How to perform analysis of data in MongoDB using PDI

The demo below shows how to use Pentaho Data Integration tool  to perform ad-hoc analysis of data stored in MongoDB and MySQL. The following items are shown:

  1. Query data in MongoDB.
  2. Convert MongoDB document from JSON document to a basic row of columns using the JSON input step.
  3. Lookup a field in the MongoDB document. The lookup table is stored in MySQL.
  4. Store the document and the lookup data into a table in MySQL.
  5. Perform ad-hoc analysis using PDI modeler perspective.
  6. Display results using PDI visualizer.

 

Leave a comment

Filed under Big Data, MongoDB, PDI, Pentaho

How to get data into MongoDB using PDI

This demo will show how to import data from a CSV file into MongoDB using Pentaho Data Integration tool (a.k.a. Kettle).  The following items will be demonstrated:

  1. Basics of how to map columns from CSV file to fields in a MongoDB JSON document.
  2. How to handle variable/optional columns.
  3. Perform basic data scrubbing before adding data into MongoDB.

Although this demo uses a CSV file as input data, PDI can just as easily import data from many JDBC compliant databases by using the Table Input step.

Leave a comment

Filed under Big Data, MongoDB, PDI, Pentaho