Connecting to Mongo DB using Apache Drill in ETL Validator

Connecting-to-Mongo-DB-using-Apache-Drill-in-ETL-

Over the last few years, we have been working with a number of customers across various industries such as Life Sciences, Financial Services, Higher Education etc.  While the problems related to data testing are similar across industries and JSON and flat files are very common, there are many differences in the file formats.  In few industries, Avro is popular while in others, based on use case, Parquet is more commonly used.

At Datagaps, one of the key challenges is our ability to empower customers to test any of the formats with ease. Of course, we have an option to build native connections for each format but that may not be the best option sometimes.  At Datagaps, we try to leverage open standards, open source frameworks to support our customers. This is where Apache Drill (https://drill.apache.org/) comes into play.

Over the last 6 months, we have been drilling around a bit and absolutely love the speed and flexibility that Apache Drill provides. As of version 3.4.5, we use Drill as the interface between ETL Validator and any of the file formats mentioned above (except flat files since flat files are much more common and deserve native connectors). In this blog, I wanted to take few minutes and explain how easy it is for you to get started with Drill and integrate with ETL Validator.

Drill-ETL-Validator-Image

Assumptions

1. You are running this proof of concept on Windows.

2. You have ETL Validator Complete 3.4.5 or higher installed on the machine.

3. You have Mongo running on the same machine.

Apache Drill Installation

Step 1:  Go to https://drill.apache.org/ and the get the latest version of  Drill. In this blog, we assume version 1.9 of Drill.

Step 2: Using 7z or a similar tool, un-archive the file.

Step 3: Navigate to the bin folder of Apache Drill and then execute the following command “sqlline -u jdbc:drill:zk=local”. This will start the Apache Drill server and now you are ready to define your Mongo database.  Once Drill is started successfully, you should see something like this:

drill_started

Step 4: Launch the web interface to Drill using https://localhost:8047. This interface should be available once the command in step # 3 is run successfully and should look like this:

Drill-admin-ui

Step 5: Navigate to  “Storage” tab and enable the Mongo storage in Apache Drill. Since Mongo is running on the same machine, you do not have to make any changes. Your Mongo setting should look like this:

drill-mongo-storage-config

Setting up Mongo connection in ETL Validator using Drill

Step 1: Launch ETL Validator

Step 2: Create a new “Mongo through Drill” Connection.

  • Name: Can be anything you want.
  • User Name & Password: You can put null/null if there is no username/password. This is a know issue and will be fixed in upcoming versions.
  • Connection Type: Leave it as Auto
  • Host Name: You can provide localhost.
  • Port: Leave the default port on which Drill must be running.
  • Retrieve and select a schema from Mongo.

Step 3: Test the Connection.

Drill-ETL-Validator-Image-1

Now, you are ready to execute Query Comparison or other test plans in ETL Validator. Few sample queries are below. Depending on the complexity of the JSON structure, you may need more time to understand how to get the right query.

  • SELECT * from mongo.test.`addresses`;
  • SELECT  ad.address.building FROM mongo.test.addresses as ad
  • SELECT  ad.address.coord FROM mongo.test.addresses as ad

The above queries assume that the JSON is as follows and these documents are in the mongo.test schema.

{“address”: {“building”: “1007”, “coord”: [-73.856077, 40.848447], “street”: “Morris Park Ave”, “zipcode”: “10462”}, “borough”: “Bronx”, “cuisine”: “Bakery”, “grades”: [{“date”: {“$date”: 1393804800000}, “grade”: “A”, “score”: 2}, {“date”: {“$date”: 1378857600000}, “grade”: “A”, “score”: 6}, {“date”: {“$date”: 1358985600000}, “grade”: “A”, “score”: 10}, {“date”: {“$date”: 1322006400000}, “grade”: “A”, “score”: 9}, {“date”: {“$date”: 1299715200000}, “grade”: “B”, “score”: 14}], “name”: “Morris Park Bake Shop”, “restaurant_id”: “30075445”}

{“address”: {“building”: “469”, “coord”: [-73.961704, 40.662942], “street”: “Flatbush Avenue”, “zipcode”: “11225”}, “borough”: “Brooklyn”, “cuisine”: “Hamburgers”, “grades”: [{“date”: {“$date”: 1419897600000}, “grade”: “A”, “score”: 8}, {“date”: {“$date”: 1404172800000}, “grade”: “B”, “score”: 23}, {“date”: {“$date”: 1367280000000}, “grade”: “A”, “score”: 12}, {“date”: {“$date”: 1336435200000}, “grade”: “A”, “score”: 12}], “name”: “Wendy’S”, “restaurant_id”: “30112340”}

In the next blog, we will show you an example of comparing JSON data with table data using the Query Compare Test Plan. Stay tuned.

Datagaps-logo-1536x406-1

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL ValidatorDataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.  www.datagaps.com 

Related Posts:

Data Quality

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

Synthetic Data

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

ETL Testing

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

BI Validation

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.
Products

product_menu_icon01

DataOps Suite

End-to-End Data Testing Automation

product_menu_icon02

ETL Validator

Automate your Data Reconciliation & ETL/ELT testing

product_menu_icon03

BI Validator

Automate functional regression & performance testing of BI reports

product_menu_icon04

DQ Monitor

Monitor quality of data being Ingested or at rest using DQ rules & AI

product_menu_icon05

Test Data Manager

Maintain data privacy by generating realistic synthetic data using AI

About
Free Trial