Datagaps Data Validation and Migration To Snowflake

Validating_Data_Migration_to_Snowflake-281
This article discusses why and how to use both together, and dives into the challenges of Bulk Data Migration to Snowflake.
Why and How?
People are rapidly adopting cloud architectures for Data Warehouses and Machine Learning projects due to the economies of scale in the cloud. One obstacle in achieving rapid success is the data and data types inconsistencies between on-premise structures and the modern data stacks in the cloud.
When you migrate vast amounts of data to the cloud, the opportunity to introduce mistakes is multiplied due to these reasons and others. The earlier you catch the issues, the less costly it is to resolve the discrepancies.
This is where Datagaps come in to play.
database-and-Snowflake We provide the ability to test millions & even billions of records between source and targets structures of different types such as your on premises database and Snowflake.
This article describes ways in which we can test the data movement between Snowflake and on-premises data and between instances of Snowflake itself. We will also provide some benchmarks for doing comparison testing for large volumes of data moved into Snowflake. Another benefit of the Datagaps approach is, as additional data is moved into the Snowflake structure from other sources, Datagaps provides the ability to monitor the quality of your data structures by continually providing an up to date scoring so that you can determine when data is becoming corrupted.
Benefits of using Datagaps to test data movement into Snowflake

We recently sat down with one of our clients that uses our DataFlow product for testing the data migration from on-Prem SQL Server to Snowflake in the cloud running in AWS.

Datagaps-Snowflake
Implementation

They started the initial migration by performing bulk loads from 400 SQL Server tables to Snowflake with minimal transformations. This was stage 0, where they could perform source and target data comparison for over 500 million rows of data per table. Making use of the Data Migration wizard, the client was able to generate comparison tests for 400 tables in just a few hours. Even though there were few changes at this stage, they still encountered errors that were surfaced by our
DataFlow product.

Examples: Issues include numeric precision differences, null value inconsistencies, truncation issues and character interpolations.
Datagaps-Snowflake
Next, they began to perform incremental new data migrations where they continued to find similar issues that had to be corrected. As this continued, they wanted to transition from this incremental new data migration from SQL Server to loading the new data directly into Snowflake to reap the benefits stated earlier. To accomplish this, their initial ETL processes needed to be migrated to an ELT process aimed at Snowflake.

DataFlow was used once again to check the accuracy between the two systems once the new processes were in place. The validations exposed issues in the new ELT process through several iterations until the transformation were in sync. After a short period of testing, they could cut over to the new system and deprecate the SQL server environment. Now DataFlow continued to validate the incremental data as it was moved into Snowflake, finding issues earlier in the cycle where they are less costly to fix in time and lost credibility.

How it makes a difference?
Many of our clients take this one step further by testing their BI tools against the old warehouse and the new Snowflake implementation. They do regression, performance, and stress testing using our BI Validator tool to compare the old with the new. They can find differences in the look and feel in the output generation of the reports and dashboards. Differences are exposed between the report queries when compared to a database query. Often this is the last task necessary to validate the migration process.
Making use of the inbuilt cluster integration with AWS EMR in the DataOps suite, the client was able to automatically scale the EMR cluster on demand to over 30 nodes depending on the data volumes being compared and scale down once the testing has been completed. This capability helped reduce the cost of the testing while still achieving high performance when comparing tables of size 500 Million Records

0%

Reduction in Testing
Time by

0%

Improved Testing ROI by

Amount of data tested increased from manual testing of 10,000 sample records to complete testing of 500 M.

Datagaps-Snowflake_Banner
Conclusion

In conclusion, the goals of the migration project of agility, cost savings and performance improvements were achieved.

They also realized these benefits months earlier as a result of the improvement in the migration process due to the impact of the DataFlow products contribution in an estimated 50% test cycle reduction.

One of our clients reports comparing a file against a Snowflake instance with

0 0
Billion Records
0
Columns
0
Node EMR Cluster
0
Hours
Datagaps-logo-1536x406-1

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL ValidatorDataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.  www.datagaps.com 

Related Posts:

Data Quality

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

Synthetic Data

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

ETL Testing

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

BI Validation

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.
Products
product_menu_icon01

DataOps Suite

End-to-End Data Testing Automation

product_menu_icon02

ETL Validator

Automate your Data Reconciliation & ETL/ELT testing

product_menu_icon03

BI Validator

Automate functional regression & performance testing of BI reports

product_menu_icon04

DQ Monitor

Monitor quality of data being Ingested or at rest using DQ rules & AI

product_menu_icon05

Test Data Manager

Maintain data privacy by generating realistic synthetic data using AI

About
Free Trail