Data Validation for Snowflake

Data-Validation-for-Snowflake
snowflake-data-migration-1
Migrating to Snowflake Cloud Data Warehouse

Migration from a legacy data warehouse such as Netezza to a cloud-based Snowflake data warehouse requires multiple steps. Data Validation is the key to the success of data migration projects. Datagaps DataFlow can be used to validate data in each step of the data migration as well as the end-to-end data validation scenarios. If you are looking for Snowflake testing tools – Try Dataflow free for 14 days 

Step 1: Extract data from the Legacy data warehouse

Data is typically extracted into CSV or Parquet format and moved to a landing zone in AWS S3. Depending on the data volumes, AWS offers multiple options for moving the files to S3. Once the data has been moved to AWS S3, data validations need to be performed to ensure that all the data was properly extracted and migrated to AWS S3. Since there are not many transformations in this step, these tests are typically one-to-one comparisons of the data in the tables in the legacy data warehouse and the files in the AWS S3 landing zone.

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values

A sample test case diagram is shown to the right. JDBC Component can be used to read data from the legacy data warehouse. File Component can be used to read data from AWS S3. Finally, the Data Compare component can be used to compare the two datasets. Sample output for a data comparison component is shown below.

oracle-to-S3

Data comparison test case

data-compare

The output of data comparison

Step 2: Transform data

Transformations such as data type conversions can be performed in this step. Data curation can be also done to improve the Data Quality before the data is loaded into Snowflake. Before curating the data, it is important to profile the data and run data quality tests to identify data quality issues with the data. DataFlow can be used to perform these tasks.

– Compare data between landing zone and staging (curated) zone in S3
– Use Data Profile and Data Rules components to identify data quality issues
– Curate data and sync to the staging zone

data-profile
data-rules

Data Rules component

Step 3: Copy data to Snowflake

Assuming that the Snowflake tables have been created, the last step is to copy the data to the snowflake. Use the VALIDATE function to validate the data files and identify any errors. DataFlow can be used to compare the data between the Staging Zone (S3) files and Snowflake after the load.

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values
– End-to-end data validation (Legacy data warehouse to Snowflake)

DataFlow can be used to perform end-to-end Data Validation in a single test as shown to the right. A single DataFlow can be used to compare data between legacy data warehouse and S3 as well as legacy data warehouse and Snowflake.

End-to-end-data-validation

End-to-end test case

Step 4: Modify reports to Use Snowflake

While snowflake provides JDBC/ODBC drivers and supports most of the commonly used SQL functions, there are going to be some differences between the way reports are developed and executed in the legacy Data Warehouse and Snowflake. Once these changes are made, thorough testing needs to be performed between the reports using the legacy data warehouse and the equivalent reports using Snowflake.

– Compare report data
– Compare report layout
– Compare report performance
– Stress test reports in the new environments by simulating concurrent user loads
– Compare security

Datagaps BI Validator is a no-code BI Testing Tool that can help automate all these tests for the supported BI tools.

Try BI Validator free for 14 days for your Snowflake BI testing needs – Download Now

Datagaps-logo-1536x406-1

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL ValidatorDataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.  www.datagaps.com 

Related Posts:

Data Quality

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

Synthetic Data

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

ETL Testing

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

BI Validation

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.
Products
product_menu_icon01

DataOps Suite

End-to-End Data Testing Automation

product_menu_icon02

ETL Validator

Automate your Data Reconciliation & ETL/ELT testing

product_menu_icon03

BI Validator

Automate functional regression & performance testing of BI reports

product_menu_icon04

DQ Monitor

Monitor quality of data being Ingested or at rest using DQ rules & AI

product_menu_icon05

Test Data Manager

Maintain data privacy by generating realistic synthetic data using AI

About
Free Trial