Data Validation for Snowflake

Q: What are the main steps involved in validating a Snowflake data migration?

Snowflake migration validation includes validating data extraction to AWS S3, verifying transformations and data quality in staging, comparing staged data with Snowflake, and validating reports after migration.

Q: Why is data profiling important before loading data into Snowflake?

Data profiling identifies quality issues before loading, allowing organizations to clean and validate data in the staging area so only accurate, high-quality data is loaded into Snowflake.

Q: What does report validation involve after migrating to Snowflake?

Report validation compares report accuracy, visual layout, performance, concurrent user behavior, and security settings between the legacy environment and Snowflake after migration.

By Rajesh Kumar
February 18, 2022
2:28 pm
Snowflake

Listen to article 0:00

Migrating from a legacy data warehouse to Snowflake requires validation at each step. This guide outlines a four-step process using Datagaps DataFlow: comparing extracted data (row counts, encoding, completeness) between the legacy warehouse and AWS S3, validating transformations using Data Profile and Data Rules components, verifying data loaded into Snowflake against staging files, and end-to-end validation. It also covers using BI Validator to test report accuracy, layout, performance, and security after switching reports to Snowflake.

Key Takeaways

Extraction validation confirms lossless data movement — comparing row counts, data encoding, completeness, and values between the legacy warehouse and AWS S3 landing zone ensures nothing was lost during extraction.
Transformation step requires data quality checks before load — Data Profile and Data Rules components help identify and curate data quality issues before syncing data to the staging zone, ahead of loading into Snowflake.
DataFlow supports single-test, end-to-end validation — a single DataFlow can validate data between the legacy warehouse and S3, as well as between the legacy warehouse and Snowflake, in one unified test.
Reports need separate validation after migration — BI Validator checks report data accuracy, layout, performance, security, and stress-tests reports under concurrent user loads to confirm they function correctly once switched to Snowflake.

Migrating to Snowflake Cloud Data Warehouse

Migration from a legacy data warehouse such as Netezza to a cloud-based Snowflake data warehouse requires multiple steps. Data Validation is the key to the success of data migration projects. Datagaps DataFlow can be used to validate data in each step of the data migration as well as the end-to-end data validation scenarios. If you are looking for Snowflake testing tools – Try Dataflow free for 14 days

Step 1: Extract data from the Legacy data warehouse

Data is typically extracted into CSV or Parquet format and moved to a landing zone in AWS S3. Depending on the data volumes, AWS offers multiple options for moving the files to S3. Once the data has been moved to AWS S3, data validations need to be performed to ensure that all the data was properly extracted and migrated to AWS S3. Since there are not many transformations in this step, these tests are typically one-to-one comparisons of the data in the tables in the legacy data warehouse and the files in the AWS S3 landing zone.

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values

A sample test case diagram is shown to the right. JDBC Component can be used to read data from the legacy data warehouse. File Component can be used to read data from AWS S3. Finally, the Data Compare component can be used to compare the two datasets. Sample output for a data comparison component is shown below.

Data comparison test case

The output of data comparison

Step 2: Transform data

Transformations such as data type conversions can be performed in this step. Data curation can be also done to improve the Data Quality before the data is loaded into Snowflake. Before curating the data, it is important to profile the data and run data quality tests to identify data quality issues with the data. DataFlow can be used to perform these tasks.

– Compare data between landing zone and staging (curated) zone in S3
– Use Data Profile and Data Rules components to identify data quality issues
– Curate data and sync to the staging zone

Data Rules component

Step 3: Copy data to Snowflake

Assuming that the Snowflake tables have been created, the last step is to copy the data to the snowflake. Use the VALIDATE function to validate the data files and identify any errors. DataFlow can be used to compare the data between the Staging Zone (S3) files and Snowflake after the load.

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values
– End-to-end data validation (Legacy data warehouse to Snowflake)

DataFlow can be used to perform end-to-end Data Validation in a single test as shown to the right. A single DataFlow can be used to compare data between legacy data warehouse and S3 as well as legacy data warehouse and Snowflake.

End-to-end test case

Step 4: Modify reports to Use Snowflake

While snowflake provides JDBC/ODBC drivers and supports most of the commonly used SQL functions, there are going to be some differences between the way reports are developed and executed in the legacy Data Warehouse and Snowflake. Once these changes are made, thorough testing needs to be performed between the reports using the legacy data warehouse and the equivalent reports using Snowflake.

– Compare report data
– Compare report layout
– Compare report performance
– Stress test reports in the new environments by simulating concurrent user loads
– Compare security

Datagaps BI Validator is a no-code BI Testing Tool that can help automate all these tests for the supported BI tools.

Try BI Validator free for 14 days for your Snowflake BI testing needs – Download Now

FAQs: Snowflake Data Migration Validation

1) What are the main steps involved in validating a Snowflake data migration?

Snowflake migration validation typically includes four stages: validating data extraction from the legacy data warehouse to AWS S3, verifying data transformations and quality in the staging area, comparing staged data with data loaded into Snowflake, and validating BI reports after they are connected to the new Snowflake environment.

2) How is data extraction validated when migrating to Snowflake?

Extraction validation compares row counts, data completeness, encoding, and actual data values between the legacy source system and the files written to AWS S3. This ensures data is transferred accurately before any transformation or loading takes place.

3) Why is data profiling important before loading data into Snowflake?

Data profiling identifies quality issues such as missing values, invalid formats, and inconsistencies before data is loaded into Snowflake. Cleaning and validating data in the staging area helps ensure only accurate, high-quality data reaches the target warehouse.

4) What does report validation involve after migrating to Snowflake?

Post-migration report validation verifies that dashboards and reports produce the same results as before migration. It includes checking data accuracy, visual consistency, report performance, stress testing under concurrent usage, and confirming that security and user access permissions remain correctly configured.

Get Started Today

Talk to a datagaps expert

Rajesh Kumar A

Digital Marketing Manager, Datagaps

Digital Marketing Manager at Datagaps. Drives data-driven growth through content, performance campaigns, and marketing technology.

Subrahmanya Narayana Chirravuri

Senior Director, Technology, Datagaps

Senior Director of Technology at Datagaps. Leads engineering for the ETL, BI, and data-quality validation platforms.

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, DataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms. Datagaps

Use Case

Cloud

Analytics

Industry

Academy

Support