Data Quality in Financial Institutions – Partial Flattening of Mainframe Complex Files

Q: Why do financial institutions still use COBOL systems?

Financial institutions continue to use COBOL because of its proven stability, reliability, and ability to process large volumes of mission-critical financial transactions, often alongside VSAM files and copybooks.

Q: What makes migrating COBOL data to Snowflake challenging?

COBOL data is typically stored in hierarchical structures that do not map directly to Snowflake's relational tables, requiring careful transformation and validation to preserve data relationships.

Q: What is the difference between partial flattening and complete flattening?

Partial flattening preserves portions of the original hierarchy while reducing data complexity, whereas complete flattening converts the entire hierarchical structure into normalized relational tables, creating larger datasets that require additional validation.

Q: What data quality checks can be applied after flattening COBOL data?

After flattening COBOL data, organizations can perform duplicate detection, domain validation, null checks, character validation, and other standard data quality tests on the resulting relational datasets.

By Rajesh Kumar
February 7, 2023
9:17 am
BI Testing, Cloud Data Migration, Data Quality, Data Validation, DataOps, ETL Testing
0 comments

Financial institutions still rely on COBOL and mainframe systems (using VSAM and copybooks) whose hierarchical data structures are difficult to migrate to modern platforms like Snowflake. This post explains “partial flattening”—a technique using DataOps Suite’s Python functions to preserve some hierarchy while flattening other parts, unlike complete flattening, which flattens everything into a normalized structure. It also covers post-migration data quality checks (duplicate IDs, domain checks, null checks) needed to validate flattened COBOL and EBCDIC binary datasets.

Key Takeaways

COBOL systems remain common in finance due to stability — VSAM and copybooks help manage decades-old hierarchical data structures, but their tree-like nesting makes migration to flat, normalized systems like Snowflake challenging.
Partial flattening preserves key relationships — unlike complete flattening (which normalizes all hierarchy into flat rows), partial flattening keeps some structure intact—like the relationship between an account and its transactions—while still simplifying other parts.
Data volume differs significantly between approaches — complete flattening produces a much larger dataset since all hierarchical relationships are expanded, while partial flattening results in a smaller, more manageable volume.
Post-migration validation requires specific checks — once COBOL data is converted into related datasets, teams can apply duplicate ID checks, domain/list-of-values checks, null checks, and other data quality rules that were difficult to run on the original nested structure.

COBOL and Financial Institutes

COBOL (Common Business-Oriented Language) is a programming language that was developed in the 1950s and remains in widespread use today, particularly in the finance and banking sectors. Many financial institutions still rely on COBOL systems to manage their data and processes, even though newer technologies such as Java and Python have largely replaced COBOL in other industries.

One of the reasons that COBOL systems are still in use is that they are extremely stable and reliable. These systems have been in operation for decades and have proven to be effective at handling large volumes of data and transactions. In addition, COBOL systems often include technologies such as VSAM (Virtual Storage Access Method) and copybooks, which help to manage and organize the data in these systems.

However, the hierarchical nature of COBOL and finance datasets can make it difficult to migrate this data to a modern system like Snowflake. In these systems, data is often organized in a tree-like structure with multiple levels of nested records. For example, a financial transaction record might contain multiple account details, each of which might contain multiple transaction details. This hierarchical structure can make it challenging to map the data to a more flat and normalized structure like that used by Snowflake.

To address this challenge, organizations may consider using partial flattening when migrating their data. Partial flattening involves keeping some of the hierarchy in the data while still flattening out other parts. This can be done using DataOps Suite’s python functions, which allow for more granular control over the data conversion process.

A quick note that the Suite also works with binary EBCDIC files. In this blog post, we focus on a COBOL system however the same can be applied to binary files.

Partial Flattening vs Complete Flattening

Let’s say that an organization is migrating a financial transaction record from a COBOL system to Snowflake. The transaction record in the COBOL system might have the following structure:

Partial Flattening allows the organization to preserve some of the hierarchical structure of the original data (e.g., the relationship between an account and its transaction details), while still flattening out other parts to make it easier to work with in Snowflake.

Alternatively, the organization could use complete flattening to convert the transaction record. In this case, the entire hierarchical structure of the original data is flattened out, resulting in a more normalized and flat structure. However, this approach may make it more difficult to understand the relationships between different parts of the data, particularly if the data contains multiple levels of hierarchy.

One of the key differences between partial flattening and complete flattening is the volume of data that is produced. Complete flattening involves flattening out all levels of hierarchy in the data, resulting in a more normalized and flat structure. This can result in a significantly larger volume of data, as all of the hierarchical relationships are preserved in the data. On the other hand, partial flattening involves keeping some of the hierarchy in the data while still flattening out other parts. This can result in a smaller volume of data, as some of the hierarchical relationships are removed from the data.

Data Quality Post Flattening

Once a COBOL file has been converted in the Related Datasets, a multitude of traditional test cases that are difficult to implement in a the orignal complex structure.

These take the forms of Duplicity check of ID within substructure, List-of-Values Domain checks, Null checks, Character checks and the various data quality checks. Many of these are present in the Datagaps’ DataOps Suite.

Try DataOps Suite – Free Trial

For more Data Rules Data at Rest Testing: Read this article on Data Quality

Conclusion

Overall, the migration of COBOL and finance datasets to a modern system like Snowflake can be a complex and time-consuming process. By using partial flattening and the DataOps Suite’s python functions, organizations can ensure that their data is accurately and effectively migrated, while still maintaining the hierarchical structure that is so important in these systems. This helps to validate the mainframe datasets and ensure that the data is correctly migrated to the new system, which is critical for maintaining the integrity of the data and ensuring that it is properly understood by users.

Get a Free POC scheduled today!

Request Demo

FAQs: COBOL to Snowflake Data Migration

1) Why do financial institutions still use COBOL systems?

Many financial institutions continue to rely on COBOL applications because of their proven stability, reliability, and ability to process high volumes of mission-critical financial transactions. Combined with technologies such as VSAM and copybooks, COBOL systems remain foundational to many core banking operations.

2) What makes migrating COBOL data to Snowflake challenging?

COBOL data is commonly stored in hierarchical structures with multiple nested levels, whereas Snowflake uses a relational, table-based model. Converting these complex structures into a format suitable for Snowflake while preserving data relationships requires careful transformation and validation.

3) What is the difference between partial flattening and complete flattening?

Partial flattening retains portions of the original hierarchical relationships, reducing data expansion while maintaining logical connections. Complete flattening converts the entire hierarchical structure into fully normalized relational tables, resulting in larger datasets and requiring additional validation to preserve data integrity and traceability.

4) What data quality checks can be applied after flattening COBOL data?

After COBOL data has been flattened into relational datasets, organizations can perform duplicate detection, domain and list-of-values validation, null checks, character validation, and other standard data quality tests that are difficult to apply directly to complex hierarchical source files.

Rajesh Kumar A

Digital Marketing Manager, Datagaps

Digital Marketing Manager at Datagaps. Drives data-driven growth through content, performance campaigns, and marketing technology.

S P S Murthy Akella

Director, Technology Strategy, Datagaps

Director of Technology Strategy at Datagaps. Business solutions architect and Certified Scrum Master in data engineering, responsible AI, and ML across BFSI, telecom, aviation, and energy.

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, DataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms. Datagaps

Use Case

Cloud

Analytics

Industry

Academy

Support

Use Case

Cloud

Analytics

Industry

Academy

Support

Data Quality in Financial Institutions – Partial Flattening of Mainframe Complex Files

COBOL and Financial Institutes

Partial Flattening vs Complete Flattening

Data Quality Post Flattening

Conclusion

FAQs: COBOL to Snowflake Data Migration

Related Posts:

Leave a Reply Cancel reply

Recent Blogs

Solutions

Data Testing Concepts

Products