Data Quality in Financial Institutions – Partial Flattening of Mainframe Complex Files

Improving-Data-Quality-in-Financial-Institutions

COBOL and Financial Institutes

COBOL (Common Business-Oriented Language) is a programming language that was developed in the 1950s and remains in widespread use today, particularly in the finance and banking sectors. Many financial institutions still rely on COBOL systems to manage their data and processes, even though newer technologies such as Java and Python have largely replaced COBOL in other industries.

One of the reasons that COBOL systems are still in use is that they are extremely stable and reliable. These systems have been in operation for decades and have proven to be effective at handling large volumes of data and transactions. In addition, COBOL systems often include technologies such as VSAM (Virtual Storage Access Method) and copybooks, which help to manage and organize the data in these systems.

However, the hierarchical nature of COBOL and finance datasets can make it difficult to migrate this data to a modern system like Snowflake. In these systems, data is often organized in a tree-like structure with multiple levels of nested records. For example, a financial transaction record might contain multiple account details, each of which might contain multiple transaction details. This hierarchical structure can make it challenging to map the data to a more flat and normalized structure like that used by Snowflake.

DataOps-Suite-Logo-1

To address this challenge, organizations may consider using partial flattening when migrating their data. Partial flattening involves keeping some of the hierarchy in the data while still flattening out other parts. This can be done using DataOps Suite’s python functions, which allow for more granular control over the data conversion process. 

A quick note that the Suite also works with binary EBCDIC files. In this blog post, we focus on a COBOL system however the same can be applied to binary files.

Partial Flattening vs Complete Flattening

Let’s say that an organization is migrating a financial transaction record from a COBOL system to Snowflake. The transaction record in the COBOL system might have the following structure:

To address this challenge, organizations may consider using partial flattening when migrating their data. Partial flattening involves keeping some of the hierarchy in the data while still flattening out other parts. This can be done using DataOps Suite’s python functions, which allow for more granular control over the data conversion process. 

BaseRecord
Partial-Flattening
Complete-Flattening

Partial Flattening allows the organization to preserve some of the hierarchical structure of the original data (e.g., the relationship between an account and its transaction details), while still flattening out other parts to make it easier to work with in Snowflake.

Alternatively, the organization could use complete flattening to convert the transaction record. In this case, the entire hierarchical structure of the original data is flattened out, resulting in a more normalized and flat structure. However, this approach may make it more difficult to understand the relationships between different parts of the data, particularly if the data contains multiple levels of hierarchy.

DataOps-Suite-Partial-Flatenning-Function
Fig. DataOps Suite Partial Flattening Function
A-Complex-JSON-Flattened
Fig. A Complex JSON Flattened
Completly-Flat-File-The-complex-columns-will-be-further-classified-into-new-rows
Fig. Completely Flat File – The complex columns will be further classified into new rows
Partially-Flat-File
Fig. Partially Flat File

One of the key differences between partial flattening and complete flattening is the volume of data that is produced. Complete flattening involves flattening out all levels of hierarchy in the data, resulting in a more normalized and flat structure. This can result in a significantly larger volume of data, as all of the hierarchical relationships are preserved in the data. On the other hand, partial flattening involves keeping some of the hierarchy in the data while still flattening out other parts. This can result in a smaller volume of data, as some of the hierarchical relationships are removed from the data.

Data Quality Post Flattening

Once a COBOL file has been converted in the Related Datasets, a multitude of traditional test cases that are difficult to implement in a the orignal complex structure. 

These take the forms of Duplicity check of ID within substructure, List-of-Values Domain checks, Null checks, Character checks and the various data quality checks. Many of these are present in the Datagaps’ DataOps Suite.

available-data-rules
Fig. List of available data rules in this use-cases
Zero-Code-DQ-Node
Fig. How a Zero-Code DQ node looks in the Suite
Results-of-the-Data-Rules-Check
Fig. Results of the Data Rules Check
Conclusion

Overall, the migration of COBOL and finance datasets to a modern system like Snowflake can be a complex and time-consuming process. By using partial flattening and the DataOps Suite’s python functions, organizations can ensure that their data is accurately and effectively migrated, while still maintaining the hierarchical structure that is so important in these systems. This helps to validate the mainframe datasets and ensure that the data is correctly migrated to the new system, which is critical for maintaining the integrity of the data and ensuring that it is properly understood by users.

Datagaps-logo-1536x406-1

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL ValidatorDataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.  www.datagaps.com 

Related Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *

Data Quality

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

Synthetic Data

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

ETL Testing

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.

BI Validation

Automate testing of Business Intelligence applications by making use of the metadata available from the BI tools such as Tableau, OBIEE, and Business Objects.
Products
product_menu_icon01

DataOps Suite

End-to-End Data Testing Automation

product_menu_icon02

ETL Validator

Automate your Data Reconciliation & ETL/ELT testing

product_menu_icon03

BI Validator

Automate functional regression & performance testing of BI reports

product_menu_icon04

DQ Monitor

Monitor quality of data being Ingested or at rest using DQ rules & AI

product_menu_icon05

Test Data Manager

Maintain data privacy by generating realistic synthetic data using AI

About
Free Trial