Data Quality Monitoring for Data Lakes with an ML Data Catalog
Data Quality of data within an enterprise application or database can be inspected, audited using legacy data quality approach. But what about data in a data lake? The data is in the form of files, it could be semi-structured and a lot of times, it comes from third parties. Data quality of this data can change unexpectedly. Why because the upstream vendor or application changed the format, or upstream data pipeline introduced bad data or machine generating data failed. This unexpected change in data quality can cause data pipeline failures or result in inaccurate data feeding analytics.
DvSum’s ML Data Catalog with its active scan capability can automatically detect unexpected changes in data volumes, data formats, and data distribution and alert the relevant data teams to take action. The active scan can also be integrated through APIs into data pipelines to check for data quality issues and prevent data pipeline failures.
Watch the 3-minute tutorial to see it in action.
Schedule a demo, today
- Establish a common data understanding
- Accelerate time to value from data
- Enable frictionless and compliant access to data