Data Quality Monitoring for Data Lakes with an ML Data Catalog
Data Quality of data within an enterprise application or database can be inspected, audited using legacy data quality approach. But what about data in a data lake? The data is in the form of files, it could be semi-structured and a lot of times, it comes from third parties. Data quality of this data can change unexpectedly. Why because the upstream vendor or application changed the format, or upstream data pipeline introduced bad data or machine generating data failed. This unexpected change in data quality can cause data pipeline failures or result in inaccurate data feeding analytics.
DvSum’s ML Data Catalog with its active scan capability can automatically detect unexpected changes in data volumes, data formats, and data distribution and alert the relevant data teams to take action. The active scan can also be integrated through APIs into data pipelines to check for data quality issues and prevent data pipeline failures.
Watch the 3-minute tutorial to see it in action.
Share tutorial
More tutorials
Unlock the full potential of your data with DvSum
- Create a unified view of your entire data landscape on Day 1
- Streamline data governance with automatic data classification and enrichment
- Improve data accuracy with integrated data quality and cleansing
- Empower business users to get data insights with no-code self-service data exploration