Data Mesh is a strategic and decentralized approach of managing data that moves away from the traditional view of treating data as a raw source of information or a by-product of a process to managing data as a product independently. Treating data as a product independently pertains to including subject matter experts or data domain owners or producers of data being responsible for quality, documentation, contextualization, and access to data
Data Mesh Benefits
Orchestration in the right hands: With data ownership decentralized among people working in close proximity with the data e.g. Subject matter experts and Cross- Functional domain teams, Data Mesh helps build accountability with the right set of people and reduces the communication gap between the data producers and data consumers
Brings in Agility : Because Data Mesh is based on a key principle of domain-oriented decentralized data ownership and architecture, it focuses on the on-time delivery and quality of the product, thus reducing time to market and bringing in agility in the process.
Get more Value from Data: Treating data as a product ensures data has the right ownership assigned who are responsible for the quality and trustworthiness of data. It also ensures data is accessible and discovered easily with the metadata and other semantics readily available and better secured with the right policies and security measures
Need for Data Governance in Data Mesh
While Data Mesh has its own set of benefits as data is being managed as data products across different domain data owners, the same benefit comes with certain complexities and challenges which can be resolved by using the right data governance approach
Domains and Data become Siloed
Each of the domains and domain teams while managing data in their way creates issues with data remaining in silos and may not match the standards as set by the organization. The management of data is only restricted to the particular domain and isolated from the rest of the organization. This creates issues with discoverability and contextualization with data replicated across multiple places and trouble with understanding the data. In addition, the quality standards, rules and metrics may not match the global standards leading to inconsistency in the quality of the data.
Integration with New and Different Data Platforms
Given the power, previously with the expert central team moving to domain teams, the effort in integration and exploitation of data from new data platforms will increase. It may be fair to say that the domain teams need the right data catalog that can cover the entire data stack to support in extracting, integrating, cleansing and centralizing all information into a single place for the benefit of both producers and consumers
Difficult to have consistent and trustworthy Data
Different domains have different data quality requirements and rules and hence, they may maintain different methodologies and frameworks to deal with data quality issues which can lead to mismatch in data quality expectations across domains for certain sets of data. Hence it is important to have a federated way of mitigating data quality issues. For e.g. The central team can devise a framework and set of rules and standards for key data sets which then is propagated across to the domains to maintain the level of quality needed for the data across different domains.
How DvSum can support Data Governance in Data Mesh
Covers the entire data and analytics stack
DvSum Agile Data Catalog covers your entire data stack. That includes your data lakes, streaming data, databases, analytics platforms, and your BI Layer. It covers your hybrid cloud, whether your data is on-premise, private cloud, or in AWS, in Azure or GCP. It catalogs your structures data like tables, csv files, compressed data files like parquet, semi-structured data like JSON or DocumentDB and your unstructured data like Media files. Since Data Mesh would involve multiple data products across different domains connecting to various different data sources, DvSum can help connect to each of them and centralize information into a single place to support the producers and consumers utilizing data
Make data more discoverable and addressable
Since data may be replicated at multiple places in a Data Mesh approach, it is important to have a central repository to easily understand and find the data to avoid any anomalies and address any issues. DvSum with its Automated Catalog Curation makes this easy by using machine-learning to automatically classify the data based on universal, industry-specific, or custom entities. Classic examples are Names, Phone Numbers, email addresses, Social Security numbers, and Credit Card Numbers. Classification is done by not only looking at exact or fuzzy column names, but also inspecting the actual values and patterns in the data.
Have Trusted Quality Data
Even if the overall structure is decentralized, organization must have a common framework for managing Data Quality in a Data Mesh architecture. A framework that would consist of the quality metrics, standard rules and bringing in more trustworthiness with the help of data lineage. DvSum with its powerful rules engine can help check data quality for master data, transactional data and data across systems. With its business friendly user interface, it can orchestrate cleansing with the producers of the data. With DvSum you can also correct DQ issues across different categories of data like Master data, Data Pipeline Issues, IoT Data quality issues and Data Drifts in Machine Learning
Maintain Data Privacy & Security
DvSum Data Catalog provides necessary features to tag and classify data as per the data privacy standards including sensitivity of the data and provides an simple way to identify, manage and mitigate risks to avoid any non-compliance to rules and regulations. With powerful data security features, DvSum’s security approach focuses on security governance, risk management, and compliance. This includes encryption at rest and in transit, network security and server hardening, administrative access control, system monitoring, logging and alerting, and more.