3 ways data catalogs drive data migration

18 Nov, 2021 •

teamwork brainstorming meeting and new startup project in workplace, quality successful work concept, vintage effect.

CIOs and CDOs are in the hot seat to deliver data and self-service capabilities to businesses. A part of that journey often requires leveraging data in the Cloud. But many Cloud Migrations have ended up with a Data Swamp, or have blown their delivery timelines and budget. A key reason is not being able to effectively inventory and prioritize data to migrate. In this post, we’ll talk about how Data Catalog combined with the Data Ranking technique can allow you to plan how you should sequence your data migration.

In our last blog, we illustrated how business leaders and C-suite executives are turning to self-service analytics to drive business insights and empower business users. Data offerings over the cloud have become the norm to offer quick and effective solutions to meet business requirements and enable self-service analytics. However, for many businesses that are aiming to migrate their infrastructure onto the cloud, numerous questions arise based on cost and time to implementation. Despite being assured of the tangible benefits of moving to a cloud-based infrastructure, why do so many businesses face the same challenges after moving to the cloud?

I’m sure as data professionals we’ve all heard: 

“20% data drives 80% of business decisions” 

Often times identifying this good quality poses a huge challenge in itself. Migrating this high-quality data can have transformative effects on business intelligence. Problems like an incomplete inventory of data, prevent accurate estimation of data migration duration and also results in inefficiencies in the data discovery process. This sentiment is further confirmed when we consider that high-quality data should be given priority to migrate. However, businesses are unable to identify which data sources to prioritize as well. Moreover, concerns over data cleansing eating away at budgets and proposed schedules are hard to bypass. Let’s see this all too common problem in action:  

Take Jim, a data analyst at a firm that has multiple siloed on-prem data sources. The data is messy and a majority of his work is finding which data to use from various data sources. On completing this lengthy process, he begins his analysis. His firm decides to move their data capabilities onto the cloud and begin their migration. After many months of effort, Jim hopes the new cloud-based data can speed up his process. But, to his surprise, he is still facing the same challenges! What went wrong? 

The answer lies in data quality. In fact, this sentiment is reinforced in a poll conducted by DvSum. 53% of individuals identified data quality and cleansing as the biggest challenges faced by businesses aiming to migrate their data.

what are the key challenges it faces in migrating data to cloud poll answers
LinkedIn Poll as of Nov 8, 2021

Let’s take a look at how businesses can drive this process through DvSum’s award-winning cloud migration matrix: 

VolumeVelocity Usage Business Priority MIGRATION PRIORITY 
***High1
**High Med2
*High Med*3
**Med*4
*High Low*5
**Low*0

The matrix reinforces that high business priority data takes precedence over all other low-quality data. Moreover, high usage data with business relevance is also to be considered. Let’s tie this together with Jim’s condition, our data analyst. 

Moving the same dirty data to a faster infrastructure, results in data users having to face the same pitfalls at a higher cost. Migration gives us an opportunity to cleanse our data and move only the valuable data, saving costs and time. But, how do we identify this “good” data? A data Catalog is a powerful tool that can help in this process. A data catalog allows a unified view of all our data assets, allowing users to identify and tag desirable data. It even provides powerful data governance functionality. 

So, let’s summarize the 3 ways that data catalogs help drive data quality: 

  1. Provide a complete inventory of all data  
  2. Help prioritize which data to move first 
  3. Help provide data about your data ensuring insight into quality 

Share this post:

You may also like