As surprising as it may seem, despite the enhanced focus by top corporate boards on the importance of Big data analytics to determine key business trends, most of the data that is being supplied is of inferior quality, and this has started to impact several sectors, from Finance to Pharmaceuticals. A recent study highlighted a key fact that most of the CEOs are primarily worried about the quality of the data; they base their business decisions on (nearly 80% +).
What should be pointed out is that the concern regarding the quality of raw data is not exactly new but one that has been around for several years. The challenges facing the industry can be manifold, how to improve the quality of the data, to legacy data to filtering out what is inherently bad data and retaining good quality data. How exactly does one go about it? That’s one of the essential questions that face the industry today; furthermore, companies often weigh up the benefits, vs. costs of any such data rework and are yet to determine if it is indeed worth it. As one company executive put it, one way to sum it up would be to liken the current data and data quality to Oil in its raw form. This data, is still unprocessed and only after undergoing a process of refinement, which likely includes validation, duplication, complete auditing, can its quality improve and the data subset including analytics would prove to be useful for any company.
Good data vs. Bad data
When it comes to defining data, and especially good data, several companies and organizations alike tend to have different takes on the same. However, mostly, they all agree on certain key concepts, such as precision and accuracy. Moreover, depending on your company’s current requirement, this definition could vary, but both accuracy and precision are still taken as viable metrics. When it comes to bad data or weak data, most companies are not unaware of the low quality of their current data but often show a reluctance to tackle the same on account of the time it would take them to address the situation and to rectify the problem. However, the problem is that as data gets duplicated, it soon spreads over the rest of your architecture. So as low-quality data proliferates across your system, it gives rise to several discrepancies, which can lead to several issues.
More downtime: As your system develops several discrepancies, it may cause more downtime, to try and reconcile your data.
Resource allocation: as a result of these discrepancies, it can cause your organizations to allocate resources less effectively than before.
Slow integration and slow deployment: And due to increased downtime and several inconsistencies, you may find it hard to integrate and even deploy new systems you would find it quite challenging to meet the current industry quality control standards.
Increased frustration: As a result of poor data, you may find it hard to operate various aspects of your business smoothly, and this can lead to increased failure.
Quantifying the cost
When it comes to reviewing the impact of low-quality data, assessing the price or for that matter, quantifying the same can be a trite difficult since different organizations tend to utilize their data differently. However, according to Forbes Insights, the following has been determined –
Data problems can cost a company nearly $5 million annually with some companies estimating their total losses to be around $20 million.
Gartner research estimates that nearly 40% of the anticipated value of various business initiatives is often never achieved on account of low-quality data. Due to the low quality of data, both planning and execution of these business initiatives tend to be shoddy at best, and at other times, ‘dead in the water.’
More than 85% of all data integration projects do not progress as expected and often end up as a total failure with others overshooting their respective budgets.
Most companies have canceled implementing new IT systems, mainly on account of poor data.
It was also found that a data telecommunications company lost $8millions per month, mainly due to data errors. Naturally, this affected their competitiveness in the long run
Banks were often unable to calculate risks correctly and to assess their risk management
One bank, in particular, found that nearly 60% of its home equity loans were erroneously derived. They found that the principal was getting large each year.
Data quality deficit
Experts have long surmised that some of the causative factors for several organizations to suffer from poor quality data have to do with the fact that the data in question often had dubious provenances, or that the company in question often lacked the requisite technology to process and filter the data, and to correct it as and when required. As a result, data quality has been steadily eroding over the last few years, and it is time that the industry as a whole took some practical measures.
The data in question needs to be validated, cleansed, de-duplicated, and audited – and as mentioned earlier, companies are increasingly less likely to take any comprehensive measure on their own to do so on account of the time and resources involved. The irony is that an organization that can clean its data quality can use the same to determine current and future trends in the business, and as a result, they would be able to compete more effectively than before. While bad data can impact your company’s business process, and even impact your profit margin, it should also be pointed out that good quality data can help you run useful data analytics, help determine current and future trends in your business. In turn, it can help you streamline your business to the point that you can undertake sound business decisions and become more competitive, as a result.
Data and risk management
Low-quality data has often been a perennial problem where banks and insurers are concerned; this is mainly because most of the data they depend on is mostly unstructured and under-utilized as a result. Some of the data in question may have questionable sources, but the problem is further compounded by the fact that most organizations depend on legacy systems or actuarial software, resulting in increasingly redundant IT architectures. It is akin to trying to run a quantum computer on Window Vista, which should serve to underscore the issue facing the industry as a whole today. However, poor quality data can also impact a company’s ability to assess its risks, and manage the same successfully. To say that poor quality data has influenced risk management would indeed be an understatement.
Furthermore, most of these companies from small insurance companies to large banking firms often duplicate the same data but store it in different formats. This result in unnecessary duplication which only further serves to clog up the issue also. While several experts have already pointed out the redundancy of several of these legacy systems, several organizations continue to use the same. As a result, banks, insurers, and others, who often need to be able to assess risks to ensure that their clients are well protected against the same, are finding it hard to do so, under the circumstances. It should be pointed out that homogenous data is of value to any company/ organization as long as it is aggregated by applying certain standards, and structured in such a manner that it is easy to use. Only then, would organizations be in a better position to accurately assess their risks and manage the same effect.
Data governance
More IT companies have been asking for more stringent Data governance, and this is with good reason. Data governance is essentially the process by which a company/ organization evaluates its data, prioritizes the fiscal aspects and applications of the data, and makes a key determination on how it can be used. This is not a new topic but then again, it is not a legacy one either.
The point is that with good and sound data governance, companies can often determine if a particular data can be used, from where it was sourced, the method of the source, what data rules should govern this data, before tagging it as ‘acceptable data’. While that may sound just like what most companies need, the fact remains that more of the top fortune 500 companies have low to minimal compliance when it comes to data governance. Again, as pointed out earlier on, companies are less averse to losing valuable time on account of data governance and this is a mindset that needs to be changed immediately.
Companies must have the following in place to allow for effective data governance –
Data quality definitions: This often determines the ‘acceptability’ of the data, its condition, whether it can be trusted, and to determine whether it adheres to current data policies.
Glossary: This can ensure that all key data is recorded accurately and that it does not get duplicated unnecessarily.
Roles and responsibilities: This pertains to assigning certain individuals with key roles and responsibilities in maintaining data and ensuring that it adheres to current data policies.
Metadata creation: Metadata creation can help you track the lineage of a particular data set; it can also provide you with a better understanding of the relationship between different sets of data business processes, across various niches including internal and/or external sources.
Benefits of good data governance:
While only a few companies at most ensure strict data compliance when it comes to data governance, there is all the more reason that you would want to take a closer look at some of the key aspects of the same.
With good data governance, you would increase the value of your key data
You would be able to use the data to glean key business insights, trends and develop better analytics for your organization
Able to increase profits, sales, and revenue for your organization
Set in place strict compliance standards that would be the norm for data collated from various sources
Ensure smooth regulation and transparency of data which includes storing and maintaining it in pristine condition
Provide others in your organization with crucial training on all aspects of data, which should help you to leverage the data much more effectively than before.
Data quality, data governance and copyright
One of the issues to make headlines recently was the recent passage of the infamous article 13, also known as the directive on copyright in the single digital market, is basically a bill seeking to limit data profusion and to ensure that the creators of the ‘data’ get due copyright or is paid a fair remuneration for the same. While this particular article predominantly applies to EU nations only and seeks to limit publication and by extension, duplication of information for public dissemination without due credit/ remuneration being given to the creators, it does bring into sharp focus the current predicament facing the whole industry, vis-à-vis poor data quality.
While no one is advocating an article 13 or something similar for how data should be shared, how it should be maintained etc., it does allow you to a glimpse of a viable solution. Taking a closer look at Article 13, it seems that all the attention is focused on the copyright part. Fair enough, but that brings a relevant topic into focus, and that’s data provenance. So if you are assured about the provenance of a certain data set, then you would be more than likely to develop critical business decisions regarding the same. The issue of poor data quality is a universal one and one that the global community needs to resolve at the earliest. It should be pointed out that select few can not mandate any such solution in the IT field, but instead, wide-ranging discussions must be had with all stakeholders to come up with a viable solution. Of course, data provenance, or tagging each individual data can help you ascertain its provenance quite , but that still does not eliminate other issues such as redundancy, re-duplication, etc.
Good quality data, why it is essential
There are those who argue that data governance takes precedence but managing bad data smoothly makes little sense. It is better to have in place technologies and specific measures which can help eliminate bad data, and strict data compliance measures in place, to promote good data. Of course, there are those organizations that are more than aware of the fact that their data/information is of low quality and yet, are quite reluctant to take any concrete measures to tackle the same. Perhaps, you need to refresh yourself on some of the key benefits of ‘good data’ for the whole organization as a whole.
Decision making: good quality data can help your organization develop better business models, and enable your executives to take key business decisions with more confidence since the data they base the same on is of good quality.
Productivity: And as your executives perform better, your company should see a jump in its productivity as well. Now, they need not worry about spending all the extra time on validating, cleansing and tagging data appropriately. Since the data is already good, they can now focus on the core aspects of their work and be more productive, as a result.
Data compliance: It is vital that you have good quality data in place, which can ensure that the data you maintain along with the quality standards in your organization is compliant with widely accepted industry standards. Bad data can cause you to fail compliance checks which can result in both fines as well as loss of valuable business.
Marketing: With good quality data, you would be able to develop more effective marketing campaigns that can help target your core demographics much more efficiently. Moreover, you can use the business insights and customer behavior gathered from studying such data to provide you with the edge over your nearest competition.
Opportunity: It is a fact that useful data can also help you zone in on viable and prospective opportunities much more efficiently.
Conclusion
Low data quality is an industry-wide problem and one that the industry as a whole need to address right away. Poor data can lead to loss of prospective customers, and you may also end up losing money (premiums not being calculated accurately etc.) as a result of the same. This is the reason why it is vital that both the topics of data governance and poor quality data are tackled immediately. It should also be pointed out that data can also evolve, and that is something else that you may want to consider various compliance measures when setting in place strict quality standards, that may well become the new ‘Norm’