Using Data Management to Drive Business Objectives

At the 19th International Conference on Petroleum Data Integration, Information and Data Management, there was a striking common theme regarding the need to achieve strategic business objectives from data management initiatives.  Many presenters touched on this topic, if not focused their entire presentation around it, including Noah Consulting, EP Energy, ConocoPhillips and CLTech Consulting.

Strategic Data ManagementMatt Tatro of Noah Consulting presented on the idea of applying Lean manufacturing principles to oil and gas data management.  In Lean, one is always considering how to continuously improve and along with that you must have a definition of success.  He made a key point that “everything revolves around the improvements and the resulting impact on the balance sheet.”  Entrance strongly believes in this concept and in our client projects we employ Agile scrum methodology to facilitate continuous improvement.

As other presenters would go on to note later in the conference, management buy-in is key to data management project success.  The best way to ensure to success is to have quantifiable metrics which demonstrate real financial benefits to the organization.  Especially in light of the recent industry downturn, the focus on bottom line results should give the back office all the more reason to ensure that projects are aligned with the strategic objectives of the business.  In the Envision phase of projects, Entrance always works with clients to identify the business value that is ultimately to be achieved at project completion.

EP Energy presented on how they transformed their IT operation to improve results by turning their focus from technology objectives to business objectives.  ConocoPhillips discussed their strategic approach to prioritizing subsurface data workflows with an end goal of accelerating delivery of value to the business.  Entrance sees many firms in the upstream space embracing the idea of IT alignment with strategic business objectives and we structure our projects to do the same.

Jess Kozman of CLTech addressed the issue head-on in his presentation entitled “Rebranding Data Management for Executive Relevance.”  He calls out three particular challenges around the low visibility of data management in most organizations but most significantly in my opinion is the perception of data management (and frequently IT in general) as a cost center and not a value-add to the organization.  At Entrance we pride ourselves on delivering real business value to our clients, not just a technical solution – this is the most exciting part of what we do to drive efficiency in our clients’ businesses.

In conclusion, ensure that your data management initiatives are value-creating projects.  Quantify the expected results up-front and validate them at the end.  Use Lean or Agile principles to enable you to make changes on-the-fly to keep projects well-aligned with strategic business objectives.

PNEC Data Management Conference Recap: The Role of IT in Data Management

The Role of IT in Data Management

At the 2015 PNEC data management conference for the oil and gas industry, one of the themes addressed by multiple presenters was the relationship between IT and the business. In particular, Chris Josefy and Omar Khan from EP Energy hit on three key areas that resonated with me in their presentation entitled “The Case for Transforming the Role of IT to support Upstream E&P Operations”:

  1. Being guided by the business objective, not the technical objective
  2. Driving more value from how the digital oilfield is managed
  3. Focusing on people, process, and data

Business value and strategic objectives are always in focus at Entrance when we engage on a new project so we strongly agree with the approach that EP Energy IT has taken. The technology is merely an enabling tool that allows the business objectives to be reached! It is often easy for the IT staff to focus on the technology (as they should know this the best) and lose sight of why the initiative is being undertaken at all.

In order to fully leverage the digital oilfield investment, you have to talk to the boots on the ground in the field. So often, decisions are made in the corporate office without assessing or understanding how they will impact day-to-day operations in the field. EP Energy made a specific initiative to send IT out into the field to understand where the real value is so that they implemented the most valuable tools in the shortest time. Entrance also believes in this. The photos on are web site are of our own consultants wearing the hardhats!

Ultimately what all of this type of approach means to the technical staff is the focus on people, process and data. People are key to the success of any technology project and Entrance has written and presented on this topic in the past because it has definitely been our experience as well. The process is about how the work gets done, which you will discover by working side-by-side with people in the field. The technology needs to streamline the process, not burden it with cumbersome extra steps, if the technology will be considered successful in the end. Lastly, we can’t ignore the data. Understanding the data as it’s understood and used by the field staff is crucial, otherwise you will build data models and process that are not sustainable.

Keep these points in mind on your next IT project journey!

Conference Recap: PIDX Fall 2014

PIDXlogoStandards as Value Creators in Oil and Gas

I attended the PIDX International US Fall Conference last week in Houston where the theme was “How Standards Accelerate the Realization of Value in Oil & Gas.”  A variety of speakers gave presentations around this topic by presenting real world examples, including Entrance’s President Nate Richards.  I want to highlight some of the key messages that I saw coming out of this gathering of industry experts.


Be Proactive Not Reactive

In a joint session, Noble Energy presented with ADP on the ways they use standards to drive global efficiency in their e-invoicing handling.  The message that stood out most to me was that they were being proactive, not reactive, thanks to e-invoicing.  In other words, by leveraging PIDX standards, Noble is able to address issues before they become problems rather than constantly chasing down errors and answers to questions.  It lets Noble ask the questions instead of being the recipient of questions.  The ability to be proactive drives efficiency by reducing time spent on things such as follow-up and re-work.

The Time is Now

A common theme that presented itself across many presentations was the idea that “the time is now” for e-business automation.  The standards are well-established and the technology is robust.  There is a sense that critical mass is being reached as leaner businesses squeeze out inefficiencies in their back offices.  Businesses that are not automating are quickly falling behind the curve as time marches on.  Ultimately, businesses that ignore the momentum toward the automation of trading partner interactions will put themselves at a competitive disadvantage as electronic business transactions become “table stakes” for businesses across the entire oil and gas industry.

Common Business Challenges

As part of a panel discussion, Deloitte elaborated on the value of standards in the oil and gas industry.  Deloitte’s presentation made an important point about the business challenges facing the oil and gas industry.  Specifically, the Deloitte representative highlighted the following challenges:

  • Mergers and Acquisitions
  • Rapid Growth
  • Lack of Integration
  • Technology

The speed of business is constantly accelerating and nowhere is that true more than oil and gas right now.  Data exchange standards provide a common language with which to drive process efficiency which ultimately facilitates M&A and enables rapid growth.  Lack of integration is a historical challenge but this is a clearable hurdle now for those companies willing to make an investment in their future efficiency.  Technology is constantly changing and larger organizations may struggle with their internal IT groups to get the right tools to meet their urgent business needs.

Not Just the Standard Schema

On the technical side, it all comes down to actually making everything work at the end of day.  While standards like PIDX provide a standard schema to facilitate communication between businesses, there is still a significant challenge to be overcome around semantics.  The PIDX standard provides a framework but ultimately each implementer uses the standard in a way that makes sense to it.  There is still much more to be done around defining the meaning of terms.  For example, there is consistently disagreement between organizations and individuals over the definition of what is a “well,” which is such a fundamental aspect of any oil and gas data discussion.  (PPDM has done a lot of work on this particular example, but challenges still remain across the lexicon of oil and gas.)

What’s next?

For businesses in the oil and gas industry looking to drive efficiency in the back office, e-business automation is a proven tool.  If you are interested in learning more how to reduce manual processes, eliminate paper, decrease days sales outstanding (DSO) and drive down costs, then it’s time to talk to Entrance consultants about creating a vision and road map for your company’s software strategy that leverages standards and the latest technology to enable an efficient back office.


Increase The Scalability And Efficiency Of Your Oilfield Services With Field Data Capture Software

Oilfield Workers on Laptop iiDon’t let your current time-keeping and work-order processes restrict your growth potential. Streamline and automate these critical tasks with field data capture software.

With the US onshore rig count level holding at ~1,710, 2014 is shaping up to be another banner year for North American oilfield services companies. Are you capturing your share of this business?

Most oilfield professionals we engage with are coming to the realization that they’re leaving money on the table.  Their growth is being stunted or unrealized due to antiquated processes and systems.   The data capture tools and processes currently in the field are simply no longer adequate to meet their future growth plans and the elevated expectations of their clients.

In this blog, I’ll address how using paper-based forms for time-keeping and work-orders in the field – while familiar and convenient – slow down time-sensitive workflows, hinder reporting, and create data accuracy and integrity issues that negatively impact the payroll and invoicing work streams.  I will also address how a Field Data Capture solution solves these challenges.

Poor data quality – the scourge of Finance and AR

Do you rely on hours-worked documentation from field operations staff to drive invoicing and payroll? Our services industry clients do, so we know the pain of managing invoicing and payroll when that data is missing, incomplete, inaccurate, or untimely.  The impact of poor data quality for hours-worked ripples throughout the organization from the CFO’s concern for Days Sales Outstanding (DSO) to the Accounts Receivable and Payroll clerks that are striving to do their jobs accurately and efficiently.  Electronic field data capture can add scalability and efficiency in the back office AR and Payroll departments.

Five critical questions for AR and payroll teams

AR and Payroll staff must always be on the ball due to the cyclical rhythm of the processes they own in the business.  Whether your processes execute as frequently as weekly or as infrequently as monthly, the obligations of these departments do not change.  As a services firm, AR and Payroll are arguably two of the most critical back office functions.  These questions will help you assesthe effectiveness of your own processes:

  • Does your data capture process  differentiate between hours billed and hours paid?  In order to ensure accuracy in the back office activities, it is critical that your system of record clearly distinguish between hours to be invoiced and hours to be paid (especially for hourly employees).


  • How long does it take for your AR/payroll departments to get notified once work is performed? Is it weeks, days, hours, minutes or seconds?  Yes, “seconds” is a possible answer, if you have a mobile-enabled electronic field data capture system in place.


  • How long does your back office take to process invoices or payroll once they’ve received time records? e time records until invoices or payroll are processed?  How much time does your back office staff spend performing data entry, rekeying into multiple systems or manually creating Excel files?  Automation of the static and repetitive invoice generation and payroll processing functions can make your back office staff much more efficient so they can get more done in less time.  In our experience, automation solutions never replace jobs but rather they let the existing staff be more effective.  The efficiency created by automation reduces the need to hire additional staff as the business scales up.  Additionally, an often overlooked benefit is that automation keeps the staff you do have much happier with their jobs.  The productivity increases enabled by automation allow humans to focus on the non-automatable processes that naturally occur all the time that cannot be automated away.


  • How often do  you receive complaints about inaccurate invoices or payroll inaccuracies?  Manual human processes are naturally error-prone even for the most diligent clerk.  Automating manual processes means that you set the business rules one-time, up-front and don’t have to worry about the process or math being done the same every cycle.


  • Is there one critical person in your business process that controls everything?  If you’re AR or payroll clerk goes on vacation, gets sick or retires, are your business processes able to execute without interruption?  Process automation reduces the risk to the organization by allowing the business to continue executing as usual even when the personnel change.


Attend our free seminar to learn more!

Hopefully, I’ve convinced you that arming your field workers with new data capture software and mobile devices will dramatically improve information flow between the field and your back-office, resulting in more efficient and scalable processes that enable rather than hinder your workforce from supporting more clients and work.  While I focused specifically on time keeping and work orders, a robust Field Data Capture solution also provides similar benefits when creating inspection reports and asset integrity management.

If you are interested in learning more about how to jump start your own Field Data Capture improvement initiative and are already starting to consider whether a custom and/or packaged data capture solution is the right approach, I highly recommend that you attend our FREE lunch seminar at III Forks on July 17th where Nate Richards will talk provide an overview on Field Data Capture Solutions for Oilfield services.





3 Reasons Why Data Management Should Be Strategic Rather Than Tactical

Global Business Communication

During the 18th International Conference on Petroleum Data Integration, Information and Data Management (commonly known as the “PNEC” conference) on May 20-22, 2014, there was a session in the first day dedicated to the topic of professional data management. During the panel discussion, an attendee asked the following question: Why do we even need a professional role dedicated to managing data, since data service is a supportive role to various operations?

Trudy Curtis, CEO of PPDM Association, answered this by emphasizing that data management (especially data content management) should not be viewed as a tactical function, but a strategic one, needing lots of focus and planning to help businesses truly benefit from the potential of strong, quality data.

Many businesses indeed do not view data management as a strategic function. Below, I will give three reasons why data management deserves to be a key strategic function within any modern digital business.

Data is the blood of modern business workflow

When considering business processes, or workflows, of a business, many would consider them comprising mainly operation procedures. For many of these workflows, the role of data is supportive (i.e. they are the inputs and artifacts of the workflow, but the data itself would not alter how the workflow is run). Enter the digital age, though, and you suddenly have important workflows that cannot run without putting data management into a much more strategic, proactive role.

Suppose that an upstream company manages leases where, upon a certain level of production from the wells of the land, division of interest changes. For example, when the company produces X barrels of crude oil, a particular division of interest doubles (while proportionally shrinking interests of other entities). The data about the leases and the data of production accounting are stored in separate places. In this case, two challenges would occur:

  • If the production level data is not accurate (data quality issue), it may trigger the change of division of interest at the wrong level, or not trigger at the right level. This will bring losses to the company, and/or damage relationships with customers.
  • If the production level data is not accessible from the lease management department (data accessibility), then the whole workflow completely relies on someone at the accounting department to notify the land department in order to make the necessary change. Not only is this cumbersome, but the probability of missing the change notice is very high.

As you can see, today’s workflows are increasingly dependent upon data quality, accessibility, governance, etc. to ensure the execution quality of the process. To minimize negative impact due to data issues, data management needs to be done at a strategic level, so that it can plan forward and ensure that all processes in the company are well supported by the needed data. If there is no plan, when you need it, it will not be there.

Unplanned data cannot give meaningful information

One wave that the industry is catching on is Business Intelligence (BI). By utilizing data integration, dashboard, data warehouse, etc., it provides a powerful platform to generate useful information, helping the business line to make better decisions. There is, though, not enough discussion about the underlying requirement: data quality.

Simply put, the data needs to be a certain quality to support BI objectives. One common challenge is that in order to do a useful rollup of a certain dataset, there will be certain required data often not captured. BI projects rely on well-captured data to be successful and useful; if the data has not been captured, a BI project will not miraculously fix this problem.

As the saying goes: “garbage in, garbage out.” BI projects also rely on data that is in good quality, with accurate and precise data to do correct rollups, so that it can provide adequate and realistic information. In fact, the most costly portion of many BI projects is data cleansing, which is required to make the projects successful.

If the data has already been managed strategically, ensuring certain quality, governance and availability, projects and operations that rely on this data will be much more cost efficient and successful.

Data maturity needs to grow with the business itself

Many people talk about data growth in terms of volume. Data volume is certainly a key factor, but it would be unwise to overlook the fact that data maturity needs to grow with the business itself as well. It will not magically catch up with the business, and ignoring it in the business roadmap can lead to negative impacts.

Realistically speaking, setting up a mature data model and strategy is costly and time-consuming. For small businesses, they need quick wins to maintain positive cash flow; therefore, most small businesses could not afford high data maturity, and “getting the job done” is what they focus on.

As the organization grows, though, the data has to become more mature with the organization. Since the business requirement will expand, or even become different, when the organization grows, the original data model, quality, governance, etc. will not be able to support the growing operations.

Projects to improve data quality, set up governance, ensure accessibility, etc. are expensive and time-consuming, therefore these data improvement projects need to be planned ahead, in accordance with the organization’s roadmap.

Moving forward

Just like IT, data management used to be viewed under the spotlight of a supportive, tactical function. However, in this new digital age, data management deserves better management. Align data management with your company strategy roadmap, and your organization will have a head start to quality data, ensuring operation efficiency and cost savings in the long run.

Data Management for Oil & Gas: High Performance Computing

Data Management and Technology

The oil and gas industry is dealing with data management on a scale never seen before. One approach to quickly get at relevant data is with High Performance Computing (HPC).

HPC is dedicated to the analysis and display of very large amounts of data that needs to be processed rapidly for best use.

One application is the analysis of technical plays with complex folding. In order to understand the subsurface, three dimensional high definition images are required.

The effective use of HPC in unconventional oil and gas extraction is helping drive the frenetic pace of investment, growth and development that will provide international fuel reserves for the next 50 years. Oil and gas software supported by data intelligence drives productive unconventional operations.

Evolving Data Management Needs

As far back as 2008, the Microsoft High-Performance Computing Oil and Gas Industry Survey conducted by the Oil & Gas Journal Online Research Center indicated that many industry geoscientists and engineers have access to the computing performance levels they require.

However, computing needs are growing more complex, so significant room for improvement exists. NumerousOil and Gas: High Performance Computingrespondents believe that making HPC available to more people industry wide can increase production, enhance decision-making, reduce delays in drilling, and reduce the overall risk of oil and gas projects.

Chesapeake is the largest leasehold owner of Marcellus Shale Play, which reaches from Southern NY to West Virginia. They employ HPC  in their shales and tight sands operations.

3-D imaging enables technical staff to detect fine-scale fracturing and directional dependency characteristics. Seismic data provides a structural road map that helps identify dip changes, small faults and natural fracture orientation.

High Performance Computing in the Real World

Chesapeake routinely performs inversions of pre-stack and post-stack data management. Datasets for imaging and inversion support models that represent complex earth structures and physical parameters, where true inversion results are known.

Reservoir maps require constant updating. Advanced pre-stack 3-D techniques are used to extract detailed rock properties that aid in discriminating good rock from bad rock at Marcellus.

Focusing on pre-stack data management has significantly increased computational requirements. Depending on the acquisition method, collecting multicomponent 3-D data can increase data size by orders of magnitude.

Advanced algorithms provide results in a matter of days, making it possible to realistically deal with a lease schedule.

Clustered super-computing systems are becoming well priced and scalable. HPC options are not only realistic, but a requirement for independents who want to bring advanced processing capabilities in house.

Check out this blog post on how oil and gas companies are using data management to improve processes here…

Keys to Business Intelligence

Five key insights from business intelligence expert David Loshin

In a recent interview, David Loshin, president of business intelligence consultancy Knowledge Integrity, Inc., Business  Intelligence Implementation named five key things organizations can do to promote business intelligence success:

  • Design configurable business intelligence dashboards that can provide needed metrics in real time
  • Provide drill-down capabilities for metrics that are of specific concern for the business
  • Ensure agreement about performance goals and targets throughout the organization
  • Create a cultural understanding of how metrics should be used
  • Experiment with different analyses to determine which ones can provide business value

Design configurable business intelligence dashboards that can provide needed metrics in real time

According to Loshin, the key goal of any business intelligence program should be to provide performance metrics in a way that is informative, but not intrusive. In other words, business intelligence dashboards need to be highly configurable in order to make sure that business users are getting access to the exact data they need, without falling victim to data paralysis caused by having to sift through all the data they don’t need.

In addition, business intelligence dashboards need to be able to provide updates in real time, in order to ensure that business users are making decisions based on the most current view of metrics.

Provide drill-down capabilities for metrics that are of specific concern for the business

Every organization wants different insights from their business intelligence solutions. As a result, business intelligence dashboards should not be one-size-fits-all in the insights they provide.

If an organization knows in advance that a specific metric could be particularly helpful for their business, they should plan ahead to make sure their BI dashboard includes drill-down capabilities for that metric, so that they will be able to get a deeper level of insight when the need arises.

Ensure agreement about performance goals and targets throughout the organization

What are the most important insights that can be gained from a business intelligence solution? For some organizations, it’s figuring out the best way to generate new revenue. For others, it may be reducing costs or mitigating risks.

Either way, it’s important that all key stakeholders understand the values that matter most to the business, and know how BI metrics will be used to help meet those performance goals and targets.

Create a cultural understanding of how metrics should be used

An efficient business intelligence solution should allow individuals to take independent action, but there should also be an organization-wide understanding of how each individual is expected to use the insights provided by the BI solution.

C-level executives set the standard for what data is important to monitor, but they won’t be the ones actually drilling down into the data. As a result, it’s important that all business users have an understanding of how BI can help improve their decision-making.

Experiment with different analyses to determine which ones can provide business value

Business intelligence is most likely to be successful when it has executive support, but executives will probably only provide support for programs that have demonstrated value in the past. Loshin compares this situation to a chicken/egg problem: business users need executive support to implement quality BI solutions, but they often need to prove the value of business intelligence solutions before they can get executive support.

To overcome this problem, Loshin recommends undertaking a series of short experiments to find which BI analyses can provide business value, while weeding out the ones that can’t. It’s quite likely that many of the tested analyses won’t prove valuable, but the ones that do should provide sufficient return to make the experimentation worthwhile.

For more, read this post on the ROI for business intelligence

Business Intelligence: What is Hadoop?

Hadoop: The New Face of Business Intelligence

Big data has changed the way businesses handle their business intelligence initiatives, requiring them to capture, process, and analyze exponentially larger amounts of information. Traditional business intelligence tools like relational databases are no longer sufficient to handle the level of data businesses are experiencing.

If businesses are going to take advantage of the insights offered by big data—instead of drowning in the flood of useless and irrelevant data—they are going to need new tools to help them handle the data crush.

Enter Hadoop. In just a few short years, Hadoop has become one of the most powerful and widely used tools for turning big data into useful insights.

What is Hadoop exactly?

It may have a strange name, but there’s no reason to intimidated or confused about what Hadoop actually is. Hadoop is simply an open-source software platform, produced by the non-profit Apache Software Foundation, for the storage and processing of massive data sets.

Hadoop is designed to spread files and workloads across clusters of hardware. This arrangement allows for the increased computational power needed to handle massive amounts of data, and helps organizations protect their workloads from hardware failure.

The Hadoop framework is made up of a number of different modules, including Hadoop Distributed File System (HDFS). HDFS distributes very large files across hardware clusters to ensure maximum aggregate bandwidth. Hadoop MapReduce is a programming model for processing very large data sets.

Why do I need to learn about Hadoop?

Simply put, Hadoop has already experienced a very high level of adoption from the business world. It promises to be the standard tool for big data management going forward.

Hadoop is already being used by more than half of Fortune 50 companies, including major names like Yahoo! and Facebook. Eric Baldeschwieler, CEO of Hortonworks, has predicted that as much as half of the world’s data will be processed using Hadoop by the year 2017.

If your business works with data at all, you need to know the name Hadoop. It will touch your organization in some way, if it hasn’t done so already.

What are the advantages of Hadoop?

Hadoop gives your developers the power to conduct batch processing on data sets that include structured, unstructured, and semi-structured data. This makes it a perfect fit for the realities of today’s big data environment.

It also allows it to succeed in ways that traditional business intelligence tools can’t. It is also highly scalable, and offers enterprise-level big data analytics at a price that midmarket companies can afford.

What are the disadvantages of Hadoop?

With so much fanfare around Hadoop, identifying its shortcomings might seem difficult, but they certainly exist. Hadoop isn’t the simple answer to all of your data management problems.

It’s important that you understand what it can and can’t do before you pursue a Hadoop-based big data solution for your business.

Hadoop is a tool aimed specifically at developers. As a result it can segregate tech users from the business users who actually need to make use of data insights.

If the insights you gain from Hadoop data processing aren’t getting into the right hands, then your Hadoop deployment is just wasting your time and resources.

As an open-source framework, Hadoop should be looked at as a work in progress. Many industry analysts have suggested that the current iteration of Hadoop is not mature enough to provide real-time Business Intelligence Security: Hadoopanalytics or ensure the security of sensitive data. Businesses can gain a lot of value by using Hadoop, but they also need to learn about these limitations first.

For more on this topic, read our three part series on the components of a business intelligence solution

Data Management: Measuring Quality

Data Management for Upstream

In the world of data management, there are two factors that are important to account for: quality and access.

Measuring data quality was the topic of a PPDM presentation by Jeremy Eade, a Subsurface Data Lead at BP. His approach to data management specifically surrounded the consistency of data surrounding wells.

Data Rules and Business Rules

To start out the discussion, Eade covered data rules and business rules. For those Data Management and Qualitywho are unfamiliar, a data rule should only be checking one point of quality at a time. For example, whether the well rig release date must be on or off the spud date.

There should be a data rule for each piece of data that the company cares about. A quality data management program should also have business rules that dictate the governance of the database.

The Four C’s

Eade also covered some more specific data quality dimensions, called the Four C’s. Below are some examples of how they apply in an Upstream situation.

Completeness: Can well data stand on its own? Is relevant data from each system included for each well?

Consistency: Is formatting the same across records? Is the same data recorded for each well? Across databases, is data also consistent?

Correctness: This is where data rules comes in. Some thought should be given to how accuracy can be checked for each type of data.

Currency: When was the last time frac data was loaded into the database?

Quality Measurement in Action

Eade went to describe what happened when they started to examine the specifics of data quality at BP. One observation was that G&G spent far too much time on the quality control side.

In addition, after implementing data rules, the team learned that 20 of BP’s wells had null data. They also found many instances where the spud dates was listed after the rig release date.

Best in class data quality data is not useful to any company if they can’t get to it with an intuitive interface. Read this blog post on data management and presentation for more!

Data Management Problems: Data Cleansing

Data Management and Quality

Many processes in data management affect the quality of data in a database. These processes may be classified into three categories, such as processes that add new data to the database and those that manipulate existing data in the database.

The remaining category includes processes such as data cleansing that reduce the accuracy of the data over time without actually changing the data. This loss of accuracy is typically the result of changes in the real world that isn’t captured by the database’s collection processes.

Arkady Maydanchik describes the issue of data cleansing in the first chapter of his book Data Quality Assessment.

Cleansing in Data Management

Data management: quality issuesData cleansing is becoming increasingly common as more organizations incorporate this process into their data management policies. Traditional data cleansing is a relatively safe process since it’s performed manually, meaning that a staff member must review the data before making any corrections.

However, modern data cleansing techniques generally involve automatically making corrections to the data based on a set of rules. This rules-driven approach makes corrections more quickly than a manual process, but it also increases the risk of introducing inaccurate data since an automated process affects more records.

Computer programs often implement these rules, which represent an additional source of data inaccuracy since these programs may have their own bugs that can affect data cleansing.

Problems with Data Cleansing

Part of the risk of automatic data cleansing is due to the complexity of the rules in a typical data management environment, which frequently fail to reflect the organization’s actual data requirements. The data may still be incorrect after executing the data-cleansing process, even when it complies with the theoretical data model. The complexity and unrelated nature of many problems with data quality may result in the creation of additional problems in related data elements after performing data cleansing.

For example, employee data includes attributes that are closely related such as employment history, pay history and position history. Correcting one of these attributes is likely to make it inconsistent with the other employment data attributes.

Another factor that contributes to the problems with modern data cleansing is the complacency that data management personnel often exhibit after implementing this process. The combination of these factors often means that data cleansing creates more problems than it solves.

Case Study

The following case study from Maydanchik’s book illustrates the risk of data cleansing, which involved a large corporation with over 15,000 employees and a history of acquiring other businesses. This client needed to cleanse the employment history in its human resources system, primarily due to the large number of incorrect or missing hire dates for its employees.

These inaccuracies were a significant problem because the hire date was used to calculate retirement benefits for the client’s employees. Several sources of legacy data were available, allowing for the creation of several algorithms to cleanse the employment data.

However, many of these employees were hired by an acquired business rather than directly hired by the client corporation. The calculation of the retirement benefits was supposed to be based on the date that the client acquired the employee instead of the employee’s original hire date, but the original data specifications didn’t reflect this business requirement.

This discrepancy caused the data-cleansing process to apply many changes incorrectly. Fortunately, this process also produced a complete audit trail of the changes, which allowed the data analyst to correct these inconsistencies without too much difficulty.

This data-cleansing project was completed satisfactorily in a relatively short period of time, but many such projects create errors that remain in the database for years.

For more on solving data management issues, check out this post on managing data entry.