Data Pioneering

Surveying The Situation

IMG_0602.JPG
 

This is part 3 of a three part post on data quality and data strategy. See part 1 and part 2.

Do you encounter data quality issues that impact your reporting or analysis?

Do you wish you knew what was causing these issues and how to resolve them?

Let’s look at some of the most common data challenges.

Improving Consistency

IMG_0682.JPG
 

Why We Care about Data Inconsistencies

In order to easily process, analyze, or report on data it must be consistent. This includes consistent spelling, number and date formats. There cannot be any exceptions because occasional data inconsistencies lead to automated reporting and analysis failures.

What Causes Data Inconsistencies

Data inconsistencies happen when data is manually entered into a system, especially when it’s manually tracked, and where standardization and controls are lacking. Standardization includes drop down lists for categorical data (think names of regions or divisions). Controls can prevent the entry of invalid data to ensure that a required data field isn’t left blank or that a month isn’t entered where there should be a complete date.

How To Resolve Data Inconsistencies

Addressing manually tracked data: To analyze or report on data it must be in a system. Understand the process and why it’s being tracked, ensure that it isn’t already in a system (efforts may be duplicated across divisions), and work with IT to make the necessary system changes and eliminate manual tracking.

Improving system data: Improve standardization and controls at the system level for these data items will resolve these issues.

Filling In The Blanks

IMG_0709.JPG
 

Understanding a process requires comprehensive information about that process. If there is data about only some occurrences of an activity or some of the people then it will be difficult or impossible to extract useful insights about that process.

Where You’ll See Missing or sparse Data

Multi-step long term processes like business development and talent acquisition that require data entry by multiple people about a variety of activities over a period of time, tend to have missing or sparse data issues.

How To Resolve Missing or Sparse Data Issues

If you expect users to enter data in a specific way but find that they aren’t doing what you expected then you have a business process or communication problem. You’ll need to define the process for how and when data should be entered, and make that process clear to all involved. If you want your business development team to enter information about client calls, you’ll need to communicate that to the team.

Dealing With Difficult Data Formats

IMG_0667.JPG
 

What is a difficult data format?

A difficult data format is any format that cannot be easily processed (think PDF files, word documents, text in an email) which make analysis complicated.

Why You Might Receive Data In This Format

It is fairly common to receive data in a difficult format even from mostly automated systems such as building entry systems. Older (legacy) systems especially homegrown systems are more likely to produce outputs in difficult formats. When receiving data second hand rather than data that was extracted specifically for you or your project it’s possible that the original recipient didn’t intend to run an analysis so that format was acceptable.

How To Fix It

Don’t spend time on difficult data formats unless absolutely necessary. The best way to address it is to identify the original data source and request an easy to use format (CSV or Excel). For newer systems this should be simple. Legacy and homegrown systems may require workarounds.

Finding Your Way

IMG_0731.JPG
 

A lot of data challenges are related to how data is entered and many of these issues can be resolved with better definitions, a clearly defined process for data entry, and system controls that prevent inconsistencies like variant spellings.

Sometimes you can’t resolve the issue or the data isn’t collected the way you need it: it’s not frequent enough, detailed enough, or stored for long enough. This requires a more complicated (expensive) solution and you should carefully weigh the costs and benefits before proceeding.

Is your company struggling with data challenges? Contact me at stacey@arielanalytics.com to find out how Ariel Analytics can help.

About the photos: White Mountains, New Hampshire
Thoughts: The White Mountains are home to the famous 6,288 foot Mount Washington, as well as many other 4000+ foot peaks. You can hike, drive or take the train up to the summit of Mt Washington. The mountains aren’t as tall as they are out west, but the extreme weather conditions result in a lower tree line for great views. The Lincoln-Lafayette ridge trail is a great hike and the AMC (Appalachian Mountain Club) hut system gives hikers the option of hiking hut to hut for several days with just their clothing and a toothbrush. The huts provide food, water, bathrooms (no showers) and a place to sleep, but they’re not cheap and you’ll need to book well in advance.

Have a data or analytics question that you’d like to see answered here? Email your questions to stacey@arielanalytics.com.

The day we hiked Mt Washington the weather was perfect and visibility was great. We got lucky, this is pretty rare.

The day we hiked Mt Washington the weather was perfect and visibility was great. We got lucky, this is pretty rare.