Dirty Data

In the business world, we get data in all kinds of formats and ways.  Often times this data is what I call dirty and in need of cleaning.  The most common culprits can be easily fixed.  Below are some of them and my most oft-used remedies.

Weird space at the end (or beginning) of text in field
Sometimes I download data, copy and paste it from another source, or get it directly from a database.  And, for some reason,  there are fields or cells with spaces at the end.  My experience tells me that this is usually one of two things.  Either somehow (often data entry error) a person typed a space at the end or beginning on accident.  Or non printable characters (such as carriage returns or the tab key) have been appended.  The latter often happens when converting a tab delimited file.

To remedy, I combine two functions:

Clean: Removes all nonprintable characters from text
Trim: Removes all spaces from a text string except for single spaces between words
Combining the two: =Clean(Trim(textString))

Unwanted characters in fields
Adding data validation to fields is not hard to do.  I’ll post on that later as it relates to Excel and Access.  But, in the meantime, we’ll always get data that is not standardized and often dirty.  (There’s that word again).  Some examples: phone numbers formatted in several different ways (with dashes, periods, parenthesis, etc.); Social Security Numbers (some with dashes, some without); Birth Dates (using different formats).  An easy way to remove unwanted characters, when you know what those characters are, is to use the Substitute() function.  For example, if cell A1 contains “888-555-1212″, you could remove the dashes with SUBSTITUTE(A1,”-“,””).  If, instead, A1 contains “(888) 555-1212″, you could remove all non numeric values with nested (a function within a function) Substitute functions:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,”-“,””),”(“,””),”)”,””),” “,””)

YUCK!  Notice how you have to do a separate substitute function for every unwanted character.  Very cumbersome.  There is another way, if you are able to get over the fear of using VBA (Visual Basic for Applications; See an earlier post for a quick note on overcoming that fear).

Using VBA to include only the values you want.

This one is pretty simple.  The code can look confusing, but you can find examples and help like this easily on the web.  The world is eager to help you if you just ask.  By writing this custom function, you’ll be able to easily clean any data so that it includes only the values you want.  The example provided filters to include only alphanumeric characters, but you could add or delete any that you wish.  Once done, you can simply use this new CleanString function [CleanString(stringToBeCleaned)] just like you would any of the built-in ones Excel provides.

VBA Example: CleanString

Advertisement

About bbluford
I am an executive finance professional with a love for process and application development (MS Access, Excel, Quickbooks), mostly as it relates to Accounting and Business Functions. I also love to write and share ideas with other people in this world. I'm an admitted Gym Rat who works out excessively. The best summation of me is that I love to teach and to learn.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: