Dirty Data Again

It happened again.  I was working on a project for which I needed a quick list of all of the 50 states and their abbreviations.  I probably had it somewhere— I’m sure I’ve needed this information before—but I was too lazy to look for it.  If I was in 7th grade, I could probably have just typed them in order from memory.  But I’m not and I can’t.  So I did what any half-way smart person would do.  I googled “State Abbreviations”.  The result returned a lot of sites, as you’d expect.  I decided on using the one for our good old United States Postal Service: http://www.usps.com/ncsc/lookups/abbr_state.txt. Read more of this post

Advertisement

Dirty Data

In the business world, we get data in all kinds of formats and ways.  Often times this data is what I call dirty and in need of cleaning.  The most common culprits can be easily fixed.  Below are some of them and my most oft-used remedies.

Weird space at the end (or beginning) of text in field
Sometimes I download data, copy and paste it from another source, or get it directly from a database.  And, for some reason,  there are fields or cells with spaces at the end.  My experience tells me that this is usually one of two things.  Either somehow (often data entry error) a person typed a space at the end or beginning on accident.  Or non printable characters (such as carriage returns or the tab key) have been appended.  The latter often happens when converting a tab delimited file. Read more of this post