September 4, 2012 by terryrichwhitehead
I was going to entitle this post as ‘The good, the bad and the ugly’ but I chose the current title not to over dramatise it but dealing with bad data can more akin to the new title. Think about the question; is no data better than bad data? I would argue the case for having no data, as bad data worthless. Bad data causes all sorts of problems the extent of which cannot be accurately quantified. What makes it worse is that people in the main put up with it, well in my opinion they shouldn’t.
IT is blessed with many acronyms, one springs to mind when thinking of bad data and that’s GIGO (Garbage In Garbage Out). If you are putting bad data into an IT system then you can be sure of getting bad data out.
You may think my stance on bad data is a little bit confrontational, well it is meant to be. Bad data causes confusion, it costs businesses money, it wastes time and it causes frustration.
First of all let’s look at confusion. A user carries out some task, gets the results but they don’t look quite right or they are not what we expect. So what happens, they either spend time checking the data in the hope of verifying it’s correctness/incorrectness or they ignore it hoping it will not matter, anyway if it did then it’s some else problem and the cycle continues.
Bad data wastes time in one or two ways, firstly like above a user has a loss of confidence in data and they will spend time checking it. The second is where the bad data has given rise to a series of events which should have never happened in the first place. For example, last year it took a well-known courier made three attempts over three days to deliver a book to my house, why? Because the GPS data they were using was incorrect, in fact three miles incorrect.
Bad data costs money, in the example above bad data would have had a financial impact on the courier. Bad data always costs money, not just because employees have to spend time addressing (no pun intended) the issues that have arisen, but also where money has been or is spent needlessly. I know of instances where excessive stock has been purchased in the retail trade because of bad data, I’ve even known employees not to get a yearly bonus because of data been incorrect.
Frustration, most people get frustrated by bad data, at work it gets in the way of getting the job done and as a consequence it puts more pressure on staff. All this results in staff ending up frustrated. Now there is another serious issue here in that work place stress can also be caused by bad data.
Bad data also affects lives outside of the work place, here’s a true story…
A couple of months ago I was on holiday with the family in the South West of England and one day we decided to visit a small town on a very picturesque river. A few miles out of the town we passed an electronic sign post advertising the number free of spaces in each of two car parks, one was a park and ride which was out of town and had a hundred plus free spaces, the other was down in the town itself and it had over a hundred free parking spaces. So with all those free spaces in the town we drove off down a hill following the signs straight to the car park expecting to see all those free spaces. Now I don’t know if I was at the wrong car park but when we arrived it was full and not only that cars were driving around the car park stalking anyone on foot. It was a hot day and by the occasional honking of horns and waving of arms you could tell that this was causing a bit of frustration. We gave up and headed back up the hill to the park and ride, having wasted both time and fuel as a consequence of bad data.
Now in this case I would have sooner had no data than bad data. I would have then made my own choice and most likely chose the park and ride, saved time, money and the frustration.
So what can be done to reduce the scourge of bad data? Firstly we need to understand what we mean by data, people automatically think data the contents of a database. Well yes and err no. Yes, databases hold data but so do other systems in files and documents. Once we understand what data is we then have to find it. This is no easy task, we know where the HR, CRM systems are, but what about all the others that are used day in day out that are not on the IT radar, yet are still important the functioning of the organisation or business. There’s the Excel spread sheets and the Access databases. Worst still, what about the duplication of data from multiple sources?
I believe there is a very strong case for data audits. Get it all out in the open and see what we’ve got. This does not just mean managers it also means the users who in the main have a fair idea of what going on and what data they are using. Catalogue this data so that we know what it contains, where it is held, where it is used and most importantly who is responsible for it.
Once the data-sets have been identified, drive out the duplications as there should only be one source for each data-set. All data-sets that have been identified should then have their integrity assessed, if they cannot be easy fixed then there are two options. Option one remove all the elements of the data-set that are likely to be incorrect, Option two dispose of the data-set altogether.
You may wonder why I have such an issue with bad data. In a service orientated architecture data sits at its very foundations and any weakness here will percolate up through the various levels infecting every service it touches causing weakness throughout the structure, a bit like subsidence in a normal building producing cracks in the walls.
Over the coming months I will be pushing for an audit of our data sources, I don’t want it to be a witch hunt, I would like it to be open and honest with absolutely no fear of recrimination.