Thursday, May 22, 2014

on the difficulty of avoiding data devalidation

As is my wont, I was updating my website. In this case the key product I was linking to was the NHS Service Finder database. I smacked through a sample search (like you do) and my eye snagged on a thing halfway down the page. A thing I knew had not existed since 2011.

So I went to the button, you know the button, the one that opens a challenging half-remembered route to the data curators who keep your information in a reasonable state of validation (my memory ran something along these lines; there's some forms, and a phone number and a guy and a thing you have to do, but he knows his stuff so it's OK) but the button was gone, and in its place was a simple line of text: "This information was supplied by Serco Global Services on 12-03-2014"

OK, I thought. New route. "Report an issue with this information" link. Insta-auto-holding-reply received. Take it from there. Then I thought to myself, I wonder if [redacted] service is still there?

It was. Four years gone and repeatedly removed from everyone's databases.

But (and this is really the but) it persists, haunting the local databases, or those that don't (or barely) update, linked to long-gone web content, or copies of that content, or copies of copies of that content. We're guilty of it, to an extent ourselves; one of the sources I found was (or at least appeared) official, or from the right source, though I think it was a random scrap lost on a server somewhere.

 24 hours later, I get the email from the person. "This information has been passed to our third party information supply service. Please be advised it may take as long as 6 weeks for any changes to be reflected in the database."

Spare a thought for the information curators. I think about them a lot. Specifically I try to think about how to encourage them to try the emails and phone numbers they copy from database to database. Because they're dead, every one of them. The email, the phone number, even the web address (which was a surprise to me, I thought it was redirect to current content and oh, another thing for fixing).

We never want to throw away data, us humans. I remember the days I kept a careful list of all my database deletions, a list of the discontinuations, each with a justification that no-one would ever ask for. The services, clubs, groups, activities that make you go ooh! are the worst, persisting and re-entering for the sake of being interesting, that common currency of the internet.

Well, I have work to do. Time to attend to the synaptic pruning of the semantic web. 

A final note on skipping the email and phone call stage of three-point data validation and just relying on the web search; Google is adjusting its search around you constantly. If it looks like you're searching for evidence of long-discontinued services, it will give you evidence of long-discontinued services. And if you find yourself thinking, "ooh, that sounds like an interesting service," beware. Others have gone ooh before you.