Scrubbing Local Data

19 Feb

I partook in a small but interesting discussion a while ago about how bad local data is out there. Not just bad, but also impossible to clean up.

I’ve been taking the time lately to go back through iBegin and ‘scrub’ our data. As it happens, the raw data we purchase is far from perfect (duplicates galore, mis-categorizations, etc). It is essentially a ‘risk’ we take. But that isn’t the end of it – even franchises suffer from big problems when it comes to local data.

Case in point: McDonald’s. You cannot get a more recognizable name. But do note its name – McDonald + + s. Not McDonalds, not Mc Donalds, not MacDonalds, or the other dozens of varieties.

So while we went through, pass-by-pass (basically you create rules, ‘run’ the rules on the data, tweak the rules, and then re-run) through our data, I wondered what my esteemed competition was upto.

Looks like not much. Checking them out:

  • InsiderPages. IP couldn’t even handle the single-quote. It did local McDonald’s, so I guess we give it a pass
  • Yelp. I found McDonalds, McDonald’s, McDonalds Restaurant, and other variations. A few of the clickable links had a link to the official website, but most did not.
  • Google Maps. A bigger mashup than all the rest combined. No standardized name (Whats a Restrnt), some are categorized, most aren’t. No link to the official website.
  • TrueLocal. A few results, various spellings, no link to the official site except for the first link. Interestingly if you click on the first result, it shows you many more McDonald’s listings. Those are accurate, but why aren’t they in the default search results?
  • Yahoo Local. Correctly spelled and all results link to the main website. Perfect

Really my point here (amidst the connections in my brain) is that if companies cannot even get the data on the largest franchise in the world right, how are they going to cover data on small businesses?

Its a mind-boggling problem.

1 Response to Scrubbing Local Data


Yahoo Local: I’m Powerful. No Wait, I’m Confusing. - Tech Soapbox

February 19th, 2007 at 7:12 pm

[...] When it comes to the US, Yahoo! Local is by far the best site. As I outlined in my previous post about scrubbing local data, they have taken extra steps to make sure their data is accurate and clean. They have a ton of data and information – from local reviews to web-results to even extra information gleaned from sources like Delicious. [...]