I’m going to smile while I try to steal your content :)

27 Jan

The title I chose pretty much sums up what Original Signal does.

On the face of it, it is one very purty website. No ads, the site displays headlines from the most influential blogs in each category, and even has a nice popup that only shows a a teaser of the content (even with a full feed). Their only source of monetization seems to be their search box on the top – using a Yahoo feed, the top links are sponsored. Some people may have problems with that (after all none of the content is ‘theirs’) but I do not really have a problem with that.

No siree – my problem is with their links.

The first link I saw was to TechCrunch, and as his habit of mine, I looked at the url. Instead of a simple direct link or even tracking link, its a full blow url: http://web20.originalsignal.com/article/4845/stalk-your-contact-list-with-upscoop.html

Why do they need such a URL? Hrmm I thought – maybe to prettify it. So I checked for a robots.txt file – nada. Nothing, zilch. This was odd – a full blown internal link, and no notification to robots that they should not be spidering the page.

Only one reasonable explanation, and one easy check: see how many pages Google has indexed for the site.

And there we have our beautiful back-stab. You find URLs like http://buzz.originalsignal.com/article/431824/acer-computer-pdoduct.html and http://movies.originalsignal.com/article/14405/carmen-loves-praying.html (among many others). Google clocks the site in with over 3,000 pages. I do not follow SEO news much now, but a while back there was a huge-stink about 302 Redirects. Basically sites were doing 302 redirects for outbound links (which to an end user got them to their destination) but confused search engines. When a site with a lot of trust/pagerank (ie Original Signal) did this, search engines would often times rank the offending site (ie Original Signal) and obliterate the original site with a dupe penalty.

And this is exactly what Original Signal is doing. They could link directly to the source (like popurls). If they really wanted to track clicks, all they would have had to do was link to something like out.php?id=xxx or /out/xxx. They could then block it from robots (so that spiders wouldn’t get confused) or use a proper 301 redirect. Nope – they instead chose to build a full url scheme. Users get absolutely nothing out it. Search engine spiders the pages.

Congratulations on helping pollute the web.

4 Responses to I’m going to smile while I try to steal your content :)


Mr. Benji

May 7th, 2007 at 4:22 am

What do you mean with out.php?id=xxx or out/xxx
I don’t see that anywhere back or explained.



May 7th, 2007 at 11:07 am

What I mean is if they wanted to track clicks, it was as simple as doing something like /out.php?id=xxx (every article in their system has an id value). Instead, they constructed full urls – to steal traffic.



January 29th, 2008 at 7:03 pm

the url i left in my reply has been de-indexed in google- shortly after they indexed a 302 redirect from buzz.originalsignal.com to that url.
looks very much as if they are still at it.
Essentially they are a portal for black hatters with many links from their site to blackhat seo sites. They are based in Amersterdam and from what i can work out they derive their income from porno/casino sites who benefit from their blackhat work.



January 30th, 2008 at 8:28 am

Their source of revenue is selling advertising on their sites on the back of the PR they have stolen from news articles and web-sites and from the sale of copied movie material. In addition their income is derived from google adsense on their other sites.