Google Sitemaps and dealing with large sitemaps

26 Feb

I like Google’s webmaster tools. Definitely the best ‘out-reach’ system from any of the major search engines, it is an awesome way to get a quick snapshot on your sites (as Google sees it).

One thing that has recently annoyed me is how it deals with sitemaps. Why can a sitemap only link to 1,000 other sitemaps, and only 50,000 urls in one sitemap? It may seem like an odd comment, but when you have the formula of popular site + lots of users + tags it is easy to generate a lot of pages. Not to say that it would generate over 50 million pages, but I like having my sitemaps categorized. And it would be easier to have a sitemap_tags.xml, sitemap_pages.xml, etc. Maybe I’m lazy, but the 50,000 number seems arbitrary.

Same goes for 1000 other sitemaps. On a recent site I created (standard disclaimer of it being unique – cost me roughly $500,000), there were a total of 16 million pages. Two hours later I’m still struggling with the sitemap system (it complains I have over 1000 sitemaps when in fact I only have roughly 800 (20k links per page)). I’ve gotten as high as 13.9 million pages, and without the ability to actually contact Google Webmaster Team (can’t find that anywhere), I guess I have to settle there. But again – that 1000 limit seems completely arbitrary.

Update 7 hours later: I re-visited the page, and now its finding them all A-Ok.

UPDATE: Read more about this site

Thomas Schulz

February 26th, 2007 at 11:37 am

13.9 million pages… That is quite a large number :)

You could consider only include individuel posts / hubs ?



February 26th, 2007 at 12:06 pm

The problem is is that when you try to make it ‘normally’ indexable (ie – 100 links per page), the level of ‘branches’ requires at least four (eg with three branches, you get 100 links -> 10000 links -> 1,000,000 links). Furthermore, the data is not uniform. One ‘spoke’ may lead to 50 links, whereas another spoke may lead to 1000. In such a situation, the only way to get full coverage is through the sitemaps.