PageRank & Scaling – the problems of being ‘big’

31 Mar

With big sites like Blog Flux, when you deliver a service, you better have it ready for scaling.

We cache every single pagerank request to our Pagerank Checker. Obvious reason – PR updates rarely, and when it does, we can just flush the cache. In the meantime, if we keep asking Google the pagerank every time we get a request, we end up with 1) slower response time and 2) higher chance of being banned by Google.

I ran the math – we have a total of 33 gigs cached on our server. At an average size of 225 bytes per PR image (0-10), that comes out to almost 150,000,000 different URLs checked for pagerank by us! This is only a count on unique URLs – since the last pagerank reset we have delivered over 10 billion images.

I don’t remember the exact date, but this thread puts the last PR update at January 9. We have had roughly 90 days elapse since then. Number of images we’ve served up:

  • Every day: 111 million
  • Every hour: 4.6 million
  • Every minute: 77,000
  • Every second: 1300

Average response time is roughly 0.10 seconds.

For the domains we have PR cached for, the most popular first letter was ‘s’, followed by ‘m’ (way behind). The least popular was ‘q’, with ‘z’ about 2x more popular. ‘Q’ was roughly 2.4% as popular as ‘s’

Comments are closed.