Monthly Archives: June 2006

The Tom Kyte Effect

Hits, pageviews, impressions….call them what you like, but it’s amazing the effect it has on those items when someone with a popular site links to you.

It all started with Dimitri Gielis advertising on the Apex Forums about a WorldCup application he’s written in Application Express. It’s a fantastic application that allows people to bet (without any money involved) on the outcome of matches in the World Cup.

I offered to host the application on Shellprompt, as an incentive for people to get involved I also offered a years free Apex hosting to the winner (although sorting out what to do in the event of a tie proved to be a headache!). Oh yes, I also made the mistake of entering the competition myself. In the unlikely event that I actually win the free hosting, I’ll give up my “prize” to the person who comes after me (in the competition I mean…not physically comes after me in a threatening manner!).

Dimitri did a great job getting the application up and running on the Shellprompt servers without too much trouble (which is a great showpiece to how easily Apex application can be deployed).

Anyway…that was yesterday….

Today, Tom Kyte blogged about the application, as did Scott Spendolini and also Sergio Leunissen

To say that the application experienced a spike in visitors is an understatement. The “hits” went up tenfold once it was mentioned by Tom, Scott and Sergio.


You can clearly see the spike at 1pm once the posts had started to appear on those blogs.

Now, don’t worry…the interest in the application hasn’t started to decline already…what happened was once I saw that many hits coming in Dimitri and I started to investigate how we could tweak the application to make it ‘scale’ a bit better.

The ‘hits’ in that graph represent each discrete webrequest that was made to the server, for example if a webpage contained 10 images then (roughly speaking) you could expect to get 11 hits when someone viewed that page (one hit for the HTML of the page itself and a seperate hit for each of the images).

The reason we saw 10,000 hits wasn’t because there were 10,000 visitors, it was due to the way that Apex handles serving any images that are stored within the database itself. By default that file/image will always be downloaded since there is no expiry header information sent included in the standard download procedure. Fortunately you can write your own procedure which then allows you to include any HTTP headers you wish to include, an example of this is available on the Apex Wiki under the Image Constantly Refreshing section.

The World Cup application displays a flag for each country and each of these flags is stored in the database, therefore by including the Expiry headers in the download procedure we were able to save the database being queried every time the user browsed the page. Since those flags aren’t going to change (assuming no country invades another during the World Cup!) then we were able to cut down on a huge amount of essentially redundant calls.

Similarly, the header image for the Application was stored in the database and was being requested everytime the user browsed the page. Now whilst 70Kb isn’t a big size for an image these days, it’s also not necessary to request it everytime, so I added that image to the filesystem and added a few mod_expiry rules to make it expire in a number of days (rather than the default of 1 hour).

Now you might read this (if anyone is actually reading this?) and think “whats all the fuss about? It’s only a small amount of images”. However, in my opinion, this is what building systems that scale is all about. It’s about looking at what you’re doing and evaluating each and every step to see whether it’s strictly necessary and whether there’s any other ways to do it.

There is also one other big lesson in this…if you don’t have a method of measuring things, then you will never know whether you’re making things better or worse. I took the time a few months ago learning how to write Apache modules, so that I could create some custom modules that I use for measuring and recording statistics for websites that are hosted on the Shellprompt servers. Those modules have a performance impact of course, however the impact is extremely minimal and the benefit of having a more or less realtime way of querying the website traffic is worth its weight in gold.

So, whilst the graph I showed earlier may look like the traffic spiked then tailed off, what is actually happening is that the site is getting more less the same number of visitors since the “Blog effect”, but the application now works in a far more efficient manner. Whilst the number of hits seems to have artificially fallen, the fall in the bandwidth consumed is a true reflection of how much more streamlined the application is now (since the flags and header image aren’t being repeatedly downloaded by each user).


This is all completely transparent to the end user of course, since as far as they are concerned the application looks exactly the same. However it now definitely consumes less bandwidth (which is good from my perspective), it should also load faster (which is good from the users perspective) and it also makes fewer demands on the database (and it’s always good to be a ‘nice neighbour’).

It’s been good working with Dimitri on this today, he’s made a great application and I’m sure people are going to have a lot of fun with it.

The World Cup application can be found here. I encourage everyone to sign up and place your bets!