More on Twitter’s performance woes…

As you might have seen over on ComputerWeekly Site Confidence has released some interesting statistics on Twitter’s performance woes over the last few weeks, particularly as a result of the World Cup Football driving more ‘tweeting’.

The headline figures in the press release pretty much tell the main story:

“Twitter experienced 1.87 per cent downtime between 1 June and 15 June 2010 – representing a total of six hours and 44 minutes – compared with just 0.18 per cent for the whole of May, according to research from website monitoring and load testing specialist, Site Confidence, an NCC Group company”

It’s worth taking a look a bit deeper to tease out some more interesting information however…

Firstly, lets have a look at the site’s performance over the last 52 days and you can notice a significant increase in download time at the end of March, which corresponds to an increase in page size from about 200K to 400K, which probably equates to a new site release.

What’s more important, from a performance perspective, is the mix/max error bars (orange lines) which show an increased range from max to min – often a key sign of a site under performing under load.

Which may add up to a new site release that wasn’t adequately performance tested, or that the infrastructure scaled up to meet the demands of the new platform?

twi-1

Secondly, we can look at the performance over the last 3 months by “hour of day” and again you can see a distinct increase in average page download speed at different times of the day (note that 1500 to 1700 BST would equate to the start of the working day in the US).

twi-2

The variability also shows the same pattern – when the site if “fastest” (~0900 BST) the range between minimum and maximum is small. As the performance degrades the min/max variation increase.

Yet again, this is a key warning sign of an impending performance problem!

Scaling solutions like Twitter is no easy task, particularly since the read/write ratio is very different from many other web sites e.g. traditional media publishing or job boards etc, but the key lesson for anyone is to have a robust capacity & performance management plan, particularly when you reach those “points of inflection” on the performance curve when “simple” horizontal or vertical scaling is no longer the answer.