Post-mature optimization

I leave optimizations to the end, as much as possible. Get the code working, then profile it and optimize the problem areas. It was not always thus, but obviously one of the occasions when I found myself just deleting code that I had spent hours tweaking and perfecting, the penny dropped and I went for pragmatism and patience.

Of course, it’s possible to go too far the other way, and for too long put off optimizations which are [a] blindingly obvious and [b] not that difficult to apply.

Yesterday, my attention was drawn to the performance of Simple.Web, compared to ASP.NET MVC and Web API. Mattias Nordberg, an application performance consultant working in London, sent me a pull request. Not a huge change, Github shows it at 37 additions and 18 deletions. But it increased the requests-per-second of Simple.Web from 600 to 4,500, putting it roughly on par with ASP.NET MVC and 1,000 reqs/sec slower than Web API.

I ended up in a Skype chat with Mattias and one of his colleagues, Tomas. They said they were evaluating frameworks and performance was the main issue for them. Obviously they’d taken a pretty good look at the code, run it through a profiler and so on. The other thing they were worried about, they said, was the routing implementation. That was completely understandable: I threw together the original routing code in a bit of a hurry, if I’m honest, and I’d always intended to go back and tweak it, but I wasn’t sure how much of a block it really was, and how much I could improve it.

How not to do URI routing

  1. Turn every URI template into a regular expression with captures for the parameter names.
  2. Every time a request runs, run all the regular expressions.
  3. If more than one result is found, filter based on media types.

Seriously, that’s wrong, don’t do it. It’s slow and it absolutely devours CPU time.

In the Skype chat, Mattias mentioned something called the Rete algorithm. I was not familiar with it, and he admitted that he’d only just learned about it himself. I read the Wikipedia page and it sounded similar to the “tweak” I had in mind for the routing table, so I went ahead and had a go at implementing it.

I copied the regex-based Simple.Web routing classes into a console app, set up a routing table with 4,096 entries, and timed how long it took a single thread to run 4,096 lookups: 2.8 seconds. Slow. Much slower than I’d expected, but when I thought about it, not really surprising.

A better way to do URI routing

  1. Break each URI template down by its delimiter (i.e., forward slash).
  2. Construct a tree using the unique values at each level of the templates.
  3. Populate the tree with “matcher” objects that only check the relevant part of a URL:
  4. Every time a request runs, break its URI down and run it through the tree.

I think this is at least a nod in the general direction of the Rete algorithm, although in this case it’s been heavily adapted to the specific task of matching URIs to templates. I don’t know if I’ve made it sounds more or less complicated than it is, but you can look at the code if you’re interested. (One of the things that surprised me a little was that it didn’t actually need any regular expressions at all to reproduce the existing functionality.)

Anyway, initial implementation in a spike project took under an hour, and I ran the same performance test, 4,096 entries, 4,096 lookups: 0.04 seconds. 70 times faster. I reported back to Tomas and Mattias, who immediately asked “and with 10,000 routes?” My test harness was working in multiples of eight, so I bumped it up to 32,768 routes, and doing 32,768 lookups.

The old code took 19.5 seconds.

The new code takes 0.27 seconds.

When a route matched to the last template added to the routing table, the old code took 0.2 seconds. The new code takes 0.0005 seconds.

Oh, this is all still on a single thread, of course.

I integrated this new algorithm into Simple.Web last night, and the demands of reality meant that things slowed down a little, but a single lookup against a 32,768 route table still only takes a shade under 1.5 milliseconds, on an Ultrabook with a ULV Core i7 CPU.

What a difference a day makes

In summary, then, when I woke up yesterday, Simple.Web was an epic performance failure, managing 600 reqs/sec. Out of the blue I get a pull request which increases that almost by a factor of 8. Inspired, I finally get round to fixing the other performance bottleneck, with a speed boost somewhere between one and two orders of magnitude depending on how many routes you have. End result, it’s probably a good 10 times faster than it was yesterday, and it won’t slow down as the number of routes increases.

I’ve got some profiling software installed now, and weighttp, and I plan to run a few more tests and see if I can identify any other hideous bottlenecks that can be optimized without too much effort. I want to be faster than Web API.

I’ll be pushing new packages to NuGet in a day or two, but if you want the CI builds, you can add http://teamcity.cloudapp.net/guestAuth/app/nuget/v1/FeedService.svc/ to your package sources and install them direct from the TeamCity server.

Comments

  1. ferventcoder says:

    Awesome!

  2. “The old code took 19.5 seconds.
    The new code takes 0.27 seconds.”

    That is a serious perf improvement, but in the end it would be unnoticeable. That’s across 32,768 routes, so if I’m understanding correctly this would be the equivalent of 32,768 page requests by users. This means on average, each user would see a faster response time of 0.0005950927734375 seconds.

    • @mark – Great work!
      @dotnetchris – What about a high traffic site? It starts to add up.

    • As Mike says, on a high-traffic site (hundreds or thousands of requests per second) the time spent resolving routes is time not spent doing anything else, especially because it’s CPU-bound. If it takes 20 seconds to resolve 30,000 routes, your maximum requests per second is 1,500, which is rubbish.

      I also made the comparison for single request, which went from two tenths of a second to half a millisecond, before the handler has even started running the actual application code. That’s an unacceptable delay.

      • ” If it takes 20 seconds to resolve 30,000 routes your maximum requests per second is 1,500″ that’s assuming there’s 1 thread, which in real world hosting inside IIS there would be many orders of parallelization. I agree with you that it’s great for them to make that magnitude of a perf improvement, one that measures with lots of 0s in the % increase. I’m just saying with how faster servers are, easily dual-proc octo core HT so even a tiny server can have 32 effective cores, and apps like IIS really know what to do with that.

        Would it reduce throughput of the site? Sure, somewhat, but your site would still be nearly 100% bound to IO. It’s always IO. Whether it’s network or disk that’s always the largest controller of throughput.

        • It’s increasingly common for apps to be hosted in cloud environments, where you pay for every core or process you use. Think Azure or AppHarbor. So reducing CPU overhead as much as possible is probably a more worthwhile pursuit today than it was 5 years ago.

          Since Simple.Web is a non-blocking async framework, IO shouldn’t be an issue in terms of scalability. Therefore improving the performance of blocking operations is, again, a key factor in realising that potential scale.

          • IO will still likely become the limiting factor. Even with full async, there’s only so much work that can be done concurrently before everything starts queuing up anyway. Or even suppose you have infinite concurrency throughput, how much load can you pass to the database, file system, cloud file system before you end up queuing or overloading those resources and then wait for them even in fully nonblocking calls. It’s always IO as the bottleneck (short of developer error, releasing a site with Thread.Sleep(5000) left in it sure won’t have IO as a bottleneck)

Trackbacks

  1. [...] Post-mature optimization – Mark Rendle discusses performance optimisation, kicked off from receiving a pull request from Mattias Nordberg which helped improve the performance (in terms of request throughput) of Simple.Web [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,750 other followers

%d bloggers like this: