Umbraco 5: On performance and the perils of premature optimisation
Great feedback is not necessarily "Guys, everything is awesome and works". While nice star-ratings are always lovely (thanks!), truly 'great' feedback is valuable info about what is broken: the reports of quirks, bugs and any user confusion that show us areas that we've missed and can fix in time for the first production-ready build.
And this is why we're really happy: it's been two weeks since we put that release out just before Christmas, and over the holiday period we've already had around 40 users register at our new v5 issue tracker and a great bunch of useful issues reported that we're already working hard to eradicate.
You have to notice a flaw in something in order to fix it, and we've been delighted to accept pull requests submitted over the holidays from dedicated folks that prove how awesome the Umbraco community is.
One thing which rightly has cropped up, and I consider it a valid issue worth tracking just like anything else, is the speed of running a website on these pre-production v5 builds. If you've taken the builds for a spin, especially on a machine using a spindle-based hard drive and using SqlCe, you'll have noticed it's very slow.
I thought I'd put up a quick post reassuring you that with the testing builds so far, this is partly by design, and explain how it'll receive focus this month as we prep for a production-ready 5.0 at the end of Jan.
If you're not into the technical details, the primary point is that this month is our planned moment when we make v5 shiny, speedy and ready.
Make it work, then make it work good, then make it work fast
Basically, I can sum it up pretty quick: there's no caching in there yet. Job done, right? Blogpost over.
Well, that's actually just a small part of the story.
When putting out these pre-release builds, we've had to strike a balance between features and performance precisely to help the bug-reporting process. And so, the preview builds have run slow, but the bugs that have cropped up have been due to real code rather than caching, and can be reproduced - and fixed - much easier.
So, given you want the car to be fast when it's finished, do you spend weeks optimising the engine now, maybe replacing heavier suspension with the latest trendy lightweight component, even though your beloved creation doesn't yet have any doors? Or do you carry on, and take quantitative measurements later when you are approaching an understanding of the whole car and how it behaves when everything is fitted - and it's heavier?
Well, the automotive engineers amongst us will point out I know nothing about how to build a car :) But when it comes to software, it pays to wait. As I've mentioned before, premature optimisation is a big killer on complex projects. We could have piled hundreds of hours into shaving 5kg off the weight of the car's suspension, only to find out it's actually now top-heavy and rolls in corners. Likewise, a few days could have been spent optimising a v5 database query we thought would be crucial, but it turns out the query only gets run once a month. It pays dividends time and time again to wait until your architecture is at least 80% feature-complete before you can begin to gauge where to put your effort and resources into optimising it in daily use.
Making it production-ready
Starting this week, we've split effort into two tracks to take us to our first production release: resolving critical logged issues, and performance profiling. The below is basically my primary focus for the rest of this month:
Profilers are a fantastic weapon in the developer's armoury. They allow you to get quantitative data about the performance of individual portions of your application, without resorting to tons of trace logging and guesswork.
For .NET code, I like dotTrace from JetBrains, because it allows you to profile unit tests straight from Visual Studio so you can get at performance data for known, repeatable units of work.
Here, I've artificially slowed down a unit test and used the product's "Tracing" technique which gives accurate measurements of how long each of my .NET methods takes. This measuring slows down my application, but it does so uniformly, so as long as you deal in percentage improvements between tests, you're good to go. Below you can see this test exercises a chain of methods involved in finding an entity in Hive when given a route. Basically, one of the first things to happen when a request for a URL comes in:
The level of detail is fantastic for forensic investigations and finding those hotspots.
This is the type of testing I'm doing at the moment and for the rest of this month: looking at common methods, seeing where the bottlenecks are, and working on improving them. You can see here I've reduced the GetEntityByPath method to 10% of its original execution time, and I can measure the impact on the rest of the stack with dotTrace. If this method is used in many places (and it is), the benefits go across the board.
Code itself is not the only thing to blame: sometimes you have a lot of database queries, or they are poorly optimised. Since the default Hive provider in Umbraco 5 uses NHibernate, I can use the excellent NHProf from Hibernating Rhinos to get insight into exactly what queries resulted from a test:
The unit test I've been focussing on today does a lot of work all in one NHibernate "session", and that's really not a problem as I'm in a contrived testing scenario, but it's useful as I can repeat that same scenario and compare the results each time. The above SQL that is selected in the screenshot is 1 query returning 1 row taking <1ms. Prior to today, that same operation was 5 queries, 158 rows and 252ms.
It's a fairly well-worn path from here-on out. We have unit tests that we know represent the status quo of the build out there receiving feedback from the community, and we have a track that is busy working on those bugs already. Matt managed to fix almost 10 in his first day back, which is awesome.
In the meantime this month, we're working on improving the code itself so that we can make smarter optimisations.
My aim as always is 200ms response times for an average page, before adding the catch-all page-level output caching.
A quick note about caching
Code that works well rather than relying on caching is easier to maintain, easier to test, easier to predict, and easier to fix. But that's not to say caching doesn't have a place at all. There is very little caching in the codebase; it's something I prefer to add at the last stage, but it does still require attention. Caching which URL resolves to which piece of content, for example, requires us to cleverly invalidate the cache if any of the content's parents change. But we've put the hooks in place for this type of eventing and this is getting wired-up at the moment. And finally, output caching of common website pages will be enabled in configuration.
Core features for a 5-point-0 release are pretty much there, they primarily just need polishing and speeding-up. We'll be putting out another RC release containing fixes and the first half of the performance improvements next week, so you'll get a version that is more fun to play with. In the meantime, thanks again for all the testing and reporting that you are doing - it's absolutely brilliant so far. If you would like to help out, we always love pull requests, and if like me you get a weird nerdy kick out of code profiling please do get in touch!