Umbraco 5: On performance and the perils of premature optimisation

Wednesday, January 04, 2012 by Alex Norcliffe

imageJust before Christmas we released the latest pre-production build of Umbraco 5 for testing, and so far we've had about 800 downloads and great feedback.

Great feedback is not necessarily "Guys, everything is awesome and works". While nice star-ratings are always lovely (thanks!), truly 'great' feedback is valuable info about what is broken: the reports of quirks, bugs and any user confusion that show us areas that we've missed and can fix in time for the first production-ready build.

And this is why we're really happy: it's been two weeks since we put that release out just before Christmas, and over the holiday period we've already had around 40 users register at our new v5 issue tracker and a great bunch of useful issues reported that we're already working hard to eradicate.

You have to notice a flaw in something in order to fix it, and we've been delighted to accept pull requests submitted over the holidays from dedicated folks that prove how awesome the Umbraco community is.

imageOne thing which rightly has cropped up, and I consider it a valid issue worth tracking just like anything else, is the speed of running a website on these pre-production v5 builds. If you've taken the builds for a spin, especially on a machine using a spindle-based hard drive and using SqlCe, you'll have noticed it's very slow.

I thought I'd put up a quick post reassuring you that with the testing builds so far, this is partly by design, and explain how it'll receive focus this month as we prep for a production-ready 5.0 at the end of Jan.

If you're not into the technical details, the primary point is that this month is our planned moment when we make v5 shiny, speedy and ready.

Make it work, then make it work good, then make it work fast

Basically, I can sum it up pretty quick: there's no caching in there yet. Job done, right? Blogpost over.

Well, that's actually just a small part of the story.

When putting out these pre-release builds, we've had to strike a balance between features and performance precisely to help the bug-reporting process. And so, the preview builds have run slow, but the bugs that have cropped up have been due to real code rather than caching, and can be reproduced - and fixed - much easier.

Imagine you're good with tools, and you're building a family car from a kit. You only have the engine fitted to the chassis and some wheels at the moment, and it goes OK but it's a bit slow already.

So, given you want the car to be fast when it's finished, do you spend weeks optimising the engine now, maybe replacing heavier suspension with the latest trendy lightweight component, even though your beloved creation doesn't yet have any doors? Or do you carry on, and take quantitative measurements later when you are approaching an understanding of the whole car and how it behaves when everything is fitted - and it's heavier?

Well, the automotive engineers amongst us will point out I know nothing about how to build a car :) But when it comes to software, it pays to wait. As I've mentioned before, premature optimisation is a big killer on complex projects. We could have piled hundreds of hours into shaving 5kg off the weight of the car's suspension, only to find out it's actually now top-heavy and rolls in corners. Likewise, a few days could have been spent optimising a v5 database query we thought would be crucial, but it turns out the query only gets run once a month. It pays dividends time and time again to wait until your architecture is at least 80% feature-complete before you can begin to gauge where to put your effort and resources into optimising it in daily use.

Making it production-ready

Starting this week, we've split effort into two tracks to take us to our first production release: resolving critical logged issues, and performance profiling. The below is basically my primary focus for the rest of this month:

Code profiling

Profilers are a fantastic weapon in the developer's armoury. They allow you to get quantitative data about the performance of individual portions of your application, without resorting to tons of trace logging and guesswork.

For .NET code, I like dotTrace from JetBrains, because it allows you to profile unit tests straight from Visual Studio so you can get at performance data for known, repeatable units of work.

Here, I've artificially slowed down a unit test and used the product's "Tracing" technique which gives accurate measurements of how long each of my .NET methods takes. This measuring slows down my application, but it does so uniformly, so as long as you deal in percentage improvements between tests, you're good to go. Below you can see this test exercises a chain of methods involved in finding an entity in Hive when given a route. Basically, one of the first things to happen when a request for a URL comes in:

image

The level of detail is fantastic for forensic investigations and finding those hotspots.

This is the type of testing I'm doing at the moment and for the rest of this month: looking at common methods, seeing where the bottlenecks are, and working on improving them. You can see here I've reduced the GetEntityByPath method to 10% of its original execution time, and I can measure the impact on the rest of the stack with dotTrace. If this method is used in many places (and it is), the benefits go across the board.

Code itself is not the only thing to blame: sometimes you have a lot of database queries, or they are poorly optimised. Since the default Hive provider in Umbraco 5 uses NHibernate, I can use the excellent NHProf from Hibernating Rhinos to get insight into exactly what queries resulted from a test:

image

The unit test I've been focussing on today does a lot of work all in one NHibernate "session", and that's really not a problem as I'm in a contrived testing scenario, but it's useful as I can repeat that same scenario and compare the results each time. The above SQL that is selected in the screenshot is 1 query returning 1 row taking <1ms. Prior to today, that same operation was 5 queries, 158 rows and 252ms.

It's a fairly well-worn path from here-on out. We have unit tests that we know represent the status quo of the build out there receiving feedback from the community, and we have a track that is busy working on those bugs already. Matt managed to fix almost 10 in his first day back, which is awesome.

In the meantime this month, we're working on improving the code itself so that we can make smarter optimisations.

My aim as always is 200ms response times for an average page, before adding the catch-all page-level output caching.

A quick note about caching

Code that works well rather than relying on caching is easier to maintain, easier to test, easier to predict, and easier to fix. But that's not to say caching doesn't have a place at all. There is very little caching in the codebase; it's something I prefer to add at the last stage, but it does still require attention. Caching which URL resolves to which piece of content, for example, requires us to cleverly invalidate the cache if any of the content's parents change. But we've put the hooks in place for this type of eventing and this is getting wired-up at the moment. And finally, output caching of common website pages will be enabled in configuration.

 

Core features for a 5-point-0 release are pretty much there, they primarily just need polishing and speeding-up. We'll be putting out another RC release containing fixes and the first half of the performance improvements next week, so you'll get a version that is more fun to play with. In the meantime, thanks again for all the testing and reporting that you are doing - it's absolutely brilliant so far. If you would like to help out, we always love pull requests, and if like me you get a weird nerdy kick out of code profiling please do get in touch!

11 comment(s) for “Umbraco 5: On performance and the perils of premature optimisation”

  1. Gravatar ImageAgha Says:

    Awsome work guys!! I am sure you will also start putting some tutorials together for Umbraco 5.0 especially for those developers who are new to Umbraco and also new to ASP.net MVC. If there are already some out there please let us know.

  2. Gravatar ImageDan Diplo Says:

    Performance is definitely a critical factor and I was always confident you'd address it. As a developer I totally agree with your methodology - it makes perfect sense.

    One thing I'd be curious about is whether moving to a database driven system, rather than an XML based system, has hindered or improved performance? I'm aware that relational database are not always the best suited to the kind of hierarchical queries that are common in Umbraco (such as recursive searches up the tree). Add on top of this an ORM abstraction layer, and I can imagine it would be difficult to optimise many operations to a single SQL query. Is this an issue?

  3. Gravatar ImageAgha Says:

    I have come accross this wiki only http://jupiter.umbraco.org/. Good starting point.

  4. Gravatar ImageJeroen Breuer Says:

    @Dan Diplo Umbraco 5 doesn't have to run from a database. It can also run from Examine or both. That will also increase performance. More info here: http://our.umbraco.org/forum/core/umbraco-5-general-discussion/26002-Hive-provider-for-Examine-questions

  5. Gravatar ImageChris Canal Says:

    NHibernate is great for tuning query optimization, MultiCriterias and Projections are the performance tuners best friend :)

  6. Gravatar ImageDan Diplo Says:

    @Jeroen Breuer Interesting post on using Examine hive provider. Doesn't Razor in 4.7 do something a bit similar with Media items?

    Though like many I would worry about stability if Examine was the *only* provider. I'd also point out that Lucene isn't any more optimised for relational retrieval than are RDBMs!

  7. Gravatar ImagePinal Bhatt Says:

    Great systematic approach. Jupiter is now moving to words a product to be called as great web platform or framework for building websites.

  8. Gravatar Imageandrew Says:

    great work!!

  9. Gravatar ImageMike Says:

    In the wake of Umbraco abandoning V5, I can't stop thinking about all the bad advice in this blog post.

  10. Gravatar ImageEd Says:

    Another "premature optimization is the root of all evil" excuses for writing bad software....bet someone wishes they could go back in time and rewrite this blog post, or better yet go back in time and do some 'premature' optimization.

  11. Gravatar ImageBBI Says:

    In my opinion, premature optimization is still a fault, the Umbraco disaster, now more than one year ago, was the result of wrong perception of what NHibernate is.

    I am working with NHibernate myself, on a software product where it was once introduced as a mapper - query a database to create objects, send insert/update when saving changes. Eliminate adaptions to different databases, MsSql, Oracle, same code in the application.

    But it is in fact far more than just a mapper: Loading strategies to avoid the N+1 vs. "monster join" (cartesian product) problem, caching, optimistic locking and much more.

    So, many who don't know about NHibernate's nature fall into these traps: opening new sessions for every single DB query, no strategy to avoid n+1/cart. product, not linking objects in the model, and, in general, a data model not adapted to the needs of NHibernate.

    I think the problem at Umbraco was not to avoid premature optimization, but (among others) a misleading concept of NHibernate usage and what users were doing with the additional layer (expressions causing extremly harmful masses of database queries.

Leave a comment