In the recent blog post discussing updates to the Umbraco Marketplace we mentioned in passing some "behind the scenes" updates we have made to improve functionality and performance of the website and API that provides the content for it. We also wondered if the technically minded readers would be interested in hearing more about why we made these architectural choices and some of the implementation details of how we went about it.
So that's what we've put together in the following couple of articles.
To be clear, there's no need to follow these to use the Marketplace as a solution or package developer - they are purely for interest for anyone who might like to hear a bit more about the work we do at Umbraco from a technical perspective.
The Umbraco Marketplace Solution
Included in the Marketplace solution is a scheduled process that synchronizes information from appropriately tagged packages hosted at NuGet. This is augmented by additional information provided by package developers via a JSON file hosted on their project websites.
We load that information into a SQL database and expose it via an API. We then have a single page application (SPA) based website that consumes the API and displays information about the Umbraco packages to the website visitors.
The existing combination of reading from the SQL database and using response caching works well, but we think there are some improvements we can make via using some more appropriate infrastructure.
Specifically, we are looking at incorporating two further components into our Azure based cloud infrastructure:
- Utilizing a Redis based distributed cache for requests where we are retrieving the full details of a single package by Id.
- Introducing Azure Cognitive Search and using that to support requests where we are retrieving a collection of packages based on a query defining various filter and sort options.
Using Redis as a Distributed Cache for the Umbraco Marketplace
A common approach to avoid going to the database to support every request to an API is to introduce some caching, and the easiest way to do this with .NET is via response caching (previously known as output caching). This is applied via attributes on controller endpoints, where, amongst other things, you can specify a time period for which the response to a given request is cached and returned without querying the database again for the results.
Existing Response Caching
This is the approach we had been using for the Umbraco Marketplace since its original release. It’s worked well, and once the cache is populated it’s extremely fast with the response stored in memory. However, it does have a few downsides:
- If load balancing, the cache is stored in more than one place (in the memory of each web server) and has a chance of not being the same. It’s possible a visitor with requests served from different nodes could see different results.
- With the cache stored in memory, it’s not particularly durable. A deployment, or other application restart, would clear the cache. This means first requests are going to be slower as they will need to go to the database to serve the response.
- It’s not straightforward to invalidate the cache, and so depending on your application, it may not be easy to find an optimum time span for keeping the data in the cache.
This last point was particularly relevant for us, given we were fully in control of populating the data. Rather than having to select an arbitrary time for the cache- that may either lead to serving out-of-date data, or unnecessarily retrieving unchanged data from the database - it would be better to cache for a long time and explicitly invalidate when we know there is a change.
Introducing the Distributed Cache
As an alternative to response caching, we can make use of the IDistributedCache interface, provided by Microsoft as part of .NET. There are a few implementations available for this interface.
The first is an in-memory implementation, which is fast but of course, has the same downsides as the response caching discussed earlier. However, it’s useful for local development, as you can work against the interface without having to have any external component available to hold the cache data.
There is also one for SQL Server, which has the benefit of likely already existing in your web application infrastructure but has the downside of still requiring an, albeit efficient, database query to retrieve the data.
Another, which is what we’ll be using in production, is a Redis-backed implementation available via this NuGet package.
You can see from the following gist how we are using the appropriate implementation based on a configuration value: