Back in late April 2013, when we released the Alpha version of Kipinhall, we found that even simple DB queries took around a second to execute. Though the user experience was still admirable, something didn't stick. As the number of concurrent users increased, the site started to degrade in performance. It became apparent that we had the classic scaling issue.
An easiest solution would have been to add another box but thats a cop out. Caching was an obvious choice but what would you cache and how would you invalidate entries. So it was time to shut doors & lock the room till we get to the root cause of the issue and find complementing solutions to resolve it.
We started with profiling our code at first, followed by profiling SQLs issued by Django. But we couldn't find any obvious issues with our code or the SQL generated.
During our development, we spent considerable amount of time reviewing SQL queries and ensuring that we
- only fetch what we needed
- get related values at once instead of multiple queries
- paginate queries
- batch inserts
- batch updates
These were just few items from our code review checklist.
We used variety of profilers, both custom and external
* Debug toolbar - No brainer
* Python Hotspot. Guy over
gun.io has a middleware we used.
* Python cProfile Middleware
Unfortunately, nothing stood out from our profiling efforts and that led us to believe that we were looking at the wrong place.
Perhaps a better test would be system under stress, but this time using a proper load testing tool. So we quickly shifted our effort to load test our staging site using Mechanize. The results revealed some interesting issues, which I have listed below along with their corresponding solutions that really made our site fly.
It seems that Django makes 2 DB queries for every request , which wasn't because of our code but it mainly had to do with Django Sessions. When you use database as a session storage, Django makes a call to the database to verify that the current session is valid & another call to the User table. To make matters worse, the call is made on every request. So you can imagine when
N number of requests are made concurrently, you will end up with
2N number of DB queries.
Django provides a way to attach custom information about your users via user-profiles. You can access them via
get_profile method on the
Note that Django 1.5 allows you to completely customize the
User object, so there is no need for additional profile models. Even then, it doesn't solve the frequent access problem.
Our user-profile stored information like thumb image url, class year, majors, college etc. This information is frequently accessed by the dashboard and more so when when we list all users in search results. Since there is no caching of queryset across requests, Django would issue the same SQL queries even if the underlying data didn't change.
The behavior described above is just with one model. Imagine if you're fetching
User models or models that hardly change. You would be executing them on every request. e.g. list of
No Connection pooling
Django, for good reasons, closes connection to the database on every HTTP request. The reason being that one request shouldn't impact another request state. But this comes with a cost, as each new connection is expensive & sometimes in the range of 150ms per new connection.
I am assuming that the Django authors wanted to avoid handling of pooling inside Django because connection pooling is sometimes specific to a database. Since it is database agnostic, it made sense for the authors to shun pool management in favor of simplicity.
Note that Django 1.6 now has persistent connection but it is still advised that we use a third party connection pooling. Django persistent connection is a poor man's implementation of pooling, where instead of closing the connection on every request, it keeps it open for certain period of time. It is inferior compared to a matured connection pooling like Pgbouncer or pgPool, which not only supports pooling but also clustering and parallel query processing.
Template Loading & Compiling
Django templates are loaded from the disk and compiled on every render request.
This is fine on a development environment, since templates are frequently updated by the developers, but pointless on production servers.
On production, they remain the same all along till the next change is pushed to the server.
Caching Sessions with Redis
We moved the Session storage from DB Storage to Redis storage. Moving to Redis with its default persistent store eliminated the excessive session validation queries to DB.
Fortunately, AWS ElasticCache now supports Redis. This is a huge advantage for startups, since you don't have to configure/install Redis on your own. We stayed with the default configuration and hooked up Django to it.
Django-redis-sessions is a Redis session storage that you can simply drop-in into your Django app.
The configuration is dead simple and the docs are sufficient to get started.
With Redis in place, we started seeing some drastic improvements and our load tests started performing a lot better but still not up to our expectations. We were able to shave off ~ 100ms from each request but we still were fetching the same results even if the data didn't change.
Using Johnny-Cache with Memcached as our cache server, Django rarely made DB queries and instead it fetched the cached queryset directly from memcache.
Johnny-Cache creates a unique key for each table, which is then invalidated on every table update. This is ideal for
ready.heavy-write.lite tables but won't make much difference for
The key for each query also takes into account the columns in
select, columns in
where & columns in
Again, we used AWS ElasticCache with one node, that was installed with defaults. Configuration on Django was also dead simple, since it only required the cache server endpoints.
We performed our load testing against the staging server that now had cached querysets. The results were just incredible. It was able to support 3 times the concurrent users than without caching querysets. The beauty of it is that we didn't have to worry about cache invalidation, which to me is a nightmare.
Kipin chose Postgres for multiple reasons. Primarily because I have used it in production and was well versed with its configuration. More than that, I like their development approach which has always been towards data integrity. Furthermore it adheres to the SQL standard which is a huge plus in my opinion.
Back to the connection per request issue, setup/tear is an expensive operation. This issue can be eliminated by not closing the connections on each request but instead return it back to the pool where they stay open. This is what pooling services do & some more.
Furthermore, nothing changes on Django codebase because the pool services offer the same interface as the database, so we simply switch from one port to another.
Switching to Pgbouncer reduced our request execution from 150ms to 10ms, which is like a 90% gain.
Read more @ this blog post for actual performance metrics. Undoubtedly, pooling is a must have for all Django based sites.
pgBouncer or pgPool require some heavy hand configuration and constant tweaking to get to the sweet spot. If you don't have time or rather have Django handle it, I recommend using django-postgrespool if you're still on Django 1.4. Django 1.6 now has inbuilt persistent connections.
This is probably the easiest fix of all. Simply add cached template loader to your
TEMPLATE_LOADERS. This loader wraps other template loaders and caches the compiled template in the memory.
cached template loader uses local memory to cache the compiled templates, it could lead to a bigger memory footprint of your app, especially with a large number of templates.
Furthermore, on a Django restart the cached templates are lost and will require some way of warming up.
So we modified the cache loader to use an external memcache server and noticed no visible latencies between in-memory vs external caching. [but a great deal of performance improvement from no-caching at all].
With just one medium EC2 box, a queryset/session caching server, template memcaching and a connection pooler, our site was able to handle 100+ hits/sec with around a 25ms in client response times.
Note, the measurement was done for a single webpage with 100 items before and after. Furthermore, response time doesn't take into account the media/static files that are loaded from different static server/CDN.
And yeah the landing page @ http://kipinhall.com is a wordpress site, so hold off on your
ab testing :)
On a cautious note, caching or pooling shouldn't be your only solution. We continue to spend 20% of our development time on code-reviews which primarily focuses on our models, how we access/update them and review each SQL generated by Django.
It's time well spent!
Also it is a recurring task to review & adapt to new technologies that perform better than the old ones. For e.g., Jinja2 seems to be faster than the Django built-in templates. We will do some benchmarking against the two and if it yields better results, we will switch to Jinja2 templating. More here.
I intentionally excluded tech details and stats because this is my first blog in the wild and wanted to keep it short - well not anymore.
In the next consecutive articles, I will elaborate more on the technical details and the actual implementation for the above solutions and much more. I will also elaborate on costs involved in running a tech startup, which for some reason isn't talked about either online or offline.
My goal with this blog is to document the journey from day one of the deployment, with one user, to massively scaled websites with millions of users. The latter being the ultimate goal of any startup and so is ours.
So join me!