Lessons From Scale: Django
This February, I visited Pune to speak at PyCon Pune 2017. My talk was basically a brief summary of my learning after having been working with Django in production for the past couple of years, as part of building the DoSelect stack.
Why choose Django?
The last thing that I want to spark off is a flame war over frameworks here. There are a lot of good ones; Flask is one of my favorites, but if you are building something of a greater scope than a one-off web service, you’re going to need the batteries anyway. The pace of prototyping and development, in general, is pretty fast as a result.
Django is an opinionated framework and makes a lot of decisions for you. While most of the design patterns are common sense, it’s very easy to break out of the box when you want to. This gives you a lot of flexibility as and when your application grows in scale and use case.
Lastly, Django has great community support, and it has been around long enough to be considered a mature project.
Tuning your WSGI server
We’ve been using Gunicorn, and it has been pretty awesome for us. As easy it is to setup, it’s important to tune your conf to get the maximum juice out of it. Introspect your application’s needs and take the following decisions:
- Determine worker_class — you need to know when to use sync and async workers. Async workers work great when your load is I/O bound and there are no CPU-heavy processing involved in a request-response cycle. Use sync workers when you are doing a lot of processing. This way, Gunicorn utilizes all available CPU cores optimally, and your response times are sane.
- Adjust parameters like workers, worker_connections, and keep alive wisely. A trip to Gunicorn config documentation is worth your time.
- There is such a thing as too many workers. Even if you scale your CPU and memory, you cannot infinitely scale the number of workers, since the efficiency curve plateaus eventually. Horizontally scaling your hosts is needed in the end.
Tuning your proxy server
NGINX is a great reverse proxy server — it’s lightweight and scales very well with increasing load. A few tweaks and you can optimize the perf.
- Set worker_process auto, which will automatically scale your NGINX processes with the available cores. This is not the default setting.
- Adjust keepalive_timeout. This should match the timeout on your WSGI server, so you don’t run into timeouts for requests that take more time.
- Turn on tcp_nopush and tcp_nodelay.
- gzip all the things — if you’re not doing it already!
What’s taking so long?
If you have a client facing web-app, Chrome Developer Tools should be your best friend. Measure all the requests processed by Django, and take a detailed look into the response times — find the rate limiting step, and optimize it. Rinse. Repeat.
If you make a lot of complex queries, use EXPLAIN ANALYZE on those queries from your SQL command line. This is a nice and easy way to identify which queries are taking a lot of time, so you can optimize them. When using the Django ORM, developers generally don’t think too much about what the actual queries are. Tailing the database logs is a fun exercise that every Django developer should try at times.
Caching
If you’ve scaled your app servers horizontally, it’d be good to use Redis as your primary cache. Redis works wonderfully in such use case.
When you are using Django’s contrib User model, and all your auth works on top of this, start by caching all user sessions. Again, this is not the default Django conf, so a lot of people tend to miss it. While you are at it, you’d want to cache your User model lookups as well, since these resources are being read way many more times than being written to. After that, depending on your use case, you’d want to think about resource-level caching.
As a rule of thumb, use a CDN for all things that the user cannot change. This includes static files, images, and HTML templates (if you have a single-page app).
Django ORM
The ORM is awesome, and one of the primary reasons why Django has been adopted so widely. But after you’ve run your Django app in production for a considerable time, you’d realize that it’s not the silver bullet after all. Take a peek under the hood, and look at the queries the ORM is making. Since it’s so easy to use the ORM, it’s equally easy to use it the wrong way and axe your foot. Do not fear from breaking away from the ORM when you need to.
Automatic relationship access can bite you when you are using something like Django Rest Framework or Tastypie for creating API resources. It’s better to expand relationships carefully. Add extra indices where needed.When you are building a complex application, chances are you’re gonna need a lot on-the-fly processing of data that you get from the DB before you can send it as a response. While a lot of things can be denormalized for better access, this is not feasible in most cases.
For example, the test reports on DoSelect consist of a lot of derived metrics about the test-taker. Most of these metrics are hard to denormalize since they depend on attributes that can change arbitrarily — like the qualification status (which depends on the test cut-off), percentile (which depends on the number of test takers), etc. These derived attributes are also used to sort the leader boards.
For example, the test reports on DoSelect consist of a lot of derived metrics about the test-taker. Most of these metrics are hard to denormalize since they depend on attributes that can change arbitrarily — like the qualification status (which depends on the test cut-off), percentile (which depends on the number of test takers), etc. These derived attributes are also used to sort the leader boards.
Instead of doing these derived calculations in Python, it’s better to do this in the database itself, and query with the result. One example is time taken metric. Normally, you’d need to store start and end times separately, since you might wanna change them. So instead of denormalizing time taken as a separate field, just calculate this in the database.As a rule of thumb, always denormalize data which has no bounds — like number of comments on a post. If a post has, say. 20k comments, you’d better read an integer then perform a COUNT query every time. Dehydration, which means on-the-fly calculation, is better when you know the bounds — like on the previous paragraph, the time was taken.
Scaling your database
Always use database-level connection pooling — which works very well when you are scaling your services horizontally. Django already does application level connection pooling, but things can get complicated when you are using a Celery worker — and you will end up using Celery workers. If you’re using PostgreSQL, pg bouncer is a drop-in solution for this.
Monitor all the queries to see what’s holding you back.
If you’re using a streaming replication of your database, you might want to look at segregating your reads and writes. You can offload all your reads from slaves, and dedicate all writes to the master.
Aside: If you have a use case that involves a lot of filtering, in addition to search, you might want to offload all your list reads to the search engine as well. Elastic search is a great search engine and is optimized for reads. You’d be surprised by the performance boost.
Full talk slides can be found here: https://sanketsaurav.github.io/django-on-steroids. If you have any questions related to this, I’d be happy to answer them. Add a response here, or hit me up on Twitter.