| ============================ |
| Database access optimization |
| ============================ |
| |
| Django's database layer provides various ways to help developers get the most |
| out of their databases. This document gathers together links to the relevant |
| documentation, and adds various tips, organized under a number of headings that |
| outline the steps to take when attempting to optimize your database usage. |
| |
| Profile first |
| ============= |
| |
| As general programming practice, this goes without saying. Find out :ref:`what |
| queries you are doing and what they are costing you |
| <faq-see-raw-sql-queries>`. You may also want to use an external project like |
| django-debug-toolbar_, or a tool that monitors your database directly. |
| |
| Remember that you may be optimizing for speed or memory or both, depending on |
| your requirements. Sometimes optimizing for one will be detrimental to the |
| other, but sometimes they will help each other. Also, work that is done by the |
| database process might not have the same cost (to you) as the same amount of |
| work done in your Python process. It is up to you to decide what your |
| priorities are, where the balance must lie, and profile all of these as required |
| since this will depend on your application and server. |
| |
| With everything that follows, remember to profile after every change to ensure |
| that the change is a benefit, and a big enough benefit given the decrease in |
| readability of your code. **All** of the suggestions below come with the caveat |
| that in your circumstances the general principle might not apply, or might even |
| be reversed. |
| |
| .. _django-debug-toolbar: http://robhudson.github.com/django-debug-toolbar/ |
| |
| Use standard DB optimization techniques |
| ======================================= |
| |
| ...including: |
| |
| * Indexes. This is a number one priority, *after* you have determined from |
| profiling what indexes should be added. Use |
| :attr:`django.db.models.Field.db_index` to add these from Django. |
| |
| * Appropriate use of field types. |
| |
| We will assume you have done the obvious things above. The rest of this document |
| focuses on how to use Django in such a way that you are not doing unnecessary |
| work. This document also does not address other optimization techniques that |
| apply to all expensive operations, such as :doc:`general purpose caching |
| </topics/cache>`. |
| |
| Understand QuerySets |
| ==================== |
| |
| Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good |
| performance with simple code. In particular: |
| |
| Understand QuerySet evaluation |
| ------------------------------ |
| |
| To avoid performance problems, it is important to understand: |
| |
| * that :ref:`QuerySets are lazy <querysets-are-lazy>`. |
| |
| * when :ref:`they are evaluated <when-querysets-are-evaluated>`. |
| |
| * how :ref:`the data is held in memory <caching-and-querysets>`. |
| |
| Understand cached attributes |
| ---------------------------- |
| |
| As well as caching of the whole ``QuerySet``, there is caching of the result of |
| attributes on ORM objects. In general, attributes that are not callable will be |
| cached. For example, assuming the :ref:`example Weblog models |
| <queryset-model-example>`:: |
| |
| >>> entry = Entry.objects.get(id=1) |
| >>> entry.blog # Blog object is retrieved at this point |
| >>> entry.blog # cached version, no DB access |
| |
| But in general, callable attributes cause DB lookups every time:: |
| |
| >>> entry = Entry.objects.get(id=1) |
| >>> entry.authors.all() # query performed |
| >>> entry.authors.all() # query performed again |
| |
| Be careful when reading template code - the template system does not allow use |
| of parentheses, but will call callables automatically, hiding the above |
| distinction. |
| |
| Be careful with your own custom properties - it is up to you to implement |
| caching. |
| |
| Use the ``with`` template tag |
| ----------------------------- |
| |
| To make use of the caching behavior of ``QuerySet``, you may need to use the |
| :ttag:`with` template tag. |
| |
| Use ``iterator()`` |
| ------------------ |
| |
| When you have a lot of objects, the caching behavior of the ``QuerySet`` can |
| cause a large amount of memory to be used. In this case, |
| :meth:`~django.db.models.QuerySet.iterator()` may help. |
| |
| Do database work in the database rather than in Python |
| ====================================================== |
| |
| For instance: |
| |
| * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do |
| filtering in the database. |
| |
| * Use :ref:`F() object query expressions <query-expressions>` to do filtering |
| against other fields within the same model. |
| |
| * Use :doc:`annotate to do aggregation in the database </topics/db/aggregation>`. |
| |
| If these aren't enough to generate the SQL you need: |
| |
| Use ``QuerySet.extra()`` |
| ------------------------ |
| |
| A less portable but more powerful method is |
| :meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be |
| explicitly added to the query. If that still isn't powerful enough: |
| |
| Use raw SQL |
| ----------- |
| |
| Write your own :doc:`custom SQL to retrieve data or populate models |
| </topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django |
| is writing for you and start from there. |
| |
| Retrieve everything at once if you know you will need it |
| ======================================================== |
| |
| Hitting the database multiple times for different parts of a single 'set' of |
| data that you will need all parts of is, in general, less efficient than |
| retrieving it all in one query. This is particularly important if you have a |
| query that is executed in a loop, and could therefore end up doing many database |
| queries, when only one was needed. So: |
| |
| Use ``QuerySet.select_related()`` |
| --------------------------------- |
| |
| Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it: |
| |
| * in view code, |
| |
| * and in :doc:`managers and default managers </topics/db/managers>` where |
| appropriate. Be aware when your manager is and is not used; sometimes this is |
| tricky so don't make assumptions. |
| |
| Don't retrieve things you don't need |
| ==================================== |
| |
| Use ``QuerySet.values()`` and ``values_list()`` |
| ----------------------------------------------- |
| |
| When you just want a ``dict`` or ``list`` of values, and don't need ORM model |
| objects, make appropriate usage of :meth:`~django.db.models.QuerySet.values()`. |
| These can be useful for replacing model objects in template code - as long as |
| the dicts you supply have the same attributes as those used in the template, |
| you are fine. |
| |
| Use ``QuerySet.defer()`` and ``only()`` |
| --------------------------------------- |
| |
| Use :meth:`~django.db.models.QuerySet.defer()` and |
| :meth:`~django.db.models.QuerySet.only()` if there are database columns you |
| know that you won't need (or won't need in most cases) to avoid loading |
| them. Note that if you *do* use them, the ORM will have to go and get them in |
| a separate query, making this a pessimization if you use it inappropriately. |
| |
| Also, be aware that there is some (small extra) overhead incurred inside |
| Django when constructing a model with deferred fields. Don't be too aggressive |
| in deferring fields without profiling as the database has to read most of the |
| non-text, non-VARCHAR data from the disk for a single row in the results, even |
| if it ends up only using a few columns. The ``defer()`` and ``only()`` methods |
| are most useful when you can avoid loading a lot of text data or for fields |
| that might take a lot of processing to convert back to Python. As always, |
| profile first, then optimize. |
| |
| Use QuerySet.count() |
| -------------------- |
| |
| ...if you only want the count, rather than doing ``len(queryset)``. |
| |
| Use QuerySet.exists() |
| --------------------- |
| |
| ...if you only want to find out if at least one result exists, rather than ``if |
| queryset``. |
| |
| But: |
| |
| Don't overuse ``count()`` and ``exists()`` |
| ------------------------------------------ |
| |
| If you are going to need other data from the QuerySet, just evaluate it. |
| |
| For example, assuming an Email model that has a ``body`` attribute and a |
| many-to-many relation to User, the following template code is optimal: |
| |
| .. code-block:: html+django |
| |
| {% if display_inbox %} |
| {% with emails=user.emails.all %} |
| {% if emails %} |
| <p>You have {{ emails|length }} email(s)</p> |
| {% for email in emails %} |
| <p>{{ email.body }}</p> |
| {% endfor %} |
| {% else %} |
| <p>No messages today.</p> |
| {% endif %} |
| {% endwith %} |
| {% endif %} |
| |
| |
| It is optimal because: |
| |
| 1. Since QuerySets are lazy, this does no database queries if 'display_inbox' |
| is False. |
| |
| #. Use of ``with`` means that we store ``user.emails.all`` in a variable for |
| later use, allowing its cache to be re-used. |
| |
| #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called, |
| which causes the ``user.emails.all()`` query to be run on the database, and |
| at the least the first line to be turned into an ORM object. If there aren't |
| any results, it will return False, otherwise True. |
| |
| #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling |
| out the rest of the cache without doing another query. |
| |
| #. The ``for`` loop iterates over the already filled cache. |
| |
| In total, this code does either one or zero database queries. The only |
| deliberate optimization performed is the use of the ``with`` tag. Using |
| ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause |
| additional queries. |
| |
| Use ``QuerySet.update()`` and ``delete()`` |
| ------------------------------------------ |
| |
| Rather than retrieve a load of objects, set some values, and save them |
| individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update() |
| <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes |
| <topics-db-queries-delete>` where possible. |
| |
| Note, however, that these bulk update methods cannot call the ``save()`` or |
| ``delete()`` methods of individual instances, which means that any custom |
| behavior you have added for these methods will not be executed, including |
| anything driven from the normal database object :doc:`signals </ref/signals>`. |
| |
| Use foreign key values directly |
| ------------------------------- |
| |
| If you only need a foreign key value, use the foreign key value that is already on |
| the object you've got, rather than getting the whole related object and taking |
| its primary key. i.e. do:: |
| |
| entry.blog_id |
| |
| instead of:: |
| |
| entry.blog.id |
| |