| ========================== |
| Serializing Django objects |
| ========================== |
| |
| Django's serialization framework provides a mechanism for "translating" Django |
| objects into other formats. Usually these other formats will be text-based and |
| used for sending Django objects over a wire, but it's possible for a |
| serializer to handle any format (text-based or not). |
| |
| .. seealso:: |
| |
| If you just want to get some data from your tables into a serialized |
| form, you could use the :djadmin:`dumpdata` management command. |
| |
| Serializing data |
| ---------------- |
| |
| At the highest level, serializing data is a very simple operation:: |
| |
| from django.core import serializers |
| data = serializers.serialize("xml", SomeModel.objects.all()) |
| |
| The arguments to the ``serialize`` function are the format to serialize the data |
| to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to |
| serialize. (Actually, the second argument can be any iterator that yields Django |
| objects, but it'll almost always be a QuerySet). |
| |
| You can also use a serializer object directly:: |
| |
| XMLSerializer = serializers.get_serializer("xml") |
| xml_serializer = XMLSerializer() |
| xml_serializer.serialize(queryset) |
| data = xml_serializer.getvalue() |
| |
| This is useful if you want to serialize data directly to a file-like object |
| (which includes an :class:`~django.http.HttpResponse`):: |
| |
| out = open("file.xml", "w") |
| xml_serializer.serialize(SomeModel.objects.all(), stream=out) |
| |
| Subset of fields |
| ~~~~~~~~~~~~~~~~ |
| |
| If you only want a subset of fields to be serialized, you can |
| specify a ``fields`` argument to the serializer:: |
| |
| from django.core import serializers |
| data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size')) |
| |
| In this example, only the ``name`` and ``size`` attributes of each model will |
| be serialized. |
| |
| .. note:: |
| |
| Depending on your model, you may find that it is not possible to |
| deserialize a model that only serializes a subset of its fields. If a |
| serialized object doesn't specify all the fields that are required by a |
| model, the deserializer will not be able to save deserialized instances. |
| |
| Inherited Models |
| ~~~~~~~~~~~~~~~~ |
| |
| If you have a model that is defined using an :ref:`abstract base class |
| <abstract-base-classes>`, you don't have to do anything special to serialize |
| that model. Just call the serializer on the object (or objects) that you want to |
| serialize, and the output will be a complete representation of the serialized |
| object. |
| |
| However, if you have a model that uses :ref:`multi-table inheritance |
| <multi-table-inheritance>`, you also need to serialize all of the base classes |
| for the model. This is because only the fields that are locally defined on the |
| model will be serialized. For example, consider the following models:: |
| |
| class Place(models.Model): |
| name = models.CharField(max_length=50) |
| |
| class Restaurant(Place): |
| serves_hot_dogs = models.BooleanField() |
| |
| If you only serialize the Restaurant model:: |
| |
| data = serializers.serialize('xml', Restaurant.objects.all()) |
| |
| the fields on the serialized output will only contain the `serves_hot_dogs` |
| attribute. The `name` attribute of the base class will be ignored. |
| |
| In order to fully serialize your Restaurant instances, you will need to |
| serialize the Place models as well:: |
| |
| all_objects = list(Restaurant.objects.all()) + list(Place.objects.all()) |
| data = serializers.serialize('xml', all_objects) |
| |
| Deserializing data |
| ------------------ |
| |
| Deserializing data is also a fairly simple operation:: |
| |
| for obj in serializers.deserialize("xml", data): |
| do_something_with(obj) |
| |
| As you can see, the ``deserialize`` function takes the same format argument as |
| ``serialize``, a string or stream of data, and returns an iterator. |
| |
| However, here it gets slightly complicated. The objects returned by the |
| ``deserialize`` iterator *aren't* simple Django objects. Instead, they are |
| special ``DeserializedObject`` instances that wrap a created -- but unsaved -- |
| object and any associated relationship data. |
| |
| Calling ``DeserializedObject.save()`` saves the object to the database. |
| |
| This ensures that deserializing is a non-destructive operation even if the |
| data in your serialized representation doesn't match what's currently in the |
| database. Usually, working with these ``DeserializedObject`` instances looks |
| something like:: |
| |
| for deserialized_object in serializers.deserialize("xml", data): |
| if object_should_be_saved(deserialized_object): |
| deserialized_object.save() |
| |
| In other words, the usual use is to examine the deserialized objects to make |
| sure that they are "appropriate" for saving before doing so. Of course, if you |
| trust your data source you could just save the object and move on. |
| |
| The Django object itself can be inspected as ``deserialized_object.object``. |
| |
| .. _serialization-formats: |
| |
| Serialization formats |
| --------------------- |
| |
| Django supports a number of serialization formats, some of which require you |
| to install third-party Python modules: |
| |
| ========== ============================================================== |
| Identifier Information |
| ========== ============================================================== |
| ``xml`` Serializes to and from a simple XML dialect. |
| |
| ``json`` Serializes to and from JSON_ (using a version of simplejson_ |
| bundled with Django). |
| |
| ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This |
| serializer is only available if PyYAML_ is installed. |
| ========== ============================================================== |
| |
| .. _json: http://json.org/ |
| .. _simplejson: http://undefined.org/python/#simplejson |
| .. _PyYAML: http://www.pyyaml.org/ |
| |
| Notes for specific serialization formats |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| json |
| ^^^^ |
| |
| If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON |
| serializer, you must pass ``ensure_ascii=False`` as a parameter to the |
| ``serialize()`` call. Otherwise, the output won't be encoded correctly. |
| |
| For example:: |
| |
| json_serializer = serializers.get_serializer("json")() |
| json_serializer.serialize(queryset, ensure_ascii=False, stream=response) |
| |
| The Django source code includes the simplejson_ module. However, if you're |
| using Python 2.6 or later (which includes a builtin version of the module), Django will |
| use the builtin ``json`` module automatically. If you have a system installed |
| version that includes the C-based speedup extension, or your system version is |
| more recent than the version shipped with Django (currently, 2.0.7), the |
| system version will be used instead of the version included with Django. |
| |
| Be aware that if you're serializing using that module directly, not all Django |
| output can be passed unmodified to simplejson. In particular, :ref:`lazy |
| translation objects <lazy-translations>` need a `special encoder`_ written for |
| them. Something like this will work:: |
| |
| from django.utils.functional import Promise |
| from django.utils.encoding import force_unicode |
| |
| class LazyEncoder(simplejson.JSONEncoder): |
| def default(self, obj): |
| if isinstance(obj, Promise): |
| return force_unicode(obj) |
| return super(LazyEncoder, self).default(obj) |
| |
| .. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html |
| |
| .. _topics-serialization-natural-keys: |
| |
| Natural keys |
| ------------ |
| |
| .. versionadded:: 1.2 |
| |
| The ability to use natural keys when serializing/deserializing data was |
| added in the 1.2 release. |
| |
| The default serialization strategy for foreign keys and many-to-many |
| relations is to serialize the value of the primary key(s) of the |
| objects in the relation. This strategy works well for most types of |
| object, but it can cause difficulty in some circumstances. |
| |
| Consider the case of a list of objects that have foreign key on |
| :class:`ContentType`. If you're going to serialize an object that |
| refers to a content type, you need to have a way to refer to that |
| content type. Content Types are automatically created by Django as |
| part of the database synchronization process, so you don't need to |
| include content types in a fixture or other serialized data. As a |
| result, the primary key of any given content type isn't easy to |
| predict - it will depend on how and when :djadmin:`syncdb` was |
| executed to create the content types. |
| |
| There is also the matter of convenience. An integer id isn't always |
| the most convenient way to refer to an object; sometimes, a |
| more natural reference would be helpful. |
| |
| It is for these reasons that Django provides *natural keys*. A natural |
| key is a tuple of values that can be used to uniquely identify an |
| object instance without using the primary key value. |
| |
| Deserialization of natural keys |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Consider the following two models:: |
| |
| from django.db import models |
| |
| class Person(models.Model): |
| first_name = models.CharField(max_length=100) |
| last_name = models.CharField(max_length=100) |
| |
| birthdate = models.DateField() |
| |
| class Meta: |
| unique_together = (('first_name', 'last_name'),) |
| |
| class Book(models.Model): |
| name = models.CharField(max_length=100) |
| author = models.ForeignKey(Person) |
| |
| Ordinarily, serialized data for ``Book`` would use an integer to refer to |
| the author. For example, in JSON, a Book might be serialized as:: |
| |
| ... |
| { |
| "pk": 1, |
| "model": "store.book", |
| "fields": { |
| "name": "Mostly Harmless", |
| "author": 42 |
| } |
| } |
| ... |
| |
| This isn't a particularly natural way to refer to an author. It |
| requires that you know the primary key value for the author; it also |
| requires that this primary key value is stable and predictable. |
| |
| However, if we add natural key handling to Person, the fixture becomes |
| much more humane. To add natural key handling, you define a default |
| Manager for Person with a ``get_by_natural_key()`` method. In the case |
| of a Person, a good natural key might be the pair of first and last |
| name:: |
| |
| from django.db import models |
| |
| class PersonManager(models.Manager): |
| def get_by_natural_key(self, first_name, last_name): |
| return self.get(first_name=first_name, last_name=last_name) |
| |
| class Person(models.Model): |
| objects = PersonManager() |
| |
| first_name = models.CharField(max_length=100) |
| last_name = models.CharField(max_length=100) |
| |
| birthdate = models.DateField() |
| |
| class Meta: |
| unique_together = (('first_name', 'last_name'),) |
| |
| Now books can use that natural key to refer to ``Person`` objects:: |
| |
| ... |
| { |
| "pk": 1, |
| "model": "store.book", |
| "fields": { |
| "name": "Mostly Harmless", |
| "author": ["Douglas", "Adams"] |
| } |
| } |
| ... |
| |
| When you try to load this serialized data, Django will use the |
| ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]`` |
| into the primary key of an actual ``Person`` object. |
| |
| .. note:: |
| |
| Whatever fields you use for a natural key must be able to uniquely |
| identify an object. This will usually mean that your model will |
| have a uniqueness clause (either unique=True on a single field, or |
| ``unique_together`` over multiple fields) for the field or fields |
| in your natural key. However, uniqueness doesn't need to be |
| enforced at the database level. If you are certain that a set of |
| fields will be effectively unique, you can still use those fields |
| as a natural key. |
| |
| Serialization of natural keys |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| So how do you get Django to emit a natural key when serializing an object? |
| Firstly, you need to add another method -- this time to the model itself:: |
| |
| class Person(models.Model): |
| objects = PersonManager() |
| |
| first_name = models.CharField(max_length=100) |
| last_name = models.CharField(max_length=100) |
| |
| birthdate = models.DateField() |
| |
| def natural_key(self): |
| return (self.first_name, self.last_name) |
| |
| class Meta: |
| unique_together = (('first_name', 'last_name'),) |
| |
| That method should always return a natural key tuple -- in this |
| example, ``(first name, last name)``. Then, when you call |
| ``serializers.serialize()``, you provide a ``use_natural_keys=True`` |
| argument:: |
| |
| >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True) |
| |
| When ``use_natural_keys=True`` is specified, Django will use the |
| ``natural_key()`` method to serialize any reference to objects of the |
| type that defines the method. |
| |
| If you are using :djadmin:`dumpdata` to generate serialized data, you |
| use the `--natural` command line flag to generate natural keys. |
| |
| .. note:: |
| |
| You don't need to define both ``natural_key()`` and |
| ``get_by_natural_key()``. If you don't want Django to output |
| natural keys during serialization, but you want to retain the |
| ability to load natural keys, then you can opt to not implement |
| the ``natural_key()`` method. |
| |
| Conversely, if (for some strange reason) you want Django to output |
| natural keys during serialization, but *not* be able to load those |
| key values, just don't define the ``get_by_natural_key()`` method. |
| |
| Dependencies during serialization |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Since natural keys rely on database lookups to resolve references, it |
| is important that data exists before it is referenced. You can't make |
| a `forward reference` with natural keys - the data you are referencing |
| must exist before you include a natural key reference to that data. |
| |
| To accommodate this limitation, calls to :djadmin:`dumpdata` that use |
| the :djadminopt:`--natural` option will serialize any model with a |
| ``natural_key()`` method before it serializes normal key objects. |
| |
| However, this may not always be enough. If your natural key refers to |
| another object (by using a foreign key or natural key to another object |
| as part of a natural key), then you need to be able to ensure that |
| the objects on which a natural key depends occur in the serialized data |
| before the natural key requires them. |
| |
| To control this ordering, you can define dependencies on your |
| ``natural_key()`` methods. You do this by setting a ``dependencies`` |
| attribute on the ``natural_key()`` method itself. |
| |
| For example, consider the ``Permission`` model in ``contrib.auth``. |
| The following is a simplified version of the ``Permission`` model:: |
| |
| class Permission(models.Model): |
| name = models.CharField(max_length=50) |
| content_type = models.ForeignKey(ContentType) |
| codename = models.CharField(max_length=100) |
| # ... |
| def natural_key(self): |
| return (self.codename,) + self.content_type.natural_key() |
| |
| The natural key for a ``Permission`` is a combination of the codename for the |
| ``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means |
| that ``ContentType`` must be serialized before ``Permission``. To define this |
| dependency, we add one extra line:: |
| |
| class Permission(models.Model): |
| # ... |
| def natural_key(self): |
| return (self.codename,) + self.content_type.natural_key() |
| natural_key.dependencies = ['contenttypes.contenttype'] |
| |
| This definition ensures that ``ContentType`` models are serialized before |
| ``Permission`` models. In turn, any object referencing ``Permission`` will |
| be serialized after both ``ContentType`` and ``Permission``. |