'Django dumpdata UTF-8 (Unicode)

Is there a easy way to dump UTF-8 data from a database?

I know this command:

manage.py dumpdata > mydata.json

But the data I got in the file mydata.json, Unicode data looks like:

"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"

I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).



Solution 1:[1]

After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.

manage.py dumpdata --format=xml > output.xml

I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.

Hope this will help someone, as I've landed at this thread when looking for a solution..

Solution 2:[2]

You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.

Solution 3:[3]

Here I wrote a snippet for that. Works for me!

Solution 4:[4]

You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

Then register new serializer (for example in your app __init__.py file):

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

Then you can run:

manage.py dumpdata --format=json-no-uescape > output.json

Solution 5:[5]

This solution worked for me from @Julian Polard's post.

Basically just add -Xutf8 in front of py or python when running this command:

python -Xutf8 manage.py dumpdata > data.json

Please upvote his answer as well if this worked for you ^_^

Solution 6:[6]

As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.

Solution 7:[7]

I'm usually add next strings in my Makefile:

.PONY: dump

# make APP=core MODEL=Schema dump
dump:
    @python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
    python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
    > ${APP}/fixtures/${MODEL}.json

It's ok for standard django project structure, fix if your project structure is different.

Solution 8:[8]

This problem has been fixed for both JSON and YAML in Django 3.1.

Solution 9:[9]

here's a new solution.

I just shared a repo on github: django-dump-load-utf8.

However, I think this is a bug of django, and hope someone can merge my project to django.

A not bad solution, but I think fix the bug in django would be better.

manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json

Solution 10:[10]

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)

Solution 11:[11]

I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.

I hope it helps :)

Solution 12:[12]

Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python. djangoproject.com