'Django dumpdata UTF-8 (Unicode)
Is there a easy way to dump UTF-8 data from a database?
I know this command:
manage.py dumpdata > mydata.json
But the data I got in the file mydata.json, Unicode data looks like:
"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"
I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).
Solution 1:[1]
After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.
manage.py dumpdata --format=xml > output.xml
I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.
Hope this will help someone, as I've landed at this thread when looking for a solution..
Solution 2:[2]
You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.
Solution 3:[3]
Here I wrote a snippet for that. Works for me!
Solution 4:[4]
You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:
# serfializers/json_no_uescape.py
from django.core.serializers.json import *
class Serializer(Serializer):
def _init_options(self):
super(Serializer, self)._init_options()
self.json_kwargs['ensure_ascii'] = False
Then register new serializer (for example in your app __init__.py file):
from django.core.serializers import register_serializer
register_serializer('json-no-uescape', 'serializers.json_no_uescape')
Then you can run:
manage.py dumpdata --format=json-no-uescape > output.json
Solution 5:[5]
This solution worked for me from @Julian Polard's post.
Basically just add -Xutf8 in front of py or python when running this command:
python -Xutf8 manage.py dumpdata > data.json
Please upvote his answer as well if this worked for you ^_^
Solution 6:[6]
As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:
open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))
Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.
Solution 7:[7]
I'm usually add next strings in my Makefile:
.PONY: dump
# make APP=core MODEL=Schema dump
dump:
@python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
> ${APP}/fixtures/${MODEL}.json
It's ok for standard django project structure, fix if your project structure is different.
Solution 8:[8]
This problem has been fixed for both JSON and YAML in Django 3.1.
Solution 9:[9]
here's a new solution.
I just shared a repo on github: django-dump-load-utf8.
However, I think this is a bug of django, and hope someone can merge my project to django.
A not bad solution, but I think fix the bug in django would be better.
manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json
Solution 10:[10]
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)
Solution 11:[11]
I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:
manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)
In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.
I hope it helps :)
Solution 12:[12]
Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python.
djangoproject.com
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
