'Downloading "Imdb_reviews" from Tensorflow_datasets: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30 invalid continuation byte
When I was downloading "imbd_reviews" dataset I am facing the below error,
'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte
import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
Downloading and preparing dataset imdb_reviews (80.23 MiB) to C:\Users\desig\tensorflow_datasets\imdb_reviews\plain_text\0.1.0...
Dl Completed...:
0/0 [00:00<?, ? url/s]
Dl Size...:
0/0 [00:00<?, ? MiB/s]
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-6-f3ae52bd604b> in <module>
1 import numpy as np
----> 2 datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
3
~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 _check_no_positional(fn, args, ismethod, allowed=allowed)
51 _check_required(fn, kwargs)
---> 52 return fn(*args, **kwargs)
53
54 return disallow_positional_args_dec(wrapped) # pylint: disable=no-value-for-parameter
~\anaconda3\lib\site-packages\tensorflow_datasets\core\registered.py in load(name, split, data_dir, batch_size, in_memory, shuffle_files, download, as_supervised, decoders, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
298 if download:
299 download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 300 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
301
302 if as_dataset_kwargs is None:
~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 _check_no_positional(fn, args, ismethod, allowed=allowed)
51 _check_required(fn, kwargs)
---> 52 return fn(*args, **kwargs)
53
54 return disallow_positional_args_dec(wrapped) # pylint: disable=no-value-for-parameter
~\anaconda3\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
305 self.info.size_in_bytes = dl_manager.downloaded_size
306 # Write DatasetInfo to disk, even if we haven't computed the statistics.
--> 307 self.info.write_to_directory(self._data_dir)
308 self._log_download_done()
309
~\anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
118 if type is None:
119 try:
--> 120 next(self.gen)
121 except StopIteration:
122 return False
~\anaconda3\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py in incomplete_dir(dirname)
198 try:
199 yield tmp_dir
--> 200 tf.io.gfile.rename(tmp_dir, dirname)
201 finally:
202 if tf.io.gfile.exists(tmp_dir):
~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py in rename_v2(src, dst, overwrite)
543 errors.OpError: If the operation fails.
544 """
--> 545 _pywrap_file_io.RenameFile(
546 compat.as_bytes(src), compat.as_bytes(dst), overwrite)
547
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte
Tensorflow version - 2.3.0 numpy version - 1.18.5 python version - 3.8.8 windows10 x64
Does any one have an idea, Thank you.
Solution 1:[1]
My tensorlflow version is 2.4.1 and I solved it by updating tfds to 4.5.2. Therefore, update tfds to a new version may be useful.
Solution 2:[2]
(As mentioned by ???)
Please try again by upgrading the Tensorflow version or tensorflow-datasets as below:
pip install --upgrade tensorflow
pip install --upgrade tensorflow-datasets
import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ??? |
| Solution 2 | TFer2 |
