'Downloading "Imdb_reviews" from Tensorflow_datasets: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30 invalid continuation byte

When I was downloading "imbd_reviews" dataset I am facing the below error,

'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte

import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)

Downloading and preparing dataset imdb_reviews (80.23 MiB) to C:\Users\desig\tensorflow_datasets\imdb_reviews\plain_text\0.1.0...
Dl Completed...:
0/0 [00:00<?, ? url/s]
Dl Size...:
0/0 [00:00<?, ? MiB/s]


---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-f3ae52bd604b> in <module>
      1 import numpy as np
----> 2 datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
      3 

~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
     50     _check_no_positional(fn, args, ismethod, allowed=allowed)
     51     _check_required(fn, kwargs)
---> 52     return fn(*args, **kwargs)
     53 
     54   return disallow_positional_args_dec(wrapped)  # pylint: disable=no-value-for-parameter

~\anaconda3\lib\site-packages\tensorflow_datasets\core\registered.py in load(name, split, data_dir, batch_size, in_memory, shuffle_files, download, as_supervised, decoders, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    298   if download:
    299     download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 300     dbuilder.download_and_prepare(**download_and_prepare_kwargs)
    301 
    302   if as_dataset_kwargs is None:

~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
     50     _check_no_positional(fn, args, ismethod, allowed=allowed)
     51     _check_required(fn, kwargs)
---> 52     return fn(*args, **kwargs)
     53 
     54   return disallow_positional_args_dec(wrapped)  # pylint: disable=no-value-for-parameter

~\anaconda3\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
    305         self.info.size_in_bytes = dl_manager.downloaded_size
    306         # Write DatasetInfo to disk, even if we haven't computed the statistics.
--> 307         self.info.write_to_directory(self._data_dir)
    308     self._log_download_done()
    309 

~\anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
    118         if type is None:
    119             try:
--> 120                 next(self.gen)
    121             except StopIteration:
    122                 return False

~\anaconda3\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py in incomplete_dir(dirname)
    198   try:
    199     yield tmp_dir
--> 200     tf.io.gfile.rename(tmp_dir, dirname)
    201   finally:
    202     if tf.io.gfile.exists(tmp_dir):

~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py in rename_v2(src, dst, overwrite)
    543     errors.OpError: If the operation fails.
    544   """
--> 545   _pywrap_file_io.RenameFile(
    546       compat.as_bytes(src), compat.as_bytes(dst), overwrite)
    547 

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte

Tensorflow version - 2.3.0 numpy version - 1.18.5 python version - 3.8.8 windows10 x64

Does any one have an idea, Thank you.



Solution 1:[1]

My tensorlflow version is 2.4.1 and I solved it by updating tfds to 4.5.2. Therefore, update tfds to a new version may be useful.

Solution 2:[2]

(As mentioned by ???)

Please try again by upgrading the Tensorflow version or tensorflow-datasets as below:

pip install --upgrade tensorflow
pip install --upgrade tensorflow-datasets

import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ???
Solution 2 TFer2