'Loading HuggingFace tokenizer from Dropbox (or other cloud storage)

I have a classifying model, and I have nearly finished turning it into a streamlit app.

I have the embeddings and model on dropbox. I have successfully imported the embeddings as it is one file.

However the call for AutoTokenizer.from_pretrained() takes a folder path for various files, rather than a particular file. Folder contains these files:

  • config.json
  • special_tokens_map.json
  • tokenizer_config.json
  • tokenizer.json

When using the tool locally, I would direct the function to the folder and it would work.

However I am unable to direct it to the folder on DropBox, and I cannot download a folder from DropBox into Python, only a file (as far as I can see).

Is there a way of creating a temp folder on Python or downloading all the files individually and then running AutoTokenizer.from_pretrained() with all the files?



Solution 1:[1]

To get around this, I uploaded the model to HuggingFace so I could use it there.

I.e.

tokenizer = AutoTokenizer.from_pretrained("ScoutEU/MyModel")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 desertnaut