'How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.

For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.

So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.

How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?



Solution 1:[1]

You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.

Step 1: Transfer your data into your google drive.

Step 2: Run the following code to mount you google drive.

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse


# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()


# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}


# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive


!ls My Drive/

# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt

Step 3: Run the following line to check if you can see your desired data into mounted drive.

!ls Drive

Step 4:

Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.

train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')

Edit

For downloading the data into your drive from the colab notebook environment, you can run the following code.

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Solution 2:[2]

Here are few steps to upload large dataset to Google Colab

1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:

!wget your_shareable_file_link

That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:

!unzip downloaded_filename -d destination_folder

Solution 3:[3]

Zip you file first then upload it to Google Drive.

See this simple command to unzip:

!unzip {file_location}

Example:

!unzip drive/models.rar

Solution 4:[4]

Step1: Mount the Drive, by running the following command:

from google.colab import drive
drive.mount('/content/drive')

This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it. This process is just giving permission for colab to access your Google Drive.

Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive

Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.

This process may look something like this: The current working directory in colab when you start off will be /content/ Just to make sure, run the following command in the cell:

!pwd

It will show you the current directory you are in. (pwd stands for "print working directory") Then use the commands like:

!ls

to list the directories and files in the directory you are in and the command:

!cd /directory/name/of/your/choice

to move into the directories to locate your uploaded folder or the uploaded .zip file.

And just like that, you are ready to get your hands dirty with your Machine Learning model! :)

Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...

Solution 5:[5]

There are many ways to do so :

  1. You might want to push your data into a github repository then in Google Colab code cell you can run :

    !git clone https://www.github.com/{repo}.git

  2. You can upload your data to Google drive then in your code cell :

from google.colab import drive

drive.mount('/content/drive')

  1. Use transfer.sh tool : you can visit here to see how it works :

    transfer.sh

Solution 6:[6]

Google Colab had made it more convenient for users to upload files [from the local machine, Google drive, or github]. You need to click on Mount Drive Option to the pane on the left side of the notebook and you'll get access to all the files stored in your drive.

Select the file -> right-click -> Copy path Refer this

Use python import methods to import files from this path, e.g., for example:

import pandas as pd
data = pd.read_csv('your copied path here')

For importing multiple files in one go, you may need to write a function.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 desertnaut
Solution 2 desertnaut
Solution 3 feedMe
Solution 4 Sushanth
Solution 5 Mohamed Berrimi
Solution 6 Dr Nisha Arora