'attempting to manually download MNIST pytorch dataset in databricks
I've attempted a couple different iterations now to get the dataset manually loaded into databricks's DBFS.. so that PyTorch can load it.. however the MNIST dataset seems to just be some binary file.. is it expected I unzip it first or just.. point to the GZipped tarball? So far all my trials have gotten this error
train_dataset = datasets.MNIST(
13 'dbfs:/FileStore/tarballs/train_images_idx3_ubyte.gz',
14 train=True,
RuntimeError: Dataset not found. You can use download=True to download it
I am aware I can turn Download=True , however due to the firewalls this is not an option and I want to just upload the files and wire them in myself via DBFS... anyone done this as well?
EDIT: @alexey suggested I need to add the extra paths MNIST/raw
And then change the input to
train_dataset = datasets.MNIST(
'/dbfs/FileStore/tarballs',
train=True,
download=False,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]))
data_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
But same error
Solution 1:[1]
My code and dir:
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../colabx/data', train=True, download=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
....\colabx\data\MNIST\raw>ls
t10k-images-idx3-ubyte train-images-idx3-ubyte
t10k-images-idx3-ubyte.gz train-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte train-labels-idx1-ubyte
t10k-labels-idx1-ubyte.gz train-labels-idx1-ubyte.gz
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alexey Birukov |


