'Can you run the Opacus privacy engine with pytorch SequenceTaggingDataset?

I am trying to adapt a pytorch Named Entity Recognition model to incorporate differential privacy with the Opacus library. My model uses torchtext to build the dataset and feed sentences through a word embedding layer and char embedding layer, concatenate the representation and feed it in an LSTM which tags words.

In building the dataset I use the SequenceTagginDataset from torchtext 0.6.0 with fields word, char and tag.

self.train_dataset, self.val_dataset, self.test_dataset = SequenceTaggingDataset(path=input_folder + '/DRUG-AE.tsv', fields=(
            (("word", "char"), (self.word_field, self.char_field)),
            ("tag", self.tag_field)
        )).split(split_ratio=[0.3, 0.1, 0.1])

Then I create DataLoader from the created datasets:

self.train_iter = DataLoader(self.train_dataset, batch_size=batch_size)
self.val_iter = DataLoader(self.val_dataset, batch_size=batch_size)
self.test_iter = DataLoader(self.test_dataset, batch_size=batch_size)

When defining the privacy engine of opacus it expects the model, optimizer and train dataloader. However, when doing so I receive an error message: Uniform sampling is not supported for IterableDataset. Is there a way to transform this IterableDataset to a regular one to use with the privacy engine? The data is relatively small on a local file. Thank you for any time you spend looking into this. In case anything isn't clear or you want more code snippets let me know.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source