'Writing to files from jupyter notebook
I tried to run this code:
from tqdm.auto import tqdm
import os
from datasets import load_dataset
dataset = load_dataset('oscar', 'unshuffled_deduplicated_ar', split='train[:25%]')
text_data = []
file_count = 0
for sample in tqdm(dataset['train']):
sample = sample['text'].replace('\n', ' ')
text_data.append(sample)
if len(text_data) == 10_000:
# once we git the 10K mark, save to file
filename = f'/data/text/oscar_ar/text_{file_count}.txt'
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, 'w', encoding='utf-8') as fp:
fp.write('\n'.join(text_data))
text_data = []
file_count += 1
# after saving in 10K chunks, we will have ~2082 leftover samples, we save those now too
with open(f'data/text/oscar_ar/text_{file_count}.txt', 'w', encoding='utf-8') as fp:
fp.write('\n'.join(text_data))
and i get following PermissionError:
I've tried changing rights to this directory and running jupyter with sudo privilages but it still doesn't work.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
