'Shuffle rows of a large csv
I want to shuffle this dataset to have a random set. It has 1.6 million rows but the first are 0 and the last 4, so I need pick samples randomly to have more than one class. The actual code prints only class 0 (meaning in just 1 class). I took advice from this platform but doesn't work.
fid = open("sentiment_train.csv", "r")
li = fid.readlines(16000000)
random.shuffle(li)
fid2 = open("shuffled_train.csv", "w")
fid2.writelines(li)
fid2.close()
fid.close()
sentiment_onefourty_train = pd.read_csv('shuffled_train.csv', header= 0, delimiter=",", usecols=[0,5], nrows=100000)
sentiment_onefourty_train.columns=['target', 'text']
print(sentiment_onefourty_train['target'].value_counts())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
