'Edit a value in a large csv file which appears multiple times
i have a large tab seperated csv file, about ~10GB, i need to edit all the "direct" words in a column named Tab2 with the value of the column Tab1 in the same row.
Example:
Tab1 Tab2 Tab3 Tab4 Tab5 Tab6 Tab7 Tab8 Tab9
N'eus ne ADV adv Tense=Pres|Mood=Ind|Person=0|Number=Sing,Plur _ _ _ _
ket ket ADV adv _ _ _ _ _
anezhañ direct PRON prn Case=Acc|Person=3|Gender=Masc|Number=Sing _ _ _ _
ur un DET det PronType=Ind|Number=Sing,Plur _ _ _ _
skiant skiant NOUN n Gender=Fem|Number=Sing _ _ _ _
- - PUNCT guio _ _ _ _ _
rik rik ADJ adj Gender=Masc,Fem|Number=Sing,Plur _ _ _ _
I need essentially to change the upper example to:
Tab1 Tab2 Tab3 Tab4 Tab5 Tab6 Tab7 Tab8 Tab9
N'eus ne ADV adv Tense=Pres|Mood=Ind|Person=0|Number=Sing,Plur _ _ _ _
ket ket ADV adv _ _ _ _ _
anezhañ anezhañ PRON prn Case=Acc|Person=3|Gender=Masc|Number=Sing _ _ _ _
ur un DET det PronType=Ind|Number=Sing,Plur _ _ _ _
skiant skiant NOUN n Gender=Fem|Number=Sing _ _ _ _
- - PUNCT guio _ _ _ _ _
rik rik ADJ adj Gender=Masc,Fem|Number=Sing,Plur _ _ _ _
Essentially I changed the third row from "direct"(from Tab2) to "anezhañ" (from Tab1 of the same row). Of course this is just an example. In this file there are about millions of "direct" occurences. As stated my file is large so i don't know what to use and how to use it especially. I read I should use pandas, but I really don't know what to do. Can anyone guide me to a solution please?
Solution 1:[1]
To put what martineau said into practice, the code below should do what you want...
import csv
with open("input.csv") as infile:
with open("output.csv", "w", newline="") as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, reader.fieldnames)
writer.writeheader()
for row in reader:
if row["Tab2"] == "direct":
row["Tab2"] = row["Tab1"]
writer.writerow(row)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Edo Akse |
