'Edit a value in a large csv file which appears multiple times

i have a large tab seperated csv file, about ~10GB, i need to edit all the "direct" words in a column named Tab2 with the value of the column Tab1 in the same row.

Example:

Tab1    Tab2    Tab3    Tab4    Tab5    Tab6    Tab7     Tab8   Tab9
N'eus   ne  ADV adv Tense=Pres|Mood=Ind|Person=0|Number=Sing,Plur   _   _   _   _
ket ket ADV adv _   _   _   _   _
anezhañ direct  PRON    prn Case=Acc|Person=3|Gender=Masc|Number=Sing   _   _   _   _
ur  un  DET det PronType=Ind|Number=Sing,Plur   _   _   _   _
skiant  skiant  NOUN    n   Gender=Fem|Number=Sing  _   _   _   _
-   -   PUNCT   guio    _   _   _   _   _
rik rik ADJ adj Gender=Masc,Fem|Number=Sing,Plur    _   _   _   _

I need essentially to change the upper example to:

Tab1    Tab2    Tab3    Tab4    Tab5    Tab6    Tab7     Tab8   Tab9
N'eus   ne  ADV adv Tense=Pres|Mood=Ind|Person=0|Number=Sing,Plur   _   _   _   _
ket ket ADV adv _   _   _   _   _
anezhañ anezhañ PRON    prn Case=Acc|Person=3|Gender=Masc|Number=Sing   _   _   _   _
ur  un  DET det PronType=Ind|Number=Sing,Plur   _   _   _   _
skiant  skiant  NOUN    n   Gender=Fem|Number=Sing  _   _   _   _
-   -   PUNCT   guio    _   _   _   _   _
rik rik ADJ adj Gender=Masc,Fem|Number=Sing,Plur    _   _   _   _

Essentially I changed the third row from "direct"(from Tab2) to "anezhañ" (from Tab1 of the same row). Of course this is just an example. In this file there are about millions of "direct" occurences. As stated my file is large so i don't know what to use and how to use it especially. I read I should use pandas, but I really don't know what to do. Can anyone guide me to a solution please?



Solution 1:[1]

To put what martineau said into practice, the code below should do what you want...

import csv

with open("input.csv") as infile:
    with open("output.csv", "w", newline="") as outfile:
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()
        for row in reader:
            if row["Tab2"] == "direct":
                row["Tab2"] = row["Tab1"]
            writer.writerow(row)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Edo Akse