'Making sure no duplicate lines are written to CSV Python

Currently I'm writing rows from a dataset to a CSV file with the following code:

with open('Private-Jet-Data.csv', 'a') as f:
    writer = csv.writer(f,delimiter=",")
    for row in data:
        writer.writerow(row)

Is there a more efficient way to make sure that no row is a duplicate of another in the file without opening up the file first and iterating through the entire file for each row in my data list?



Solution 1:[1]

No, its not possible.

You will need to maintain the data somewhere in memory for comparison, and that means you will have to read previous data from the file, and append only the set that is absent from it.

Also note, that in your current code snippet, you are nowhere comparing with entries alerady present in the .csv file.

Solution 2:[2]

You can just add "seen" values to a set on the fly:

with open('Private-Jet-Data.csv', 'a') as f:
    writer = csv.writer(f,delimiter=",")
    seen = set()
    for row in data:
        if row in seen:
            continue 
        writer.writerow(row)
        seen.add(row)

It's more efficient than reading the source file twice, but it will still eat some memory if you're working on a large file.

Solution 3:[3]

 f.writelines(sorted(set(ls)))

set!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anshul Goyal
Solution 2 bruno desthuilliers
Solution 3 Oleg