'check for matching rows in csv file ruby

I am very new to ruby and I want to check for rows with the same number in a csv file.

What I am trying to do is go through the input csv file and copy element from the input file to the output file also adding another column called "duplicate" to the output file, then check if a similar phone is already in the output file while copying data from input to output then if the phone already exist, add "dupl" to the row in the duplicate column.

This is what I have.

file=CSV.read('input_file.csv')

output_file=File.open("output2.csv","w")
for row in file
        output_file.write(row)
        output_file.write("\n")
end
output_file.close

Example:

Phone
(202) 221-1323
(201) 321-0243
(202) 221-1323
(310) 343-4923

output file

Phone Duplicate
(202) 221-1323
(201) 321-0243
(202) 221-1323 dupl
(310) 343-4923


Solution 1:[1]

So basically you want to write the input to output and append a "dupl" on the second occurrence of a duplicate?

Your input to output seems fine. To get the "dupl" flag, simply count the occurrence of each number in the list. If it's more than one, its a duplicate. But since you only want the flag to be shown on the second occurrence just count how often the number appeared up until that point:

lines = CSV.read('input_file.csv')

lines.each_with_index do |l,i|
    output_file.write(l + ",")

    if lines.take(i).count(l) >= 1 
        output_file.write("dupl")
    end

    output_file.write("\n")
end

l is the current line. take(i) is all lines before but not including the current line and count(l) applied to this counts how often the number appeared before if it's more than one, print a "dupl"

There probably is a more efficient answer to this, this is just a quick and easy to understand version.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1