'create an awk file to filter out unduplicated lines of a dataset

I have the following dataset and I would like to implement an iteration that checks line by line(awk or for) in an awk file that after executing it in the following way:

gawk -f file.awk dataset.csv

Allow me to get a file with the records without duplicates and the float in the last column rounded to two decimals. Below, I attach a sample of my dataset and as you can see there should be only one record per country.


40462186,US,177827,7671,4395,190
2872296,US,273870,3492,95349,1216
45236699,US,265691,6874,5873,152

Since my level is not advanced, I don't mind if the code is long so I can familiarise myself with the steps the code goes through.


awk '{a[$1]++}END{for (i in a)if (a[i]>1)print i;}' file

I found that this command can help in such functionality it would be a shell script in not an awk script.

Thank you in advance for your help

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'create an awk file to filter out unduplicated lines of a dataset

Sources

Related Questions