'create an awk file to filter out unduplicated lines of a dataset
I have the following dataset and I would like to implement an iteration that checks line by line(awk or for) in an awk file that after executing it in the following way:
gawk -f file.awk dataset.csv
Allow me to get a file with the records without duplicates and the float in the last column rounded to two decimals. Below, I attach a sample of my dataset and as you can see there should be only one record per country.
40462186,US,177827,7671,4395,190
2872296,US,273870,3492,95349,1216
45236699,US,265691,6874,5873,152
Since my level is not advanced, I don't mind if the code is long so I can familiarise myself with the steps the code goes through.
awk '{a[$1]++}END{for (i in a)if (a[i]>1)print i;}' file
I found that this command can help in such functionality it would be a shell script in not an awk script.
Thank you in advance for your help
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
