'Creating a unique array in awk: can this snippet be elaborated?
Thanks to @EdMorton, I can unique an array in awk this way:
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
if ( !seen[array[i]]++ ) {
unique[++j] = array[i]
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
# results in:
# a
# b
# c
# d
# e
}
What I don't understand, though, is this ( !seen[array[i]]++ ) condition with an increment:
- I do understand that we collect unique indices in the
seenarray; - So, we check if our temp array
seenalready has an indexarray[i](and add it to unique, if it hasn't); - But the increment after the index is the thing I still can't get :) (despite the detailed explanation provided by Ed).
So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it :)
Solution 1:[1]
Another approach is to put array's values into a new associative array as keys. That will enforce uniqueness:
BEGIN {
# it's helpful to use the return value from `split`
n = split("a b c d e a b", array)
# use the element value as a key.
# It doesn't really matter what the right-hand side of the assignment is.
for (i = 1; i <= n; i++) uniq[array[i]] = i
# now, it's easy to iterate over the unique keys
for (elem in uniq) print elem
}
outputs in no guaranteed order:
a
b
c
d
e
if you're using GNU awk, use PROCINFO["sorted_in"] to control sorting of the array traversal
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | glenn jackman |
