'Compare two files and store differences using conditional
I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.
I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).
file1.txt
"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"
file2.txt
"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
If I use:
awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt
I get:
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.
The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:
Desired results.txt
"SWITCH51";"rack7";"Datacenter2"
I am struggling to find a conditional to handle this scenario.
Solution 1:[1]
You could store the original lines in array a, like you do, plus modified lines where "rack1" is replaced by ":
$ awk -F';' -vOFS=';' 'FNR==NR {a[$0]; if($2=="\"rack1\"") {$2="\"";a[$0]}; next}
!($0 in a)' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
Note the specification of the OFS output field separator. It is needed because when we modify the $2 field awk reconstructs $0 using the OFS which by default is a space while we need it to remain a semi-column for correct comparison when parsing file2.txt.
Solution 2:[2]
You could check if the value of field 2 is just " and replace it with "rack1"
If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.
awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next}
{
tmp = $0
sub(/^"$/, "\"rack1\"", $2)
if (!($0 in a)) print tmp
}
' file1.txt file2.txt
Output
"SWITCH51";"rack7";"Datacenter2"
Solution 3:[3]
Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.
awk -F';' '
FNR==NR{
a[$0]
b[$1,$3]
next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt
Solution 4:[4]
awk -F';' '
NR==FNR { a[$0]; next }
{ key = $1 FS ($2 == "\"" ? "\"rack1\"" : $2) FS $3 }
!(key in a)
' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | The fourth bird |
| Solution 3 | RavinderSingh13 |
| Solution 4 | Ed Morton |
