'Compare two files and store differences using conditional

I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.

I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).

file1.txt

"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"

file2.txt

"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"

If I use:

awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt

I get:

"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"

But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.

The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:

Desired results.txt

"SWITCH51";"rack7";"Datacenter2"

I am struggling to find a conditional to handle this scenario.

linux awk compare

Solution 1:^[1]

You could store the original lines in array a, like you do, plus modified lines where "rack1" is replaced by ":

$ awk -F';' -vOFS=';' 'FNR==NR {a[$0]; if($2=="\"rack1\"") {$2="\"";a[$0]}; next}
    !($0 in a)' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"

Note the specification of the OFS output field separator. It is needed because when we modify the $2 field awk reconstructs $0 using the OFS which by default is a space while we need it to remain a semi-column for correct comparison when parsing file2.txt.

Solution 2:^[2]

You could check if the value of field 2 is just " and replace it with "rack1"

If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.

awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next} 
{
  tmp = $0
  sub(/^"$/, "\"rack1\"", $2)
  if (!($0 in a)) print tmp
}
' file1.txt file2.txt

Output

"SWITCH51";"rack7";"Datacenter2"

Solution 3:^[3]

Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.

awk -F';' '
FNR==NR{
  a[$0]
  b[$1,$3]
  next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt

Solution 4:^[4]

awk -F';' '
    NR==FNR { a[$0]; next }
    { key = $1 FS ($2 == "\"" ? "\"rack1\"" : $2) FS $3 }
    !(key in a)
' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	The fourth bird
Solution 3	RavinderSingh13
Solution 4	Ed Morton

'Compare two files and store differences using conditional

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]