'join 2 files with different number of rows
Good morning
I got 2 files and I want to join them.
I am using awk but I can use other command in bash. the problem is that when I try to awk some records that are not in both files do not appear in the final file.
file1
supply_DBReplication, 27336
test_after_upgrade, 0
test_describe_topic, 0
teste2e_funcional, 0
test_latency, 0
test_replication, 0
ticket_dl, 90010356798
ticket_dl.replica_cloudera, 0
traza_auditoria_eventos, 0
Ezequiel1,473789563
Ezequiel2,526210437
Ezequiel3,1000000000
file2
Domimio2,supply_bdsupply-stock-valorado-sherpa
Domimio8,supply_DBReplication
Domimio9,test_after_upgrade
Domimio7,test_describe_topic
Domimio3,teste2e_funcional
,test_latency
,test_replication
,ticket_dl
,ticket_dl.replica_cloudera
,traza_auditoria_eventos
And I wish:
file3
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000
I am executing this
awk 'NR==FNR {T[$1]=FS $2; next} {print $1 T[$2]}' FS="," file1 file2
But i got:
Domimio2, 0
Domimio8, 27336
Domimio9, 0
Domimio7, 0
Domimio3, 0
, 0
, 0
, 90010356798
, 0
, 23034
, 0
How can i do it?
Thank you
Solution 1:[1]
Assumptions:
- join criteria:
file1.field#1==file2.field#2 - output format:
file2.field#1,file1,field#2 file2- if field#1 is blank then replace withNoDomainfile2.field#2- if no match infile1.field#1then outputfile2.field#1+0file1.field#1- if no match infile2.field#2then outputNoDomain+file1.field#2(sorted byfield#2values)
One GNU awk idea:
awk '
BEGIN { FS=OFS="," }
NR==FNR { gsub(" ","",$2) # strip blanks from field #2
a[$1]=$2
next
}
{ $1 = ($1 == "") ? "NoDomain" : $1 # if file2.field#1 is missing then set to "NoDomain"
print $1,a[$2]+0
delete a[$2] # delete file1 entry so we do not print again in the END{} block
}
END { PROCINFO["sorted_in"]="@val_num_asc" # any entries leftover from file1 (ie, no matches) then sort by value and ...
for (i in a)
print "NoDomain",a[i] # print to stdout
}
' file1 file2
NOTE: GNU awk is required for the use of PROCINFO["sorted_in"]; if sorting of the file1 leftovers is not required then PROCINFO["sorted_in"]="@val_num_asc" can be removed from the code
This generates:
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | markp-fuso |
