'join 2 files with different number of rows

Good morning

I got 2 files and I want to join them.

I am using awk but I can use other command in bash. the problem is that when I try to awk some records that are not in both files do not appear in the final file.

file1

supply_DBReplication, 27336
test_after_upgrade, 0
test_describe_topic, 0
teste2e_funcional, 0
test_latency, 0
test_replication, 0
ticket_dl, 90010356798
ticket_dl.replica_cloudera, 0
traza_auditoria_eventos, 0
Ezequiel1,473789563
Ezequiel2,526210437
Ezequiel3,1000000000

file2

Domimio2,supply_bdsupply-stock-valorado-sherpa
Domimio8,supply_DBReplication
Domimio9,test_after_upgrade
Domimio7,test_describe_topic
Domimio3,teste2e_funcional
,test_latency
,test_replication
,ticket_dl
,ticket_dl.replica_cloudera
,traza_auditoria_eventos

And I wish:

file3

Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000

I am executing this

awk 'NR==FNR {T[$1]=FS $2; next} {print $1 T[$2]}' FS="," file1 file2

But i got:

Domimio2, 0
Domimio8, 27336
Domimio9, 0
Domimio7, 0
Domimio3, 0
, 0
, 0
, 90010356798
, 0
, 23034
, 0

How can i do it?

Thank you

awk


Solution 1:[1]

Assumptions:

  • join criteria: file1.field#1 == file2.field#2
  • output format: file2.field#1 , file1,field#2
  • file2 - if field#1 is blank then replace with NoDomain
  • file2.field#2 - if no match in file1.field#1 then output file2.field#1 + 0
  • file1.field#1 - if no match in file2.field#2 then output NoDomain + file1.field#2 (sorted by field#2 values)

One GNU awk idea:

awk '
BEGIN   { FS=OFS="," }

NR==FNR { gsub(" ","",$2)                        # strip blanks from field #2
          a[$1]=$2
          next
        }

        { $1 = ($1 == "") ? "NoDomain" : $1      # if file2.field#1 is missing then set to "NoDomain"
          print $1,a[$2]+0
          delete a[$2]                           # delete file1 entry so we do not print again in the END{} block
        }

END     { PROCINFO["sorted_in"]="@val_num_asc"   # any entries leftover from file1 (ie, no matches) then sort by value and ...
          for (i in a)
              print "NoDomain",a[i]              # print to stdout
        }
' file1 file2

NOTE: GNU awk is required for the use of PROCINFO["sorted_in"]; if sorting of the file1 leftovers is not required then PROCINFO["sorted_in"]="@val_num_asc" can be removed from the code

This generates:

Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 markp-fuso