'Search on the last column with \ delimiter and save the email address associated to it to a variable
I have two files.
file1.txt contains:
META GAIN CORP
GG$
ABG$
PEPRA_UAT
12GHR
CC$
USDP_MAIN
XQ$
PR$
MIX_DEV
and file2.csv contains:
\\fr.usdp.org\SOLE\Home\RD,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\99 FLOOR,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\44 FLOOR,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DATABASE,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\NICE SHORT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\PRO DEV,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\FARE GRUST,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ GROUP,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ TEAM TOOLKIT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\BILLING ELEMENT,[email protected]
\\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\RRT_SEC,[email protected]
had this on my script but I can't exactly get the last column if there are spaces.
for sr in `cat file1.txt`; do
sname=`echo ${sr} | awk -F: '{ print $1 }'`
emdrs=`grep -Fw "${sname}" file2.csv | awk -F',' '{print$2}' | sed 's/[[:space:]]//' | xargs | sed -e 's/ /,/g'`
echo "$sname || To: $emdrs" >> details.txt
done
details.txt output
META || [email protected],[email protected],[email protected],[email protected]
GAIN || [email protected],[email protected],[email protected],[email protected]
CORP || [email protected],[email protected],[email protected],[email protected]
but what i wanted is that
META GAIN CORP || To: [email protected],[email protected],[email protected],[email protected]
and I should also be able to search string with $ like this one ABG$ ) and not including the duplicate email.
ABG$ || To: [email protected],[email protected]
Any help will be greatly appreciated.
Solution 1:[1]
One awk idea (replaces OP's current for loop):
awk -F',|\\\' ' # field delimiter of "," or "\"
FNR==NR { srlist[$1]
next
}
{ email=$NF
if (email == "") next
sr=$(NF-1)
if (sr in srlist && emlist[sr] !~ email) { # skip duplicate email addresses
delim=(emlist[sr]) ? "," : ""
emlist[sr]=emlist[sr] delim email
}
}
END { for (sr in emlist)
print sr " || To: " emlist[sr]
}
' file1.txt file2.csv
This generates:
ABG$ || To: [email protected],[email protected]
META GAIN CORP || To: [email protected],[email protected],[email protected],[email protected]
NOTES:
- while a bit more typing than OP's current
forloop, this approach requires a single scan offile2.awkand eliminates the 7 subprocess calls (for each pass through OP'sforloop) - for any appreciable volume of data an
awksolution should be noticeably faster - for the sample data provided:
- 0.65 secs:
awk - 1.80 secs:
bash/for-loop
- 0.65 secs:
Solution 2:[2]
A shell loop is never the right approach for manipulating text, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
Using GNU awk for arrays of arrays:
$ cat tst.awk
BEGIN { FS="[\\\\,]" }
NR == FNR {
tgts[$0]
next
}
{
sr = $(NF-1)
email = $NF
}
(sr in tgts) && (email != "") {
emails[sr][email]
}
END {
for ( sr in emails ) {
printf "%s || To:", sr
sep = " "
for ( email in emails[sr] ) {
printf "%s%s", sep, email
sep = ","
}
print ""
}
}
$ awk -f tst.awk file1.txt file2.csv
ABG$ || To: [email protected],[email protected]
META GAIN CORP || To: [email protected],[email protected],[email protected],[email protected]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Ed Morton |
