'how to add header text with adjacent content in un-formatted data set, side by side with a delimiter separated value using sed/awk/python
I have a long list of unformatted data say data.txt where each set is started with a header and ends with a blank line, like:
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
Now, i want to add the header of each set with it's content side by side with comma separated. Like:
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
so that whenever i will grep with one keyword, relevant content along with the header comes together. Like:
$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$
I am newbie on bash scripting as well as python and recently started learning these, so would really appreciate any simple bash scipting (using sed/awk) or python scripting.
Solution 1:[1]
Using sed
$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
Match lines containing Type, hold it in memory and delete it.
Match lines with alphabetic characters, append G the contents of the hold space. Finally, sub new line for a comma.
Solution 2:[2]
I would use GNU AWK for this task following way, let file.txt content be
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
then
awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt
output
alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$
moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$
rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$
Explanation: If line starts with (^) Type set header value to that line ($0) and go to next line. For every line print if it does contain at least one character (/./) line ($0) concatenated with ; and header, otherwise print line ($0) as is.
(tested in GNU Awk 5.0.1)
Solution 3:[3]
Using any awk in any shell on every Unix box regardless of which characters are in your data:
$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Daweo |
| Solution 3 | Ed Morton |
