'Print XML field values to CSV
I have a directory with loads of XML files that are not formatted in the exact same way but the ones I am after contain the 3 same required fields. I need to know the SNPID of each library as they must be unique. For this I need to list each and their ID value.
Firstly I need to only get results from files containing the value="Kontakt" in the "PoweredBy" field. Then for each I need the Name, SNPID and RegKey to print into a 3 column CSV. eg. Mystica | 547 | Best Service - Mystica
The XML files are like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<ProductHints spec="1.0.16">
<Product version="2">
<UPID>70026fd5-8f6f-429e-b891-12c2f94bc566</UPID>
<Name>Mystica</Name>
<Type>Content</Type>
<NKSEnabled minVersion="1.1.0.0">true</NKSEnabled>
<Relevance maxVersion="1.0.9.0" minVersion="1.0.0.0">
<Application minVersion="5.0.0.0" nativeContent="true">kontakt</Application>
</Relevance>
<Relevance minVersion="1.1.0.0">
<Application minVersion="5.8.0.0" nativeContent="true">kontakt</Application>
<Application minVersion="2.6.5.0">maschine</Application>
<Application minVersion="1.8.2.0">kkontrol</Application>
</Relevance>
<PoweredBy>Kontakt</PoweredBy>
<Visibility maxVersion="1.0.9.0" minVersion="1.0.0.0" target="Standalone">1</Visibility>
<Company>Best Service</Company>
<AuthSystem>RAS2</AuthSystem>
<SNPID>547</SNPID>
<RegKey>Best Service - Mystica</RegKey>
<Icon>bestservice</Icon>
<ProductSpecific>
<HU>496B8CF4F8B1402C4A6650214DF2514C</HU>
<JDX>C9A2B6D9549FD159D8A3CFF054AAE934C2AC849EC74827847288DF07577A8F22</JDX>
<Visibility type="Number">3</Visibility>
</ProductSpecific>
</Product>
</ProductHints>
I've tried
Computer:~ user$ cd /Library/Application\ Support/Native\ Instruments/Service\ Center
Computer:Service Center
while read -r Name SNPID RegKey
do
echo "+++++++++++++++++++"
echo "Name: ${Name}"
echo "SNPID: ${SNPID}"
echo "RegKey: ${RegKey}"
awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}'
done
but the results are inconsistent (I know bash isn't great for parsing XML but the requirements are pretty basic).
find -name "*.xml" | xargs cat | tr -d "\n" | sed 's/<\/Name>/\n/g' | sed 's/<\/SNPID>/\n/g' | sed 's/<\/RegKey>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|SNPID:|RegKey:" | sed 's/Name: /---\nName: /g'
also results in error: find: illegal option -- n
Any BASH wizards out there that can help me please? (If possible I would prefer to use terminal without installing other things) Thanks
Solution 1:[1]
This might work for you (GNU sed):
sed -sE '/^\s*<(Name|SNPID|PoweredBy|RegKey)>(.*)<\/\1>\s*$/{s//\1:\2/;H}
$!d;g;/PoweredBy:Kontakt/!d
s/Name:([^\n]*)(.*)/\2\n\1/
s/SNPID:([^\n]*)(.*)/\2|\1/
s/RegKey:([^\n]*)(.*)/\2|\1/
s/.*\n//' file ...
Use the options -s to treat each input file separately and -E to simplify regexps.
Using alternation, extract the required fields into the hold buffer.
At the end of the file if the PoweredBy field does not contain Kontakt no further processing is required.
Otherwise, format the fields in the hold buffer into the required format and print the result.
Solution 2:[2]
Highly recommeded: Don't try to parse xml with anything other than a proper xml parser. For that you can use either xmlstarlet [edited to reflect @Reino's comment below:
xml sel -T -t -m "//ProductHints//Product[PoweredBy="Kontakt"]" -v //SNPID/. --nl -v //Name/. --nl -v //RegKey/. --nl your_file.xml
or, even simpler, xidel:
xidel your_file.xml -e '//ProductHints//Product[PoweredBy="Kontakt"]//(Name,SNPID,RegKey)'
Output (based on the sample xml in your question) should be:
547
Mystica
Best Service - Mystica
Solution 3:[3]
Using sed
$ sed -n '/Name\|SNPID\|RegKey/{s/[^>]*>\([^<]*\).*/\1/;H;d};G;s/\n/ | /g;/^ |/{s/^ | \+| //p}' input_file
Mystica | 547 | Best Service - Mystica
Solution 4:[4]
1. Locating required files using find and grep
grep -l "PoweredBy>Kontakt" $(find . -name "*.xml")
Validate you get the correct file list.
2. Extracting data, and formating data from required files using awk
This awk script, process only the required files one by one.
awk -F"[><]" '
$2 == "Name" {ret = $3 " | "}
$2 == "SNPID" {ret = ret $3 " | "}
$2 == "RegKey" {print ret $3}
' $(grep -l "PoweredBy>Kontakt" $(find . -name "*.xml"))
An alternative using awk without grep command.
awk -F"[><]" '
$2 == "PoweredBy" && $3 != "Kontakt" {nextfile}
$2 == "Name" {ret = $3 " | "}
$2 == "SNPID" {ret = ret $3 " | "}
$2 == "RegKey" {print ret $3}
' $(find . -name "*.xml")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | potong |
| Solution 2 | |
| Solution 3 | HatLess |
| Solution 4 |
