'Remove XML tabs & newline within same tag in Bash
I have XML data like below with different name and ending tag.
This kind of format is easy to read but problematic for data extraction.
XML source data
<Device name="MotorA"
type="stepper"
factor="2"
profile="high"
SyncMode="false">
<Param name="Gain"
type="Baic"
PID="Standard"
valid="true"
version="1.2"/>
Expected output
<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">
<Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/>
How do I remove tabs/and newlines within same tag (one tag, one line) for data extraction in a Bash script?
Environment is "Linux develop 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux"
Solution 1:[1]
Where is the closing </Device> tag of your <Device [...]> node? Your XML data suggests the <Param [...]> node is actually a child-node instead of a sibling.
I suggest you process XML with an XML-parser, like xidel. Luckily xidel is rather forgiving for unclosed tags:
$ xidel -s "input.xml" -e . --output-node-format=xml
<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">
<Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/></Device>
Notice the closing </Device> tag?
Or properly indented:
$ xidel -s "input.xml" -e . --output-node-format=xml --output-node-indent
<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">
<Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/>
</Device>
Solution 2:[2]
The following works for me (where srcdatas.xml is a XML file containing the xml source data from your question).
awk 'BEGIN {RS=""}{gsub(/\n/, " ", $0); print $0}' srcdatas.xml
Tested on Windows Subsystem for Linux (WSL), specifically
Ubuntu 20.04.2 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64)
Inspiration came from the following question on serverfault
Solution 3:[3]
If ed is available/acceptable. Note that ed is not a tool for parsing xml files.
#!/usr/bin/env bash
ed -s file.xml <<-EOF
g/^</;/>$/j
,p
Q
EOF
Or a separate ed script, script.ed
g/^</;/>$/j
,p
Q
Then
ed -s file.xml < script.ed
In-oneline
printf '%s\n' 'g/^</;/>$/j' ,p Q | ed -s file.xml
Change Q to w if in-place editing of file.xml is needed.
Solution 4:[4]
This command removes all tabs and newlines, and add newline to each tag ends with >
tr -d '\t\n' < yourxml | sed -e $'s/>/>\\\n/g'
Tested on
- Mac bash: GNU bash, version 4.4.12(1)-release (x86_64-apple-darwin15.6.0)
- Ubuntu bash: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Abra |
| Solution 3 | |
| Solution 4 | Taylor G. |
