'Remove XML tabs & newline within same tag in Bash

I have XML data like below with different name and ending tag.

This kind of format is easy to read but problematic for data extraction.

XML source data

<Device name="MotorA" 
type="stepper" 
factor="2" 
profile="high" 
SyncMode="false">

<Param name="Gain" 
type="Baic" 
PID="Standard" 
valid="true" 
version="1.2"/>

Expected output

<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">
<Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/>

How do I remove tabs/and newlines within same tag (one tag, one line) for data extraction in a Bash script?

Environment is "Linux develop 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux"



Solution 1:[1]

Where is the closing </Device> tag of your <Device [...]> node? Your XML data suggests the <Param [...]> node is actually a child-node instead of a sibling.

I suggest you process XML with an XML-parser, like . Luckily xidel is rather forgiving for unclosed tags:

$ xidel -s "input.xml" -e . --output-node-format=xml
<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">

<Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/></Device>

Notice the closing </Device> tag?

Or properly indented:

$ xidel -s "input.xml" -e . --output-node-format=xml --output-node-indent
<Device name="MotorA" type="stepper" factor="2" profile="high" SyncMode="false">
  <Param name="Gain" type="Baic" PID="Standard" valid="true" version="1.2"/>
</Device>

Solution 2:[2]

The following works for me (where srcdatas.xml is a XML file containing the xml source data from your question).

awk 'BEGIN {RS=""}{gsub(/\n/, " ", $0); print $0}' srcdatas.xml

Tested on Windows Subsystem for Linux (WSL), specifically

Ubuntu 20.04.2 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64)

Inspiration came from the following question on serverfault

linux - remove line break using AWK

Solution 3:[3]

If ed is available/acceptable. Note that ed is not a tool for parsing xml files.

#!/usr/bin/env bash

ed -s file.xml <<-EOF
  g/^</;/>$/j
  ,p
  Q
EOF

Or a separate ed script, script.ed

g/^</;/>$/j
,p
Q

Then

ed -s file.xml < script.ed

In-oneline

printf '%s\n' 'g/^</;/>$/j' ,p Q | ed -s file.xml

Change Q to w if in-place editing of file.xml is needed.

Solution 4:[4]

This command removes all tabs and newlines, and add newline to each tag ends with >

tr -d '\t\n' < yourxml | sed -e $'s/>/>\\\n/g'

Tested on

  • Mac bash: GNU bash, version 4.4.12(1)-release (x86_64-apple-darwin15.6.0)
  • Ubuntu bash: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Abra
Solution 3
Solution 4 Taylor G.