'Sed: template through a multiple line
I have the following text:
[...]
<p class="title">ABC</p>
<p class="text">
<a href="https://site" target="_blank">
TEXT HERE </a>
</p>
[...]
[...]
<p class="title">ABC</p>
<p class="text">
TEXT HERE </p>
[...]
from the given text is necessary to get:
TEXT HERE<no space>
TEXT HERE<no space>
If the text was in one line, i.e.
<p class="title">ABC</p><p class="text"><a href="https://site" target="_blank">TEXT HERE </a></p>
<p class="title">ABC</p><p class="text">TEXT HERE </p>
I would solve this problem in the following way:
sed -n "s/.*title\">ABC<\/p>.*\">\([^<]*\).*/\1/p" ./file.txt
But I have a pattern that goes through a multiple line and I don't know how to solve the task in this case. Can somebody give the right direction for solving the problems?
Solution 1:[1]
This might work for you (GNU sed):
sed -nE '/"title">ABC<\/p>/{:a;s/<\/p>/&/2;tb;N;ba;:b;s/\n//g;s/ABC//;s/<[^>]*>//g;s/\s*$//;p}' file
Focus on line(s) with "title">ABC</p> and then keep appending lines (or not) until a second </p> is found.
Remove newlines if present.
Remove the text ABC.
Remove all tags.
Remove any trailing white space and print the result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | potong |
