'sed multiline delete everything before first occurrence of pattern
I have a multiline string containing some text followed by a JSON, so it has the following format:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON: (note the trailing space).
My current solution:
# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'
I get the following output:
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
But I want the following output:
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
So the idea is to select everything until the first occurrence of { and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {.
How can I achive best with sed what I want to do?
Solution 1:[1]
You may use this sed:
sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Command Details:
1,/MY_JSON:/: Match from line 1 to the line that matchesMY_JSON:{/MY_JSON:/!d; s/^MY_JSON: *//;}: Delete all lines except last one and then removeMY_JSON:from that line.
Solution 2:[2]
Using sed
$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Solution 3:[3]
If file has only one json structure
Input
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
If file has multiple json structures
Input
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
some
My: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'
OR
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'
In above command remove anything after ; to retain MY_JSON like titles
Output
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
If alternative of sed are plausible: https://unix.stackexchange.com/questions/460087/extract-json-from-a-text-file-with-arbitrary-text has good solution with grep
Solution 4:[4]
Here is a solution on the positive approach.
Instead of removing data, extract data from the file.
$ sed --quiet '/MY_JSON:/,$ {s/^MY_JSON: //;p}' input.1.txt
{
"foo": [
{
"bar": "baz",
(...) // more content here
}
]
}
Explanation
--quiet Prevent print duplication.
/MY_JSON:/,$ Range of text from line matching regexp /MY_JSON:/ to last line. Denoted as $
{...} sed execution list on each line in the range.
s/^MY_JSON: //; p Substitute "MY_JSON: " with "" than print each line.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | anubhava |
| Solution 2 | |
| Solution 3 | |
| Solution 4 | Dudi Boy |
