'sed multiline delete everything before first occurrence of pattern

I have a multiline string containing some text followed by a JSON, so it has the following format:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

I want to extract the JSON using sed by removing the text before, so everything until (and including) MY_JSON: (note the trailing space).

My current solution:

# $str contains above multiline string
$ echo $str | sed '/MY_JSON: /d'

I get the following output:

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

But I want the following output:

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

So the idea is to select everything until the first occurrence of { and delete it. But that doesn't work. It doesn't delete the first n lines until the line where the pattern matches. It also deletes the whole line instead of just the part until the {.

How can I achive best with sed what I want to do?



Solution 1:[1]

You may use this sed:

sed '1,/MY_JSON:/ {/MY_JSON:/!d; s/^MY_JSON: *//;}' file

{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Command Details:

  • 1,/MY_JSON:/: Match from line 1 to the line that matches MY_JSON:
  • {/MY_JSON:/!d; s/^MY_JSON: *//;}: Delete all lines except last one and then remove MY_JSON: from that line.

Solution 2:[2]

Using sed

$ sed 's/^[a-zA-Z][^{]*//;/^$/d' input_file
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Solution 3:[3]

If file has only one json structure

Input

It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/p;}' -n
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

If file has multiple json structures

Input

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
some
My: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Some random text
It spans across multiple lines and contains a JSON that does not start at the beginning of the line:
MY_JSON: {
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}
sed '/^[^{]*{/,/^}/!d;s/^[^{]*{/{/g'

OR

sed '1h;1!H;${;g;s/^[^:]*:[^{]*\({.*}\).*/\1/;p}' -n | sed -n '/^[^{]*{/,/^}/{;p}' | sed 's/^[^{]*{/{/g'

In above command remove anything after ; to retain MY_JSON like titles

Output

{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}
{
  "foo": [
{
      "bar": "baz",
     (...) // more content here
    }
   ]
}

If alternative of sed are plausible: https://unix.stackexchange.com/questions/460087/extract-json-from-a-text-file-with-arbitrary-text has good solution with grep

Solution 4:[4]

Here is a solution on the positive approach.

Instead of removing data, extract data from the file.

$ sed --quiet '/MY_JSON:/,$  {s/^MY_JSON: //;p}' input.1.txt
{
  "foo": [
    {
      "bar": "baz",
     (...) // more content here
    }
   ]
}

Explanation

--quiet Prevent print duplication.

/MY_JSON:/,$ Range of text from line matching regexp /MY_JSON:/ to last line. Denoted as $

{...} sed execution list on each line in the range.

s/^MY_JSON: //; p Substitute "MY_JSON: " with "" than print each line.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2
Solution 3
Solution 4 Dudi Boy