'Replace a pattern between lines

I am trying to replace a pattern between the lines of a file.

Specifically, I would like to replace ,\n & with , &\n in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR+H, but I found it difficult with sed.

So, the initial file is in the following form:

      A +,
   &  B -,
   &  C ),
   &  D +,
   &  E (,
   &  F *,
 # &  G -,
   &  H +,
   &  I (,
   &  J +,
      K ?,

The output-desired form is:

      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *, &
#  &  G -,
      H +, &
      I (, &
      J +,
      K ?,

Following previous answered questions on stackoverflow, I tried to convert it with the commands below:

sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt

sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt

but they fail if the symbol "#" is present in the file.

Is there any way to replace the matched pattern simpler, let's say: sed -i 's/,\n &/, &\n /g' file

Thank you in advance!



Solution 1:[1]

Using sed

$ sed ':a;N;s/\n \+\(&\) \(.*\)/ \1\n     \2/;ba' input_file
      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *,
 # &  G -, &
      H +, &
      I (, &
      J +,

Solution 2:[2]

Assuming that the line

 # &  G -,

is a commented line which could get uncommented later, it might make sense to handle the & in this line as well. Not knowing the purpose of the data, this might or might not be useful.

With GNU Awk, the command

awk 'BEGIN { RS=",";ORS="" } { printf "%s%s", ORS, gensub(/(\n[ \t#]*)&/, " \\&\\1 ",1); ORS=RS }' inputfile

will turn the input

      A +,
   &  B -,
   &  C ),
   &  D +,
   &  E (,
   &  F *,
 # &  G -,
   &  H +,
   &  I (,
   &  J +,
      K ?,

into

      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *, &
 #    G -, &
      H +, &
      I (, &
      J +,
      K ?,

This script will only work correct if the last line is terminated by a newline or if any other character follows the ,.

Explanation:

  • RS="," sets the comma as record separator instead of a newline for input.
  • ORS="" sets the output record separator to an empty string before the first record.
  • fprintf "%s%s", ORS, gensub(...) prepends the record separator instead of appending it.
  • gensub GNU specific substitution function which allows backreferences to matched groups.
  • /(\n[ \t#]*)&/ search pattern: The parentheses define a group (1) that consists of a newline \n followed by any sequence of spaces, tabs or comment characters [ \t#]*. The group is followed by an & character.
  • " \\&\\1 " replacement: space followed by &, followed by captured group (1) (\\1) and an additional space to replace the removed &. (The \\& is necessary to get a literal & character instead of inserting the whole match.)
  • ORS=RS sets the output record separator to , after the first row. (after every ros, in fact) to prepend a comma before the 2nd and following records. This ensures that the last record which should be a newline will not get a trailing ,.

The version below of the GNU Awk script will work as expected only if the last line of the input file is not terminated with a newline. It will create an additional line with a , because the last record containing a newline will be terminated by the output record separator ,.

awk 'BEGIN { RS=ORS="," } { print gensub(/(\n[ \t#]*)&/, " \\&\\1 ",1) }' inputfile

If the input file ends with a newline, the output will be

...
      I (, &
      J +,
      K ?,
,

with no newline after the last ,.

Solution 3:[3]

Using sed

sed -En 'H;${g;s/^\n//;s/((\n *#.*)*)\n +&(.*)/ \&\1\n    \3/gmp}' file

Explanation

  • -E Enable extended regexp
  • -n Prevent the default printing of sed
  • H Append to hold space
  • ${ When at the end
  • g Overwrite what is in the hold space to the pattern space
  • s/^\n//; remove the leading newline from the hold space
  • s/ Start substitute
  • ((\n *#.*)*) Capture group 1, optionally repeat matching a newline and # followed by the rest of the line
  • \n +&(.*) Match a newline and 1+ spaces, then match & and capture the rest of the line in group 3
  • / Substitute with after this
  • \&\1\n \3 The substitution pattern with the capture groups and the escaped &
  • / End substitution
  • gmp global to replace all occurrences, multiline, print the line that has a substitution

Output

      A +, &
      B -, &
      C ), &
      D +, &
      E (, &
      F *, &
 # &  G -,
      H +, &
      I (, &
      J +,
      K ?,%

See a bash demo.

Solution 4:[4]

This might work for you (GNU sed):

sed -E '/,$/{:a;N;/#[^\n]*$/ba
        s/,((\n.*)*)\n(\s*)&/, \&\1\n\3 /;h;s/(.*)\n.*/\1/p;g;s/.*\n(.*\n)/\1/;D}' file

Form a two line window (but include comments too if necessary).

Format the first line and print it (with comments if found).

Remove all but the last two lines.

Delete the first of the two lines left and repeat.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HatLess
Solution 2
Solution 3 The fourth bird
Solution 4 potong