'removing new line "\r\n" and ^M characters in all column except last one { in UNIX}
Got a solution to format a unix file containing ^M and "\r\n" in a file as per shared link earlier "https://stackoverflow.com/questions/68919927/removing-new-line-characters-in-csv-file-from-inside-columns-in-unix" .
But current ask is to get rid of "\r\n" and ^M characters in all column of unix file except last one { so last column "\r\n" along with ^M character value cna be used to format the file using command awk -v RS='\r\n' '{gsub(/\n/,"")} 1' test.csv }
sample data is ::
$ cat -v test.csv
234,aa,bb,cc,30,dd^M
22,cc,^M
ff,dd,^M
40,gg^M
pxy,aa,,cc,^M
40
,dd^M
Current Output::
234,aa,bb,cc,30,dd
22,cc,
ff,dd,
40,gg
pxy,aa,,cc,
40,dd
Expected output::
234,aa,bb,cc,30,dd
22,cc,ff,dd,40,gg
pxy,aa,,cc,40,dd
Solution 1:[1]
Would you please try a perl solution:
perl -0777 -pe 's/\r?\n(?=,)//g; s/(?<=,)\r?\n//g; 's/\r//g; test.csv
Output:
234,aa,bb,cc,30,dd
22,cc,ff,dd,40,gg
pxy,aa,,cc,40,dd
- The
-0777option tells perl to slurp all lines including line endings at once. - The
-peoption interprets the next argument as a perl script. - The regex
\r?\n(?=,)matches zero or one CR character followed by a NL character, with a positive lookahead for a comma. - Then the substitution
s/\r?\n(?=,)//gremoves the line endings which matches the condition above. The following comma is not removed due to the nature of lookaround assertions. - The substitution
s/(?<=,)\r?\n//gis the switched version, which removes the line endings after the comma. - The final
s/\r//gremoves still remaining CR characters.
[Edit]
As the perl script above slurps all lines into the memory, it may be slow if the file is huge. Here is an alternative which processes the input line by line using a state machine.
awk -v ORS="" ' # empty the output record separator
/^\r?$/ {next} # skip blank lines
f && !/^,/ {print "\n"} # break the line if the flag is set and the line does not start with a comma
{
sub(/\r$/, "") # remove trailing CR character
print # print current line (w/o newline)
if ($0 ~ /,$/) f = 0 # if the line has a trailing comma, clear the flag
else f = 1 # if the line properly ends, set the flag
}
END {
print "\n" # append the newline to the last line
}
' test.csv
BTW if you want to put blank lines in between as the posted expected output which looks like:
234,aa,bb,cc,30,dd
22,cc,ff,dd,40,gg
pxy,aa,,cc,40,dd
then append another \n in the print line as:
f && !/^,/ {print "\n\n"}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
