'How to grab text after newline in a text file no clean of spaces, tabs [closed]

Assume this: It needs to pass a file name as an argument.

This is the only text I’m showing. The remaining text has more data (not shown). The problem: The text is semi-clean, full of whitespace, tabs, Unicode, isn't clean and has to be like this (my needs), so copy/paste this exact text doesn't work (formatted by markup):

I have some text like this:

*** *
more text with spaces and  tabs
*****
1
Something here and else, 2000 edf, 60 pop
    Usd324.32           2 Usd534.22
2
21st New tetx that will like to select with pattern, 334 pop
    Usd162.14

*** *
more text with spaces and tabs, unicode
*****

I'm trying to grab this explicit text:

  • 1 Something here and else, 2000 edf, 60 pop Usd324.32

because of the newline and whitespace, the next command only grabs 1:

grep -E '1\s.+'

Also, I have been trying to make it with new concatenations:

grep -E '1\s|[A-Z].+'

But it doesn't work. grep begins to select a similar pattern in different parts of the text:

awk '{$1=$1}1'   #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r"   #done already

How can I grab:

  • grab one newline
  • grab the whole second line after one newline
  • grab the number $Usd324.34 and remove Usd


Solution 1:[1]

You can use this sed:

sed -En '/^1/ {N;N;s/[[:blank:]]*Usd([^[:blank:]]+)[^\n]*$/\1/; s/\n/ /gp;}' file

1 Something here and else, 2000 edf, 60 pop 324.32

Or this awk would also work:

awk '$0 == 1 {
   printf "%s", $0
   getline
   printf " %s ", $0
   getline
   sub(/Usd/, "")
   print $1
}' file

1 Something here and else, 2000 edf, 60 pop 324.32

Solution 2:[2]

Pure Bash:

#! /bin/bash

exec <<EOF
*** *
more text with spaces and  tabs                                                             
*****
1
Something here and else, 2000 edf, 60 pop
    Usd324.32           2 Usd534.22
2
21st New tetx that will like to select with pattern, 334 pop
    Usd162.14

*** *
more text with spaces and tabs, unicode
*****
EOF

while read -r line1; do
  if [[ $line1 =~ ^1$ ]]; then
    read -r line2
    read -r line3col1 dontcare
    printf '%s %s %s\n' "$line1" "$line2" "${line3col1#Usd}"
  fi
done

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2 ceving