'Extracting multiple rows from file as per header

I have text file in the following format... # in the file are for comments which I have added...

REM XML :  
SET DATAFORMAT DELIMITED
SET SEPARATOR ;   # -------This is used as delimiter
SET THOUSAND ,
SET MR

REM ************************************************* # Part 1
REM Tr
CMD INSERT TABLE_NAME1                                # --- Table name
ATT COLN1 COLN2 COLN3 COLN4 CODE COLN6 COLN7 COLN8    # --- Column names
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;  # --- Data starts
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;  # --- Data ends

REM ************************************************* # --- Another Part
REM Tr
CMD INSERT TABLE_NAME2                                # --- Table Name
ATT COLN1 COLN2 COLN3 COLN4 COLN5 COLN6 CODE COLN8    # --- Column Name
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;  # --- Data Starts
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;  # --- Data Ends

I had written a shell script which was parsing the Part 1 of the file using awk. The parsing had the following steps...

  1. Get the delimiter from the 'SEPARATOR'
  2. Get the Field number of the CODE column from the ATT record.
  3. Extract the corresponding filed from the DATA records.

To do this, I did...

SEPARATOR=$(awk '/SEPARATOR/{print $0}' "$1") 
# Store the field separatore in a variable.
SEP=$(echo "$SEPARATOR"  | awk '{print $3}')
# Get the line that contains 'code' in the file!
CODE=$(awk '/code/{print $0}' "$1") 
# Find the Field Number that contains the string 'code' in the record.
SEARCH_STRING=code
FIELD_NUMBER=$(echo "$CODE" | awk -v b="$SEARCH_STRING" '{for (i=1;i<=NF;i++) { if ($i == b) { print i } }}')
 while IFS= read -r line || [[ -n "$line" ]]; do
      #extract each field
 done < "$1"

Now the requirement got changed and there shall be multiple headers in the same file as shown above. I have to keep track of the header as well. The final output will be 2 files with the header... File1.txt

REM XML_Description :  
SET DATAFORMAT DELIMITED
SET SEPARATOR ;
SET THOUSAND ,
SET MR

REM *******************************************************************
REM Tr
CMD INSERT TABLE_NAME1
ATT COLN1 COLN2 COLN3 COLN4 CODE COLN6 COLN7 COLN8
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;

File2.txt

REM XML_Description :  
SET DATAFORMAT DELIMITED
SET SEPARATOR ;
SET THOUSAND ,
SET MR

REM *******************************************************************
REM Tr
CMD INSERT TABLE_NAME2
ATT COLN1 COLN2 COLN3 COLN4 COLN5 COLN6 CODE COLN8
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;

I am not getting how to split the file and also add the header so that I can pass these two files to the old parsing script...



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source