'Extracting multiple rows from file as per header
I have text file in the following format... # in the file are for comments which I have added...
REM XML :
SET DATAFORMAT DELIMITED
SET SEPARATOR ; # -------This is used as delimiter
SET THOUSAND ,
SET MR
REM ************************************************* # Part 1
REM Tr
CMD INSERT TABLE_NAME1 # --- Table name
ATT COLN1 COLN2 COLN3 COLN4 CODE COLN6 COLN7 COLN8 # --- Column names
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2; # --- Data starts
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2; # --- Data ends
REM ************************************************* # --- Another Part
REM Tr
CMD INSERT TABLE_NAME2 # --- Table Name
ATT COLN1 COLN2 COLN3 COLN4 COLN5 COLN6 CODE COLN8 # --- Column Name
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2; # --- Data Starts
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2; # --- Data Ends
I had written a shell script which was parsing the Part 1 of the file using awk. The parsing had the following steps...
- Get the delimiter from the 'SEPARATOR'
- Get the Field number of the
CODEcolumn from theATTrecord. - Extract the corresponding filed from the
DATArecords.
To do this, I did...
SEPARATOR=$(awk '/SEPARATOR/{print $0}' "$1")
# Store the field separatore in a variable.
SEP=$(echo "$SEPARATOR" | awk '{print $3}')
# Get the line that contains 'code' in the file!
CODE=$(awk '/code/{print $0}' "$1")
# Find the Field Number that contains the string 'code' in the record.
SEARCH_STRING=code
FIELD_NUMBER=$(echo "$CODE" | awk -v b="$SEARCH_STRING" '{for (i=1;i<=NF;i++) { if ($i == b) { print i } }}')
while IFS= read -r line || [[ -n "$line" ]]; do
#extract each field
done < "$1"
Now the requirement got changed and there shall be multiple headers in the same file as shown above. I have to keep track of the header as well. The final output will be 2 files with the header... File1.txt
REM XML_Description :
SET DATAFORMAT DELIMITED
SET SEPARATOR ;
SET THOUSAND ,
SET MR
REM *******************************************************************
REM Tr
CMD INSERT TABLE_NAME1
ATT COLN1 COLN2 COLN3 COLN4 CODE COLN6 COLN7 COLN8
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
File2.txt
REM XML_Description :
SET DATAFORMAT DELIMITED
SET SEPARATOR ;
SET THOUSAND ,
SET MR
REM *******************************************************************
REM Tr
CMD INSERT TABLE_NAME2
ATT COLN1 COLN2 COLN3 COLN4 COLN5 COLN6 CODE COLN8
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
DAT DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;DATA1;DATA2;
I am not getting how to split the file and also add the header so that I can pass these two files to the old parsing script...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
