'python regex to read text file and split row to column

My text file format is like below

ID  col_A   col_B   col_C
1   0.26    0.11    0.18
2   0.27    0.12    0.17
3   0.21    0.10    0.15
----------------------------
AVG 0.25    0.11    0.17
----------------------------
ID  col_D   col_E   col_F
1   0.23    0.18    0.20
2   0.24    0.14    0.17
3   0.23    0.10    0.13
----------------------------
AVG 0.23    0.14    0.17
----------------------------

I'm attempting to use python and regex to export two separate csv files with the format like below

Table 1

ID	col_A	col_B	col_C	col_D	col_E	col_F
1	0.26	0.11	0.18	0.23	0.18	0.20
2	0.27	0.12	0.17	0.24	0.14	0.17
3	0.21	0.10	0.15	0.23	0.10	0.13

Table 2

	col_A	col_B	col_C	col_D	col_E	col_F
AVG	0.25	0.11	0.17	0.23	0.14	0.17

Here's my code:

import re
import pandas as pd


with open('test.txt') as file:
        lines = file.readlines()
        regex = r'\A(?P<ID>\S+)\s*(?P<COL_A>\S+)\s*(?P<COL_B>\S+)\s*(?P<COL_C>\S+)'
        data = []
        
        for line in lines:
            m = re.search(regex, line)
            if m != None:
                data.append([m.group(1),m.group(2),m.group(3),m.group(4)])
        
        df = pd.DataFrame(data)
        
df.to_csv('test.csv', index = False)

My code would result in a strange format like

0	1	2	3
ID	col_A	col_B	col_C
1	0.26	0.11	0.18
2	0.27	0.12	0.17
3	0.21	0.10	0.15
------	---------	---------	---------
AVG	0.25	0.11	0.17
------	---------	---------	---------
ID	col_D	col_E	col_F
1	0.23	0.18	0.20
2	0.24	0.14	0.17
3	0.23	0.10	0.13
------	---------	---------	---------
AVG	0.23	0.14	0.17
------	---------	---------	---------

How can I modify my code to achieve my request? Thank you!

Solution 1:^[1]

Something like this should work for you:

import re
import pandas as pd
import numpy as np

data = []
headers = []
averages = []
ids = []
with open('/content/text.txt') as file:
  for l in file.readlines():
      s = re.findall(r'\S+', l.rstrip().split(',')[0])
      if '-' not in s[0]:
        if 'ID' in s[0]: headers.append(s)
        elif 'AVG' in s[0]: averages.append(s)
        else: 
          data.append(s[1:])
          ids.append(s[0])

data = np.hstack([data[:len(data)//len(headers)], data[len(data)//len(headers):]])
data = np.concatenate([np.expand_dims(np.transpose(sorted(set(ids))), axis=-1), data], axis=1)
columns = sorted(set(np.concatenate(headers)))
df = pd.DataFrame(data, columns=columns).astype({'ID': 'int64'})
main_df = df.set_index('ID')

Table 1:

   col_A col_B col_C col_D col_E col_F
ID                                    
1   0.26  0.11  0.18  0.23  0.18  0.20
2   0.27  0.12  0.17  0.24  0.14  0.17
3   0.21  0.10  0.15  0.23  0.10  0.13

Table 2:

average_df = pd.DataFrame(np.expand_dims(list(map(float, [d for d in np.concatenate(averages) if 'AVG' not in d])), axis=0), columns=columns[1:])
average_df = average_df.rename_axis('AVG')

     col_A  col_B  col_C  col_D  col_E  col_F
AVG                                          
0     0.25   0.11   0.17   0.23   0.14   0.17

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'python regex to read text file and split row to column

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]