'Replace numbers from fileinput maintaining old file formatting and ignoring comma

Hi I have a config file from another program that I want to manipulate with python. The only thing that needs to be replaced are the numbers from the file. The file itself has some tabs and withespaces and commas. I was able to read the file with pandas only keeping the relevant information. Now I want to replace the numbers inside the file with the numbers from my pandas dataframe. This is my code so far:

import pandas as pd
import fileinput

params = pd.read_csv("params.txt", skip_blank_lines=True, delim_whitespace=True, skiprows=1, header=None, names=["paramname", "equal", "param1", "param2"])
#params.columns = params.columns.str.strip()
params.drop(params.columns[1], axis=1, inplace=True)
params.replace(',','', regex=True, inplace=True)
params.replace('\t','', regex=True, inplace=True)
params.replace(' ','', regex=True, inplace=True)

it = 0
rowidx = 0
colidx = "param1"


with fileinput.FileInput("params.txt", inplace=True, backup='.bak') as file:
    for line in file:
        if (it % 2 == 0):
            colidx = "param1"
        else:
            colidx = "param2"
        print(' '.join([params.loc[rowidx, colidx] if x.lstrip('-,').isnumeric() else x for x in line.split()]))
        if (it % 2):
            rowidx += 1
        it += 1

Most things are working as I intended but I still have some issues that I cant find a solution for. I was able to also replace negative numbers with the lstrip function but there are also some numbers that have a comma after the digit and lstrip('-,') doesnt seem to work for that. Another problem is that it seems that although the numbers are replaced correctly the rewrite of the file replaces the formatting of the original file (tabs are beeing removed).

So my questions are:

  1. How can I replace numbers that are followed by a comma without replacing the comma itself.
  2. How to I maintain the original formatting of the inputfile (tabs etc.)?
  3. How to take float numbers like 0.5 also into account?

This is how the file looks like:

Parameter File

QQ  =   7,  6
RR  =   5,  5
SS  =   0,  0

ay_on   =   0,  0
by_on   =   0,  1
cvc_on  =   1
mvc_on  =   1
rc      =   0

adus    =   -200,   -200
bdus    =   200,    200
au      =   -5, -2
bu      =   5,  2
ay      =   7,  0
by      =   10, 0.5
con_Ayl =
con_byl =

Hp      =   100
Hu      =   20
Hw      =   1
Tsamp   =   100

model   =   1

soft    =   3
Qfac    =   5
sgm_r   =   0.01,   0.01
Tobs    =   -1

mpc_on  =   1

And this is how the dataframe looks like:

   paramname param1  param2
0         QQ      7    6.00
1         RR      5    5.00
2         SS      0    0.00
3      ay_on      0    0.00
4      by_on      0    1.00
5     cvc_on      1     NaN
6     mvc_on      1     NaN
7         rc      0     NaN
8       adus   -200 -200.00
9       bdus    200  200.00
10        au     -5   -2.00
11        bu      5    2.00
12        ay      7    0.00
13        by     10    0.50
14   con_Ayl    NaN     NaN
15   con_byl    NaN     NaN
16        Hp    100     NaN
17        Hu     20     NaN
18        Hw      1     NaN
19     Tsamp    100     NaN
20     model      1     NaN
21      soft      3     NaN
22      Qfac      5     NaN
23     sgm_r   0.01    0.01
24      Tobs     -1     NaN
25    mpc_on      1     NaN

EDIT: I solved the problem with the comma and floats with:

print(' '.join([params.loc[rowidx, colidx] if (x.lstrip('-').isnumeric() or isfloat(x.lstrip('-'))) else params.loc[rowidx, colidx] + ',' if x.endswith(',') else x for x in line.split()]))

But the problem that all the tabs are removed from the file still is there. How can I prevent this?



Solution 1:[1]

I prefer to parse the file and then generate the records for pd.Dataframe.

import pandas as pd
import numpy as np


def parse_file_to_records(file_name):
    records = []
    with open(file_name, 'r') as f:
        lines = f.readlines()
        # skip the first line
        for line in lines[1:]:
            line = line.strip().replace(' ', '')
            if not line:
                continue
            paramname, paramvalue = line.split('=')
            param2 = np.nan
            try:
                param1, param2 = paramvalue.split(',')
            except ValueError:
                param1 = paramvalue or np.nan
            records.append({
                'paramname': paramname,
                'param1': param1,
                'param2': param2,
            })
    return records


df = pd.DataFrame.from_records(parse_file_to_records('params.txt'))
print(df

)

and the result looks like this

   paramname param1 param2
0         QQ      7      6
1         RR      5      5
2         SS      0      0
3      ay_on      0      0
4      by_on      0      1
5     cvc_on      1    NaN
6     mvc_on      1    NaN
7         rc      0    NaN
8       adus   -200   -200
9       bdus    200    200
10        au     -5     -2
11        bu      5      2
12        ay      7      0
13        by     10    0.5
14   con_Ayl    NaN    NaN
15   con_byl    NaN    NaN
16        Hp    100    NaN
17        Hu     20    NaN
18        Hw      1    NaN
19     Tsamp    100    NaN
20     model      1    NaN
21      soft      3    NaN
22      Qfac      5    NaN
23     sgm_r   0.01   0.01
24      Tobs     -1    NaN
25    mpc_on      1    NaN

If you want to judge if a string is valid float, you can write a simple function like this:

def isfloat(num):
    try:
        float(num)
        return True
    except ValueError:
        return False


print(isfloat('1.23'))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1