'combining column with gaps with data from anothert csv file

I have exchange data for my thesis on 1 min interval. Now the data has the following columns: time_period_start;time_period_end;time_open;time_close;rate_open;rate_high;rate_low;rate_close;volume. I need to create one masterfile. I have imported all of the dates by ISO8601 standard. But the problem that im facing is that there are gaps in the data. For example for some exchanges there is not data for a certain minute(no orders). Following a screenshot to show it better:

raw data

How it should look after: afterwards

As you can see the raw data has gaps. And these gaps need to be filled likeabove stated.

Now i have a file with all the dates already there. So my idea was to basically scan the column and compare it to the other column, from the file with the gaps, and if the dates align, insert the closing price in the cel. But im really at a loss on how to do this. I have limited experience with data handling and beginner level in python.

If someone could help me or give any tips on how to tackle the problem, it would be greatly appreciated.

EDIT: as requested tables instead of pictures.

Table before:
time   price   volume
0      10      50
2      15      35
5      13      42

table after:
time   price volume
0      10    50
1      10    0
2      15    35
3      15    0
4      15    0
5      13    42

The table before is roughly a sketch on how the data is now. The table after is how it needs to be. Basically adding the missing rows. Then adding the price to the missing values in price column which needs to be the value of last row that had data. And lastly adding 0 for all non existing values in volume column

I have a way of determining what the missing rows are like this:

df = pd.read_csv(datafile, delimiter=';') # file with data
dm = pd.read_csv(masterfile, delimiter=';') # file with correct time column
x = dm[~dm.time_period_end.isin(df.time_period_end)]
print(x)

Solved:

df = pd.read_csv(datafile, delimiter=';') # file with data
dm = pd.read_csv(masterfile, delimiter=';') # file with correct time column
x = dm[~dm.time_period_end.isin(df.time_period_end)]
df = df.append(x)
df = df.sort_values(by=column1, ascending=True)
df['volume_traded'] = df['volume_traded'].fillna(0)
df['trades_count'] = df['trades_count'].fillna(0)
df['price_close'] = df['price_close'].fillna(method='ffill')
df = df.drop(columns=['time_period_start', 'time_open', 'time_close', 'price_open', 'price_high', 'price_low'])

df.to_csv('combined.csv', index=False, sep=';')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'combining column with gaps with data from anothert csv file

Sources

Related Questions