'reading data-frame with missing values
I am trying to read some df with few columns and few rows where in some rows data are missing. For example df looks like this, also elements of the df are separated sometimes with uneven number of spaces:
0.5 0.03
0.1 0.2 0.3 2
0.2 0.1 0.1 0.3
0.5 0.03
0.1 0.2 0.3 2
Is there any way to extract this:
0.1 0.2 0.3 2
0.2 0.1 0.1 0.3
0.1 0.2 0.3 2
Any suggestions.
Thanks.
Solution 1:[1]
You can parse manually your file:
import re
with open('data.txt') as fp:
df = pd.DataFrame([re.split(r'\s+', l.strip()) for l in fp]).dropna(axis=0)
Output:
>>> df
0 1 2 3
1 0.1 0.2 0.3 2
2 0.2 0.1 0.1 0.3
4 0.1 0.2 0.3 2
Solution 2:[2]
You can try this:
import pandas as pd
import numpy as np
df = {
'col1': [0.5, 0.1, 0.2, 0.5, 0.1],
'col2': [0.03, 0.2, 0.1, 0.03, 0.2],
'col3': [np.nan, 0.3, 0.1, np.nan, 0.3],
'col4': [np.nan, 2, 0.3, np.nan, 2]
}
data = pd.DataFrame(df)
print(data.dropna(axis=0))
Output:
col1 col2 col3 col4
0.1 0.2 0.3 2.0
0.2 0.1 0.1 0.3
0.1 0.2 0.3 2.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 |
