'pandas - fill in default values for missing records
I would like to fill in missing data in my dataset in the following way:
My data can contain one of the following shapes:
['circle', 'square', 'triangle']
Each shape has width and height attributes,
I would like to fill in missing data: in case my dataset is missing one of the shapes, it will add it with default values
(default values for the example: 'height' = 10, , 'width'= 5)
example: if this is my dataset:
data = [
{'shape': 'circle', 'height': 5, 'width': 4},
{'shape': 'circle', 'height': 2, 'width': 3},
{'shape': 'square', 'height': 6, , 'width': 5}
]
missing: triangle, fill it with default width & height,
result will be:
data = [
{'shape': 'circle', 'height': 5, 'width': 4},
{'shape': 'circle', 'height': 2, 'width': 3},
{'shape': 'square', 'height': 6, 'width': 2},
{'shape': 'triangle', 'height': 10, 'width': 5} # added by code with default values
]
my code:
shapes = ['circle', 'square', 'triangle']
df = pd.DataFrame(data)
# df.?? some code to fill in missing data
Solution 1:[1]
I would add the entries with missing values to the data before making the dataframe, so that it already has the right number of rows. Then you can fill in the missing values by passing a dictionary with the default values to df.fillna():
import pandas as pd
data = [
{'shape': 'circle', 'height': 5, 'width': 4},
{'shape': 'circle', 'height': 2, 'width': 3},
{'shape': 'square', 'height': 6, 'width': 5}
]
shapes = ['circle', 'square', 'triangle']
shapes_present = {d['shape'] for d in data}
shapes_missing = set(shapes) - shapes_present
for shape in shapes_missing:
data.append({'shape': shape})
df = pd.DataFrame(data)
df.fillna({'height': 10, 'width': 5}, inplace=True)
df
shape height width
0 circle 5.0 4.0
1 circle 2.0 3.0
2 square 6.0 5.0
3 triangle 10.0 5.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
