'How to unify units?
I am a python.new who need some help in the following question:
I got a dataframe like this.
df:
| index | height | unit |
|---|---|---|
| 0 | 181.5 | cm |
| 1 | 72.5 | inches |
| 2 | 168.0 | cm |
| 3 | NaN | NaN |
| .. | .. | .. |
...2000+rows
df = pd.DataFrame(data=[[181.5,'cm'],
[72.5,'inches'],
[168.0,'cm'],
['NaN','NaN']],
columns = ['height','unit'],
index=[1,2,3,4])
I want to unify the unit to "cm", and make corresponding changes to height, and keep the 'NaN's.
Solution 1:[1]
Use a dictionary to map conversion factors and use indexing to update the values/units:
# ensure real NaNs:
df = df.replace('NaN', np.nan)
# set up dictionary of conversion factors
d = {'cm': 1, 'inches': 2.54}
# map converted heights
df['height'] = df['height'].mul(df['unit'].map(d))
# update units
df.loc[df['unit'].isin(d), 'unit'] = 'cm'
output:
height unit
1 181.50 cm
2 184.15 cm
3 168.00 cm
4 NaN NaN
handling unknown units
if you want to handle the case of values for which units are unknown and leave them unchanged, use map(lambda x: d.get(x, 1)) instead of map
Solution 2:[2]
Adjusted from this solution using a mask:
mask = (df['unit'] == 'inches')
df_inches = df[mask]
df.loc[mask, 'height'] = df_inches['height'] * 2.54
df.loc[mask, 'unit'] = 'cm'
print(df)
Output:
height unit
1 181.5 cm
2 184.15 cm
3 168.0 cm
4 NaN NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | white |
