'Normalizing rows of pandas DF when there's string columns?
I'm trying to normalize a Pandas DF by row and there's a column which has string values which is causing me a lot of trouble. Anyone have a neat way to make this work?
For example:
system Fluency Terminology No-error Accuracy Locale convention Other
19 hyp.metricsystem2 111 28 219 98 0 133
18 hyp.metricsystem1 97 22 242 84 0 137
22 hyp.metricsystem5 107 11 246 85 0 127
17 hyp.eTranslation 49 30 262 80 0 143
20 hyp.metricsystem3 86 23 263 89 0 118
21 hyp.metricsystem4 74 17 274 70 0 111
I am trying to normalize each row from Fluency, Terminology, etc. Other over the total. In other words, divide each integer column entry over the total of each row (Fluency[0]/total_row[0], Terminology[0]/total_row[0], ...)
I tried using this command, but it's giving me an error because I have a column of strings
bad_models.div(bad_models.sum(axis=1), axis = 0)
Any help would be greatly appreciated...
Solution 1:[1]
Use select_dtypes to select numeric only columns:
subset = bad_models.select_dtypes('number')
bad_models[subset.columns] = subset.div(subset.sum(axis=1), axis=0)
print(bad_models)
# Output
system Fluency Terminology No-error Accuracy Locale convention Other
19 hyp.metricsystem2 0.211832 0.21374 0.145418 0.193676 0 0.172952
18 hyp.metricsystem1 0.185115 0.167939 0.160691 0.166008 0 0.178153
22 hyp.metricsystem5 0.204198 0.083969 0.163347 0.167984 0 0.16515
17 hyp.eTranslation 0.093511 0.229008 0.173971 0.158103 0 0.185956
20 hyp.metricsystem3 0.164122 0.175573 0.174635 0.175889 0 0.153446
21 hyp.metricsystem4 0.141221 0.129771 0.181939 0.13834 0 0.144343
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
