'Count number of similar columns
I have a data frame which looks like this:
Code (index column) sensor1 sensor2 sensor3 ... sensorn
111 21% 3% 20% 5%
121 23% 80% 19% ... 4%
131 50% 1% 53% ... 4%
........................................
1n1 1% 2% 2% ... 5%
And would like first to round numbers (e.g. to 4% would be 0, from 5% to 14% would be 10%, 15% to 24% would be 20%, etc.). So my dataframe, after this step would look like this:
Code (index column) sensor1 sensor2 sensor3 ... sensorn
111 20% 0% 20% 10%
121 20% 80% 20% ... 0%
131 50% 0% 50% ... 0%
........................................
1n1 0% 0% 0% ... 10%
Next I would like to keep distinct columns and add new row that would indicate the number of same columns:
Code (index column) sensor1 sensor2 ... sensorn
111 20% 0% 10%
121 20% 80% ... 0%
131 50% 0% ... 0%
...................................
1n1 0% 0% ... 10%
-------------------------------------------------------------------
count 2 1 1
In this case, the sensors' numbers in column names don't represent anything meaningful. I rounded the numbers with the function round(), but I do not know how to proceed. Can someone help me? Btw my data frame is really big.
Solution 1:[1]
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"sensor1": ["20%", "20%", "50%"],
"sensor2": ["0%", "80", "0%"],
"sensor3": ["20%", "20%", "50%"],
"sensor4": ["20%", "20%", "50%"],
}
)
print(df)
# Output
sensor1 sensor2 sensor3 sensor4
0 20% 0% 20% 20%
1 20% 80 20% 20%
2 50% 0% 50% 50%
Here is one way to do it:
columns = df.columns.to_list()
counts = [0] * len(columns)
# Count identical columns
for i, col in enumerate(columns):
for other_col in columns:
if col != other_col and df[col].eq(df[other_col]).all():
counts[i] += 1
# Add new row
df.loc["count", :] = counts
# Remove duplicated columns
df = df.T.drop_duplicates(subset=df.index[-1]).T
print(df)
# Output
sensor1 sensor2
0 20% 0%
1 20% 80
2 50% 0%
count 2 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Laurent |
