'Count number of similar columns

I have a data frame which looks like this:

Code (index column)   sensor1      sensor2     sensor3   ...   sensorn
        111             21%           3%         20%              5%
        121             23%           80%        19%     ...      4%
        131             50%           1%         53%     ...      4%
                         ........................................
        1n1             1%            2%          2%     ...      5%

And would like first to round numbers (e.g. to 4% would be 0, from 5% to 14% would be 10%, 15% to 24% would be 20%, etc.). So my dataframe, after this step would look like this:

Code (index column)   sensor1      sensor2     sensor3   ...   sensorn
        111             20%           0%         20%              10%
        121             20%           80%        20%     ...      0%
        131             50%           0%         50%     ...      0%
                     ........................................
        1n1             0%            0%          0%     ...      10%

Next I would like to keep distinct columns and add new row that would indicate the number of same columns:

Code (index column)   sensor1      sensor2     ...   sensorn
        111             20%           0%                 10%
        121             20%           80%       ...      0%
        131             50%           0%        ...      0%
                     ...................................
        1n1             0%            0%        ...      10%
        -------------------------------------------------------------------
        count           2              1                  1

In this case, the sensors' numbers in column names don't represent anything meaningful. I rounded the numbers with the function round(), but I do not know how to proceed. Can someone help me? Btw my data frame is really big.



Solution 1:[1]

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "sensor1": ["20%", "20%", "50%"],
        "sensor2": ["0%", "80", "0%"],
        "sensor3": ["20%", "20%", "50%"],
        "sensor4": ["20%", "20%", "50%"],
    }
)
print(df)
# Output
  sensor1 sensor2 sensor3 sensor4
0     20%      0%     20%     20%
1     20%      80     20%     20%
2     50%      0%     50%     50%

Here is one way to do it:

columns = df.columns.to_list()
counts = [0] * len(columns)

# Count identical columns
for i, col in enumerate(columns):
    for other_col in columns:
        if col != other_col and df[col].eq(df[other_col]).all():
            counts[i] += 1

# Add new row
df.loc["count", :] = counts

# Remove duplicated columns
df = df.T.drop_duplicates(subset=df.index[-1]).T

print(df)
# Output
      sensor1 sensor2
0         20%      0%
1         20%      80
2         50%      0%
count       2       0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent