'Delete all columns for which value repents consecutively more than 3 times
I have adf that looks like this:
| date | stock1 | stock2 | stock3 | stock4 | stock5 | stock6 | stock7 | stock8 | stock9 | stock10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 10/20 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.9 |
| 11/20 | 0.1 | 0.9 | 0.3 | 0.4 | 0.3 | 0.5 | 0.3 | 0.2 | 0.4 | 0.1 |
| 12/20 | 0.1 | 0.6 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.7 | 0.9 | 0.1 |
| 10/20 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.9 |
| 11/20 | 0.8 | 0.9 | 0.3 | 0.4 | 0.3 | 0.5 | 0.3 | 0.2 | 0.9 | 0.1 |
| 12/20 | 0.3 | 0.6 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.7 | 0.9 | 0.1 |
| 10/20 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.7 | 0.7 | 0.8 | 0.9 | 0.9 |
| 11/20 | 0.8 | 0.9 | 0.3 | 0.4 | 0.3 | 0.7 | 0.3 | 0.2 | 0.4 | 0.1 |
| 12/20 | 0.3 | 0.6 | 0.9 | 0.5 | 0.6 | 0.7 | 0.8 | 0.7 | 0.9 | 0.1 |
I want to delete all columns for which the same value repeats, consecutively, more than 3 times. In this example, the columns "stock1", "stock6" and "stock9" should be deleted. In the other columns, we have repeating values more than 3 times, but not one after the other. I think I can adapt the code from that question Removing values that repeat more than 5 times in Pandas DataFrame, but I could not manage to do that yet.
Solution 1:[1]
You could want avoid apply here:
N = 3
df.loc[:,
df.set_index('date')
.ne(df.shift()).cumsum()
.stack()
.groupby(level=1)
.value_counts()
.max(level=0).le(N)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
