'How to do in Python a complex selection of rows in Pandas dataframe
I have a big df like below (just show the first lines, the real one has more than 60000k rows):
Id Name Age Friends
0 Will 33 385
1 Jean 26 2
2 Hugh 55 221
3 Deanna 40 465
4 Quark 68 21
5 Weyoun 59 318
6 Gowron 37 220
7 Will 54 307
8 Jadzia 38 380
9 Hugh 27 181
10 Odo 53 191
11 Ben 57 372
........
I would like to store in another dataframe that every 100 values insert 12.
I know that with .loc and .iloc you can store 1 value each n values (100 in the example below):
df1 = df.loc[::100]
I am trying not to iterate with a for within the dataframe since the df is so large, the process slows down a lot, is there any way with .loc to achieve this complex row selection?
Solution 1:[1]
You can actually just trim off all the hundreds off the index values, so e.g. 200-300 becomes 0-100, 123000-124000 becomes 0-100, etc., and then filter for values less than 12:
filtered = df[df.index % 100 < 12]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
