'Randomly select values from a list column so that all elements across lists are selected
Say, I had a pandas dataframe with a list column 'event_ids'
code canceled event_ids
xxx [1.0] [107385, 128281, 133015]
xxS [0.0] [108664, 110515, 113556]
ssD [1.0] [134798, 133499, 125396, 114298, 133915]
cvS [0.0] [107611]
eeS [5.0] [113472, 115236, 108586, 128043, 114106, 10796...
544W [44.0] [107650, 128014, 127763, 118036, 116247, 12802.
How to select k rows sufficiently randomly so that all elements across 'event_ids' are represented in the sample? By that I mean the event vocabulary in samples should be same as that of the population. By 'sufficiently' random I mean if some sort of importance sampling is possible so that initially the samples are random and added or rejected according to some condition.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
