'How do I find whether an entry in a list is larger than the next 50 numbers of that list, without using for-loops?
I have a very long list of continuous data. For each list entry, I want to know whether it is larger than the next 50 entries to come. Since it is a lot of data, it needs to be efficient, but I don't come much further than this:
list = [5,3,7,4,7,12,6,3,2,1,6 .... 5] # n = >80.000.000
new_list = list()
for i,val in enumerate(list):
if val > max(list[i:i+50]):
new_list.append(1)
else:
new_list.append(0)
Can some help me vectorize this problem? Or some suggestions how to go from here?
Solution 1:[1]
The best way is to use PySpark.
If this method cannot be applied, multiprocessing is applied.
from joblib import Parallel, delayed
def func(data, index):
return True if data[index] > max(data[index : index + 50]) else False
src = [5,3,7,4,7,12,6,3,2,1,6,5,5,3,7,4,7,12,6,3,2,1,6,5,5,3,7,4,7,12,6,3,2,1,6,5,5,3,7,4,7,12,6,3,2,1,6,5,5,3,7,4,7,12,6,3,2,1,6,5]
dst = Parallel(n_jobs=-1, backend="threading", verbose=10, max_nbytes=None)(
delayed(func)(src, index) for index, value in enumerate(src)
)
print(dst)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ??? |
