'How to find recent values inside a list?
I have a list of dates with a format %y-%m-%d and a single value. I need to find all the dates that are more recent than value
dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'
the output should be false,false,false,true,true,true. How can I perform this in a fast way without a for loop?
Solution 1:[1]
You can use numpy to let the underlying c implementation do the looping (which is way faster then pure python)
import numpy as np
dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'
dates = np.array(dates, dtype='datetime64')
value = np.array(value, dtype='datetime64')
print(dates > value)
Solution 2:[2]
The straightforward list comprehension:
output = [date > value for date in dates]
Benchmark results with the suggested NumPy solution:
len(dates) = 6:
Python is 6.6 times faster than NumPy
Python is 6.5 times faster than NumPy
Python is 7.0 times faster than NumPy
len(dates) = 600:
Python is 3.4 times faster than NumPy
Python is 3.5 times faster than NumPy
Python is 3.5 times faster than NumPy
len(dates) = 60000:
Python is 4.1 times faster than NumPy
Python is 4.0 times faster than NumPy
Python is 4.1 times faster than NumPy
Benchmark code (Try it online!):
from timeit import timeit
import numpy as np
dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'
def f(dates, value):
dates = np.array(dates, dtype='datetime64')
value = np.array(value, dtype='datetime64')
return dates > value
def g(dates, value):
return [date > value for date in dates]
def test(dates, value, number):
print(f'\nlen(dates) = {len(dates)}:')
for _ in range(3):
tf = timeit(lambda: f(dates, value), number=number)
tg = timeit(lambda: g(dates, value), number=number)
print(f'Python is {tf/tg:.1f} times faster than NumPy')
test(dates, value, 10000)
test(dates * 100, value, 1000)
test(dates * 10000, value, 10)
Solution 3:[3]
I can't say about speed, but maybe this could be a solution without for loop at all:
dates = ['22-02-10', '22-02-11', '22-02-12', '22-02-13', '22-02-14', '22-02-15']
value = '22-02-12'
def printing(the_list, element):
if dates[element] < value or dates[element] == value:
print("false")
new_element = element + 1
if new_element < len(the_list):
printing(the_list, new_element)
else:
print("true")
new_element = element + 1
if new_element < len(the_list):
printing(the_list, new_element)
printing(dates, 0)
What I am doing here is basically I am using recursion :D
P.S. I'm pretty sure there must some fancy super mega mathematic algorithm there, that can solve this with a much shorter code, and much faster, but... This is all my poor mind came up with :D
UPDATE
According to this site my code is 3 times faster than other NumPy answer's code. So I'm proud of myself :D
My code gives: 0.030 s. to run
The NumPy code: 0.126 s. to run
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Kelly Bundy |
| Solution 3 |
