'How to find recent values inside a list?

I have a list of dates with a format %y-%m-%d and a single value. I need to find all the dates that are more recent than value

dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'

the output should be false,false,false,true,true,true. How can I perform this in a fast way without a for loop?



Solution 1:[1]

You can use numpy to let the underlying c implementation do the looping (which is way faster then pure python)

import numpy as np
dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'
dates = np.array(dates, dtype='datetime64')
value = np.array(value, dtype='datetime64')
print(dates > value)

Solution 2:[2]

The straightforward list comprehension:

output = [date > value for date in dates]

Benchmark results with the suggested NumPy solution:

len(dates) = 6:
Python is 6.6 times faster than NumPy
Python is 6.5 times faster than NumPy
Python is 7.0 times faster than NumPy

len(dates) = 600:
Python is 3.4 times faster than NumPy
Python is 3.5 times faster than NumPy
Python is 3.5 times faster than NumPy

len(dates) = 60000:
Python is 4.1 times faster than NumPy
Python is 4.0 times faster than NumPy
Python is 4.1 times faster than NumPy

Benchmark code (Try it online!):

from timeit import timeit
import numpy as np

dates = ['22-02-10','22-02-11','22-02-12','22-02-13','22-02-14','22-02-15']
value = '22-02-12'

def f(dates, value):
    dates = np.array(dates, dtype='datetime64')
    value = np.array(value, dtype='datetime64')
    return dates > value

def g(dates, value):
    return [date > value for date in dates]

def test(dates, value, number):
    print(f'\nlen(dates) = {len(dates)}:')
    for _ in range(3):
        tf = timeit(lambda: f(dates, value), number=number)
        tg = timeit(lambda: g(dates, value), number=number)
        print(f'Python is {tf/tg:.1f} times faster than NumPy')

test(dates, value, 10000)
test(dates * 100, value, 1000)
test(dates * 10000, value, 10)

Solution 3:[3]

I can't say about speed, but maybe this could be a solution without for loop at all:

dates = ['22-02-10', '22-02-11', '22-02-12', '22-02-13', '22-02-14', '22-02-15']
value = '22-02-12'


def printing(the_list, element):
    if dates[element] < value or dates[element] == value:
        print("false")
        new_element = element + 1
        if new_element < len(the_list):
            printing(the_list, new_element)
    else:
        print("true")
        new_element = element + 1
        if new_element < len(the_list):
            printing(the_list, new_element)


printing(dates, 0)

What I am doing here is basically I am using recursion :D

P.S. I'm pretty sure there must some fancy super mega mathematic algorithm there, that can solve this with a much shorter code, and much faster, but... This is all my poor mind came up with :D

UPDATE

According to this site my code is 3 times faster than other NumPy answer's code. So I'm proud of myself :D

My code gives: 0.030 s. to run

The NumPy code: 0.126 s. to run

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Kelly Bundy
Solution 3