'Get the first item from an iterable that matches a condition
I would like to get the first item from a list matching a condition. It's important that the resulting method not process the entire list, which could be quite large. For example, the following function is adequate:
def first(the_iterable, condition = lambda x: True):
for i in the_iterable:
if condition(i):
return i
This function could be used something like this:
>>> first(range(10))
0
>>> first(range(10), lambda i: i > 3)
4
However, I can't think of a good built-in / one-liner to let me do this. I don't particularly want to copy this function around if I don't have to. Is there a built-in way to get the first item matching a condition?
Solution 1:[1]
Damn Exceptions!
I love this answer. However, since next() raise a StopIteration exception when there are no items,
i would use the following snippet to avoid an exception:
a = []
item = next((x for x in a), None)
For example,
a = []
item = next(x for x in a)
Will raise a StopIteration exception;
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Solution 2:[2]
As a reusable, documented and tested function
def first(iterable, condition = lambda x: True):
"""
Returns the first item in the `iterable` that
satisfies the `condition`.
If the condition is not given, returns the first item of
the iterable.
Raises `StopIteration` if no item satysfing the condition is found.
>>> first( (1,2,3), condition=lambda x: x % 2 == 0)
2
>>> first(range(3, 100))
3
>>> first( () )
Traceback (most recent call last):
...
StopIteration
"""
return next(x for x in iterable if condition(x))
Version with default argument
@zorf suggested a version of this function where you can have a predefined return value if the iterable is empty or has no items matching the condition:
def first(iterable, default = None, condition = lambda x: True):
"""
Returns the first item in the `iterable` that
satisfies the `condition`.
If the condition is not given, returns the first item of
the iterable.
If the `default` argument is given and the iterable is empty,
or if it has no items matching the condition, the `default` argument
is returned if it matches the condition.
The `default` argument being None is the same as it not being given.
Raises `StopIteration` if no item satisfying the condition is found
and default is not given or doesn't satisfy the condition.
>>> first( (1,2,3), condition=lambda x: x % 2 == 0)
2
>>> first(range(3, 100))
3
>>> first( () )
Traceback (most recent call last):
...
StopIteration
>>> first([], default=1)
1
>>> first([], default=1, condition=lambda x: x % 2 == 0)
Traceback (most recent call last):
...
StopIteration
>>> first([1,3,5], default=1, condition=lambda x: x % 2 == 0)
Traceback (most recent call last):
...
StopIteration
"""
try:
return next(x for x in iterable if condition(x))
except StopIteration:
if default is not None and condition(default):
return default
else:
raise
Solution 3:[3]
The most efficient way in Python 3 are one of the following (using a similar example):
With "comprehension" style:
next(i for i in range(100000000) if i == 1000)
WARNING: The expression works also with Python 2, but in the example is used range that returns an iterable object in Python 3 instead of a list like Python 2 (if you want to construct an iterable in Python 2 use xrange instead).
Note that the expression avoid to construct a list in the comprehension expression next([i for ...]), that would cause to create a list with all the elements before filter the elements, and would cause to process the entire options, instead of stop the iteration once i == 1000.
With "functional" style:
next(filter(lambda i: i == 1000, range(100000000)))
WARNING: This doesn't work in Python 2, even replacing range with xrange due that filter create a list instead of a iterator (inefficient), and the next function only works with iterators.
Default value
As mentioned in other responses, you must add a extra-parameter to the function next if you want to avoid an exception raised when the condition is not fulfilled.
"functional" style:
next(filter(lambda i: i == 1000, range(100000000)), False)
"comprehension" style:
With this style you need to surround the comprehension expression with () to avoid a SyntaxError: Generator expression must be parenthesized if not sole argument:
next((i for i in range(100000000) if i == 1000), False)
Solution 4:[4]
Similar to using ifilter, you could use a generator expression:
>>> (x for x in xrange(10) if x > 5).next()
6
In either case, you probably want to catch StopIteration though, in case no elements satisfy your condition.
Technically speaking, I suppose you could do something like this:
>>> foo = None
>>> for foo in (x for x in xrange(10) if x > 5): break
...
>>> foo
6
It would avoid having to make a try/except block. But that seems kind of obscure and abusive to the syntax.
Solution 5:[5]
I would write this
next(x for x in xrange(10) if x > 3)
Solution 6:[6]
For anyone using Python 3.8 or newer I recommend using "Assignment Expressions" as described in PEP 572 -- Assignment Expressions.
if any((match := i) > 3 for i in range(10)):
print(match)
Solution 7:[7]
The itertools module contains a filter function for iterators. The first element of the filtered iterator can be obtained by calling next() on it:
from itertools import ifilter
print ifilter((lambda i: i > 3), range(10)).next()
Solution 8:[8]
For older versions of Python where the next built-in doesn't exist:
(x for x in range(10) if x > 3).next()
Solution 9:[9]
By using
(index for index, value in enumerate(the_iterable) if condition(value))
one can check the condition of the value of the first item in the_iterable, and obtain its index without the need to evaluate all of the items in the_iterable.
The complete expression to use is
first_index = next(index for index, value in enumerate(the_iterable) if condition(value))
Here first_index assumes the value of the first value identified in the expression discussed above.
Solution 10:[10]
This question already has great answers. I'm only adding my two cents because I landed here trying to find a solution to my own problem, which is very similar to the OP.
If you want to find the INDEX of the first item matching a criteria using generators, you can simply do:
next(index for index, value in enumerate(iterable) if condition)
Solution 11:[11]
In Python 3:
a = (None, False, 0, 1)
assert next(filter(None, a)) == 1
In Python 2.6:
a = (None, False, 0, 1)
assert next(iter(filter(None, a))) == 1
EDIT: I thought it was obvious, but apparently not: instead of None you can pass a function (or a lambda) with a check for the condition:
a = [2,3,4,5,6,7,8]
assert next(filter(lambda x: x%2, a)) == 3
Solution 12:[12]
You could also use the argwhere function in Numpy. For example:
i) Find the first "l" in "helloworld":
import numpy as np
l = list("helloworld") # Create list
i = np.argwhere(np.array(l)=="l") # i = array([[2],[3],[8]])
index_of_first = i.min()
ii) Find first random number > 0.1
import numpy as np
r = np.random.rand(50) # Create random numbers
i = np.argwhere(r>0.1)
index_of_first = i.min()
iii) Find the last random number > 0.1
import numpy as np
r = np.random.rand(50) # Create random numbers
i = np.argwhere(r>0.1)
index_of_last = i.max()
Solution 13:[13]
here is a speedtest of three ways. Next() is not the fastest way.
from timeit import default_timer as timer
# Is set irreflexive?
def a():
return frozenset((x3, x3) for x3 in set([x1[x2] for x2 in range(2) for x1 in value]) if (x3, x3) in value) == frozenset()
def b():
return next((False for x1 in value if (x1[0], x1[0]) in value or (x1[1], x1[1]) in value), True)
def c():
for x1 in value:
if (x1[0], x1[0]) in value or (x1[1], x1[1]) in value:
return False
return True
times = 1000000
value = frozenset({(1, 3), (2, 1)})
start_time = timer()
for x in range(times):
a()
print("a(): Calculation ended after " + str(round((timer() - start_time) * 1000) / 1000.0) + " sec")
start_time = timer()
for x in range(times):
b()
print("b(): Calculation ended after " + str(round((timer() - start_time) * 1000) / 1000.0) + " sec")
start_time = timer()
for x in range(times):
c()
print("c(): Calculation ended after " + str(round((timer() - start_time) * 1000) / 1000.0) + " sec")
Results to:
Calculation ended after 1.365 sec
Calculation ended after 0.685 sec
Calculation ended after 0.493 sec
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
