'Find numbers in Python list which are within a certain distance of each other

I have a list of numbers in Python. It looks like this:

a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]

I want to keep all the numbers which are within + or - 7 of each other and discard the rest. Is there a simple way to do this in Python? I have read about the list filter method but am not sure how to get what I want working with it.

I am new to Python.

Update

Output would ideally be [84, 86, 87, 89, 90] and another list [997, 999, 1000, 1002]. I want to retrieve the sequences and not the outliers. Hope this makes sense.



Solution 1:[1]

This is algorithm problem, try this:

def get_blocks(values):
    mi, ma = 0, 0
    result = []
    temp = []
    for v in sorted(values):
        if not temp:
            mi = ma = v
            temp.append(v)
        else:
            if abs(v - mi) < 7 and abs(v - ma) < 7:
                temp.append(v)
                if v < mi:
                    mi = v
                elif v > ma:
                    ma = v
            else:
                if len(temp) > 1:
                    result.append(temp)
                mi = ma = v
                temp = [v]
    return result

a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
print get_blocks(a)

Output:

[[84, 86, 87, 89, 90], [997, 999, 1000, 1002]]

Solution 2:[2]

If your problems allows transitive relations, i.e. x is in the group as long as it's at most 7 away from any element in the group, then this seems to me like a graph theory problem. To be more specific, you need to find all connected components.

The problem itself is pretty easy to solve with a recursive algorithms. You would first create a dictionary in which every key would be one of the elements and every value would be a list of elements which are at most 7 apart from that element. For your example, you would have something like this:

for element in elements:
    connections[element] = []
    visited[element] = False
    for another in elements:
        if abs(element - another) <= limit:
            connections[element].append(another)

Which would give you something like this

{
    84: [86, 87, 89, 90],
    86: [84, 87, 89, 90],
    ...
    997: [999, 1000, 1002]
    ...
    2014: []
}

Now you need to write a recursive function which will take as input an element and a list, and it will keep adding elements in a list as long as it can find an element which is at most 7 apart from the current element.

def group_elements(element, group):
    if visited[element]:
        return
    visited[element] = True
    group.append(element)
    for another in connections[element]:
        group_elements(another, group)

Somewhere in the code you also need to remember which elements you have already visited to make sure that you don't get into an infinite loop.

visited = {}

You need to call that function for every element in your list.

groups = []
for element in elements:
    if not visited[element]:
        group = []
        group_elements(element, group)
        groups.append(group)
print group

This code should give the following output for your input:

[[87, 84, 86, 89, 90], [2014], [1000, 1002, 997, 999]]

Solution 3:[3]

For any problem like this my first port of call is the Python itertools module. The pairwise function from that link that I use in the code is available in the more-itertools module.

from more_itertools import pairwise
results = []
chunk = []
a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
a.sort()
for v1, v2 in pairwise(a):
    if v2 - v1 <= 7:
        chunk.append(v1)
    elif chunk:
        chunk.append(v1)
        results.append(chunk)
        chunk = []
print(results)
[[84, 86, 87, 89, 90], [997, 999, 1000, 1002]]

I haven't tested for edge cases, so it's buyer beware :)

Solution 4:[4]

a = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999]
temp=a[0]
result=[]
temp1=[]
counter =len(a)

for i in a:
    if i in range(temp-7,temp+7):
        temp1.append(i)
        if counter==1:
            result.append(temp1)
    else:
        if temp1:
            result.append(sorted(temp1))
        temp1=[]
        temp=i
    counter=counter-1
print result

Solution 5:[5]

Say, we have a list aa = [87, 84, 86, 89, 90, 2014, 1000, 1002, 997, 999].

aa = sorted(aa)
aa
[84,86,87,89,90,997,999,1000,1002,2014]

To define differences among neighbors, please use numpy's: ad = np.ediff1d(aa).

ad
array([2, 1, 2, 1, 907, 2, 1, 2, 1012])

To sort off indices of numbers beyond the range, use: np.where(ad < rng)[0] + 1, where rng = 7 is range within which the numbers are kept:

np.where(ad < 7)[0] + 1
array([1, 2, 3, 4, 6, 7, 8])

To split the array by indices into required subarrays: np.split(aa, np.where(ad > rng)[0] + 1). So, the function is:

def splitarr(aa,rng):
    ad = np.ediff1d(aa)
    return np.split(aa, np.where(ad > rng)[0] + 1)

splitarr(aa, 7)    
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002]), array([2014])]

There is no criteria in the question why [2014] is not acceptable in the preferable answer, is it due to it is the only list with a single element? Thus, the filter by Length is proposed:

np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]

where splarr = np.split(aa, np.where(ad > rng)[0] + 1). That returns indices of arrays of allowed length, where np.fromiter() coverts map to numpy and lim is the threshold for array length: lim indicates that only this length and above is allowable. So, the final function is:

def splitarr(aa,rng,lim):
    ad = np.ediff1d(aa)
    splarr = np.split(aa, np.where(ad > rng)[0] + 1)
    return [splarr[i] for i in np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]]

splitarr(aa, 7, 2)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002])]
splitarr(aa,7,1)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002]), array([2014])] 
splitarr(aa, 1, 1)
[array([84]), array([86, 87]), array([89, 90]), array([997]), array([ 999, 1000]), array([1002]), array([2014])]
splitarr(aa,1,2)
[array([86, 87]), array([89, 90]), array([ 999, 1000])]
splitarr(aa,2,2)
[array([84, 86, 87, 89, 90]), array([ 997,  999, 1000, 1002])]

To get lists instead of arrays just replace the last line in a function to:

return [splarr[i].tolist() for i in np.where(np.fromiter(map(len, splarr), dtype=int) >= lim)[0]]

Instead of a comment under the 'best answer': I have noticed it works only with the range 7, and is incorrect for any other range.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 fred.yu
Solution 2 hgazibara
Solution 3 Mark Lawrence
Solution 4 Abhishek Jain
Solution 5