'Filtering tuples within a range from a reference list

I have a reference list of tuples containing different range of values.

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), 
 (667, 690), (667, 690), (479, 508), (1112, 1578)]

I have the following list of lists containing tuple of values which has to be compared against the reference list.

[  [(450,470)],
   [(100, 200), (500, 700)],
   [(0, 29), (3827, 3856)],
   [(820, 835), (1539, 1554)],
   [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
   [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049),
    (2158, 2163), (2594, 2599), (2643, 2648)]  ]

I am trying to pick one tuple from each list which is in the range of tuples present in the reference list.

The conditions I considered are :

  1. If the input list contains no tuple which has a value in the range of reference list, then any tuple can be taken. For example [(0, 29), (3827, 3856)] is not in the range of the reference list so, I can take any of the tuple. By default I append the first tuple in the list to the reference list.

  2. If a tuple within the range of the reference list is found, then that tuple is appended to the reference list and stops searching in that loop. Example is [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)]

  3. If more than one tuple is also present in the range of reference list, then the first found tuple is appended to the reference list. Example is [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), (2594, 2599), (2643, 2648)]

  4. Values in a tuple will never be same and second value in a tuple will always be larger than the first value.

The logic I used to find the range is I took the minimum value and maximum value in the first position of tuple of the reference list. The I did simple iteration.

Code I used is

tag_pos_refin = [(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), 
                 (667, 690), (667, 690), (479, 508), (1112, 1578)]

tag_pos_db = [  [(450,470)],
                [(100, 200), (500, 700)],
                [(0, 29), (3827, 3856)],
                [(820, 835), (1539, 1554)],
                [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
                [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), 
                  (2594, 2599), (2643, 2648)]
            ]


min_threshold = min(tag_pos_refin)[0]
max_threshold = max(tag_pos_refin)[0]

for tag_pos in tag_pos_db:
    if len(tag_pos) == 1:
        tag_pos_refin.extend(tag_pos)

for tag_pos in tag_pos_db:
    if len(tag_pos) > 1:
        for j in tag_pos:
            if j[0] in range(min_threshold, max_threshold):
                tag_pos_refin.append(j)
                break
            elif min(tag_pos)[0] not in range(min_threshold, max_threshold):
                tag_pos_refin.append(j)
                break             

print(tag_pos_refin)

Output Obtained

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (100, 200), (0, 29), (820, 835), (622, 635), (637, 642)]

Desired Output

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (500, 700), (0, 29), (820, 835), (622, 635), (637, 642)]

My doubt is

Is it possible to write the code in a better way or better logic for finding the range so that instead of (100,200), the best tuple is (500,700).

(Use case of this is bit complicated to explain: But the values of a tuple can be considered as index point of words or sentences in a text)



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source