'Filtering tuples within a range from a reference list
I have a reference list of tuples containing different range of values.
[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690),
(667, 690), (667, 690), (479, 508), (1112, 1578)]
I have the following list of lists containing tuple of values which has to be compared against the reference list.
[ [(450,470)],
[(100, 200), (500, 700)],
[(0, 29), (3827, 3856)],
[(820, 835), (1539, 1554)],
[(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
[(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049),
(2158, 2163), (2594, 2599), (2643, 2648)] ]
I am trying to pick one tuple from each list which is in the range of tuples present in the reference list.
The conditions I considered are :
If the input list contains no tuple which has a value in the range of reference list, then any tuple can be taken. For example
[(0, 29), (3827, 3856)]is not in the range of the reference list so, I can take any of the tuple. By default I append the first tuple in the list to the reference list.If a tuple within the range of the reference list is found, then that tuple is appended to the reference list and stops searching in that loop. Example is
[(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)]If more than one tuple is also present in the range of reference list, then the first found tuple is appended to the reference list. Example is
[(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), (2594, 2599), (2643, 2648)]Values in a tuple will never be same and second value in a tuple will always be larger than the first value.
The logic I used to find the range is I took the minimum value and maximum value in the first position of tuple of the reference list. The I did simple iteration.
Code I used is
tag_pos_refin = [(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690),
(667, 690), (667, 690), (479, 508), (1112, 1578)]
tag_pos_db = [ [(450,470)],
[(100, 200), (500, 700)],
[(0, 29), (3827, 3856)],
[(820, 835), (1539, 1554)],
[(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
[(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163),
(2594, 2599), (2643, 2648)]
]
min_threshold = min(tag_pos_refin)[0]
max_threshold = max(tag_pos_refin)[0]
for tag_pos in tag_pos_db:
if len(tag_pos) == 1:
tag_pos_refin.extend(tag_pos)
for tag_pos in tag_pos_db:
if len(tag_pos) > 1:
for j in tag_pos:
if j[0] in range(min_threshold, max_threshold):
tag_pos_refin.append(j)
break
elif min(tag_pos)[0] not in range(min_threshold, max_threshold):
tag_pos_refin.append(j)
break
print(tag_pos_refin)
Output Obtained
[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (100, 200), (0, 29), (820, 835), (622, 635), (637, 642)]
Desired Output
[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (500, 700), (0, 29), (820, 835), (622, 635), (637, 642)]
My doubt is
Is it possible to write the code in a better way or better logic for finding the range so that instead of (100,200), the best tuple is (500,700).
(Use case of this is bit complicated to explain: But the values of a tuple can be considered as index point of words or sentences in a text)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
