'Efficient solution of dataframe range filtering based on another ranges

I tried the following code to find the range of a dataframe not within the range of another dataframe. However, it takes more than a day to compute the large files because, in the last 2 for-loops, it's comparing each row. Each of my 24 dataframes has around 10^8 rows. Is there any efficient alternative to the following approach?

Please refer to this thread for a better understanding of my I/O: Return the range of a dataframe not within a range of another dataframe

My approach:
I created the tuple pairs from the (df1['first.start'], df1['first.end']) and (df2['first.start'], df2['first.end']) initially in order to apply the range() function. After that, I put a condition whether df1_ranges are in the ranges of df2_ranges or not. Here the edge case was df1['first.start'] = df1['first.end']. I collected the filtered indices from the iterations and then passed into the df1.

df2_lst=[]
for i,j in zip(temp_df2['first.start'], temp_df2['first.end']):
    df2_lst.append(i)
    df2_lst.append(j)
df1_lst=[]
for i,j in zip(df1['first.start'], df1['first.end']):
    df1_lst.append(i)
    df1_lst.append(j)

def range_subset(range1, range2):
    """Whether range1 is a subset of range2."""
    if not range1:
        return True  # empty range is a subset of anything
    if not range2:
        return False  # non-empty range can't be a subset of empty range
    if len(range1) > 1 and range1.step % range2.step:
        return False  # must have a single value or integer multiple step
    return range1.start in range2 and range1[-1] in range2

##### FUNCTION FOR CREATING CHUNKS OF LISTS ####
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i],lst[i+1]

df1_lst2 = list(chunks(df1_lst,2))
df2_lst2 = list(chunks(df2_lst,2))
indices=[]
for idx,i in enumerate(df1_lst2): #main list
    x,y = i
    for j in df2_lst2: #filter list
        m,n = j
        if((x!=y) & (range_subset(range(x,y), range(m,n)))): #checking if the main list exists in the filter range or not
            indices.append(idx) #collecting the filtered indices

df1.iloc[indices]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Efficient solution of dataframe range filtering based on another ranges

Sources

Related Questions