'Chunking the list for efficient logical comparison

I have the following pieces of code that I want to optimize. It outputs the correct result fast, however, only for the list with a max of 10^5 instances. But I have a list containing 2*10^8 which takes an enormous amount of time in case of compiling over 24 similar kinds of lists. Could anyone help by coming up with an efficient solution that optimizes the performance without changing the desired output?

m = df2['first.start'].tolist()
n = df2['first.end'].tolist()

# these following lists will get changed
c = df3['first.seqnames'].tolist()
temp_c = df3['first.seqnames'].tolist()
c2 = df3['second.seqnames'].tolist()
temp_c2 = df3['second.seqnames'].tolist()
x = df3['first.start'].tolist()
y = df3['first.end'].tolist()
a = df3['second.start'].tolist()
b = df3['second.end'].tolist()

for idx1,i in enumerate(x): # working with the first start and end only rn
    for idx2,j in enumerate(m): # [m,n] -> df2[start,end] ##### [x,y] -> df1[start,end] ### [a,b] -> df1[start2,end2] 
        if (m[idx2]<=x[idx1]): 
            if (x[idx1]<=n[idx2]): 
                #(start, end) = (n+1,y)
                temp = x[idx1]
                x[idx1] = n[idx2]+1
                a[idx1] = a[idx1] + (x[idx1]-temp)
            else:
                continue
        else:
            if(y[idx1]>=n[idx2]):
                #(start, end) = (x,m-1)
                #(start, end) = (n-1,y) 
                temp1 = x[idx1]
                temp2 = y[idx1]
                temp3 = b[idx1]
                y[idx1] = m[idx2] - 1 
                x.insert(idx1+1, n[idx2]-1)
                y.insert(idx1+1, temp2)
                b[idx1] = a[idx1] + (y[idx1]-x[idx1])
                a.insert(idx1+1, temp3-(y[idx1+1]-x[idx1+1]))
                b.insert(idx1+1, temp3)

                temp_c.insert(idx1+1, temp_c[idx1])
                temp_c2.insert(idx1+1, temp_c2[idx1])

            elif (y[idx1]>=m[idx2]):
                #(start, end) = (x,m-1)
                y[idx1] = m[idx2]-1
                b[idx1] = a[idx1] + (y[idx1]-x[idx1])
            else:
                continue

The df3 dataframe looks like this:

    first.seqnames  first.start first.end   first.width first.strand    second.seqnames second.start    second.end  second.width    second.strand
0       chr1    11462       11468       7   *   chr1    10882   10888           7   *
1       chr1    11470       11471       2   *   chr1    10890   10891           2   *
2       chr1    11473       11484       12  *   chr1    10893   10904           12  *
3       chr1    11676       11677       2   *   chr1    11096   11097           2   *
4       chr1    11782       11849       68  *   chr1    11202   11269           68  *
... ... ... ... ... ... ... ... ... ... ...
1929046 chr1    249235900   249235941   42  *   chr2B   131613429   131613470   42  *
1929047 chr1    249235943   249235949   7   *   chr2B   131613472   131613478   7   *
1929048 chr1    249236698   249236700   3   *   chr2B   131614226   131614228   3   *
1929049 chr1    249236702   249236708   7   *   chr2B   131614230   131614236   7   *
1929050 chr1    249237320   249237335   16  *   chr2B   131614842   131614857   16  *

The df2 looks like:

    first.seqnames  first.start first.end   3   4   5
3503    chr1    346213      346984      .   0   .
3504    chr1    3135466     3136202     .   0   .
3505    chr1    3190760     3191377     .   0   .
3506    chr1    3354604     3355258     .   0   .
3507    chr1    5388136     5388749     .   0   .
... ... ... ... ... ... ...
4530    chr1    245026995   245027904   .   0   .
4531    chr1    246492153   246492971   .   0   .
4532    chr1    246882492   246883154   .   0   .
4533    chr1    247887347   247888175   .   0   .
4534    chr1    249151889   249152623   .   0   .

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Chunking the list for efficient logical comparison

Sources

Related Questions