'Is there a way to simplify my gff file? Converting gff file to MCSCANX input

I got the following gff file:

> sl1   FUN_000001  15679   15897 sl1   FUN_000001  15952   17031
> sl1   FUN_000001  17086   17316 sl1   FUN_000001  17371   17454
> sl1   FUN_000001  17508   17702 sl1   FUN_000001  15679   15897
> sl1   FUN_000001  15952   17031 sl1   FUN_000001  17086   17316
> sl1   FUN_000001  17371   17454 sl1   FUN_000001  17508   17702
> sl1   FUN_000002  26991   27390 sl1   FUN_000002  26991   27390
> sl1   FUN_000002  26991   27051 sl1   FUN_000002  27104   27390
> sl1   FUN_000002  26991   27051 sl1   FUN_000002  27104   27390
> sl1   FUN_000003  31856   32689 sl1   FUN_000003  31856   32689
> sl1   FUN_000003  32432   32689 sl1   FUN_000003  31856   32365
> sl1   FUN_000003  32432   32689 sl1   FUN_000003  31856   32365
> sl1   FUN_000004  34247   35148 sl1   FUN_000004  34247   35148
> sl1   FUN_000004  34856   35148 sl1   FUN_000004  34247   34802
> sl1   FUN_000004  34856   35148 sl1   FUN_000004  34247   34802
> sl1   FUN_000005  38975   39306 sl1   FUN_000005  38975   39306
> sl1   FUN_000005  38975   39001 sl1   FUN_000005  39064   39306
> sl1   FUN_000005  38975   39001 sl1   FUN_000005  39064   39306

I need to get only one gene (FUN_*****) with the minor lenght and the major lenght. For example, for gene FUN_000001:

sl1 FUN_000001  15679   15897
sl1 FUN_000001  15952   17031
sl1 FUN_000001  17086   17316
sl1 FUN_000001  17371   17454
sl1 FUN_000001  17508   17702
sl1 FUN_000001  15679   15897
sl1 FUN_000001  15952   17031
sl1 FUN_000001  17086   17316
sl1 FUN_000001  17371   17454
sl1 FUN_000001  17508   17702

my output must be :

sl1 FUN_000001  15679   17702

Itried to use the drop_duplicates in python but only permits to get the first or the last row.

Could anyone help me?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source