'Python script to remove certain things form a string

i have a file with many lines like this,

>6_KA-RFNB-1505/2021-EPI_ISL_8285588-2021-12-02

i need to convert it to

>6_KA_2021-1202

all of the lines that require this change start in a >

The 6_KA and the 2021-12-02 are different for all lines.

I also need to add an empty line before every line that i change in thsi manner.



Solution 1:[1]

UPDATE: You changed the requirements from when I originally answered yourpost, but the below does what you are looking for. The principle remains the same: use regex to identify the parts of the string you are looking to replace. And then as you are going thru each line of the file create a new string based on the values you parsed out from the regex

import re

regex = re.compile('>(?P<first>[0-9a-zA-Z]{1,3}_[0-9a-zA-Z]{1,3}).*(?P<year>[0-9]{4})-(?P<month>[0-9]{2})-(?P<day>[0-9]{2})\n')

def convert_file(inputFile):
    with open(inputFile, 'r') as input, open('Output.txt', 'w') as output:
        for line in input:
            text = regex.match(line)
            if text:
                output.write("\n" + text.group("first") + '_' + text.group("year") + "-" + text.group("month") + text.group("day") + "\n")
            else:
                output.write(line)

convert_file('data.txt')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1