'Reading files faster in python

I'm writting a script to read a TXT file where each line is a Log entry and I need to separate this log in different files (for all Hor, Sia, Lmu). I'm reading each line and dividing in new files with no problem when using my test file (80kb), but when I try to apply to the actual file (177MB - around 500k lines) it takes too long. Took more than an hour and it was still at 80K lines read.

The lines are like this:

Crm|Hor|SiebelSeed

Crm|Sia|SiebelSeed

Crm|Lmu|LMU|

Is there anyway I can make it run faster?

My code

with open(path, "r", encoding="UTF-16") as file:
    for i, line in enumerate(file): 
    
            if i > 2: # lines 1-2 are headers
                component = re.match(r"Crm\|([A-Za-z0-9_]+)|]", line).group(1)
                
                if component not in comp_list:
                    comp_list.append(component)
                    
                    with open(f'HHR_Splitter/output/{component}.txt','w+', encoding="UTF-16") as new_file:
                        new_file.write('{}'.format(line))
                        
                        
                if component in comp_list:
                    
                    with open(f'HHR_Splitter/output/{component}.txt','a+', encoding="UTF-16") as existing_file: 
                        existing_file.write('{}'.format(line))

                else:
                    break

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Reading files faster in python

Sources

Related Questions