'Reading files faster in python
I'm writting a script to read a TXT file where each line is a Log entry and I need to separate this log in different files (for all Hor, Sia, Lmu). I'm reading each line and dividing in new files with no problem when using my test file (80kb), but when I try to apply to the actual file (177MB - around 500k lines) it takes too long. Took more than an hour and it was still at 80K lines read.
The lines are like this:
Crm|Hor|SiebelSeed
Crm|Sia|SiebelSeed
Crm|Lmu|LMU|
Is there anyway I can make it run faster?
My code
with open(path, "r", encoding="UTF-16") as file:
for i, line in enumerate(file):
if i > 2: # lines 1-2 are headers
component = re.match(r"Crm\|([A-Za-z0-9_]+)|]", line).group(1)
if component not in comp_list:
comp_list.append(component)
with open(f'HHR_Splitter/output/{component}.txt','w+', encoding="UTF-16") as new_file:
new_file.write('{}'.format(line))
if component in comp_list:
with open(f'HHR_Splitter/output/{component}.txt','a+', encoding="UTF-16") as existing_file:
existing_file.write('{}'.format(line))
else:
break
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
