'Python - fix one file according to another
I'm just starting with python and i'm having trouble to solve this.
My problem is I have 2 files: some_video.txt and some_video.srt. srt files were generated using autosub and they have a bad translation. So, I thought that I could compare with my some_video.txt and correct subtitles
What I want to do is to replace srt text with the equivalent txt text
This is txt format:
And srt format:
First, I tried to compare files, considering the equivalent texts are not in the same line I couldn't figure out a way to do this properly
Then, I tried to merge files with their index, remove lines from srt that I don't need, and then replace the text, but it did not work
this is my last attempt from many
with open('./inputs/nao-sei-quantas-almas-pessoa.txt') as f:
for i, line in enumerate(f):
#print(f'\nIndice: {i} \nLine: {line}')
l = ['txt', i, line]
merged.append(l)
with open('nao-sei-quantas-almas-pessoa.srt') as f:
for i, line in enumerate(f):
#print(f'\nIndice: {i} \nTipo: {type(line)} \nLine: {line}')
print(f'\nLine: {line} - {line.isalpha()}')
l = ['srt', i, line]
merged.append(l)
Solution 1:[1]
Since there are numbers in the srt file, then assuming there is one line of text in the srt file per timestamp, you can replace each line of text which is two lines after a number with the correct line like so:
srtpath = 'nao-sei-quantas-almas-pessoa.srt'
txtpath = './inputs/nao-sei-quantas-almas-pessoa.txt'
with open(txtpath, 'r') as file:
correct_text = file.readlines() # Get the correct subtitles from the text file
with open(srtpath, 'r+') as file:
srt_lines = file.readlines() # Read what the srt file currently contains
for i in range(1, len(correct_text)): # Loop over each line from the correct text
if str(i)+'\n' in srt_lines: # If there is a corresponding number in the srt file (1, 2, 3)
srt_lines[srt_lines.index(str(i)+'\n')+2] = correct_text[i] # Then replace the srt line with the correct line
else: # When there are no more numbers in the file, then stop
break
file.seek(0)
file.writelines(srt_lines) # Write the correct srt lines back to the file
Solution 2:[2]
with open('1.txt','r') as srt_file:
srt=srt_file.readlines()
with open('2.txt','r') as text_file:
text=text_file.readlines()
a=[]
for i in (srt):
b=i.replace("\n",'')
if "-->" not in i and "\n"!=i and b.isnumeric()==False:
a.append(i)
srt_str="".join(srt)
replace_str=list(zip(a,text))
for i,j in list(replace_str):
srt_str=srt_str.replace(i,j,1)
print(srt_str)
Explantion:
- Read 2 files.
- Checked if the lines do not only numbers,
\nand "-->" - ziped them and the text to be changed using
zip - Replaced the strings.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ukulele |
| Solution 2 | Faraaz Kurawle |


