'Python - fix one file according to another

I'm just starting with python and i'm having trouble to solve this.

My problem is I have 2 files: some_video.txt and some_video.srt. srt files were generated using autosub and they have a bad translation. So, I thought that I could compare with my some_video.txt and correct subtitles

What I want to do is to replace srt text with the equivalent txt text

This is txt format:

enter image description here

And srt format:

enter image description here

First, I tried to compare files, considering the equivalent texts are not in the same line I couldn't figure out a way to do this properly

Then, I tried to merge files with their index, remove lines from srt that I don't need, and then replace the text, but it did not work

this is my last attempt from many

with open('./inputs/nao-sei-quantas-almas-pessoa.txt') as f:
     for i, line in enumerate(f):
         #print(f'\nIndice: {i} \nLine: {line}') 
         l = ['txt', i, line]
         merged.append(l)
with open('nao-sei-quantas-almas-pessoa.srt') as f:
     for i, line in enumerate(f):
         #print(f'\nIndice: {i} \nTipo: {type(line)} \nLine: {line}') 
         print(f'\nLine: {line} - {line.isalpha()}') 
         l = ['srt', i, line]
         merged.append(l)


Solution 1:[1]

Since there are numbers in the srt file, then assuming there is one line of text in the srt file per timestamp, you can replace each line of text which is two lines after a number with the correct line like so:

srtpath = 'nao-sei-quantas-almas-pessoa.srt'
txtpath = './inputs/nao-sei-quantas-almas-pessoa.txt'

with open(txtpath, 'r') as file:
   correct_text = file.readlines() # Get the correct subtitles from the text file
   
with open(srtpath, 'r+') as file:
   srt_lines = file.readlines() # Read what the srt file currently contains
   for i in range(1, len(correct_text)): # Loop over each line from the correct text
       if str(i)+'\n' in srt_lines:           # If there is a corresponding number in the srt file (1, 2, 3)
           srt_lines[srt_lines.index(str(i)+'\n')+2] = correct_text[i] # Then replace the srt line with the correct line
       else:                             # When there are no more numbers in the file, then stop
           break
   file.seek(0)
   file.writelines(srt_lines) # Write the correct srt lines back to the file

Solution 2:[2]

with open('1.txt','r') as srt_file:
    srt=srt_file.readlines()
with open('2.txt','r') as text_file:
    text=text_file.readlines()
a=[]
for i in (srt):
    b=i.replace("\n",'')
    if "-->" not in i and "\n"!=i and b.isnumeric()==False:
        a.append(i)
srt_str="".join(srt)
replace_str=list(zip(a,text))

for i,j in list(replace_str):
    srt_str=srt_str.replace(i,j,1)
print(srt_str)

Explantion:

  • Read 2 files.
  • Checked if the lines do not only numbers, \n and "-->"
  • ziped them and the text to be changed using zip
  • Replaced the strings.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ukulele
Solution 2 Faraaz Kurawle