'How to make an SRT file into a dataset?
Is it possible to turn an SRT file, which is used for subtitles in videos into a dataset?
When imported into Excel, the SRT file format looks like this:
1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT
...
This pattern continues as time in the "video"/transcript goes on. I'd like to format the SRT file this way:
number ; start ; end ; text
1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT
Solution 1:[1]
The VBA procedure below loads a standard .srt (SubRip Movie Subtitle File) from a local file and splits it into rows/columns on the active Excel worksheet.
Import SRT subtitles from Local File:
Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, x As Long
'load file
Open fName For Input As #1
While Not EOF(1)
Line Input #1, sIn
sOut = sOut & sIn & vbLf
Wend
Close #1
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For x = 1 To UBound(sArr)
Range("A" & x) = sArr(x)
Next x
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName
End Sub
Example Usage:
Sub test_FileImport()
importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub
Import SRT subtitles from Website URL:
Alternatively, you can import an .srt (or other similar text files) from a Website URL such as https://subtitle-index.org/ with this:
Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, rw As Long
Dim httpData() As Byte, XMLHTTP As Object
'load file from URL
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
httpData = XMLHTTP.responseBody
Set XMLHTTP = Nothing
sOut = StrConv(httpData, vbUnicode)
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url
End Sub
Example Usage:
Sub testImport()
importSRTfromWeb _
"https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub
Many sites host free .srt's; you may have to right-click the download button to copy the link (which may have an .srt extension or might be a pointer, like the example above). The procedure won't work on .zip'd files.
More Information:
Wikipedia : SubRip & SRT
MSDN : Split Function (VBA)
Wikipedia : Newline characters
MSDN : UBound Function
Solution 2:[2]
in the above code :
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
should be replaced with:
'breakout into rows
For rw = 0 To UBound(sArr)
Range("A" & rw+1) = sArr(rw)
Next rw
else the output will start from line 2
Solution 3:[3]
I used Vim and wrote a quick regex to convert a .srt into a .csv file for a translator friend who needed a similar conversion. The csv file can then be opened in Excel / LibreOffice and saved as .xls, .ods or whatever. My friend didn't need the subtitle numbers to appear in the first column so the regex code looks like this :
set fileencoding=utf-8
%s/"/""/g
g/^\d\+$/d
%s@^\(.*\) --> \(.*\)\n@"\1","\2","@g
%s/\n^$/"/g
Variant to keep the sub numbering :
set fileencoding=utf-8
%s/"/""/g
%s@\(^\d\+\)$\n^\(.*\) --> \(.*\)\n@"\1","\2","\3","@g
%s/\n^$/"/g
Save this code into a text file with the .vim extension, then source this file when editing your .srt in Vim / Gvim. Save the result as a .csv. Enjoy the magic of Regexes !
NB : my code uses commas as field separators. Change the commas into semi-colons in the above code to use semi-colons. I've also added double-quotes as string delimitors in case double-quotes and commas occur in the subtitle text. Much more error proof !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | vielkind |
| Solution 3 |
