'How to make an SRT file into a dataset?

Is it possible to turn an SRT file, which is used for subtitles in videos into a dataset?

When imported into Excel, the SRT file format looks like this:

1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT

...

This pattern continues as time in the "video"/transcript goes on. I'd like to format the SRT file this way:

number ; start ; end ; text

1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT


Solution 1:[1]

The VBA procedure below loads a standard .srt (SubRip Movie Subtitle File) from a local file and splits it into rows/columns on the active Excel worksheet.

Import SRT subtitles from Local File:

Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet

    Dim sIn As String, sOut As String, sArr() As String, x As Long

    'load file
    Open fName For Input As #1
        While Not EOF(1)
            Line Input #1, sIn
            sOut = sOut & sIn & vbLf
        Wend
    Close #1

    'convert LFs to delimiters & split into array
    sOut = Replace(sOut, vbLf & vbLf, vbCr)
    sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
    sArr = Split(sOut, vbCr)

    'check if activesheet is blank
    If ActiveSheet.UsedRange.Cells.Count > 1 Then
        If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
            "Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
            vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
        ActiveSheet.Cells.ClearContents
    End If

    'breakout into rows
    For x = 1 To UBound(sArr)
        Range("A" & x) = sArr(x)
    Next x

    'split into columns
    Columns("A:A").TextToColumns Destination:=Range("A1"), _
        DataType:=xlDelimited, Other:=True, OtherChar:="|"

    MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName

End Sub

Example Usage:

Sub test_FileImport()
    importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub

Import SRT subtitles from Website URL:

Alternatively, you can import an .srt (or other similar text files) from a Website URL such as https://subtitle-index.org/ with this:

Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet

    Dim sIn As String, sOut As String, sArr() As String, rw As Long
    Dim httpData() As Byte, XMLHTTP As Object

    'load file from URL
    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.send
    httpData = XMLHTTP.responseBody
    Set XMLHTTP = Nothing
    sOut = StrConv(httpData, vbUnicode)

    'convert LFs to delimiters & split into array
    sOut = Replace(sOut, vbLf & vbLf, vbCr)
    sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
    sArr = Split(sOut, vbCr)

    'check if activesheet is blank
    If ActiveSheet.UsedRange.Cells.Count > 1 Then
        If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
            "Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
            vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
        ActiveSheet.Cells.ClearContents
    End If

    'breakout into rows
    For rw = 1 To UBound(sArr)
        Range("A" & rw) = sArr(rw)
    Next rw

    'split into columns
    Columns("A:A").TextToColumns Destination:=Range("A1"), _
        DataType:=xlDelimited, Other:=True, OtherChar:="|"
    MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url

End Sub

Example Usage:

Sub testImport()
    importSRTfromWeb _
        "https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub

Many sites host free .srt's; you may have to right-click the download button to copy the link (which may have an .srt extension or might be a pointer, like the example above). The procedure won't work on .zip'd files.


More Information:

Solution 2:[2]

in the above code :

'breakout into rows For rw = 1 To UBound(sArr) Range("A" & rw) = sArr(rw) Next rw

should be replaced with:

'breakout into rows For rw = 0 To UBound(sArr) Range("A" & rw+1) = sArr(rw) Next rw

else the output will start from line 2

Solution 3:[3]

I used Vim and wrote a quick regex to convert a .srt into a .csv file for a translator friend who needed a similar conversion. The csv file can then be opened in Excel / LibreOffice and saved as .xls, .ods or whatever. My friend didn't need the subtitle numbers to appear in the first column so the regex code looks like this :

set fileencoding=utf-8
%s/"/""/g
g/^\d\+$/d
%s@^\(.*\) --> \(.*\)\n@"\1","\2","@g
%s/\n^$/"/g

Variant to keep the sub numbering :

set fileencoding=utf-8
%s/"/""/g
%s@\(^\d\+\)$\n^\(.*\) --> \(.*\)\n@"\1","\2","\3","@g
%s/\n^$/"/g

Save this code into a text file with the .vim extension, then source this file when editing your .srt in Vim / Gvim. Save the result as a .csv. Enjoy the magic of Regexes !

NB : my code uses commas as field separators. Change the commas into semi-colons in the above code to use semi-colons. I've also added double-quotes as string delimitors in case double-quotes and commas occur in the subtitle text. Much more error proof !

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 vielkind
Solution 3