Removing formatting from srt files

This is a wiki page. Be bold and improve it!

If you have any questions about the content on this page, don't hesitate to open a new ticket and we'll do our best to assist you.

This page will describe how to remove formatting, time codes, closed caption from a .srt subtitle file.

subtitleeditor

subtitleeditor has an option to export as plain text. Simply open the .srt file with subtitleeditor then go to File >> Export >> Export Plain Text.

Pros:
It's very easy!

Cons:

* It does not strip tags like <i>, etc.

* Dialogues written on multiple lines within the same time code are kept on separate lines. (In some instances, this can be considered a pro).

* There is no space left between lines of dialogues in consecutive time codes.

E.g.

15
00:01:37,460 --> 00:01:41,190
Keep going
till you <i>smell money</i>
or step in chocolate.

16                                                                                                                                                                         
00:01:42,800 --> 00:01:45,230
Okay. Thank you.

will be output as:

Keep going
till you <i>smell money</i>
or step in chocolate.
Okay. Thank you.

Equivalent CLI command with sed

Note that the following one-line sed command achieves almost exactly as the above, the only difference being that a blank line is left between dialogues from consecutive time codes:

sed -r '/^[0-9]+$/{N;d}' subtitles.srt > dialogue.txt

outputs:

Keep going
till you <i>smell money</i>
or step in chocolate.

Okay. Thank you.

Custom script

Issues related to this page:

ProjectSummaryStatusPriorityCategoryLast updatedAssigned to
Linux softwarescript to remove time codes from srt fileactivenormalfeature request8 years 43 weeks
ProgrammingRegex: what does {N;d} mean?activenormalsupport request8 years 43 weeks