Thursday, April 14, 2011

Convert subtitles: ASS to SRT

Illustration based on HotDW


Let's dive back into my big pool of inconsistency. This time, I will serve you a simple (g)awk-script which will easilys convert subtitles of the ASS-format to the SRT-format.



Recently, I came across some "raw" material: a movie in a language I can't understand, but without subtitles. Online I found some subtitles in the ASS-format, but their timing is incorrect. As I don't find the format easy to manipulate, I needed a way to convert them to the SRT-format.

If you have Windows, you could install grep and gawk, and you already come a long way (*):

grep Dialogue | \
awk -F, 'BEGIN { TELLER="1"; }
         { printf("%s\n0%s,%s0 --> 0%s,%s0\n",
                       TELLER,substr($2,0,7),substr($2,9,2),substr($3,0,7),substr($3,9,2));
           TELLER++;}
         { DIALOGUE=$10;
           for(i=11;i<=NF;i++)
           {
               DIALOGUE=DIALOGUE","$i
           };
           printf ("%s\n\n",DIALOGUE)}'

First it will only take the lines that contain "Dialogue" and will then put them in the SRT-format. If you would have a subtitle file named "iam-an.ass" containing the following:
[Script Info]
; Script generated by Aegisub
; http://www.aegisub.net
Title: Neon Genesis Evangelion - Episode 26 (neutral Spanish)
Original Script: RoRo
Script Updated By: version 2.8.01
ScriptType: v4.00+
Collisions: Normal
PlayResY: 600
PlayDepth: 0
Timer: 100,0000
Video Aspect Ratio: 0
Video Zoom: 6
Video Position: 0
 
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, 
ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: DefaultVCD, Arial,28,&H00B4FCFC,&H00B4FCFC,&H00000008,&H80000008,-1,0,0,0,100,100,0.00,0.00,1,1.00,2.00,2,30,30,30,0
  
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.18,0:00:06.85,DefaultVCD, NTP,0000,0000,0000,,Like an Angel with pity on nobody
Dialogue: 0,0:00:07.00,0:00:09.10,DefaultVCD, NTP,0000,0000,0000,,Nobody nobody nobody
(based on the example from Wikipedia)

Open a terminal and run the following command:
./ass-to-srt.sh < iam-an.ass >> iam-an.srt

with "ass-to-srt.sh" being the script mentioned above, then the subtitles will be written in "iam-an.srt". That new subtitle file will look like:
1
00:00:01,180 --> 00:00:06,850
Like an Angel with pity on nobody

2
00:00:07,000 --> 00:00:09,100
Nobody nobody nobody

That will do. Now I can use SubtitleEditor to easily adjust the timing of the SRT-subtitles.

More:

---
(*) If you want the full flavour, try Cygwin or go for a complete Linux distro, but that seems a bit overkill for this simple script.

4 comments:

  1. I would add an additional grep for removing the blocks between brackets.

    The new code would start like:
    grep Dialogue | grep -Evo '{(.*)}' \
    awk -F, .....

    Regards

    ReplyDelete
    Replies
    1. Sorry, this is the right one:

      grep Dialogue | grep -Evo '{(.*)}' | \
      awk -F, .....

      Delete
    2. Great, I will check it out :-)

      Thank you very much!

      Delete

Related Posts Plugin for WordPress, Blogger...