Page 1 of 1

Text based subtitles

Posted: Wed May 11, 2022 5:30 pm
by baishen
Is there a way to have makemkv output text based subtitles in addition to the image based ones from the disc?

The image based ones cause Plex to transcode the video and chew up my processor, which means I can't really stream any 4k with subtitles.

Re: Text based subtitles

Posted: Wed May 11, 2022 10:29 pm
by dcoke22
I don't think so. MakeMKV is focused on making a 1:1 copy of the disc. That said, sometimes DVDs have closed captions (distinct from subtitles) and MakeMKV includes something called ccextractor which outputs text (I think). On most blu-rays, however, there's only PGS subtitles.

After making a .mkv there are ways to use different programs to extract those subtitles out of a .mkv file and convert them to a text based format like .srt.

Also, different Plex clients handle PGS subtitles differently. Depending on what your client is, you might be able to easily change it something that supports direct play with PGS subtitles.

I'm not really an expert in subtitles, so take everything I say with a grain of salt.

Re: Text based subtitles

Posted: Wed May 11, 2022 11:29 pm
by baishen
I've tried getting something to convert the image subtitles to text, but all of the guides I've seen so far have been for old, outdated software that doesn't work. I wouldn't mind a two step process. Just looking for a way to do it.

AFAIK, the Shield might display image subtitles without transcoding, but I'm not aware of any client outside of that and even then, I'm not positive.

Re: Text based subtitles

Posted: Thu May 12, 2022 12:12 am
by d00zah
Have you considered a resource like opensubtitles.org ? You can usually find text-based subs (both full & forced, often "corrected") in most languages. If you're looking to save time, this is definitely a 'path of least resistance'. This is a resource Emby (& probably other media servers) uses to download missing subtitles. There are other similar sites, as well.

NOTE: NO NEED to create an account to search for/download files.

Re: Text based subtitles

Posted: Thu May 12, 2022 11:53 am
by baishen
Plex has the functionality to automatically download those subtitles. I just would like to convert the existing image based ones instead of trying to figure out which random subtitle file online has the correct forced subs.

Re: Text based subtitles

Posted: Thu May 12, 2022 12:52 pm
by dcoke22
I run Plex. My main client is the current AppleTV 4K and it can display PGS subtitles without transcoding.

I rarely have occasion to try to turn PGS subtitles into .srt, but when I do, I generally use a combination of the MKVToolNix tools and a free online service for the OCR part.

mkvextract is the utility that can copy out the PGS subtitle track. I also use mkvmerge to learn the track numbers for the command.

For example, to get the track IDs, I use mkvmerge.

Code: Select all

% mkvmerge --identify ./MyMovie.mkv 
File './MyMovie.mkv': container: Matroska
Track ID 0: video (AVC/H.264/MPEG-4p10)
Track ID 1: audio (DTS-HD Master Audio)
Track ID 2: audio (DTS)
Track ID 3: subtitles (HDMV PGS)
Chapters: 10 entries
% 
Now I know the subtitle track is ID 3.
Next, I use mkvextract to copy the subtitles out of the file.

Code: Select all

% mkvextract ./MyMovie.mkv tracks 3:subtitles.sup
Extracting track 3 with the CodecID 'S_HDMV/PGS' to the file 'subtitles.sup'. Container format: SUP
Progress: 100%
% 
Now I've got a file called 'subtitles.sup' on my filesystem that is just the PGS Subtitles. It is ready to OCR in however you see fit. I use https://subtitletools.com/convert-sup-to-srt-online but there are other ways to do it I'm sure.

Re: Text based subtitles

Posted: Thu May 12, 2022 2:27 pm
by baishen
Thanks. I'll give that a try.

Re: Text based subtitles

Posted: Sat May 14, 2022 12:36 pm
by Chetwood
Subtitle Edit can open the sub from inside the mkv but it takes a while so you might as well extract it with MKV Cleaver or something. Its OCR uses Tesseract and works pretty well but you'll be asked about a lot of words not yet in the dictionary. I often use Subextractor instead when ripping tv shows which has a different approach for OCRing and is a lot faster with proper source material.