Subtitling and the complexity of distributing two simple rows of text


Playing out subtitles for video, which often consist of two simple rows of text, can be a complex job. And distributing it is even more so. Subtitling (or captioning) is an ancillary data service in video distribution that ranges in importance from being only a legal requirement for hard-of-hearing to being key to having a business case as a broadcaster, as with translated subtitling. What can make it complex are the workflows and distribution formats.

When playing out linear channels early binding means ingesting subtitles into the video material in advance. This is mostly used for hard of hearing and VOD. The most common standards, EBU Teletext and EIA-608, are old and have many limitations like number of characters per row, available character sets, formatting etc. Their main advantage is compatibility. Modern standards are coming into use, like IMSC, which removes many limitations, for example by using Unicode.

For translated subtitling, for example where a broadcaster in a non-English speaking country transmits English speaking material, the time for translation often makes it better to use late binding playout, where subtitles are ingested at transmission point, and synchronized to timecode. This allows later delivery of subtitle files and flexibility to change the transmission without re-ingesting material. But it requires integration between the subtitling playout system and automation, video playout and the transmission systems, where subtitles can be inserted into the picture (burn-in), as VBI or VANC data or as DVB bitmap subtitles.

As channels are distributed, the format often changes; from SDI to DVB to streaming, etc. This means the subtitle format also changes, leading to translation between code pages, resolutions, timing and more. Some formats, like DVB Subtitling and SCTE-27, are image based and therefore need to be OCR’ed (Optical Character Recognition) to convert to a text based format.

Automated subtitling, speech to text using voice recognition, and automated translations are the holy grails of subtitling. The quality of text to speech is continuously improving. The low cost compared to humans typing means increased use, especially where it was previously prohibited by cost. Automated translation is the next target. Its use will depend heavily on quality requirements, starting where quality is secondary. But trusting automated text to speech and translation for watching the Game Of Thrones finale on HBO is still many years away.


Peter Sjöström, Head of R&D Subtitling, Edgeware

Peter was formerly Head of R&D at Cavena, which was recently acquired by Edgeware


Read more: Solution brief TV Subtitles Cavena/Edgeware


Fill out the form below and we will get in touch with you.