Using ASR Tools to Produce Automatic Subtitles for TV Broadcasting. A Cross-Linguistic Comparative Analysis

IRIS

The proposed paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce interlingual subtitles for broadcasting purposes. Two different ASR tools (Broadstream and Amberscript) were trialled by a UK broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian. More specifically, the two tools were used to produce automatic intralingual subtitles for a British talk show and for a US feature film dubbed into Italian. Both tools produced a timecoded transcript and an evaluation study was commissioned to compare the performance of the two tools on the English and on the Italian materials. Our evaluation focused on two key dimensions of quality: the accuracy of the transcript and the readability of the subtitles in relation to the needs of a potential audience. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models, originally developed to assess accuracy in live subtitling (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017). Our adapted version focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing subtitle segmentation, namely both line breaks and subtitle breaks. Our findings indicate that in terms of accuracy Broadstream outperformed Amberscript in English, but Amberscript delivered a more accurate output in Italian. However, all the ASR outputs that were analysed fell short of the NER 98% accuracy threshold expected for live subtitling in the broadcasting industry (and, arguably, the accuracy expected of pre-recorded subtitles is actually even higher, because the subtitles are not produced under such high time pressure). As regards readability, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages, thus impacting overall quality further and requiring extensive human post-editing. The combined evaluation of both accuracy and readability has provided insights into the strengths and weaknesses of each tool for the two languages in question and in relation to the TV genres considered. To sum up, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step in the subtitling process. Our in-depth analysis has clearly shown that in order to produce broadcast-ready subtitles, substantial human input is required both before the tools can be put to work on the materials (customisation and selection of appropriate settings) and after the ASR has generated the subtitles (human editing).

Using ASR Tools to Produce Automatic Subtitles for TV Broadcasting. A Cross-Linguistic Comparative Analysis

Elena Davitti;Annalisa Sandrelli;Tomasz Korybski;Yuan Zou;Constantin Orasan;Sabine Braun

2024-01-01

Abstract

The proposed paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce interlingual subtitles for broadcasting purposes. Two different ASR tools (Broadstream and Amberscript) were trialled by a UK broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian. More specifically, the two tools were used to produce automatic intralingual subtitles for a British talk show and for a US feature film dubbed into Italian. Both tools produced a timecoded transcript and an evaluation study was commissioned to compare the performance of the two tools on the English and on the Italian materials. Our evaluation focused on two key dimensions of quality: the accuracy of the transcript and the readability of the subtitles in relation to the needs of a potential audience. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models, originally developed to assess accuracy in live subtitling (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017). Our adapted version focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing subtitle segmentation, namely both line breaks and subtitle breaks. Our findings indicate that in terms of accuracy Broadstream outperformed Amberscript in English, but Amberscript delivered a more accurate output in Italian. However, all the ASR outputs that were analysed fell short of the NER 98% accuracy threshold expected for live subtitling in the broadcasting industry (and, arguably, the accuracy expected of pre-recorded subtitles is actually even higher, because the subtitles are not produced under such high time pressure). As regards readability, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages, thus impacting overall quality further and requiring extensive human post-editing. The combined evaluation of both accuracy and readability has provided insights into the strengths and weaknesses of each tool for the two languages in question and in relation to the TV genres considered. To sum up, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step in the subtitling process. Our in-depth analysis has clearly shown that in order to produce broadcast-ready subtitles, substantial human input is required both before the tools can be put to work on the materials (customisation and selection of appropriate settings) and after the ASR has generated the subtitles (human editing).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				automatic speech recognition (ASR), automatic subtitles, prerecorded content, accuracy, readability
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14090/5521

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

social impact