The proposed paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce interlingual subtitles for broadcasting purposes. Two different ASR tools (Broadstream and Amberscript) were trialled by a UK broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian. More specifically, the two tools were used to produce automatic intralingual subtitles for a British talk show and for a US feature film dubbed into Italian. Both tools produced a timecoded transcript and an evaluation study was commissioned to compare the performance of the two tools on the English and on the Italian materials. Our evaluation focused on two key dimensions of quality: the accuracy of the transcript and the readability of the subtitles in relation to the needs of a potential audience. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models, originally developed to assess accuracy in live subtitling (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017). Our adapted version focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing subtitle segmentation, namely both line breaks and subtitle breaks. Our findings indicate that in terms of accuracy Broadstream outperformed Amberscript in English, but Amberscript delivered a more accurate output in Italian. However, all the ASR outputs that were analysed fell short of the NER 98% accuracy threshold expected for live subtitling in the broadcasting industry (and, arguably, the accuracy expected of pre-recorded subtitles is actually even higher, because the subtitles are not produced under such high time pressure). As regards readability, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages, thus impacting overall quality further and requiring extensive human post-editing. The combined evaluation of both accuracy and readability has provided insights into the strengths and weaknesses of each tool for the two languages in question and in relation to the TV genres considered. To sum up, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step in the subtitling process. Our in-depth analysis has clearly shown that in order to produce broadcast-ready subtitles, substantial human input is required both before the tools can be put to work on the materials (customisation and selection of appropriate settings) and after the ASR has generated the subtitles (human editing).
Using ASR tools to produce automatic subtitles for TV broadcasting: a cross-linguistic comparative analysis
Annalisa Sandrelli;
In corso di stampa
Abstract
The proposed paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce interlingual subtitles for broadcasting purposes. Two different ASR tools (Broadstream and Amberscript) were trialled by a UK broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian. More specifically, the two tools were used to produce automatic intralingual subtitles for a British talk show and for a US feature film dubbed into Italian. Both tools produced a timecoded transcript and an evaluation study was commissioned to compare the performance of the two tools on the English and on the Italian materials. Our evaluation focused on two key dimensions of quality: the accuracy of the transcript and the readability of the subtitles in relation to the needs of a potential audience. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models, originally developed to assess accuracy in live subtitling (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017). Our adapted version focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing subtitle segmentation, namely both line breaks and subtitle breaks. Our findings indicate that in terms of accuracy Broadstream outperformed Amberscript in English, but Amberscript delivered a more accurate output in Italian. However, all the ASR outputs that were analysed fell short of the NER 98% accuracy threshold expected for live subtitling in the broadcasting industry (and, arguably, the accuracy expected of pre-recorded subtitles is actually even higher, because the subtitles are not produced under such high time pressure). As regards readability, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages, thus impacting overall quality further and requiring extensive human post-editing. The combined evaluation of both accuracy and readability has provided insights into the strengths and weaknesses of each tool for the two languages in question and in relation to the TV genres considered. To sum up, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step in the subtitling process. Our in-depth analysis has clearly shown that in order to produce broadcast-ready subtitles, substantial human input is required both before the tools can be put to work on the materials (customisation and selection of appropriate settings) and after the ASR has generated the subtitles (human editing).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.