AI and audio description: Scribit.Pro investigates whether artificial intelligence can help produce accessibility features

22 May 2024  
Illustration in the Scribit.Pro colors of a woman putting her arm through computer screens.

Developments in AI are moving at lightning speed. New options seem to emerge every week. Scribit.Pro follows these developments closely. We are investigating whether artificial intelligence can assist us in creating audio description, transcription and subtitling.

We already regularly use artificial intelligence during the production of accessibility features for videos. This allows our subtitlers/image describers to work faster when we use automatically generated subtitles. For example, this speech-to-text technology converts the speech from a video production into text. This text can be placed in the Scribit.Pro editor. This gives our professionals a starting point from which they can create professional subtitling for the deaf and hard of hearing. This saves a lot of time, because you don't have to type everything out yourself.

Customers who make their videos accessible themselves using the Scribit.Pro software can also use this AI feature. On the dashboard they will find a button that can generate automatic subtitles for a video. The subtitles then appear as subtitle blocks, with almost correct timing, in the editor. A perfect basis to get started yourself.

This technology is astonishingly good, but not flawless. For example, names or words are sometimes misheard, or points belong in different places in a sentence. And even if the artificial intelligence has understood everything correctly and converted it into text properly, the subtitles still require extensive editing. Few people speak in perfect sentences. So when a speaker in a video says “eh” four times in a row, AI produces a subtitle with the word “eh” four times in a row. AI does not take into account understandable Dutch, pleasantly readable Dutch or grammatically correct sentences, but the subtitler/image descriptor should of course do so. The subtitler must add the speaker identifications to the subtitles, as well as the music or sound indications.

By producing subtitles for the deaf and hard of hearing for a video, Scribit.Pro makes the video more understandable for people with a hearing impairment. The subtitles not only show the dialogues or what a voice-over says, but also indicate that it is a voice-over. Or a doorbell rings or a dance number starts. Scribit.Pro processes all essential auditory information from a video in the subtitles. And we also process this information in the transcript that we produce with each video.

To make video content more accessible for people with a visual impairment, or for example a cognitive challenge, Scribit.Pro produces audio description in the form of an (artificial) voice that provides an image description. But can AI also do work in the field of audio description? Can artificial intelligence facilitate, speed up or improve audio description production? Is AI able to convert texts that appear on the screen with video content into text that can be spoken by the audio description voice? Scribit.Pro can also benefit greatly from an AI model that can interpret moving images and convert them into an image description. This type of artificial intelligence is called generative AI. Scribit.Pro is busy testing various AI models that can be integrated into its own software. We want to investigate whether the production of accessibility files can be improved and more efficient, so it can support image describers in their work.

AI can be a useful assistant for Scribit.Pro's professionals and customers; a fast-working intern who needs guidance, correction and supervision and should not be given final responsibility. The profession of video accessibility remains  professional and human work.

Keep an eye on our social media if you want to stay informed about the AI ​​features of Scribit.Pro.

Get started today

Sign up to our newsletter