Hello dear /promptcollective readers - first of all happy new year… ouch it’s been a while! Hope you are all doing well and engaging in AI explorations and adventures… I have been very busy with different projects (will be able to share soon!).
But until I am back with a text here I have a super interesting essay from guest author Mats Meisen from AI dubbing startup voicelayer.ai for you! He is writing about the role of AI dubbing in making stories universally accessible.
I hope you will enjoy the read - I sure did…
The train rattles softly as I make my way back to the small village where I grew up. It’s Christmas in two days—not only the season of families coming together, having a beer with old friends, but also the season of binge-watching. Every year, The Lord of the Rings, Harry Potter, and other classics climb the top playlists on Netflix and Amazon. My girlfriend sits next to me, binge-watching a series in German on her smartphone, probably just to make the long ride pass by faster. From Berlin, it takes six hours to reach the Village.
The village, especially in the early 2000s, felt cut off from the rest of the world. You either played for the local football club, rode horses, or joined the traditional brotherhood. These weren't for me, but I found my windows to the world: the TV in the living room and the computer in the office. I spent most of my childhood exploring the wider world from these powerful portals. Though raised in a household with Marlboro on the table and Coca-Cola bottles in the fridge—traces of my parents’ time living in the U.S., where they experienced the vibrant cultures of cities like Washington and New York. Yet, it was German voices that brought Hollywood blockbusters to life in our tiny corner of the world.
But there was another kind of longing. As a teenager, I’d buy VHS tapes and, later, DVDs of American skateboarding videos. I couldn’t understand a word, but I watched them over and over, desperately searching for a way to feel part of a culture that resonated deeply with me. I saw people who traveled the world, who had a radical lifestyle, and a love for filmmaking. It was something I wanted to be part of. Even if I had met my heroes, I wouldn’t have been able to speak with them. I wished I could understand the jokes they were making, the stories they were sharing. I remember laughing out loud at their humor, even though I had no idea what they were saying. I wanted to belong to something bigger —something that didn’t stop at national borders, something global.
A Foundation Where It Matters
Fast forward to today. I’m building my startup for an AI dubbing solution. The mission was immediately clear: building a model to mimic the original performance across languages.
Original content in native tongues regardless of where you are from. Most AI startups today rely on wrapping large language models into business cases—solutions that larger companies can easily replicate in the future, often with better resources. As these giants continue to dominate, startups in this space may struggle to justify their existence.
The true opportunity for AI startups lies in building foundation models that address specific, impactful problems—like dubbing.
Audio deep tech and speech synthesis is a field where it’s relatively feasible to train something with limited resources. Competing in the race for the best LMMs no longer makes sense; the tech giants have both the funding and the infrastructure, including extensive datasets to maintain their lead. But these giants like Meta, Open Ai and Google will not build specifically for these special industry cases as they won't divert their resources from broader innovations. They prefer to provide tools like Google Cloud, AWS and others, to empower startups and developers to build specialized solutions.
Dubbing has been underappreciated, particularly by tech innovators in places like California. They don’t experience the same cultural reliance on it, likely because the workforce involved in dubbing is decentralized across the globe. Dubbing is just one of many components involved in producing a film, yet it is oddly overlooked despite being a crucial tool for enabling local content to reach global markets.
For many years, AI dubbing has relied on TTS (text-to-speech) technology, but TTS falls significantly short when it comes to acting and creative storytelling. Dubbing is an intellectually demanding craft. It’s not just about translation; it’s about performance. The holy grail here is to make computers act.
We have recently seen creators and influencers like Mr. Beast experimenting with multilingual content on social media speaking German, French, Italian, Thai, Hindi, and other languages. However, much of this work is still primarily human-led or uses basic automation rather than advanced AI-driven dubbing tools. It works okay for social media.
But it’s far from acceptable for dramatic performances. One reason for that is the automation. These one-button solutions often ignore the nuances of the original performance, producing translations that may only approximate the desired tone, voice transfer, and timing. If it doesn’t work—press again.
While many AI solutions have struggled to meet the nuanced demands of dubbing, advancements in voice cloning and prosody modeling are beginning to close the gap.
Modern AI systems are now capable of capturing elements like tone, pitch, and emotional directions, which are crucial for maintaining the integrity of a performance. However, the technology is still far from replacing the human actor, emphasizing the need for more advanced and directed models.
A Local Story, Told for Global Audiences
As the demand for localized content continues to grow, the industry is struggling to keep up. Streaming has turned global storytelling into a dinner-table topic, with audiences increasingly expecting high-quality dubs in multiple languages—especially English dubs are being cultivated as a recent development. Most prominent examples for this cultural shift are the hit series: Squid Games (South Korean), Dark (German), Money Heist (Spanish). This shift makes sense from an industry perspective, but the challenges are Immense.
AI dubbing will not only change the technology people are using to distribute their films, it will also change the way people are designing content and will enable other ways of storytelling. Traditionally, filmmaking is very local; exceptions are big Hollywood studios, though they have a heavy influence of U.S. American culture. Hollywood distributes their content globally. I have interviewed a German scriptwriter who is very optimistic about upcoming technologies, as he believes the job will change from writing for a German audience to a wider international audience. This will completely transform storytelling in German cinema, while retaining enough diversity to be a uniquely German story.
At its core, dubbing is about subjective, creative decisions. It’s not just translating words; it’s about capturing the tone, emotion, and cultural nuance of the original performance.
This is why creatives must stay in the loop. After all, is there ever an objective "right" or "wrong" in art? These subjective choices are what make storytelling resonate across borders, and AI can enhance this process by providing tools that empower these decisions without replacing the human touch.
Subtitles vs. Dubbing
There’s a divide in how people view dubbing. Cinephiles in the Berlin bubble often dismiss it, preferring subtitles to preserve the “original” experience. Meanwhile, for working-class audiences, dubbing serves as a gateway to escapism, offering a chance to forget their everyday lives and immerse themselves fully in a story. Unlike subtitles, which require constant focus and divide attention, dubbing allows for a seamless viewing experience, making narratives more relatable and accessible. By translating not just language but emotion and tone, dubbing connects people to global narratives they might otherwise never access.
Where Traditional Dubbing Fails (Spoiler Alert)
Take the hit series Succession, for example. I personally enjoyed the German dub very much, though others around me criticized a lack of authenticity in the characters. This was especially noticeable with the frequent swearing and cynicism, which are difficult to translate effectively.
Though I really enjoyed watching the series in German, I had to stop after season 3. The most important character in the series—Logan Roy, around whom all other characters revolved—had gotten a new voice actor. I tried to continue with the new voice but couldn't make it past the first episode. What has happened makes the story even sadder: Succession begins with Logan Roy suffering from a stroke, which sets up the central conflict for the whole series—a power struggle among his children over the company's Leadership.
Before Logan Roy's death in the series, the German voice actor Erich Ludwig, who had lent his voice to Logan for the first three seasons, passed away in 2022. While credibility is the biggest asset in dubbing, and though the new German actor Hans Bayer was talented and matched Logan's tone and pitch well, my perception of Logan Roy became confused. I simply couldn't believe it was really him speaking anymore.
The Future of Storytelling
I still believe in the mantra of diversity, though our times are leading to... somewhere else.
I think unity can be created through common narratives. We have to make sure the right narratives are shared. Entertainment has always made an effort to create their stories around values. It’s inherent in great storytelling. Don't defend the action; defend what you stand for.
Building AI dubbing technology isn’t just about improving tools; it’s about creating connections. It’s about taking that teenager watching skateboarding videos, and giving them a voice they can understand. It’s about enabling filmmakers and creatives to share their work without compromise.
So when we talk about AI in storytelling, let’s think beyond the obvious. This isn’t just a technological advance; it’s a cultural one. We’re not just changing how films, series, and games are made—we’re changing how we see and understand each other. This is the future of storytelling.
Who Are We?
The /promptcollective was founded by Jes Brandhøj (Denmark) and Hannes Jakobsen (Germany). We're on the lookout for like-minded enthusiasts. If you're passionate about the AI-creative nexus, reach out!
Thanks for reading /prompt collective! Subscribe for free to receive new posts and support our work.