D-ID launches an AI video translation tool that includes voice cloning and lip sync

Written by
Sarah Perez
Published on
Aug. 21, 2024, 6:33 p.m.

AI video creation platform D-ID is the latest company to ship a tool for translating videos into other languages using AI technologies. However, in this case, D-ID also clones the speaker’s voice and changes their lip movements to match the translated words as part of the AI editing process.

The technology stems from D-ID’s earlier work — which you may recall from the viral trend a few years ago where users were animating their older family photos, and later those photos were able to speak . On the back of that success, the startup closed on $25 million in Series B fundraising in 2022 with an eye on serving its increasing number of enterprise customers in the U.S. who were using its technology to make AI-powered videos.

With the company’s now-launched AI Video Translate tech, currently being offered to D-ID subscribers for free, creators can automatically translate their videos into other languages to help them expand their reach. In total, there are 30 languages currently available, including Arabic, Mandarin, Japanese, Hindi, Spanish and French, among others. A D-ID subscription starts at $56 per year for its cheapest plan and the smallest number of credits to use toward AI features and then goes up to $1,293 per year before shifting to enterprise pricing.

D-ID suggests the new AI video technology could help customers save on localization costs when scaling their campaigns to a global audience in areas like marketing, entertainment, and social media. The technology will compete with other solutions for both dubbing and AI video.

For years, dubbing technologies have made it easier for video viewers to listen to audio in their own language but were often inaccessible to smaller creators. That’s been changing as companies improved access to technology. For example, YouTube released a multi-language audio feature designed to help its creators connect with a wider audience by translating their videos into other languages. Well-known creator Mr. Beast (Jimmy Donaldson) was among the early adopters, having used the tech to bring several of his popular videos to 11 more languages.

With AI, the ability to create, translate, or clone voices is also expanding. Microsoft this year announced it would use AI to translate and dub YouTube videos and others, while you watch. In July, creator platform Vimeo unveiled tools to not only translate audio and captions but to do so by replicating the speaker’s voice with AI technologies. Numerous companies also offer voice cloning or AI translation tools (or sometimes, both), including those from Descript, ElevenLabs, Speechify, Veed, Camb.ai, Captions.ai, and Akool, to name a few, as well as tools that let you create videos using AI avatars that can speak dozens of languages, like those from HeyGen , Deepbrain AI and others.

Dubbing and lip sync AI libraries, like wav2lip, have also made it easier for startups to build these sorts of tools while pitching to creators that they make it easier, and perhaps more affordable, to use AI technology.

D-ID says its new Video Translation technology will be available through D-ID Studio and its API. A one-month trial is being offered and further demos are on its website.

Weekly newsletter
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.
Read about our privacy policy .
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.