Originally published on Medium.


There are now so many lip-sync/AI Avatar/ Talking Face Generation Companies. Who is who?


The Definitive Guide to Lip Sync Companies

There are now so many lip-sync/AI Avatar/ Talking Face Generation Companies. Who is who?

The field of lip-sync is taking off at an exponential rate. While this is very true in academia, with the number of published papers becoming overwhelming, it is also true of the commercial companies that have spun out from this tech. In this guide, I will list some of them and briefly overview each. I’ll also include prices and links for those interested in trying them. I’ve tried to include my best estimate of the tech they use; please note: this is just my best estimate based on the web pages unless otherwise specified.I may earn a commission from some of these links. I’ve marked this aff.

If you are a / know of any lip-sync companies that I’ve not mentioned here, please let me know and I’ll update the list. This list is being updated regularly but may be somewhat out of date at any given time.

Synthesia

Synthesia is by far the largest player in the market, valued at over $1bn. With a research team putting out several top-tier papers every year, the tech is only going to get better. It’s a bit costly, but the quality is getting very good.

What they do:AI presenters. They generate the lip motion for specific people (you can make custom avatars for a price).

Prices:Starter tier at £17/month, Creator at £52/month. The former gets you 2 hours per year of video, the latter 6. Comes to around $2.2/minute for both tiers. The higher tiers include a webcam avatar.

The tech: Early Synthesia was almost certainly built around deferred neural rendering. Recently, their associated tech team has been focusing on Neural Parametric Head models; I expect they’ll be using them soon if they aren’t yet. Overall, Synthesia has the most mature tech.

Try them out here(aff)

HeyGen

What they do:HeyGen describes its offering as “Visual Storytelling.” In reality, this is also the AI avatar market for sales, marketing, and training videos. HeyGen has focused more on custom avatars, but it’s also starting to offer real-time avatars.

Prices:Prices range from $24/month ($1.60 per minute) to $360/month for their top-tier offering. The plans offer more custom avatars than competitors.

The tech: HeyGen’s tech is likely focused on fine-tuning. They’ll have a generic model that they fine-tune for each custom avatar. I think it’s a 2D model rather than Neural Rendering based on the feel and marketing.

Try them out here

Flawless

Flawless is a fast-growing company with excellent tech. They, like Synthesia, regularly publish papers at top conferences. Chances are you won’t be able to use their system, though, as they focus on dubbing Hollywood films.

What they do:

Prices:Not relevant in a sense. They provide custom solutions for film and advertising, which are not available to the general consumer.

The tech:Most of the marketing material shows 3D models overlayed on the face. This means they are probably using some 3DMM-based model.

Try them out here

DeepReel

What they do:DeepReel is another AI avatar company and one that I am quite biased towards, having worked on the tech myself in the past. They offer avatars for sales and marketing campaigns with integrations into Canva and Abode Express.

Prices:Prices remain at a fairly consistent ~$1.90 per minute across tiers, with plans ranging from $5–200 per month. Custom avatars cost $149 for a webcam avatar and $499 for a studio one.

The tech:As I once worked here, I can’t divulge too much. But from papers I have published with them publicly, I can say the tech is 3DMM-based.

Try them out here

Colossyan

What they do: Colossyan offers AI presenters focused on internal communications and education. They also have a feature that converts PDF or PowerPoint documents into AI avatar videos.

Prices: The starter plan starts at $19/month ($1.90/minute of video), and the pro plan goes up to $158/month ($1.75/minute of video). The pro plan includes a webcam avatar.

The tech: There’s not a lot to go on here. Usually, companies show marketing with a 3D overlay if they use 3D models, and Colossyan doesn’t. The general feel of the avatars is more 2D as well. However, they’re in the same price tier as other competitors using neural rendering, so it’s hard to tell.

Try them out here(aff)

Pipio

What they do:Pipio offer AI avatars for sales and marketing, as well as internal communications.

Prices:Pipio offers only one default price point: $20/month, which comes to $1.75/minute of video. Custom avatars are an add-on: $50 for webcam and $449 for studio.

The tech:Pipio also doesn’t directly offer any hints as to how their model works, but it has a 3DMM feel to it in my opinion.

Try them out here(aff)

SyncLabs

SyncLabs is a YC-backed company spun out by the creators of Wav2Lip. It is likely heavily influenced by their follow-up research work. It is on the cheaper end of available models, with more flexibility but generally lower quality.

What they do:SyncLabs are slightly different from many other companies here. Rather than offering avatars, they allow the lip-sync of any video to any audio. The outputs are lower quality than the predefined avatars but offer a lot more flexibility.

Prices:SyncLabs break the ~$1.5–2 business model of other companies, with the cheapest offering at $19/month coming to $0.95/minute. If you’re willing to spend $999/month that comes right down to $0.40/min.

The tech:As Synclabs is spun out by the Wav2Lip authors, it is almost certain that it builds on this line of tech. Most likely is it related to Wav2LipHQ, although with significant modifications.

Try them out here

D-ID

What they do:D-ID are one of the first companies in the field. They target training videos, sales and marketing, although you might be familiar with some of the TikTok memes made with their early tech.

Prices: D-ID is a bit cheaper than competitors per minute. Plans range from $6–360 per month, and a minute costs around $0.50.

The Tech:From the early days of D-ID memes, they likely use some form of image-based warping to generate their avatars (the kind that can work from a single image)

Try them out here

Tavus

What they do:Tavus provides AI replicas, with a focus on custom avatars for sales, marketing, education, and social media. They also emphasize APIs for developers to integrate Tavus’ replicas into their products.

Prices:Tavus offers two plans. The cheaper is $39/month ($1.56/minute), which includes 3 custom avatars. The more expensive includes 7 and is $199/month ($1.32/min).

The tech:One nice thing about Tavus is that they have been very open about their tech stack. A medium article by one of their employees goes into detail, but TL;DR: They use a 3DMM plus NeRFs (switching to Gaussian Splatting) and GANs.

Try them out here

GanAI

What they do:GAN AI also offers avatars that focus on replacing single words rather than generating entire sentences. This works well for things like email campaigns.

Prices:GAN AI does not provide prices on its website but instead asks potential clients to book a demo.

The tech:Given only individual words are replaced it’s harder to get a feel for how the tech works. However, it appears to be 2D-based.

Try them out here

VEED IO

What they do:VEED IO is more of a platform company than a company specifically focused on AI Avatars. They offer additional products such as automatic subtitles and video editing software.

Prices:Only one of their price tiers, the business tier, offers AI avatars. This costs £55/month, ~$70, and comes to $3.50/minute. However, this is only one of the features offered in this tier, so it’s not an entirely fair comparison.

The tech:Very little information about the AI avatars is available here.

Try them out here(aff)

Captions

What they do:Captions focus on end-to-end video production, including AI video generation, editing, and distribution.

Prices:Captions offer three tiers, but the pricing is not clear. You can generate videos on their app for free, but it appears you must subscribe to their max tier ($25/month) to download them.

The Tech: Captions use a combination of Diffusion, NeRFs, and Gaussian Splatting, depending on the specific model being used.

Try them out here

Some Insights

The first thing to notice is that this market is becoming extremely crowded. These companies are not the only ones—they are just the ones I have encountered. What’s more, there is very little to distinguish many of them. Many offer avatars that stand there and speak provided text. They mainly offer a price point that’s very often between $1.70-1.90 per minute. If you are considering starting your own lip-sync company, I would advise you to develop a new USPor face tough competition.

If you removed all branding and formatting from each companies page, in many cases you could not tell the difference.

Despite similar use cases, companies offer a wide range of exciting technologies. There is a mix of work across the different forms of lip sync models. 2D and 3D are used, and deferred neural rendering, NeRFs, Gaussian splatting (early stages), and wav2lip-like models are all represented.

Some of the largest tech companies are showing considerable interest in lip-sync models. EMO and VASA-1 from Alibaba and Microsoft, respectively, show this clearly. They may be able to outcompete the startups if they decide to develop products in this space.There is evidence that lip-sync models follow the same scaling laws as the rest of ML. Imagine if Google trained a several-billion-parameter model on YouTube!

We will likely see some consolidation in the next year or two. The larger companies may buy smaller ones to acquire the limited pool of experienced lip-sync developers. FAANG-level companies may also consider this if they enter the market; see above.

By Dr Jack Saunders on May 25, 2024.

Canonical link

Exported from Medium on March 18, 2026.