The Science Of Video Captions: How They Impact Audience Retention


You might have noticed that content has recently made a major format shift, with subtitled short-form clips having the internet's attention in a chokehold. Here's why everyone's all about video captions these days.

Short-form captions displayed a few words at a time and in bold enough font can increase audience retention. More importantly, they can also improve content comprehension and reading speed. Not only do video captions drive engagement, but they also drive home your content's message.

This article is the most comprehensive, well-cited, and informative resource on the power of video captions in the audience retention context. Make sure to bookmark it for future reference as well.

In this post, you will find out how captions used by content creators are different from movie captions. You will also discover five ways in which captions can help your content alongside five best practices to optimize your captions for maximum viewer retention.

By the end of this post, you'll know the caption fonts, size, color, and type to pick for your videos. But first, let's go over the difference between conventional subtitles and social media content captions.


Subtitles: The Three Use Cases

Almost all resources regarding subtitles pre-2016 have become completely irrelevant or are only partially relevant because the most common use case for captioning has shifted. In the earliest era of subtitling, its use was limited to translation. With the evolution of the global media sphere, the role of subtitles is shifting. Whenever subtitles are used, they are chosen for one of three reasons: translation, clarity, or engagement.

Subtitles: The Three Use Cases

Most resources around subtitles are limited to this use case as it is the original intent of content captioning. In the early 2000s, Japanese content, mainly anime, became the greatest beneficiary of subtitles. Currently, Koren Media is experiencing a similar cross-cultural boom.

From Korean dramas to Japanese anime, content that has cross-cultural appeal uses subtitles to overcome the language barrier. Subtitling in this context must cover one sentence at a time, be at the bottom of the screen, and shouldn't be distractingly attention-grabbing. But unless you're creating content for a foreign audience, you do not need to adhere to these practices.


With the advent of Web 2.0 and the consequent revolution in content creation, everyone everywhere is creating content. And not every creator who makes English content is a native English speaker. The language barrier has now become an accent barrier.

Subtitles can be used in long-form content to lend clarity to the speaker. Even in this context, the subtitling best practices state that one should use sentence-long subtitles that aren't too distracting.

Audience Retention

While subtitles have traditionally been used for translation and clarity, most creators have started using them for something else: engagement. The TikTok effect refers to the shortening of the human attention span. It doesn't matter if TikTok has made out attention spans shorter or if TikTok has succeeded because our attention spans are short. The point is our attention spans are short.

You might have noticed content on Reels and TikTok with a split screen featuring subway surfer gameplay. Initially, it was used to avoid copyright strikes when uploading clips from TV shows. But now, it is a popular content format, especially in "storytime" content. And the reason, at least according to Dr. Natalie Coyle, is the engagement bump it gives because of "visual tactility".

Subtitles in short-form content are closer to subway surfer's split-screen than they are to traditional long-form subtitles. And the reason is simple: they are meant to entrance the viewer, not translate or clarify the audio. That is why you cannot opt out of using captions just because your audio is "clear."

In an engagement-driven content cape, most creators are resorting to subtitles as engagement-bumping tools. If you want your content to outdo theirs in audience retention, you have to use the same tools. But to use a tool properly, you should know how it works.

How Do Subtitles Impact Audience Retention? Here's The Science

Knowing how video captions improve audience retention will help you make the right captioning choices to maximize your engagement rate. In this section, you'll find the science of subtitles.

  • Subtitles can make learning effortless - Self-improvement is a multi-billion dollar industry thanks to social media. While people want to consume content to improve themselves, they don't want to put in the effort to learn it. Niel Degrasse Tyson and Dr. Huberman are educators from two different disciplines, but their rise proves that the easier you make it for people to learn, the higher you rise as a social media educator.

  • Subtitles can make content digestible - Subtitles that appear in short phrases, aligned with the spoken word, can help learners comprehend and retain information better. A study, published in Theory and Practice in Language Studies, finds that subtitles can make it easier for people to digest spoken-word content.

  • Shorter subtitles are easier to follow - The collective creator hive of over a million creators have found this via trial and error. But finally, there is enough science to confirm that short subtitles are easier to follow. If you put long sentences in your content captions, you make your audience "read." On the other hand, when you use short, snappy subtitles, you make your audience "catch" words that appear on the screen. In a 2013 study, Kruger et al. find that subtitles that appear in short segments can improve reading speed and comprehension compared to longer, sentence-length subtitles. One can correlate these findings with their user experience to conclude that subtitles that appear in short, incremental segments can improve comprehension and retention of spoken information.

  • Snappy subtitles increase engagement - Given that subtitles, when limited to a few words at a time, make learning easier and can be followed more closely, one can summarize that they're bound to improve engagement. Multiple studies have found that short subtitles can enhance attention, retention, and overall engagement with videos.

In other words, it is pretty clear that using short, snappy subtitles can help you retain your audience's engagement. But what if you're not sure about the value of audience engagement?

Choose Between Lower-Screen Or Mid-Screen Captions

Before 2018, there was only one type of content captioning: the classic bottom-of-the-screen subtitles. Since the global shift towards short-form content, a new form of captions has emerged. It is the mid-screen captions, which are used in Reels, TikToks, and Shorts with audio value. It is part audio-illustrating tool and part engagement tool.

Knowing which type of captioning your videos require can help you choose the best font for it. In the section above, no font is as caption-specific as the PT Caption fonts. But since those fonts were designed before mid-screen captions were a thing, they do not work for short-form content captioning.

Ideally, your font choice would work for both short-form and long-form content. But if you only create one type of content (not recommended), then you can stick to a font that works only for that content type.

Judge The Font's Mood

Whether a font works in the middle of the screen or at the bottom in a small size, it is not exempt from the mood standard. When you look at the sample content that most font pages have, what do you feel? Noticing how a font affects your mood is a great way to judge its compatibility with your content.

A font that is overly traditional shouldn't be used for startup how-to videos. And a font that is very obviously modern shouldn't be used to subtitle the history of ancient Egypt.

Cross-Reference With Your Brand

Aside from aligning the font's mood with your content's mood, you must align the font's appearance with your brand's visual identity. If your letterhead, logo, and content text font clashes with your caption font, then you should consider an alternative.

Cross-referencing your font choices with your brand's existing font set is crucial. And if you have yet to build a font set for your brand, then the next tip is even more important.

Consider The Font Family's Versatility

The ideal font for captions has a sufficiently bold form as well as a medium-weight font. But the broader a font family, the better off it is for a content creator. Chances are, you produce long-form content as well as short clips.

By selecting a font with a versatile family, you leave yourself options for lower-screen captioning as well as mid-screen captions. And the subtitles in your long-form and short-form content match because they share a family.

Check Its Legibility At 14 Px And Aesthetics At 60 Px

To make sure that a font is legible on a smartphone, you should reduce its size to 14 pixels and look at it from three feet away. Aside from ensuring that the content you consume is visible at 14px, you should also make sure it looks good at 60px.

Some fonts are not meant to be used in big and bold text. But any typeface you plan to use for mid-screen captions must look great when blown up to a headline's size.

Check Its Legibility Against A Non-Uniform Background

Many content creators make the mistake of assuming that legibility tests conducted in a word processor are enough for choosing a caption font. You cannot be sure that a font is going to work for subtitles unless you use it in subtitles.

You can either place sample text over an image or actually subtitle a clip of your content with the candidate font to be sure of its legibility.

Cross-Check The Font's License With Your Content Goals

The final thing you must check before you start using a font for captions in your videos is the font's license. Most fonts have a free personal-use license, so you can use them as long as you're not paid for the piece of content in which they are used.

But to be extra safe, you should try to get fonts with a commercial license. Fonts that are free to use for commercial purposes are truly liberating.

You can use them in your sponsored content, ads, and even on-demand content. Even if you don't get paid for your content just yet, it makes sense not to tie yourself to a font that will eventually limit you.


The Value Of Audience Engagement

By now, it is clear that captions can boost your audience engagement. However, many people think that engagement is relevant to only one of two platforms. Here are quotes from important people at the helm of different social media regarding the importance of engagement:

  • Instagram - Adam Mosseri, the CEO of Instagram, has said: "Engagement is really important to us because we believe that people get value out of Instagram..."

  • Twitter - Ex-CEO and Founder Jack Dorsey has stated that: "It's important to have a conversation around healthy and sustainable engagement."

  • LinkedIn - While most people see LinkedIn as a less attention-hungry platform, its CEO, Ryan Roslansky, has tied its polite and professional environment to engagement: "For us, engagement is all about creating an environment where our members can build their professional networks, stay informed about the latest industry news and trends, and find new opportunities for career growth," says Roslansky.

  • Snapchat - If you upload engaging content on Snapchat, you're helping its CEO, Evan Spiegel, do his job. He has previously said the following regarding Snapchat's purpose: "Our goal has always been to create a platform that is fun and engaging for our users."

  • TikTok - "We believe that the key to success on TikTok is creating content that is not only engaging but also fun and creative," says TikTok CEO Shou Zi Chew.


Subtitle Selection Best Practices: What The Science Says

If you've read up to this point, you understand that subtitles' impact on audience retention is scientifically proven. Subtitles, when done right, entrance your audience, make content comprehension easier for them, and allows them to follow what's being said. "When done right" is the key aspect that this section tackles. Here are the best practices for subtitling content for maximum audience engagement.

Opt For Larger Captions

Traditional captioning practices recommended reducing text size to the minimum legible one. That was because the subtitles weren't meant to keep one from paying attention to the visuals. But unless you're captioning a movie, your subtitles are also a visual tool.

In a Journal of Computer-Assisted Learning study, researchers have found that "smaller text sizes are associated with decreased audience engagement". In the same study, they've also concluded that "larger text sizes can improve engagement".

So, when in doubt, go large. If you're confused between two font sizes, you can choose the bigger option. This brings us to the logical question: how large is too large?

According to several studies, "text-to-content ratios of around <span style="text-decoration:underline;">20%</span> to <span style="text-decoration:underline;">30%</span> can improve comprehension and retention of information".

Aim For 20% To 30% Of The Screen For Spoken-Word Content

This might come off as an extremely unorthodox suggestion, but it is scientifically backed. On paper, 30% of the screen being occupied by text seems like a lot. But remember that many TikTok clips have 50% split-screen with Subway Surfer. While adding a distracting visual element next to the relevant content is a great way to keep people glued to the screen, it is tacky. Do large subtitles seem equally tacky?

Watch this clip from the Oscars.

The text occupies over half the screen at one point and still doesn't look large because of line breaks. It is extremely difficult to have large subtitles without making them look intrusive. ContentFries has hundreds of subtitle templates that accommodate large text with appropriate fonts and colors optimized for engagement. So give out the auto-captioning engine a try.

The subtitling engine used for the clip above is offered by Instagram Stories. You can read our post on How to Use Instagram Story Captions. The IG Captions font is an interesting one as it aligns perfectly with the science of subtitling.

Use Sans-Serif Font To Make Your Captions Easy To Read

From Instagram auto-captions to TikTok's default title font, most social media platforms default to sans-serif typeface. And like every move any social media platform makes, the reason is engagement.

Multiple studies, including ones covered in this post, point towards the average reading speed being higher when the captions have sans-serif font compared to serif ones. Furthermore, we've found that caption comprehension is also better with sans-serif font choices.

Make Caption Fonts Bolder And Choose Contrasting Colors

You can pick a sans-serif font and blow up your captions so they occupy over 50% of the screen and still get skipped on because legibility goes beyond size. The color and weight of the font play a very important role. Traditionally, captions fonts were slim because they weren't meant to distract the viewers who were not reading them. Nowadays, everyone's following captions.

Subtitles have generally had contrasting colors regardless of content type. Yellow and White have been used with a black border or background. You don't need to undo those practices of movie captions when creating captions for your own videos.

But you must make the font bolder, so its weight allows more color to be snuck into the glyphs. Color theory states: "Using contrasting colors and bold fonts can improve the legibility of text and enhance learning."

Of course, it is a study about advertising media, but all content creators today are advertisers for their respective profiles. Your content gets discovered on the "For You" or the "Home" page of a social media platform. And only if your content is good enough does anyone follow you. Otherwise, people have a hard time watching the entire video.

View Text As Animation To Hook Your Viewers

This is a double-edged sword because if you overdo animation to the point where alphabets are flying all over the screen, you'll love your audience. At the same time, if you make your subtitles too "standard," you might not be able to entrance your audience.

Several guides suggest that "using animation, such as text that appears and disappears, can increase viewer attention and engagement”. When the text appears and disappears, your captions become an animation element. Most people look for visual distractions while consuming audio content.

When watching a video that has high audio value, they zone out of the video aspect and seek distraction elsewhere. The animation elements give visual distraction that keeps them hooked to the tab/screen. When you top seeing video captions as a comprehension aid and see them as a visual distraction, you'll understand why the content today looks like it looks.


Selling Your Soul For Engagement? The Creator's Dilemma

Having covered how captions affect content engagement, let's go over a related topic regarding chasing engagement. Many creators dislike the idea of chasing audience retention and losing their vision in the process. We understand this, which is why we don't recommend adding subway surfer visuals at the bottom of your content to occupy your viewers' attention.

Large subtitles can look classy and still get the job done. But what if you want to record long-form content that doesn't have visuals baked into it? You could be a standup comedian who wants to record a special with no text on the screen. Or you could be a professional speaker who wants his keynote to look like a TED Talk and not a PowerPoint.

We have worked with several creators who have this dilemma and have come to one conclusion: Do not compromise your vision when it comes to pillar content. The pillar-and-spoke content creation strategy entails creating a value-dense piece of content (pillar) and using high-discovery short-form content (spokes) to lead back to it.

For comedians, their standup specials should be exactly the way they want them. But the clips that they put out from the special should be exactly the way the algorithm wants them. For podcasters and speakers, the same could be said about their flagship content.

The balance between clout-chasing clips and integrity-having long-form content is something every respectable creator today has achieved. From Mr. Beast to Markiplier, almost every content creator is using clips for reach. So even if you put subtitles on some of your content, you'll be in good company.

You can record different spoke and pillar content pieces, but we recommend repurposing. Content repurposing allows you to make high-quality content and then transform it into multiple content pieces. Because you're focused on quality with your pillar piece, you're unlikely to make compromises on it. And once it is recorded, you can create fresh clips, graphics, etc., meant purely for reach.

ContentFries is a content-multiplying platform that allows you to create content for different platforms from one high-quality piece. Among other things, it can automate your videos and offer high-engagement drag-and-drop subtitle templates.

It can also help you add audio visualizers to audio clips so they can be uploaded on video platforms. The platform is free to try and can help you create 36+ pieces of content in under 10 minutes once you get the hang of it.


Final Thoughts

The science is in: captions work. But knowing the context in which they work is crucial. Long-sentence subtitles with a small font-size work for movies and tv shows. Short, snappy subtitles work for social media content. Make sure to use the right font, make it bold, and give it a color contrasting to its background. Above all, remember to add line breaks and make your captions occupy 30% of the screen when uploading to TikTok or IG Reels platforms.