Multilingual EPUB: How to Handle Multiple Languages in One Ebook
Bilingual books, language learning materials, annotated translations, and academic texts often mix multiple languages in a single EPUB. Correct language tagging is essential for text-to-speech pronunciation, spell checking, hyphenation, and screen reader behavior. This guide covers how to structure multilingual content in EPUB3.
Why language tagging matters
When a TTS engine or screen reader encounters the sentence 「こんにちは」said John, it needs to know that こんにちは is Japanese (and should use a Japanese voice/pronunciation) while the surrounding text is English. Without language tags, TTS engines either skip non-default-language characters or mispronounce them badly.
Correct language tagging also affects:
- Hyphenation: Hyphenation rules are language-specific. German hyphenates differently from English.
- Spell checking: Reading apps that check spelling need to know which dictionary to use per word.
- Font rendering: Some scripts require language-specific OpenType features (e.g., Traditional vs. Simplified Chinese glyphs in the same Unicode range).
- WCAG 3.1.2 (Language of Parts): WCAG requires language tagging for content in a different language from the page's default. Required for EPUB Accessibility 1.1.
Setting the primary language in the OPF
The OPF metadata declares the primary language of the publication. For a bilingual English/French book:
<metadata>
<dc:language>en</dc:language>
<dc:language>fr</dc:language>
...
</metadata>
The first dc:language element is the primary language. Additional elements declare secondary languages present in the publication. Reading systems use this to set the default TTS voice and hyphenation.
Inline language switches with xml:lang
For content that switches language at the element level, use the xml:lang attribute (and the HTML5 lang attribute — both are valid in XHTML):
Document-level language
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
xml:lang="en"
lang="en">
Paragraph-level switch
<p>The French translation reads:
<span xml:lang="fr" lang="fr">
Le renard brun rapide saute par-dessus le chien paresseux.
</span>
</p>
Section-level switch
<section xml:lang="de" lang="de">
<h2>Kapitel 1</h2>
<p>Dies ist der deutsche Text des Kapitels...</p>
</section>
Parallel text (interlinear / facing page)
For language learning books with parallel text:
<div class="parallel-text">
<p class="source" xml:lang="la" lang="la">
Gallia est omnis divisa in partes tres.
</p>
<p class="translation" xml:lang="en" lang="en">
All of Gaul is divided into three parts.
</p>
</div>
Right-to-left content
When mixing left-to-right and right-to-left scripts (e.g., English and Arabic, or English and Hebrew), set the dir attribute alongside xml:lang:
<span xml:lang="ar" lang="ar" dir="rtl">
مرحبا بالعالم
</span>
For entire chapters in a right-to-left script, set dir="rtl" on the body or section element. In the OPF spine, mark the entire publication's page progression direction:
<spine page-progression-direction="rtl">
For a bilingual LTR/RTL book, the page progression direction typically follows the primary language.
Font considerations for multilingual EPUB
Embedding fonts in multilingual EPUBs requires care:
- Use Unicode fonts that cover all the scripts in your publication. A font covering only Latin characters will fall back to system fonts for CJK, Arabic, or Devanagari text — and the fallback may not match your design intent.
- Noto fonts (Google) provide excellent Unicode coverage for almost every script and are free for embedding.
- Font licensing: Many commercial fonts do not permit EPUB embedding. Check the license before embedding. OFL (Open Font License) fonts are always safe to embed.
- Subsetting: CJK fonts are large (10–30 MB). Subset to only the characters actually used with a tool like
pyftsubset(from fonttools).
FAQ
What BCP 47 codes should I use for language tags?
Use the shortest unambiguous tag: en for English, fr for French, zh-Hans for Simplified Chinese, zh-Hant for Traditional Chinese, pt-BR for Brazilian Portuguese. The full BCP 47 registry is at iana.org/assignments/language-subtag-registry.
Does toolkit.bot preserve language tags when converting PDF to EPUB?
toolkit.bot sets the document language based on the PDF metadata and detected content language. For PDFs with mixed languages, the primary language is set in the OPF. Inline language switching requires post-conversion editing in Sigil or a text editor.
What is WCAG 3.1.2 (Language of Parts)?
WCAG success criterion 3.1.2 requires that any passage in a language different from the page's primary language be identified with a language tag. This allows assistive technologies to pronounce the text correctly. Required for EPUB Accessibility 1.1 conformance.
How do I create a bilingual facing-page EPUB?
Use a two-column CSS layout or alternate chapters/sections approach. The parallel text markup example above works well for interlinear layouts. For side-by-side pages, fixed-layout EPUB3 gives you precise column control but sacrifices reflowability.
toolkit.bot detects document language and sets correct OPF metadata — free, no account required.