Multilingual EPUBs: RTL Text, CJK, and Language Attributes
EPUB supports multilingual content, right-to-left scripts, and mixed-language documents — but each requires specific markup. Here's how to handle non-English EPUBs correctly.
Setting the Document Language
The primary language goes in the OPF metadata and in each XHTML file's html element:
<!-- In content.opf -->
<dc:language>ar</dc:language> <!-- Arabic -->
<dc:language>ja</dc:language> <!-- Japanese -->
<dc:language>zh-Hans</dc:language> <!-- Simplified Chinese -->
<dc:language>zh-Hant</dc:language> <!-- Traditional Chinese -->
<!-- In each XHTML file -->
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="ar" lang="ar"
xmlns:epub="http://www.idpf.org/2007/ops">
Use BCP 47 language tags. Both xml:lang and lang are needed — XHTML parsers use xml:lang, HTML parsers use lang.
Right-to-Left (RTL) Text — Arabic, Hebrew, Persian
RTL content requires the reading direction set at both the EPUB spine level and the CSS level:
<!-- In content.opf spine -->
<spine page-progression-direction="rtl">
<itemref idref="ch1"/>
</spine>
<!-- In the XHTML html element -->
<html xml:lang="ar" lang="ar" dir="rtl">
<!-- In CSS -->
body {
direction: rtl;
text-align: right; /* or text-align: start; for logical properties */
unicode-bidi: embed;
}
The page-progression-direction="rtl" in the spine tells reading systems to flip the page-turn direction (swipe left to go forward becomes swipe right).
Vertical Text — Japanese and Chinese
Traditional Japanese and Chinese publishing uses vertical right-to-left text (tategumi). EPUB 3 supports this via CSS writing modes:
body {
writing-mode: vertical-rl; /* vertical, right to left columns */
-webkit-writing-mode: vertical-rl;
}
/* For horizontal headers within vertical text */
h1 { writing-mode: horizontal-tb; }
Set page-progression-direction="rtl" in the spine for vertical Japanese (columns flow right to left). Kobo and Apple Books have the best vertical text support; Kindle support is partial.
Mixed-Language Content
For inline language switches (e.g., English quotes in a French document), use lang on the specific element:
<p lang="fr">
Le terme anglais <span lang="en">"reflowable"</span> n'a pas
d'équivalent direct en français.
</p>
This affects hyphenation, text rendering, and screen reader pronunciation. Screen readers switch to the correct voice/language when they encounter a lang attribute change.
Ruby Annotations (CJK Pronunciation)
Ruby annotations show pronunciation guides above CJK characters (furigana in Japanese, bopomofo in Chinese):
<ruby>漢<rt>かん</rt></ruby><ruby>字<rt>じ</rt></ruby>
EPUB 3 supports HTML5 ruby markup natively. Apple Books renders ruby well; Kindle support varies by device generation.
Font Considerations for Non-Latin Scripts
- Arabic/Hebrew — most system fonts include Arabic and Hebrew; embed only if using a specific typeface. Noto Sans Arabic is freely embeddable (SIL OFL license).
- CJK — CJK fonts are very large (5–20 MB). Subset to only the characters used with
pyftsubset:pyftsubset NotoSansCJK.ttf --text-file=book_chars.txt - Fallback — always specify a generic fallback:
font-family: "MyFont", serif;
Converting Non-English PDFs to EPUB
toolkit.bot detects the document language from PDF metadata and sets the correct dc:language and xml:lang attributes in the output EPUB. RTL documents get page-progression-direction="rtl" automatically. OCR for scanned non-Latin documents uses the appropriate language model.