← toolkit.bot

Inside an EPUB File: Structure, Files, and How It All Works

June 12, 2026  ·  7 min read

An EPUB is not a mystery format — it's a ZIP archive containing a specific set of files. Once you understand the structure, you can fix broken EPUBs, build converters, or create files from scratch. Here's what's inside every EPUB.

The EPUB ZIP Structure

mybook.epub (ZIP archive)
├── mimetype                    ← must be first, uncompressed
├── META-INF/
│   └── container.xml           ← points to the OPF file
└── OEBPS/  (or any folder name)
    ├── content.opf             ← package document (manifest + spine)
    ├── toc.ncx                 ← EPUB 2 navigation (NCX)
    ├── nav.xhtml               ← EPUB 3 navigation (NAV)
    ├── chapter01.xhtml         ← content files
    ├── chapter02.xhtml
    ├── css/
    │   └── styles.css
    └── images/
        ├── cover.jpg
        └── figure1.png

The mimetype File

The first file in the ZIP must be named mimetype, stored without compression, and contain exactly:

application/epub+zip

No newline, no BOM, no spaces. This is how e-readers and validators identify the file as an EPUB without reading the full archive. Creating EPUBs with Python:

import zipfile

with zipfile.ZipFile('book.epub', 'w') as z:
    # mimetype MUST be first and uncompressed
    z.writestr(zipfile.ZipInfo('mimetype'), 'application/epub+zip',
               compress_type=zipfile.ZIP_STORED)
    # All other files can be compressed
    z.write('META-INF/container.xml', compress_type=zipfile.ZIP_DEFLATED)
    z.write('OEBPS/content.opf', compress_type=zipfile.ZIP_DEFLATED)

META-INF/container.xml

This file tells the reading system where to find the OPF package document:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf"
              media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

The full-path is relative to the root of the ZIP. The OEBPS folder name is conventional but not required — you can use any folder name or put the OPF in the root.

The OPF Package Document (content.opf)

The OPF file is the heart of an EPUB. It has four sections:

<manifest>
  <item id="ch1" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
  <item id="ch2" href="chapter02.xhtml" media-type="application/xhtml+xml"/>
  <item id="css" href="css/styles.css"  media-type="text/css"/>
  <item id="cover-img" href="images/cover.jpg" media-type="image/jpeg"
        properties="cover-image"/>
  <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml"
        properties="nav"/>
</manifest>

<spine toc="ncx">
  <itemref idref="nav" linear="no"/>
  <itemref idref="ch1"/>
  <itemref idref="ch2"/>
</spine>

Content Files — XHTML, Not HTML

Chapter files must be valid XHTML — XML-conformant HTML. Key differences from HTML5:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title>Chapter 1</title>
  <link rel="stylesheet" type="text/css" href="../css/styles.css"/>
</head>
<body>
  <section epub:type="chapter">
    <h1>Chapter 1: Introduction</h1>
    <p>Text here.</p>
  </section>
</body>
</html>

Inspecting an EPUB

# List contents without extracting
unzip -l book.epub

# Extract to a folder
unzip book.epub -d book_extracted/

# View OPF
unzip -p book.epub OEBPS/content.opf | xmllint --format -

EPUBs from PDF Conversion

When toolkit.bot converts a PDF to EPUB, it generates all required files: mimetype, container.xml, content.opf, toc.ncx, nav.xhtml, chapter XHTML files, embedded images, and a stylesheet. The output passes EPUBCheck validation and includes EPUB Accessibility 1.1 metadata.

Convert your PDF to a valid EPUB3 →