What is inside an EPUB file?

An EPUB is a ZIP archive containing: a mimetype file (must be first, uncompressed), META-INF/container.xml (points to the OPF), content.opf (the package document with metadata, manifest, and spine), XHTML content files (chapters), CSS stylesheets, images, and navigation files (toc.ncx for EPUB 2, nav.xhtml for EPUB 3).

How do I open and inspect an EPUB file?

Rename the .epub to .zip and extract it with any archive tool, or use: unzip book.epub -d book_extracted/. On the command line, unzip -l book.epub lists contents without extracting. The OPF file (usually OEBPS/content.opf) is the main index — start there.

What is the OPF file in an EPUB?

The OPF (Open Packaging Format) file is the package document that defines the entire publication. It contains four sections: metadata (title, author, language), manifest (list of all files with media types), spine (reading order), and optionally a guide (landmark references). Reading systems load this file first to understand the EPUB structure.

Why must the mimetype file be first in the EPUB ZIP?

The EPUB specification requires the mimetype file to be the first entry in the ZIP, stored without compression (ZIP_STORED). This allows software to quickly identify the file as application/epub+zip by reading just the first bytes of the archive, without decompressing anything.

What is the difference between the manifest and spine in an EPUB?

The manifest lists every file in the publication with its id, path, and media type — it's a complete inventory. The spine defines the reading order by referencing manifest item ids in sequence. An image or CSS file appears in the manifest but not the spine. Chapter files appear in both.

Inside an EPUB File: Structure, Files, and How It All Works

June 12, 2026 · 7 min read

An EPUB is not a mystery format — it's a ZIP archive containing a specific set of files. Once you understand the structure, you can fix broken EPUBs, build converters, or create files from scratch. Here's what's inside every EPUB.

The EPUB ZIP Structure

mybook.epub (ZIP archive)
├── mimetype                    ← must be first, uncompressed
├── META-INF/
│   └── container.xml           ← points to the OPF file
└── OEBPS/  (or any folder name)
    ├── content.opf             ← package document (manifest + spine)
    ├── toc.ncx                 ← EPUB 2 navigation (NCX)
    ├── nav.xhtml               ← EPUB 3 navigation (NAV)
    ├── chapter01.xhtml         ← content files
    ├── chapter02.xhtml
    ├── css/
    │   └── styles.css
    └── images/
        ├── cover.jpg
        └── figure1.png

The mimetype File

The first file in the ZIP must be named mimetype, stored without compression, and contain exactly:

application/epub+zip

No newline, no BOM, no spaces. This is how e-readers and validators identify the file as an EPUB without reading the full archive. Creating EPUBs with Python:

import zipfile

with zipfile.ZipFile('book.epub', 'w') as z:
    # mimetype MUST be first and uncompressed
    z.writestr(zipfile.ZipInfo('mimetype'), 'application/epub+zip',
               compress_type=zipfile.ZIP_STORED)
    # All other files can be compressed
    z.write('META-INF/container.xml', compress_type=zipfile.ZIP_DEFLATED)
    z.write('OEBPS/content.opf', compress_type=zipfile.ZIP_DEFLATED)

META-INF/container.xml

This file tells the reading system where to find the OPF package document:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf"
              media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

The full-path is relative to the root of the ZIP. The OEBPS folder name is conventional but not required — you can use any folder name or put the OPF in the root.

The OPF Package Document (content.opf)

The OPF file is the heart of an EPUB. It has four sections:

<metadata> — Dublin Core metadata (title, author, language, identifier)
<manifest> — lists every file in the publication with its id, href, and media-type
<spine> — defines the reading order by referencing manifest item ids
<guide> — EPUB 2 landmark references (optional, replaced by NAV landmarks in EPUB 3)

<manifest>
  <item id="ch1" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
  <item id="ch2" href="chapter02.xhtml" media-type="application/xhtml+xml"/>
  <item id="css" href="css/styles.css"  media-type="text/css"/>
  <item id="cover-img" href="images/cover.jpg" media-type="image/jpeg"
        properties="cover-image"/>
  <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml"
        properties="nav"/>
</manifest>

<spine toc="ncx">
  <itemref idref="nav" linear="no"/>
  <itemref idref="ch1"/>
  <itemref idref="ch2"/>
</spine>

Content Files — XHTML, Not HTML

Chapter files must be valid XHTML — XML-conformant HTML. Key differences from HTML5:

Must have the XML declaration or at least the XHTML doctype
All tags must be closed: <br/> not <br>
Attribute values must be quoted
Case-sensitive: use lowercase element names
The namespace declaration is required: xmlns="http://www.w3.org/1999/xhtml"

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title>Chapter 1</title>
  <link rel="stylesheet" type="text/css" href="../css/styles.css"/>
</head>
<body>
  <section epub:type="chapter">
    <h1>Chapter 1: Introduction</h1>
    <p>Text here.</p>
  </section>
</body>
</html>

Inspecting an EPUB

# List contents without extracting
unzip -l book.epub

# Extract to a folder
unzip book.epub -d book_extracted/

# View OPF
unzip -p book.epub OEBPS/content.opf | xmllint --format -

EPUBs from PDF Conversion

When toolkit.bot converts a PDF to EPUB, it generates all required files: mimetype, container.xml, content.opf, toc.ncx, nav.xhtml, chapter XHTML files, embedded images, and a stylesheet. The output passes EPUBCheck validation and includes EPUB Accessibility 1.1 metadata.

Convert your PDF to a valid EPUB3 →