What it’s about?
I wanted to create some EPUB files for my own personal use, and I had to look at too many different documents to find the information I needed, so I decided to write the tutorial I would have had at the beginning.
This document covers the current 3.3 EPUB version. It mentions technical requirements and my understanding of the best practices. It is targeted at people with some technical background in current web technologies.
There are lots of different and incompatible EPUB tools, and this tutorial won’t cover everything (for example I won’t cover right management or digital signatures), so things can and will go wrong.
If you spot an error, or if I missed an important information, please contact me.
If you want to make some EPUB files with JavaScript or TypeScript, I created a library that may help you.
Time for a lore dump
Having a general sense of the format should make creating EPUB files less frustrating, which needs some background information, including some history.
Very few “not invented here”
When it’s possible, EPUB strongly relies on industry standards, or at least industry standards from when each decision has been taken[1], which means:
-
An EPUB is a ZIP file
-
The content relies on web technologies like HTML and CSS
-
The metadata use Dublin Core formats
-
There’s XML
This has two consequences:
-
The specification is very dry but not long, because it relies on other (and much longer) existing specifications
-
Creating simple EPUB files doesn’t require a lot of specific tooling because you can use existing libraries and applications even if they are not EPUB-aware.
Made in ancient times, for ancient hardware
Books can include lots of text and pictures. Creating ebook reading devices that are not high-end computers[2] means that giving access to the book content shouldn’t require reading the whole file at once or keeping all of it in memory.
As some of the people working on the standard care about this use case, it is taken into account, which has some consequences on the format.
XML, XHTML and namespaces
If you had to deal with web technologies, you probably heard about HTML.
XML and HTML are two descendants of the same technology called SGML, which gave them a strong family look.
HTML, and especially HTML as used by browsers, can deal with syntax errors. The goal is to be able to browse websites even if their creators made mistakes.
XML syntax is simpler and stricter, which is bad for people making errors, but good for software developers. It’s because it means that this aspect of tooling can be simpler: you can expect the provided documents to have the right format and stop at the first issue.
Here is not the place to tell you the great hopes people had about XML at the end of the 90’s, and the great sadness that ensued. The short version is that XHTML 1 is a descendant of HTML 4 with an XML syntax, like two family branches joining back. The idea was to be able to use existing XML tools to write and prepare content for the web. But the stop at the first error thing meant that XHTML documents stayed a minority, and the web industry went on with HTML 5 instead of XHTML 2.
So if you worked with HTML, XHTLM syntax will look very familiar, and XML will look familiar.
The metadata of EPUB are in XML, and content is based of XHTML documents like this one:
<!--?xml version="1.0"?-->
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Chapter 1</title>
<link type="text/css" rel="stylesheet" href="style.css"/>
</head>
<body>
<h1>Chapter title</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aliquam elementum lacus sed tristique aliquet.</p>
<img src="image/1.svg"/>
</body>
</html>
Renaming HTML files to .xhtml
won’t work! What can work instead is using an XML tool that accept HTML-style tags to open your files and to save them with the proper XML syntax.
The XHTML of EPUB is not limited to the tags defined in XHTML 1 but should work with any HTML 5 content (the current HTML version) written with the XML syntax, except that the default formatting of tags specific to HTML 5 may not be applied.
Beyond the strictness, the thing to know about XML is namespaces, which are a way to use different sets of tags and attributes in the same document while avoiding clashes.
It works by using different prefixes for each set of tags and attributes, so each set is in its own “name space”, so EPUB-specific tags and attributes will have an epub:
prefix.
CSS
As you’ve seen in the above example, CSS is supported, with each XHTML document referencing its own CSS file(s).
In the same vein as HTML 5, the CSS support level is not identical on all ebook systems, for example you may have to adapt your XHTML to avoid modern CSS selectors.
Most ebook systems are designed to make the default styling as readable as possible, so if you don’t provide any specific CSS the result should be a standard looking ebook.
When styling, also remember to take readability into account: sometimes some fancy styling can makes your text harder to read.
A possible approach is to make several version of your book available, providing one version with only minimal styling would ensure everybody will be able to read it.
Extensions & compatibility
EPUB standard designed by a working group from many organizations. Having an interoperable EPUB doesn’t mean that extensions aren’t possible.
Quite the opposite, since the standard defines what parts are fixed, what parts can be freely extended, and what parts are not fixed but nobody tried to go that way so if you try good luck because you’ll be alone.
It has three consequences :
-
If you stick to the common standard and practices, things should mostly work.
-
If you open an existing EPUB file, you may find undocumented things.
-
If you want to create EPUB files for a specific system, you may have to jump through hoops.
Also the EPUB standard tries to be compatible through versions, so you can have files that are valid as EPUB 3 and as EPUB 2. So EPUB-2 specific elements are deprecated and not used but still explicitly allowed.
Epubcheck
EPUBCheck is a free and industry standard tool to check the validity of an EPUB file.
It’s not very fast, and some of the error messages could be more explicit, but it’s very thorough and a bit stricter than the specification, so it’s very handy when you’re toying with the format. Using it as part of your book building chain could save you a lot of time.
The beginning of an EPUB file
An EPUB file is a ZIP file, which is a bundle of files.
The contained files can be compressed or directly stored as is, which has two consequences :
-
As some pictures' formats like JPEG are already compressed, storing them as is avoids a useless compression step.
-
It provides an easy way to detect EPUB files (beyond checking the file extension)
Many files formats start with a “magic number”: by reading the beginning of a file you can deduce its format by checking a list of known values, for example ZIP files start with PK
.
The next step is to be able to detect that a ZIP file is an EPUB file. It works by mandating that the first file in the ZIP bundle has a specific content, and that it must be stored uncompressed.
As in the ZIP file the metadata that describes the files are at the end of the file, if you look at the beginning you’ll get the ZIP headers followed by the content of the first file, so if it is uncompressed the whole thing works like a kind of extended magic number.
So the first file of an EPUB :
-
Must be called
mimetype
-
Must contain
application/epub+zip
and only this -
Must be uncompressed
No ZIP extra attributes
A last word about the ZIP part: Many ZIP creation tools store metadata information in “extra attributes” by default, for example timestamps because the default ones have only a 2 seconds precision.
EPUB ZIP files should not use them, so check if the tools you want to use can avoid inserting them. The Linux zip command has a --no-extra
option for this case.
OEPBS directory
A practice that used to exist but was never mandatory was to put all the EPUB content file inside a OEPBS
directory (OEPBS, standing for Open eBook Publication Structure, was the EPUB format ancestor).
Except for the files with a fixed path, you are free to put your files where you want, but it’s still a good practices to organize them all in a subdirectory like EPUB
or CONTENT
, instead of putting them at the EPUB’s root.
The OPF file
Three quarter of the metadata types of an EPUB is contained in a single .opf
(for “Open Packaging Format”) file.
Where to find it
The path of this file is for you to decide, and it must be specified as the rootfile
of a container file so it can be found:
<?xml version="1.0"?>
<container
version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
>
<rootfiles>
<rootfile
full-path="PATH_TO_YOUR_OPF_FILE.opf"
media-type="application/oebps-package+xml"
/>
</rootfiles>
</container>
The container file’s path should be META-INF/container.xml
, so the tools can be able to find it.
General structure
<!--?xml version="1.0"?-->
<?xml version="1.0"?>
<package version="3.0"
xmlns="http://www.idpf.org/2007/opf"
unique-identifier="BookId"
>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<!-- Metadata part -->
<dc:title>Book title title</dc:title>
<dc:language>en</dc:language>
<dc:identifier
id="BookId"
>https://example.com/ebook</dc:identifier>
<meta property="dcterms:modified">2025-07-31T13:39:26Z</meta>
<dc:creator>Impressive author, Phd.</dc:creator>
<dc:publisher>Large publisher ltd.</dc:publisher>
</metadata>
<manifest>
<!-- Manifest part -->
<item
id="toc"
properties="nav"
href="toc.xhtml"
media-type="application/xhtml+xml"
/>
<item
id="cover-image"
href="cover-image.png"
media-type="image/png"
properties="cover-image"
/>
<item
id="css_1"
href="css/css_1.css"
media-type="text/css"
/>
<item
id="image_1"
href="image/1.svg"
media-type="image/svg+xml"
fallback="image_1_jpg"
/>
<item
id="image_1_jpg"
href="image/1.jpg"
media-type="image/jpeg"
/>
<item
id="part_1"
href="part/part_1.xhtml"
media-type="application/xhtml+xml"
/>
</manifest>
<spine>
<!-- Spine part -->
<itemref idref="part_1"/>
<itemref idref="part_2"/>
</spine>
</package>
Book information
The first part is a set of information about the book:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Book title title</dc:title>
<dc:language>en-US</dc:language>
<dc:identifier
id="BookId"
>https://example.com/ebook</dc:identifier>
<meta property="dcterms:modified">2025-07-31T13:39:26Z</meta>
<dc:creator>Impressive author, Phd.</dc:creator>
<dc:publisher>Large publisher ltd.</dc:publisher>
</metadata>
Note: dc
stands for Dublin Core is a set of industry-standard metadata.
Mandatory fields
-
One title (
dc:title`
), having several titles is technically possible but support is inconsistent -
One language (
dc:language
), using the IETF format (it represents the main language of the book, individual XHTML files or even parts of XHTML files can specify their own languages) -
One ore more identifiers
dc:identifier
that can contain a UUID a DOI an ISBN or an URL, using an URL is nowadays suggested. Theid
attribute of the identifier must have the same value as theunique-identifier
of the OPFpackage
tag, the value is not significant.
Optional but useful fields
-
One publication date (
dc:date
) in the ISO 8601 format. -
One or more creators (
dc:creator
)
Other fields
Other optional metadata can be added, like secondary contributors, the format of the identifier, the ebook type and ebook subjects. See the specification for details about them.
Manifest
<manifest>
<item
id="toc"
properties="nav"
href="toc.xhtml"
media-type="application/xhtml+xml"
/>
<item
id="cover-image"
href="cover-image.png"
media-type="image/png"
properties="cover-image"
/>
<item
id="css_1"
href="css/css_1.css"
media-type="text/css"
/>
<item
id="image_1"
href="image/1.svg"
media-type="image/svg+xml"
fallback="image_1_jpg"
/>
<item
id="image_1_jpg"
href="image/1.jpg"
media-type="image/jpeg"
/>
<item
id="part_1"
href="part/part_1.xhtml"
media-type="application/xhtml+xml"
/>
</manifest>
The manifest provides an exhaustive list of all files used in the ebook, which includes :
-
The content XHTML files
-
The images used in the ebook
-
The cover image
-
The table of content file
-
The style sheets
-
Any other file
For example if an XHTML
file use an image not listed in the manifest, your reader may not display it, even if technically the file can be found in the EPUB file.
The mimetype
, META-INF/container.xml
and OPF file must not be listed in the manifest.
The items order of the manifest is not meaningful.
Each item:
-
Must have a unique
id
attribute used to identify it. -
Must have an
href
attribute that contains its path in the EPUB hierarchy. -
Must have a
media-type
attribute that contains its type according to the media type format. -
Can have a
properties
attribute that define specific attributes of some items, these includes: -
Can have a
fallback
attribute that contains theid
of another item that is supposed to be used as a fallback if the current item can’t be displayed. For example if the initial item is anSVG
file` you can provide a fallback for systems that don’t support this format. This feature is cool but unfortunately not supported by all readers.
Spine
<spine>
<itemref idref="part_1"/>
<itemref idref="part_2"/>
</spine>
The spine list the XHTML documents in the reading order of the ebook.
Each document is referenced by its manifest id
.
Table of content
An EPUB file must includes a table of content (TOC) that contains links to the different parts of the book. This TOC must be an XHTML document identified in the manifest with the properties="nav"
attribute:
<manifest>
<item
id="toc"
properties="nav"
href="toc.xhtml"
media-type="application/xhtml+xml"
/>
</manifest>
The TOC’s content must be placed inside a nav
tag, with the epub:type="toc"
attribute.
XML require that the epub
namespace is declared in the header.
The table hierarchy is defined using nested ordered lists with ol
and li
tags.
<?xml version="1.0"?>
<html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
>
<head>
<title>Table of content</title>
</head>
<body>
<nav epub:type="toc">
<h1>Table of content</h1>
<ol>
<li><a href="part_1.xhtml#id_title_1">Title 1</a>
<ol>
<li>
<a
href="part/part_1.xhtml#id_title_1_1"
>Title 1.1</a>
</li>
<li>
<a
href="part/part_1.xhtml#id_title_1_1"
>Title 1.2</a>
</li>
</ol>
</li>
<li>
<a
href="part/part_2.xhtml#id_title_2"
>Title 2</a>
</li>
<li>
<a
href="part/part_3.xhtml#id_title_3"
>Title 3</a>
</li>
</ol>
</nav>
</body>
</html>
The XHTML documents don’t need to follow the TOC organization. The TOC can be omitted from the spine, in this case it’s only used for navigation.
Cover image
An EPUB file can define a cover image, it is identified in the manifest with the properties="cover-image"
attribute:
<manifest>
<item
id="cover-image"
href="cover-image.png"
media-type="image/png"
properties="cover-image"
/>
</manifest>
Different ebook systems have different requirements requirement regarding the cover image size.
Summary of the files so far
-
uncompressed
mimetype
file with fixed content -
META-INF/container.xml
file that provide the path to the OPF file -
OPF file with most of metadata:
-
Book metadata
-
Manifest
-
Spine
-
-
Table of content file
Footnotes
EPUBs support “footnotes”, which are a misnomer since they are displayed in popups, who avoid moving around like in physical books.
Footnotes use XHTML links with EPUB-specific attributes :
<?xml version="1.0"?><!DOCTYPE html>
<html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
>
<head>
<title>Chapter 1</title>
</head>
<p>Lorem<a href="notes.xhtml#note_1" epub:type="noteref">1</a> ipsum</p>
</body>
</html>
<?xml version="1.0"?><!DOCTYPE html>
<html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
>
<head>
<title>Notes</title>
</head>
<aside id="note_1" epub:type="footnote">Note text</aside>
</body>
</html>
Notes can appear in the same document as the main text or in a separate one, the only constraints are:
-
the link must be right, with the link anchor (after the
#
being the same as the note’sid
). -
declare the
xmlns:epub="http://www.idpf.org/2007/ops"
namespace in the XHTML documents, to make theepub:type
attributes valid.
Spoilers
There is no native EPUB support for spoilers, but some ebook systems support the details
tag that provide a similar feature:
<?xml version="1.0"?><!DOCTYPE html>
<html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
>
<head>
<title>Spoiler example</title>
</head>
<details>
<summary>Click to see the spoiler</summary>
Spoiler content.
</details>
</body>
</html>
On systems that don’t support them there are no good workaround, the best you can probably do is to add custom CSS formatting, but decreasing text readability is not a good idea because it creates accessibility issues.
As far as I know there is no way to detect if a system support them, so there is no way to switch the CSS formatting off when a system properly supports the tag.
The end
That’s it, happy publishing!