Random Python Scripts #2: How to Edit eBooks with Python - Rename Characters, Fix Translations, and More

ikerdomingoperez
Sep 8, 2025
7 min read

How we got here

Recently, my wife (a pure bookworm) told me:

“I like this book (Unseen, by Mari Jungstedt) — seriously, it’s so good, proper Nordic Noir. But there’s one tiny detail that keeps taking me out of the story on every page: the main character. Why his name has to be Knutas? Really, is there not a better name?”

That planted the seed. And I guess you already know what we are going to do: Edit eBooks with Python.

This is a small tutorial to help you understand how eBooks work, and how to adapt them so your reading flows better. It’s not just about weird character names, sometimes a name is so familiar that every time it’s mentioned, you picture someone you know. Or maybe a bad translation keeps using a strange or incorrect word. Or… maybe you just want to set the world on fire.

There are a few libraries (pun intended) for working with eBooks in Python, but the most popular are ebooklib and epub_meta.

Understanding the EPUB format

For this example, I’ll use an eBook I created for my wife some time ago: a compilation of short stories published in different magazines by Sally Rooney.

First, let’s understand the format. An EPUB is just a ZIP file. If you unzip it, you’ll find a bunch of files: cover images, metadata, CSS, and, most importantly, the HTML/XHTML pages containing the text. So, let's unzip it and take a look at the brains.

If you want to do it fast, you could just open the HTML files, find the occurrences in your IDE, replace them, save, zip it back up, rename to .epub… and you’re done. Tutorial over.

Or… you could enjoy the thrill of using BeautifulSoup to parse HTML, find and replace text, and rebuild the book. You know, for the science.

Installing the tools to edit eBooks with Python

Install the libraries with pip:

pip install ebooklib beautifulsoup4

ebooklib to navigate through the eBook contents, modify them, or create a new book.
BeautifulSoup4 the most popular HTML parser (you probably know it already).

To load a book you just need to call read_epub

from ebooklib import epub

book = epub.read_epub('Raccoon Stories - Sally Rooney.epub')

On that book object you can navigate through the options

>>> dir(book)
['EPUB_VERSION', 'FOLDER_NAME', 'IDENTIFIER_ID', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_id_html', '_id_image', '_id_static', 'add_author', 'add_item', 'add_metadata', 'add_prefix', 'bindings', 'direction', 'get_item_with_href', 'get_item_with_id', 'get_items', 'get_items_of_media_type', 'get_items_of_type', 'get_metadata', 'get_template', 'guide', 'items', 'language', 'metadata', 'namespaces', 'pages', 'prefixes', 'reset', 'set_cover', 'set_direction', 'set_identifier', 'set_language', 'set_template', 'set_title', 'set_unique_metadata', 'spine', 'templates', 'title', 'toc', 'uid', 'version']

Check the metadata

>>> book.get_metadata('DC', 'title')
[('Raccoon Stories', {})]
>>> book.get_metadata('DC', 'creator')
[('Sally Rooney', {'{http://www.idpf.org/2007/opf}role': 'aut', '{http://www.idpf.org/2007/opf}file-as': 'Sally Rooney'})]

And loop through the content

>>> for item in book.get_items():
...     print(item.get_name())
... 
titlepage.xhtml
index_split_000.html
index_split_001.html
index_split_002.html
index_split_003.html
index_split_004.html
toc.ncx
page_styles.css
stylesheet.css
cover.jpeg
index-1_1.png

Those HTML files are the ones we need. There are some eBooks that write the content in XML, LXML or XHTML, but in this case its good old HTML. In any case, those files are always of the type epublib.ITEM_DOCUMENT. So, let's look again to check how many item documents are there. We expect to find 6 on this example: all html files and that title xhtml.

>>> for item in book.get_items():
...     if item.get_type() == ebooklib.ITEM_DOCUMENT:
...         print(item.get_name())
... 
titlepage.xhtml
index_split_000.html
index_split_001.html
index_split_002.html
index_split_003.html
index_split_004.html

Finding and replacing text

All we need to do now is loop through each of them, load with bs4, and find and replace.

For character names, you will usually need at least three replacements:

Full name
First name only
Last name only

In my example book, good Inspector Knutas is not part of the story, so, instead, I replaced the character "Libby". Searching for "Libby" in my reader returned 13 matches, one of them lowercase.

search character name occurrences — Searching for Libby

With BeautifulSoup, we need to load each HTML file, find matching text nodes, replace them, and save the result as a new EPUB. The script isn’t the most Pythonic, but it works. And more importantly, it’s easy to adapt or scale.

import logging
from ebooklib import epub, ITEM_DOCUMENT
from bs4 import BeautifulSoup


logger = logging.getLogger('epub')
logger.setLevel(logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(logging.Formatter('%(asctime)s | %(levelname)s | %(message)s'))
logger.addHandler(stream_handler)

BOOK_NAME = 'Raccoon Stories - Sally Rooney.epub'

REPLACEMENTS = [
    ('libby', 'mandy'),
    ('Libby', 'Mandy')
]


def main():
    logger.info('Welcome to the eBook character name fixer.')
    logger.info(f'Loading book {BOOK_NAME}.')
    book = epub.read_epub(BOOK_NAME)
    logger.info(
        f'Book loaded. Title: {book.get_metadata("DC", "title")[0][0]} by {book.get_metadata("DC", "creator")[0][0]}.'
    )
    for epub_item in book.items:
        if epub_item.get_type() == ITEM_DOCUMENT:
            logger.info(f'Found epub document item {epub_item.get_name()}. Loading...')
            content = BeautifulSoup(epub_item.get_content(), features='html.parser')
            logger.info('Content loaded. Searching for occurrences.')
            replacements = 0
            for text_chunk in content.find_all(string=True):
                for to_search, to_replace in REPLACEMENTS:
                    if to_search in text_chunk:
                        occurrences = text_chunk.count(to_search)
                        logger.info(f'Found {to_search} {occurrences} times.')
                        text_chunk.replace_with(text_chunk.replace(to_search, to_replace))
                        logger.info(f'Replacing [{to_search} >> {to_replace}] at {epub_item.get_name()}')
                        replacements += occurrences
            epub_item.set_content(str(content).encode('utf-8'))
            logger.info(f'Total replacements: {replacements}')
    epub.write_epub('modified_book.epub', book)


if __name__ == '__main__':
    main()

Output:

12:10:28,593 | INFO | Welcome to the eBook character name fixer.
12:10:28,593 | INFO | Loading book Raccoon Stories - Sally Rooney.epub.
12:10:28,598 | INFO | Book loaded. Title: Raccoon Stories by Sally Rooney.
12:10:28,598 | INFO | Found epub document item titlepage.xhtml. Loading...
12:10:28,605 | INFO | Content loaded. Searching for occurrences.
12:10:28,605 | INFO | Total replacements: 0
12:10:28,605 | INFO | Found epub document item index_split_000.html. Loading...
12:10:28,607 | INFO | Content loaded. Searching for occurrences.
12:10:28,608 | INFO | Total replacements: 0
12:10:28,608 | INFO | Found epub document item index_split_001.html. Loading...
12:10:28,609 | INFO | Content loaded. Searching for occurrences.
12:10:28,609 | INFO | Total replacements: 0
12:10:28,609 | INFO | Found epub document item index_split_002.html. Loading...
12:10:28,610 | INFO | Content loaded. Searching for occurrences.
12:10:28,611 | INFO | Total replacements: 0
12:10:28,611 | INFO | Found epub document item index_split_003.html. Loading...
12:10:28,612 | INFO | Content loaded. Searching for occurrences.
12:10:28,612 | INFO | Total replacements: 0
12:10:28,612 | INFO | Found epub document item index_split_004.html. Loading...
12:10:28,700 | INFO | Content loaded. Searching for occurrences.
12:10:28,705 | INFO | Found libby 1 times.
12:10:28,705 | INFO | Replacing [libby >> mandy] at index_split_004.html
12:10:28,705 | INFO | Found Libby 1 times.
12:10:28,705 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,705 | INFO | Found Libby 1 times.
12:10:28,705 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,705 | INFO | Found Libby 1 times.
12:10:28,705 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,706 | INFO | Found Libby 1 times.
12:10:28,706 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,706 | INFO | Found Libby 2 times.
12:10:28,706 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,706 | INFO | Found Libby 1 times.
12:10:28,706 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,706 | INFO | Found Libby 3 times.
12:10:28,706 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,706 | INFO | Found Libby 1 times.
12:10:28,706 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,707 | INFO | Found Libby 1 times.
12:10:28,707 | INFO | Replacing [Libby >> Mandy] at index_split_004.html
12:10:28,740 | INFO | Total replacements: 13

On the last two lines, we replace the chapter HTML content and save a new book. In other words, we rewrite each HTML file and then repackage the EPUB.

(loop)
    ....
	epub_item.set_content(str(content).encode('utf-8'))
epub.write_epub('modified_book.epub', book)

Now let’s verify the output: no more “Libby,” and 13 “Mandy” hits instead.

Perfect! search and replace complete. Time to keep reading.

Practical tips

Case sensitivity: As on our example, the name also was referenced in lowercase, consider this for the replacement.
Partial matches: There are short names that could appear as substrings on other words. Use word boundaries or regex to avoid unwanted replacements.
Non-text nodes: Don’t touch attributes like href or class unless you intend to.
XHTML strictness: Some EPUBs are strictly XHTML. Keep output well-formed to avoid reader warnings. The BS4 can use lxml or other formatters, in this example was html.
Backup everything: Always write to a new filename. Keep the original EPUB untouched.

Creating a new EPUB from scratch

Just to complete this small tutorial, a few tips on how to create your own book from scratch. The tricky part would be to wrap your text in html. It doesn't require advanced html, but still can be a pain in the arse trying to manually put the html tags for every line. There are automated tools that convert from Word or Markdown to EPUB, but for the science, here is a minimal Python + ebooklib workflow.

First, create a new book and set the basic metadata

book = epub.EpubBook()

book.set_identifier('id123')
book.set_title('eBooks for dummies')
book.set_language('en')
book.add_author('Iker Domingo')

then, create chapters

introduction = epub.EpubHtml(
    title='Introduction',
    file_name='intro.html',
    lang='en'
)
introduction.content = '''
<h1>Introduction</h1>
<p>First eBook ever created with Python</p>
'''

chapter1 = epub.EpubHtml(
    title='Chapter 1',
    file_name='chap_01.html',
    lang='en'
)
chapter1.content = '''
<h1>Chapter 1</h1>
<p>blah blah blah</p>
'''

Alternative, create a text-to-html basic conversor to avoid repetitive html tags, something like wrap between <p> tags each line break. And, a nice header for the title.

def text_to_html(title, text):
    tagged_text = text.replace('\n', '</p><p>')
    return f"<h1>{title}</h1>\n<p>{tagged_text}</p>"

introduction = epub.EpubHtml(
    title='Introduction',
    file_name='intro.html',
    lang='en'
)
introduction.content = text_to_html(
    title='Introduction',
    text='First line\nSecond line'
)

chapter1 = epub.EpubHtml(
    title='Chapter 1',
    file_name='chap_01.html',
    lang='en'
)
chapter1.content = text_to_html(
    title='Chapter 1',
    text='Chapter 1 First line\nChapter 1 Second line'
)

anyway, we can create any number of "chapters" as EpubHtml objects, but then we have to add them to the book

book.add_item(introduction)
book.add_item(chapter1)

We can also create a Table of Contents

book.toc = (
    epub.Link('intro.html', 'Introduction', 'intro'),
    epub.Link('chap_01.html', 'Chapter 1', 'chap1')
)

Add Ncx and Nav files for ePub readers

book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())

We can also create your own CSS

style = '''
body { font-family: Arial, sans-serif; }
h1 { color: darkblue; }
'''
nav_css = epub.EpubItem(
    uid="style_nav",
    file_name="style/nav.css",
    media_type="text/css",
    content=style
)
book.add_item(nav_css)

We define what is the reading order

book.spine = ['nav', introduction, chapter1]

Aaand its done

epub.write_epub('python_stuff.epub', book)

The result? So professional.

Wrap-up

And that’s it: understand the EPUB structure, target just the document items, parse safely, replace precisely, and package it back up. If you are fixing an annoying translation or giving a character a new identity, this small script can keep the story flowing the way you want.

If you enjoy custom scripts or have an idea for a niche eBook tweak, reach out. Send your suggestion and I will turn it into a small, focused script for a future Random Python Scripts post. Edge cases welcome, they’re the most fun to solve.

Oh, I almost forgot. Wondering what I renamed Inspector Knutas to? Plot twist: I didn't, this book was on ink and paper.