[简易教程] MDX转EPUB电子书方法

阿弥陀佛 · 2024 年12 月 10 日 13:18

有些电子书适合制作成MDX查询，有些词典则适合阅读。因此也有MDX转EPUB的需要。
自己摸索的方法，不一定高效，大家有更好的方法欢迎补充。

1、解包MDX，MdxExport解开，词序可能是乱的，要注意。其他的解包工具暂时没测试。

2、把解包后的TXT稍微处理一下，基本格式如下：

<title>标题</title>
<body0>
正文内容...
</body0>

可以用正则替换：

</>\n(.+)

替换成：

</body0>\n<title>\1</title>\n<body0>

再调整一下第一个词头和最后一个词头，保持格式一致。

3、如果有MDD，也解开，把CSS、图片、TXT，都放在同一目录下，比如：D:\01\05
如：把 CRFDPIC文件夹放在 D:\01\05目录下。

4、写 Python代码：

import os
from ebooklib import epub

def txt_to_epub(txt_path, epub_path, css_path, img_dir):
    # 创建一个EPUB书籍对象
    book = epub.EpubBook()

    # 设置书籍的标题和作者
    book.set_title(os.path.splitext(os.path.basename(txt_path))[0])
    book.set_language('ja')

    # 检查样式文件是否存在
    if not os.path.exists(css_path):
        print(f"样式文件 {css_path} 不存在。")
        return

    # 创建一个样式文件
    style = epub.EpubItem(uid="style", file_name="style/ziyuan.css", media_type="text/css", content=open(css_path, 'rb').read())
    book.add_item(style)

    # 读取TXT文件并解析内容
    with open(txt_path, 'r', encoding='utf-8') as file:
        content = file.read()

    # 将内容分割为章节
    chapters = content.split('<title>')[1:]

    # 创建一个空的列表来存储章节
    chapters_list = []

    # 用于存储图片的字典
    images = {}

    for index, chapter in enumerate(chapters):
        # 提取标题
        title = chapter.split('</title>')[0].strip()
        # 提取内容
        body0_index = chapter.find('<body0>')
        if body0_index != -1:
            body_content = chapter[body0_index+len('<body0>'):].split('</body0>')[0].strip()
        else:
            body_content = chapter.strip()

        # 处理图片
        start_idx = 0
        while True:
            img_tag_start = body_content.find('<img', start_idx)
            if img_tag_start == -1:
                break
            img_tag_end = body_content.find('>', img_tag_start)
            src_start = body_content.find('src="', img_tag_start)
            src_end = body_content.find('"', src_start + 5)
            img_src = body_content[src_start + 5:src_end]
            if not img_src.startswith('/'):
                img_src = '/' + img_src
            img_path = img_src[1:]
            if os.path.exists(os.path.join(img_dir, img_path)):
                img_uid = f"image{index+1}_{len(images)+1}"
                img_item = epub.EpubItem(uid=img_uid, file_name=f"image_{img_uid}.jpg", media_type="image/jpeg", content=open(os.path.join(img_dir, img_path), 'rb').read())
                book.add_item(img_item)
                images[img_src] = img_uid
                body_content = body_content[:src_start] + f'src="image_{img_uid}.jpg"' + body_content[src_end:]
            start_idx = img_tag_end

        # 创建一个EPUB章节对象
        chapter_html = epub.EpubHtml(title=title, file_name=f'chap_{index+1}.xhtml', lang='ja')
        chapter_html.content = f'<h1>{title}</h1>{body_content}'
        book.add_item(chapter_html)
        chapters_list.append(chapter_html)

        # 添加书签
        book.toc = (epub.EpubNcx(),)

    # 设置书籍的主体内容（章节）
    book.spine = ['nav'] + chapters_list

    # 保存EPUB文件
    epub.write_epub(epub_path, book, {})

# 遍历指定目录下的所有TXT文件
for txt_file in os.listdir('D:\\01\\05'):
    if txt_file.endswith('.txt'):
        txt_path = os.path.join('D:\\01\\05', txt_file)
        epub_path = os.path.join('D:\\01\\05', txt_file.replace('.txt', '.epub'))
        css_path = os.path.join('D:\\01\\05', 'ziyuan.css')  # CSS文件的路径
        img_dir = 'D:\\01\\05'  # 图片目录
        txt_to_epub(txt_path, epub_path, css_path, img_dir)
        print(f"Converted {txt_file} to EPUB format.")

print("All files have been converted to EPUB format.")

运行PY，生成EPUB电子书。（按：我不会编程，以上代码AI写的，经过调整后能用）

中国佛教文化大观TXT.zip (1.5 MB)
中国佛教文化大观.epub (2.3 MB)
本身有图片，图片太大，我就生成了无图本，有图的epub有105M。

这只是一个很粗糙的版本，进一步优化，要正则调整 TXT内的数据结构。以美化书籍排版。

fufupu · 2024 年12 月 10 日 13:28

感谢分享，我一直想做从词典中根据词频来制作高频epub书以便时常翻阅，但苦于思路上挣扎，如图片，章节之类的，自己简单做的html体积太大经常卡死，但编程上苦手，也不知道有什么好的方法了。

sxingbai · 2025 年5 月 22 日 04:03

这两天也想转两本epub，多谢！

FengLiang · 2025 年6 月 21 日 02:03

学习了，不会编程。如果有个APP我可以试试，谢谢。