史记辞典图片版

nyyb · 2025 年1 月 21 日 01:30

看了一下，跟这个是一个东西，就是把图片分辨率放大，我用过，对于不清晰的图片，效果不是很理想。可以看看这个，跟这个原理应该是一样的 https://bigjpg.com/

nyyb · 2025 年1 月 21 日 01:42

我用bigjpg与 SourceBook分别处理了一张图片，效果如下，可以看到 SourceBook略好一些，可能是因为 SourceBook的模型专门针对文字图片把，bigjpg侧重于非文字图片：
bigjpg:

SourceBook

都是放大四倍，
原图：

这是原文字：

以上供大家参考

aimdict · 2025 年1 月 21 日 01:44

对mdx格式来说，图片格式是在没有资源的情况下无可奈何的选择，文本格式才是王道，所以我们要感谢阿弥陀佛的制作，他制作的绝大部分词典是文本格式的。

nyyb · 2025 年1 月 21 日 01:46

是的，要是有文本版的就好了

HQYang · 2025 年1 月 21 日 02:36

请问第三步的小程序可以共享使用吗？批量制作词头。

jcz777 · 2025 年1 月 21 日 02:38

度盘，有个DJVU格式，可能清晰些

nyyb · 2025 年1 月 21 日 02:45

这个也就几行代码，写的太简单了，简单到拿不出手。既然想要我就贴出来吧，献丑了：

使用前安装opencc，执行：

pip install opencc

import re

import opencc

converter = opencc.OpenCC('t2s.json')


def convert_mdx():
    with open(r'E:\dowload\史记辞典-index.txt', 'r', encoding='utf-8') as f:
        with open('E:\dowload\史记辞典.txt', 'w', encoding='utf-8') as fw:
            lines = f.readlines()
            mapping = dict()
            for line in lines:
                line = line.strip()
                if not line:
                    continue
                mt = re.search(r'(\d+)', line)
                if mt:
                    index = mt.start(1)
                    headword = line[:index].strip()
                    pic_num = mt.group(1)
                    mapping[pic_num] = headword
            for line in lines:
                line = line.strip()
                if not line:
                    continue
                mt = re.search(r'(\d+)', line)
                if mt:
                    index = mt.start(1)
                    headword = line[:index].strip()
                    pic_num = mt.group(1)

                    last_href = ''
                    prev_num = str(int(pic_num) - 1)
                    if prev_num in mapping:
                        last_headword = mapping[prev_num]
                        if last_headword:
                            last_href = f'<a href="entry://{last_headword}">上一页</a>'

                    next_href = ''
                    next_num = str(int(pic_num) + 1)
                    if next_num in mapping:
                        next_headword = mapping[next_num]
                        if next_headword:
                            next_href = f'<a href="entry://{next_headword}">下一页</a>'
                    fw.write("""%s
<img src="/%s.png" width="1080px"><br/><br/><center>%s %s</center>
</>
""" % (headword, pic_num.rjust(6, '0'), last_href, next_href))
                    simple_word = converter.convert(headword)
                    if simple_word != headword:
                        fw.write(f"""{simple_word}
@@@LINK={headword}
</>
""")
                else:
                    print('该行不合法：' + line)

# 生成magic命令
def gen_cmds():
    with open('E:\dowload\cmds.txt', 'w', encoding='utf-8') as f:
        for i in range(854):
            png_name = str(i + 1).rjust(6, '0')
            cmd = f'magick {png_name}.png -crop +200+150 -crop -142-386 +repage -strip output/{png_name}.png\n'
            f.write(cmd)


convert_mdx()

nyyb · 2025 年1 月 21 日 03:10

谢谢分享，确实比我这个pdg更清晰，等我有空了替换一下mdd文件

wwr21 · 2025 年1 月 21 日 03:22

希望提供一下第四步切图的一些相关工具和批命令。多谢

nyyb · 2025 年1 月 21 日 03:45

magic命令安装这个软件就可以了

nyyb · 2025 年1 月 21 日 03:51

上边crop就是裁剪的意思，四个值分别是
左，上，右，下的偏移量，他的坐标大概是以图片左上角是原点，+200代表往右移动200个像素，负数相反

say · 2025 年4 月 7 日 06:37

已经校对好了。
汉书辞典.txt (341.0 KB)

Aaron · 2025 年4 月 7 日 13:43

不太明白这是怎么来的？

say · 2025 年4 月 8 日 00:35

say · 2025 年4 月 8 日 00:38

我不知道你这张图是哪一本书的？

Aaron · 2025 年4 月 8 日 00:51

看错了，以为是《史记辞典》。

yuppie98 · 2025 年4 月 8 日 02:19

所以，配套这个电子辞典，最好有原来的这个电子版一起看，是不是好些？能否提供一下这个_10324703.uvz文件呢?

last_idol · 2025 年4 月 8 日 02:26

籍合网有文本版，现在只缺少一个账号。

say65 · 2025 年4 月 8 日 04:35

三国志辞典我来校对。

say · 2025 年4 月 8 日 04:48

汉书辞典.djv…等2个文件
链接:https://pan.baidu.com/s/1rHveoe-veIoAbPEQ1DBbyA
提取码:uizg