一个极为简陋的批量提取词头的脚本

hkreporter · 2022 年8 月 18 日 14:22

#依赖：pip install mdict_utils
#功能：
#能对整个文件夹以及子文件夹内的*.mdx文件提取词头
#提取出的词头txt打上日期标签，因为词典将来可能更新
#自定义需要提取的文件夹地址

#提取目的：
#比较词头
#扩充词头
#完成扩充之后，开始按照词表提取词头，制作自定义术语表


# -*- coding:utf-8 *-


# 导入os模块
import os
import datetime
#输入需要提取的文件夹地址，可以提取文件夹内所有的子文件夹中的mdx词头
dir = input("Enter the path of mdx:")

# 为了方便提取，加上词头+日期，直接everything搜索就能把所有文件夹中的词头txt文件都找到
# mdict -k "absolute_path_of_the_dict.mdx">"absolute_path_of_the_dict 词头 date.txt"
date_object = datetime.date.today()


def path(dir):
    for x, y, z in os.walk(dir):
        for name in z:
            a = os.path.splitext(name)[1]  # 文件名拆分，获取后缀名
            if a == ".mdx":
                file_path = x + "\\" + name
                print(
                    "mdict "
                    + "-k "
                    + '"'
                    + file_path
                    + '"'
                    + ">"
                    + '"'
                    # + file_path
                    + x
                    + "\\"
                    + os.path.splitext(name)[0]
                    + " 词头 "
                    + str(date_object)
                    + ".txt"
                    + '"'
                )
                os.system(
                    "mdict "
                    + "-k "
                    + '"'
                    + file_path
                    + '"'
                    + ">"
                    + '"'
                    # + file_path
                    + x
                    + "\\"
                    + os.path.splitext(name)[0]
                    + " 词头 "
                    + str(date_object)
                    + ".txt"
                    + '"'
                )


if __name__ == "__main__":
    path(dir)

atauzki · 2022 年8 月 20 日 05:39

直接用API更好

import os
import sys
import datetime
from mdict_utils import reader

input_dir = sys.argv[1]

for input_path in os.listdir(input_dir):
    if input_path.endswith(".mdx"):
        output_path = os.path.basename(input_path) + "-Headwords-" + str(datetime.date.today()) + ".txt"
        headwords = reader.get_keys(input_path, passcode="utf8")
        with open(output_path, "w", encoding="utf8") as f:
            for key in headwords:
                f.write(key + "\n")

dictionaryfan · 2022 年8 月 20 日 08:57

既然有mdx，用goldendict，右键词典，词典词条，然后导出为txt就行

当然写个程序自动也不错，特别是批量的时候管用

hkreporter · 2022 年8 月 20 日 11:42

感谢大佬指点！
请问一个小白问题，reader这样的api还有别的吗？在哪里有介绍？在它的githubliuyug/mdict-utils: MDict pack/unpack/list/info tool没有找到。

atauzki · 2022 年8 月 20 日 16:36

看源代码，定义了什么函数就用什么