这论坛的教学都是失效的链接,我花了一个礼拜的时间弄懂了网上是怎么制作词典,因此来发布在这,让小白少走弯路。我在这论坛找的教程没一个看得懂,因此写了这篇,只要你有一点python基础,你就可以制作一个自己的词典了。
Preparation
You have to install these packages for creating mdx and mdd files
pip install pyglossary mdict-utils gTTS
For visualization what is appearance of
.mdx
. with.mdd
, you should install GitHub - goldendict/goldendict: A feature-rich dictionary lookup program, supporting multiple dictionary formats (StarDict/Babylon/Lingvo/Dictd) and online dictionaries, featuring perfect article rendering with the complete markup, illustrations and other content retained, and allowing you to type in words without any accents or correct case.
Introduction
A dictionary has two main files: .mdx
and .mdd
. .mdx
is used for the dictionary’s content, and .mdd
is used for dictionary assets like images, audio, and cover images.
To build a .mdx
file, collect words, definitions, and example sentences.
A .mdd
file includes sound files. Use gTTS to get them. Install the gTTS package. Also, embed images and CSS files to relate every word in the .mdx
contents.
Here is any resource in my github:
GitHub - cia1099/mdict: Tutorial how to generate .mdx file to make your own dictionary
Getting start
Before we start our code, we should create the project structure like:
./
├── example
│ ├── playsound.png
│ └── example.css #(optional)
├── example.json
├── example.css
└── write_mdx.py
example.json
contains your dictionary key-value contents, which meat key represent word and value is definition or any example, vowel etc… Contains all of contents of the words in value json format, we will discuss it later. In additional you can use other data collection file(.csv
,.db
,.txt
…) to parse it. In this example we used json data format to create our.mdx
file.example.css
used to revise our contents. You can put it outside the/example
folder. Outside the.mdd
, we can edit our contents’ layout and appearance. We can modify the display in goldenDict instantly.example
directory collects all of a word’s assets, including images and sound files. We will use gTTs to collect our sound resources and put these into the directory. You can put your.css
file in this folder to hide it.
1. Generate pronunciation by gTTS
We can generate any pronunciation as long as gTTS supports it. We wrote every word by:
import asyncio
from gtts import gTTS
async def rpc_gtts(text: str, lang: str = "en", dir: str = ""):
tts = gTTS(text, lang=lang)
filename = text.split(" ")[0].lower() + ".mp3"
tts.save(str(Path(dir) / filename))
The text is contents which you want to text to sound, lang is desired language, dir is where directory you want to save it, example
is our use case here. We used the first word of text as our stem name and used async declaration because this function is a heavy I/O operation. Using even loop is the most efficient way to handle tremendous requests concurrently, which save our life.
import asyncio
import json
from pathlib import Path
from typing import Coroutine
async def run_together(coroutines: list[Coroutine]):
if len(coroutines) < 1:
print("There is no any Coroutine")
return
print(f"Request {len(coroutines)} GTTs services concurrently")
await asyncio.gather(*coroutines)
with open("example.json", 'r') as rf:
dictionary = json.load(rf)
dict_dir = Path("example")
asyncio.run(
run_together([
rpc_gtts(word, dir=str(dict_dir)) for word in dictionary.keys() if not (dict_dir / f"{word}.mp3").exists()
])
)
After you executed above code, you would get .mp3
files in example
folder. Now we have our pronunciation files, congratulation Dude!
2. Example of dictionary content
Here is an example of our dictionary, you could modify your own dictionary as long as you can parse the format. Our example.json
looks like:
{
"doe": ["noun", "a deer, a female deer."],
"ray": ["noun", "a drop of golden sun."],
"limp": ["adjective", "not firm or strong"]
}
We parsed this format to html format:
def parse2html(word: str, pos: str, definition: str) -> str:
html = f"""
<link href="example.css" rel="stylesheet" type="text/css" />
<div class="ml-1em">
<span class="color-navy"><b>{word}</b></span>
<span class="color-purple">/<a href="sound://{word}.mp3"><img src="/playsound.png"></a> lɪmp/ </span>
<span class="color-darkslategray bold italic">{pos}</span>
</div>
<div class="ml-2em">{definition}</div>
"""
return html.replace("\n", "")
The html class tags are according to example.css
. Revise them as you like.
We used this function to parse our example.json
data and write to .mdx
file:
import json
from mdict_utils.base.writemdict import MDictWriter
with open("example.json", 'r') as rf:
dictionary = json.load(rf)
writer = MDictWriter(
{word: parse2html(word, v[0], v[1]) for word, v in dictionary.items()},
title="Example Dictionary",
description="This is an example dictionary.",
)
with open(f"example.mdx", "wb") as wf:
writer.write(wf)
3. Pack our dictionary with sounds and images
This is the easiest step. We just type the command in the terminal to our resource folder, then pack it into a .mdd
file.
mdict -a example example.mdd
Summary
We have the files needed for goldenDict to read them. We can open it to check our example.
There are good dictionaries you can use to see how they are built.
You can query a word to see what is their format:
mdict -q <word> <dict>.mdx
Or use pyglossary to unpack .mdx
# txt format
pyglossary example.mdx example.txt
# json format
pyglossary example.mdx example.json
mdict-utils can also unpack:
mkdir -p mdx
mdict -x <dict>.mdx -d ./mdx
以上你就成功制作mdx词典了,希望大家能别走和我一样的弯路,这里排版和难用,清楚的排版可以看我的github