Mdx和mdd制作教程

cia1099 · 2024 年8 月 29 日 08:49

这论坛的教学都是失效的链接，我花了一个礼拜的时间弄懂了网上是怎么制作词典，因此来发布在这，让小白少走弯路。我在这论坛找的教程没一个看得懂，因此写了这篇，只要你有一点python基础，你就可以制作一个自己的词典了。

Preparation

You have to install these packages for creating mdx and mdd files

pip install pyglossary mdict-utils gTTS

For visualization what is appearance of .mdx. with .mdd, you should install GitHub - goldendict/goldendict: A feature-rich dictionary lookup program, supporting multiple dictionary formats (StarDict/Babylon/Lingvo/Dictd) and online dictionaries, featuring perfect article rendering with the complete markup, illustrations and other content retained, and allowing you to type in words without any accents or correct case.

Introduction

A dictionary has two main files: .mdxand .mdd. .mdxis used for the dictionary’s content, and .mddis used for dictionary assets like images, audio, and cover images.
To build a .mdx file, collect words, definitions, and example sentences.
A .mddfile includes sound files. Use gTTS to get them. Install the gTTS package. Also, embed images and CSS files to relate every word in the .mdx contents.

Here is any resource in my github:

GitHub - cia1099/mdict: Tutorial how to generate .mdx file to make your own dictionary

Getting start

Before we start our code, we should create the project structure like:

./
├── example
│ ├── playsound.png
│ └── example.css #(optional)
├── example.json
├── example.css
└── write_mdx.py

example.json contains your dictionary key-value contents, which meat key represent word and value is definition or any example, vowel etc… Contains all of contents of the words in value json format, we will discuss it later. In additional you can use other data collection file(.csv, .db, .txt …) to parse it. In this example we used json data format to create our .mdxfile.
example.cssused to revise our contents. You can put it outside the /example folder. Outside the .mdd, we can edit our contents’ layout and appearance. We can modify the display in goldenDict instantly.
example directory collects all of a word’s assets, including images and sound files. We will use gTTs to collect our sound resources and put these into the directory. You can put your .css file in this folder to hide it.

1. Generate pronunciation by gTTS

We can generate any pronunciation as long as gTTS supports it. We wrote every word by:

import asyncio
from gtts import gTTS
async def rpc_gtts(text: str, lang: str = "en", dir: str = ""):
  tts = gTTS(text, lang=lang)
  filename = text.split(" ")[0].lower() + ".mp3"
  tts.save(str(Path(dir) / filename))

The text is contents which you want to text to sound, lang is desired language, dir is where directory you want to save it, example is our use case here. We used the first word of text as our stem name and used async declaration because this function is a heavy I/O operation. Using even loop is the most efficient way to handle tremendous requests concurrently, which save our life.

import asyncio
import json
from pathlib import Path
from typing import Coroutine

async def run_together(coroutines: list[Coroutine]):
  if len(coroutines) < 1:
  print("There is no any Coroutine")
  return
  print(f"Request {len(coroutines)} GTTs services concurrently")
  await asyncio.gather(*coroutines)

with open("example.json", 'r') as rf:
                dictionary = json.load(rf)

dict_dir = Path("example")
asyncio.run(
run_together([
rpc_gtts(word, dir=str(dict_dir)) for word in dictionary.keys() if not (dict_dir / f"{word}.mp3").exists()
])
)

After you executed above code, you would get .mp3 files in example folder. Now we have our pronunciation files, congratulation Dude!

2. Example of dictionary content

Here is an example of our dictionary, you could modify your own dictionary as long as you can parse the format. Our example.json looks like:

{
"doe": ["noun", "a deer, a female deer."],
"ray": ["noun", "a drop of golden sun."],
"limp": ["adjective", "not firm or strong"]
}

We parsed this format to html format:

def parse2html(word: str, pos: str, definition: str) -> str:
  html = f"""
<link href="example.css" rel="stylesheet" type="text/css" />
<div class="ml-1em">
<span class="color-navy"><b>{word}</b></span>
<span class="color-purple">/<a href="sound://{word}.mp3"><img src="/playsound.png"></a> lɪmp/ </span>
<span class="color-darkslategray bold italic">{pos}</span>
</div>
<div class="ml-2em">{definition}</div>
"""
  return html.replace("\n", "")

The html class tags are according to example.css. Revise them as you like.

We used this function to parse our example.jsondata and write to .mdx file:

import json
from mdict_utils.base.writemdict import MDictWriter

with open("example.json", 'r') as rf:
  dictionary = json.load(rf)

writer = MDictWriter(
{word: parse2html(word, v[0], v[1]) for word, v in dictionary.items()},
title="Example Dictionary",
description="This is an example dictionary.",
)
with open(f"example.mdx", "wb") as wf:
  writer.write(wf)

3. Pack our dictionary with sounds and images

This is the easiest step. We just type the command in the terminal to our resource folder, then pack it into a .mdd file.

mdict -a example example.mdd

Summary

We have the files needed for goldenDict to read them. We can open it to check our example.
There are good dictionaries you can use to see how they are built.

freedict

You can query a word to see what is their format:

mdict -q <word> <dict>.mdx

Or use pyglossary to unpack .mdx

# txt format
pyglossary example.mdx example.txt
# json format
pyglossary example.mdx example.json

mdict-utils can also unpack:

mkdir -p mdx
mdict -x <dict>.mdx -d ./mdx

以上你就成功制作mdx词典了，希望大家能别走和我一样的弯路，这里排版和难用，清楚的排版可以看我的github

iandros · 2024 年8 月 29 日 10:54

看不懂，为什么不用中文呢？

wwr21 · 2024 年8 月 29 日 11:08

实在看不懂。

ddok · 2024 年8 月 30 日 13:04

关键一点python基础也没有啊，哈哈哈