Forvo .DSL to Forvo .MDX (Conversion instructions)

Forvo Audios Conversion (.dsl → .mdx)

INSTRUCTIONS (Linux users)

You can find the Forvo audios and ready .dsl dictionaries for them here:

1. Create a folder (lets call it “Forvo folder”) with:

• Forvo audios for the target language (uncompressed) inside an “audio” folder. For french, for instance, the path should look like this:

…/Forvo folder/audio/fr/{forvo usernames}/…

• .dsl file for your language
• forvo_{language}.txt (blank file)
• title.html (just open it and write the name you want to be displayed for your dictionary
• description.html (you can add a description for the dictionary or leave this file blank)

2. Download this Pyglossary version with OctopusMdictSource support (attached below)

Pyglossary 4.6.1 (with MDX support).zip (896.1 KB)

Also install a interface:
• Tkinter-based interface

**Debian/Ubuntu:** apt-get install python3-tk tix
**openSUSE:** zypper install python3-tk tix
**Fedora:** yum install python3-tkinter tix
**Mac OS X:** read https://www.python.org/download/mac/tcltk/
**Nix / NixOS:** nix-shell -p python38Packages.tkinter tix

• Gtk3-based interface

**Debian/Ubuntu:** apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
**openSUSE:** zypper install python3-gobject gtk3
**Fedora:** dnf install pygobject3 python3-gobject gtk3
**ArchLinux:**
    pacman -S python-gobject gtk3
    https://aur.archlinux.org/packages/pyglossary/
**Mac OS X**: brew install pygobject3 gtk+3
**Nix / NixOS:** nix-shell -p pkgs.gobject-introspection python38Packages.pygobject3 python38Packages.pycairo

If you already have some version of Pyglossary installed, you can just add the OctopusMdictSource plugin (attached below) to …/Pyglossary_folder/pyglossary/plugins

octopus_mdict_source.py (4.7 KB)

3. Install mdict-utils

pip install mdict-utils
or
pip3 install mdict-utils

  1. Open your terminal, go to the Pyglossary folder and open the application

python3 pyglossary.pyw

The Pyglossary UI should open

  1. For the input file field, select the .dsl file in the Forvo folder. The input format is ABBYY Lingvo (.dsl)
    For the output field, select the .txt file in the Forvo folder. The format is Octopus Mdict Source

  2. The .mdx source for the .dsl was generated. Now you have to manually fix the paths to the audio files. Using a text editor and regular expressions [RegEx] (i use the Kate text editor), open your “forvo_{language}.txt”:

• FIND: \[s\]
• REPLACE: <a href="sound://

• FIND: //{language_code}/(.*?)/(.*?).opus\[/s\]
• REPLACE: //{language_code}/\1/\2.opus">\2</a>

Instead of {language_code} you should use the code for your language, like “fr” for french or “ru” for russian, just like the name of the zip file with the audios

7. [OPTIONAL] you can link a .css to it, so that further customization is possible

• FIND: <div style=(.*)
• REPLACE: <link rel=“stylesheet” type=“text/css” href=“style.css” />\n<div style=\1

It may take like a minute or so to apply the changes
If you are going to do this, create “style.css” inside the Forvo folder

Before the final step, your Forvo folder should look something like this:

8. Now you just have to compile the .mdx and the .mdd using mdict-utils

• COMPILING THE .MDX
Open your terminal inside the Forvo folder and type:

mdict --title title.html --description description.html -a forvo_{language}.txt forvo_{language}.mdx

• COMPILING THE .MDD

mdict --title title.html --description description.html -a audio forvo_{language}.mdd

Now you can delete everything, except the .mdx, the .mdd and the .css

You can rename the .mdx and the .mdd as you wish, and also add a icon (.png/.jpg, etc.), but be sure .mdx, .mdd and icon have the same name

WARNING: if you want to customize the dictionary using the .css, don’t ever rename it. The .css name is already specified within the compiled .mdx file


Of course the ideal would be creating a script to automate this whole process, but unfortunately I’m not a programmer and have little experience creating python scripts.

I’m not a Windows user so I can’t say much about the process there, but both Pyglossary and mdict-utils are available there.
For Windows, you can alternatively use MDX Builder instead of mdict-utils, but honestly I think mdict-utils is much better and faster

These instructions are for the .opus audio files. If you want to use .mp3 audio, just replace .opus with .mp3 in the RegEx steps and use the .mp3 audio folder. I don’t see any reason for doing this. since the .opus files are high quality and take much less space.

RegEx notes:

• on Kate, \1, \2, … are used to paste what was captured with (.*?) or (.*) in the “FIND” field, following the order respectively. (.*?) is a non-greedy capture group

• On Linux, \n is used for linebreaks, but I heard you should use \r\n on Windows


Following these same guidelines, I made Forvo French (MDX) and Forvo Persian (MDX):

• Link to the Forvo Persian (MDX): https://cloud.freemdict.com/index.php/s/M7BzW2PAWDBF5k8

• Link to the Forvo French (MDX): https://cloud.freemdict.com/index.php/s/8my4FYLCcGm2yed

6 个赞

Thank you very much ! Great job ! You are really kind… I hope more people would be able to create their favorite Pronunciation Dictionaries…!

1 个赞