185.000 German Pronunciations (raw data + script)

Raw Data : 185.000 GERMAN PRONUNCIATIONS FROM THE ENGLISH WIKTIONARY (3.5 GiB).

Download: 3.52 GB folder on MEGA

UPDATE: An .mdx dictionary was made with this raw data. See the new post HERE.


image

99.99% of audios are in .ogg format and are of very high quality. They are suitable to make a Pronunciation Dictionary for GoldenDict.

However, the audios can be used directly on GoldenDict after decompressing. Just Open the "Menu > Dictionaries > Sound Dirs " and choose the path of the folder containing all the sounds.

The naming of the sounds is very simple. For example, the German word “Haus” is named “Haus.ogg” .

Scraping was done on Linux after obtaining a .JSON file containing all the audio URLs:
https://kaikki.org/dictionary/German/index.html#:~:text=Alternative%20forms%20(4068)-,download,-Download%20JSON%20data

The English Wiktionary contains more than 900,000 audios in many languages (not only English). Those pronunciations can be scraped thanks to this Source Code:

1 Like

Suggestion: To obtain all the “headwords” I recommend to use CopyQ

You just select all the files in a folder (CTRL +A) and “CopyQ” stores all the names on the “Clipboard History”.

Then, you can copy into a .txt file the name of all audios. That would allow to make an .mdx very quickly…

Gute Idee.