DWDS ( Collaboration to create .mdx)

tovaremeterio · 2021 年7 月 19 日 23:16

Is someone also interested in scraping this German Dictionary ?

I am looking for volunteers to work together.

Please send a message if interested.
Email: tovaremeterio.56l9g@simplelogin.fr

An .mdx of DWDS was made but is not perfect. Works very well on Android “MDict” but not so well on GoldenDict PC. If interested send a PM.

James1 · 2021 年7 月 20 日 05:07

How to make a web scraping on this website with Beautiful Soup in python to make a dictionary?https://pypi.org/project/beautifulsoup4/

tovaremeterio · 2021 年8 月 23 日 16:42

A user from Telegram Group provided this info useful for anyone who would like to scrape and create an .mdx

Download all entries as web pages then edit them through notpad++ then merge all html. You make a txt file after combining the edited html files then convert with mdx builder.
You have to know a little bit something about regex. Notepad ++ can edit thousands of files at the same time. To download all the entries you need wget. To get the webwite links for entries you view source the Web pages that contain the entries then delete another infos with regex (you keep only the links to the entries) You may need to do this for the sub pages. To merge the files at the end you need powershell or third party Programm. You can also download the audio and link it with wget and regex. You don’t need to be programmer. Sorry if my description isn’t clear but these are the basic steps.

tovaremeterio · 2021 年8 月 23 日 16:48

Here is the raw data from DWDS if someone would like to make his/her own version for .mdx

James1 · 2021 年9 月 9 日 11:46

Thanks for your beautiful guideline. I don’t know what wget and regex is, though.