J̥H́-Recommendations - Dictionary Data - Open Data Collections


-- language: chinese --

除了扫描及OCR纸质书、破解App数据文件、爬虫抓取在线网页数据,我们是否应该更多的考虑如何利用已有开放的数据呢?思考如何从开放的数据中提取或者生成更有价值的数据?

比如从Corpora中生成当代词频以辅助判断单词的重要性或者从Corpora中提取词组以帮助运用单词。


The Incomplete List of Open Data Collections

Meant to be being updated list in the long run, starting for today.

Multilingual

1. Dict.cc translation data

https://www1.dict.cc/translation_file_request.php?l=e

Terms of Use - Highlights

  • Utilizing the data for personal use is granted provided that the data is not given away to third parties or published in any way
  • Programs using data of dict.cc must be subject to the terms of the GPL
  • Programs using data of dict.cc must be subject to the terms of the GPL

2. FreeDict project

Terms of Use - Highlights

  • Each dictionary has separate licensing terms which are located in the TEI header at the beginning of each dictionary file.
  • The majority of dictionaries is licensed under GPL.

Monolingual - English

Not-Really Free Open Group - English-Corpora and Its Allies






Therefore, converted to .mdx is generally not really intended. However, with mdict-analysis, GetDict and several other tools, probably .mdx could be thought as open source. (Not legally verified!)

Nevertheless, better than none.


Be polite and honored!
Don’t abuse the kindness of others.

1 Like

这有个已经基本可用的: https://glosbe.com

Dict.cc也是可以用的,而且有iOS上的App。我觉得上面的两个词典库的最大特色是允许下载数据,做进一步演化。

dict.cc的协议禁止第三方提供下载,禁止非GPL程序使用。
freedict的格式是TEI的,适合检索使用,虽然词典本身是GPL协议的,但可以供非GPL协议的软件使用。
glosbe.com的数据集成了多本开源词典,主要来源是wiktionary,允许任意分发修改使用无限制。

我又去找了一圈,还是没有找到glosbe.com哪里可以下载词典数据文件

glosbe.com确实没有,wiktionary有下载的。