中文詞彙網路 (Chinese Wordnet, CWN)

中文詞彙網路 (Chinese Wordnet, CWN)

  • Format: MDX
  • Unique headwords: 10368
  • Scraping date: 09.2023

Download link: FreeMdict Cloud

OBS: Someone had already made a MDX version of 中文詞彙網路, but I didn’t like the way the dictionary was structured: many tables with many empty cells and it wasn’t easy to identify things just with a quick glance. so I decided to make my own version, structured similarly to the online query in the LOPE website. Also, this old version has around 8110 unique headwords, less than what I got with by scraping it recently.

中文詞彙網路(Chinse Wordnet,CWN),是一項試圖解決詞義(sense)以及詞彙語意關係(lexical semantic relations)的語言知識資源。中文詞網的核心元素是中文詞彙的同義詞集(synsets)以及連繫各詞集的語意關係;透過語意關係,將各個同義詞集連接起來,形成語意網絡。
中文詞彙網路累積了近二十年的研究成果,起初由中研院語言學研究所推動,於 2010 年完成。臺大語言所目前負責維護中文詞彙網路,並著力開發更廣泛且靈活的工具資源,例如:XML、SQLite格式、WordNet LMF資料格式等等。在數位人文的時代,附有標記的語料資料常是計算語言學與自然語言處理不可或缺的資源,更多語言的詞彙網路建立,也讓這項資源成為跨語言的研究素材。
中文詞彙網路收錄範圍為實詞(open class),亦即名詞、動詞、形容詞及副詞。第一次使用中文詞網,可以參考新手教學、快速上手/小工具。“

Check the handbooks for more information about how this dictionary works and what each mark means. They are also included in the .zip file.

WARNING: Don’t rename the .css file or the font file. If you change anything, they won’t work properly, unless you unpack the .mdx file, change the link to the .css and compile it again.

By: 「水」a.k.a. Bogozarnyj a.k.a. Scho a.k.a. Зун a.k.a. Shuibogo


11 个赞

感谢制作这本,这本是一部不可多得的小型语料库,发现官网是写的词条数是29321,而这个mdx版为10368,不知为何少了不少,是不是官网的统计包含了词条的各种变体?

1 个赞

When studying the corpus I noticed the same thing, and my conclusion is that this number refers to something else. On their website they provide the dataset (from 2022.08), and the number of unique lemmas there is something around 8.000, and as far as I know each unique lemma corresponds to an unique headword. Also, I used not only these lemmas as a list of words for scraping, but also around 100000 headwords from the Taiwan MOE Recompiled dictionary, which is a nice coverage, but for like 94% of these headwords there were no entries in the CWN. The headwords from the previous MDX version of 中文詞彙網路 are also all included, and these 10368 unique headwords is everything I could get. So this “29321” number is weird…
Another thing to notice is that a considerable amount of entries does exist on the website, but they are empty, 0 content. I deleted all the empty entries from the .MDX file. Maybe these empty entries are included among these “29321” entries.

1 个赞

OK got it. Thanks for your kind reply~~

1 个赞

谢谢分享好资源!希望谁能更改一下css,字体改小一点,段间距改的紧凑一点!

1 个赞

换了一个浏览器,回复都这么难