语研所《现代汉语大词典》百科词条质量有待提高

前几天狠心花了一千大洋购入语研所《现代汉语大词典》。这两天随便翻了一下。一个基本的感受是,最好将这部词典定位于语言文字用法备考。词典在这方面做得很细致。但在百科词条方面,硬伤很多。主要体现在这几个方面——

  • 逻辑性硬伤:上位概念界定失当,逻辑归属错位
  • 时代性硬伤:释义脱离时代发展,科学性和专业性滞后
  • 系统性硬伤:编纂体例缺乏统筹,词群内部释义体例参差

建议百科知识直接查百度百科(维基百科固然更好,但百度百科也足以强过本词典)。

下面简单举几例这两天随便翻到的值得商榷的词条。


【阿尔法狗】阿尔法围棋。[提示]围棋,英文名称go。[英AlphaGo]

【阿尔法围棋】人工智能程序围棋手;机器人围棋手:[…配例…] 也译作阿尔法狗。

  • :词典释义将“阿尔法围棋”泛化为“人工智能程序围棋手”或“机器人围棋手”属于事实性与逻辑性双重错误。首先,上位概念归属错误,“阿尔法围棋”(AlphaGo)本质是计算机软件程序,而非实体机器人;其次,该词为专有名词,特指由Google DeepMind开发的特定人工智能围棋软件,在当前中文语境中并未出现词义泛化现象。

  • 改进建议:修改为特指释义:【阿尔法围棋】由Google DeepMind开发的一款人工智能围棋软件。


【癌】1. 上皮组织生长出来的恶性肿瘤,癌变的细胞生长快,常蔓延到附近或转移到远处的组织、器官,常见的有胃癌、肺癌……也叫癌瘤。2. 某些恶性肿瘤的俗称:淋巴~|血~|骨~。

【恶性肿瘤】肿瘤的一种,周围没有膜包裹,细胞异常增增生,形状、大小很不规则,与正常组织之间的界限不明显。能在体内转移,破坏性很大。癌和肉瘤都属于恶性肿瘤。

  • :“癌”的释义兼顾了狭义病理学定义与社会通俗用法,处理得当。但“恶性肿瘤”存在明显缺陷,释义将其描述为“周围没有膜包裹……形状、大小很不规则”的病变,这仅符合部分实体瘤的形态学特征,无法涵盖白血病、多数淋巴瘤等弥散性肿瘤。现代医学界定恶性肿瘤的核心并非形态结构,而是其恶性生物学行为。

  • 改进建议:以生物学行为重构释义,并扩充列举类型:【恶性肿瘤】肿瘤的一种,由异常增生的肿瘤细胞构成,具有侵袭周围组织、破坏器官结构和发生转移的能力。包括癌、肉瘤以及白血病、淋巴瘤等血液和免疫系统恶性疾病。”


【枸橼酸】柠檬酸的旧称。

【柠檬酸】有机化合物,无色结晶。……旧称枸橼酸。

  • :将“枸橼酸”简单定性为“柠檬酸的旧称”属于严重的事实性误导。二者实为同一化学物质在不同领域的规范名称:“枸橼酸”是我国《药典》规定的法定医药通用名称,广泛且强制应用于医疗、药品领域;而“柠檬酸”则是食品工业及日常生活的通俗叫法。若之后《现代汉语大词典》不幸与《现代汉语词典》一样,被出版物编校人员奉为圭臬,则此释义极易导致医药类出版物编审中将专业医药词汇误改为“柠檬酸”。

  • 改进建议:【枸橼酸】即柠檬酸,为医药及生化领域的法定通用名称。【柠檬酸】……医药及生化领域又称枸橼酸。


【硬盘】计算机上使用的以坚硬的旋转盘片为基础的磁盘……提示:近些年出现的固态硬盘也是硬盘的一种。

  • :释义严重滞后于技术演进。当前定义“以坚硬的旋转盘片为基础的磁盘”仅指机械硬盘(HDD)。虽然提示中补充了固态硬盘(SSD),但SSD基于闪存芯片,并无“旋转盘片”,直接与主定义相悖。在当代语境中,“硬盘”已演化为计算机大容量非易失性存储器的统称。

  • 改进建议:将上位概念上推,落在存储器这一本质上:【硬盘】1. 计算机上使用的大容量存储器设备。主要包括基于旋转磁性盘片的机械硬盘,以及基于闪存芯片的固态硬盘。2. 特指机械硬盘。


【新型冠状病毒】特指2019年在人体中新发现的冠状病毒,是目前已知的可以感染人类的第七种冠状病毒。简称新冠病毒。

【新型冠状病毒肺炎】特指2019年新发现的新型冠状病毒感染引起的肺炎。主要临床表现为发热、干咳、乏力,少数患者伴有鼻塞、流涕、咽痛、肌痛和腹泻的症状。轻型患者仅表现为低热、轻微乏力,无肺炎表现。重症患者发病后出现呼吸困难,严重者可快速进展为急性呼吸窘迫综合征。简称新冠肺炎。英文名为COVID-19(Corona Virus Disease 2019)。提示:根据病毒变异和疾病特征,2022年12月26日,国家卫生健康委员会将“新型冠状病毒肺炎”更名为“新型冠状病毒感染”。

  • :存在外文缩写缺失与医学概念滞后的问题。其一,既然疾病词条附带了COVID-19,病毒词条也应补充国际标准缩写SARS-CoV-2。其二,新冠病毒引发的是全身系统性感染,未必伴随肺炎。基于此,国家卫健委已于2022年底将其更名为“新型冠状病毒感染”。若释义仍以“肺炎”为上位概念,已不符合现代医学共识,并会将误解持续下去。

  • 改进建议:建议将主词条设为【新型冠状病毒感染】,界定为“由新型冠状病毒引起的传染病”,并客观描述其全身性症状;原【新型冠状病毒肺炎】可作为副词条收录,释为“新型冠状病毒感染的曾用名”。


【单簧管】管乐器,由嘴子、小筒、管身和喇叭口四部分构成,嘴子上装有单簧片。也叫黑管。

【竖笛】一种直式的木管乐器。起源于15世纪的意大利,盛行于欧洲各国。笛身有八孔或六孔,音域较窄,音色圆润。也叫直笛。

  • :1. 体例不一。后者以“一种”开头而前者无。2. 本词典已对社区词汇作适当标注,但该处未顾及。在台湾地区,“单簧管”常被称为“竖笛”。3. 描述陈旧。“嘴子”“小筒”这种称呼过于古早,早已不用。现在一般相应称为“笛头”“脖管”。

  • 改进建议:更改【单簧管】部件名称,并在释义末尾补充“台湾地区亦称竖笛”。同步修订【竖笛】词条,按大陆和台湾的两种不同指代进行多义项划分。


【口琴】一种乐器,一般上面有两行并列的小孔,里面装着铜质的簧,用口吹或吸小孔发出声响。

【柳琴】弦乐器,外形像琵琶,比琵琶小,有四根弦。俗称土琵琶。

  • :1. 体例不一。同上述【单簧管】与【竖笛】。2. “有两行并列的小孔”仅符合复音口琴的特征,无法涵盖当前广泛流行的布鲁斯口琴或半音阶口琴。

  • 改进建议:1. 统一乐器类词条的分类体例,如均使用“簧管乐器”“弹拨乐器”起头。2. 【口琴】去掉对孔洞排列的局限性描述,提炼核心机制:“一种吹奏乐器。一般呈长方形,内部装有若干金属簧片,通过口吹或吸气使簧片振动发声。”


【单晶体】原子按照统一的规则排列的晶体。具有一定的外形,其物理性质再不同方向一般各不相同。

【单晶硅】单质硅的一种形态。

  • 与评:释义缺乏联动。既然词典已收录上位概念【单晶体】,【单晶硅】的释义就不应泛泛表述为“单质硅的一种形态”。

  • 改进建议:增强词条间的逻辑自洽:【单晶硅】硅的单晶体形态。其物理特性……,可用于……。


【同性恋】同性别的人之间的性爱行为:他公开了自己的~身份|现代社会对于~已经日益理性和宽容了。

  • :上位概念界定过于狭隘,滞后于现代心理学与社会学认知。将其统摄于“性爱行为”不妥,因为同性恋本质上是一种性向,包含情感认同与性吸引,不局限于行为层面。词条配例“公开身份”指向的正是社会认同而非单纯的性行为。

  • 改进建议:扩大上位概念:【同性恋】同性别的人之间的情感依恋、性吸引或性行为。同时,词典中“异性恋”“双性恋”等相关性向词条也应依据此逻辑同步修订。此外,性爱等相关词条中,“异性之间的”等相关狭义描述应相应更改。

Generally speaking,
Language and corpus are affected by several factors.

  1. Language policy.
  2. Target audience/learning level.
  3. Region/Society.
  4. Age

The list colud go on.
And three things that are key factors that dictates what corpus can actually right.

  1. Insanity.
  2. Taboo.
  3. Institutional Ratification.

These factors allows or controls how language and knowledge propagates and works in society.
Every institiute must follow the ratification ditacted by power.

So, counting that factor, many corpus have no choice but to follow the rules, even though editors and creators want to add more, they cannot.
Where i live, Urban Dictionary is banned due to its taboo nature and government does not allow.

So, any corpus that may lack it today, but tomorrow a change in policy will allow it to add more. Even the West at some point was sever regarding what was allowed to say and what was not.

(Just my little knowledge i thought i shall share. I learned during my Linguitics Studies)

I largely agree with your point. The selection and treatment of corpus data are constrained by many factors, which is already evident from the examples I raised. My comments also carried a degree of critique toward this reality, even if I did not state it explicitly.

However, even within those constraints, this dictionary still had room to do better. In my view, beyond corpus-related limitations, another major factor is that the book still seems rooted in a traditional model of lexicography and publishing.

From the lexicographic side, it appears to have relied heavily on a small number of scholars compiling entries manually, likely with some computer assistance, but not enough to ensure consistent quality across the whole work. The capacity of any individual compiler is limited, and different compilers inevitably bring different assumptions, priorities, and stylistic habits. What I had hoped was that computational tools would help not only improve efficiency, but also reduce inconsistency in quality and style across entries. However, that goal does not seem to have been fully achieved here.

From the publishing side, it still looks like a conventional print-dictionary workflow: a draft is produced, printed out, manually marked up, and then revised. That kind of process makes careful version control difficult. The layout also appears to rely on an old Founder typesetting system, which affects visual presentation and may reduce overall editorial efficiency. It is also worth noting the long delay between the initial completion in early 2021 and the official publication in 2026.

I am not trained in linguistics; I read dictionaries only as an amateur interest. So some of the points above may reflect my own assumptions. I would be very grateful for any criticism or correction.

You raised fair points. It lagging behind other corpus and making room for improvement. Five years is a huge gap. A lot of improvement shall be made in that time window.

The another factor is what the company are definitely looking at is economic factor. Maybe they do not have enough to compete with other corpus.

And yes we live in a digital world, with internet making the whole world a golbal village, shrinking distance even virtually though. Knowledge propagates quite faster these days.

But again:

if something being traditional, well tradition has to preserved to prevent from alienation effect as far as the language is concerned.

Digital piracy. Even Oxford has not made available “Advanced Learners’ Thesaurus” in digital format which i yearn to make an mdx version of. I am about to get a print version of it. And it is going to cost me $150.

I will give my critique. Not criticism. Even you say you have no link to linguistics, you appear to have a quite good knowledge and sense judging from what you wrote in the first post.

I agree that the traditional model of lexicography is not entirely negative. It does help reduce digital piracy, and it can also preserve a sense of seriousness and authority.

But what I want to stress is that lexicographic compilation and dictionary publication are two different things and should not be conflated. Traditional print publication inevitably creates a long gap between finalization, review, and printing, which can leave a dictionary partly outdated by the time it is published. This cannot be eliminated entirely, but it can be partly reduced through digital proofreading, version control, and a more efficient revision workflow, for example by using an electronic review system based on version management.

More importantly, this dictionary explicitly aims to combine synchronic and diachronic perspectives, prescriptive and descriptive approaches, and scholarly and practical purposes: it seeks to dynamically record nearly a century of lexical change in modern Chinese, to break away from the usual static model of definition, to describe new words, new senses, and new usages in contemporary Chinese society, and to draw on recent advances in linguistics and semantics to improve practical value. That goal is reasonable, but such an ambitious “have it all” approach is difficult to realize through a traditional manual compilation model and a print-based publication process alone.

So my point is not to reject tradition, but to argue that if a dictionary still relies mainly on traditional compilation and publication workflows, it will be hard to fully achieve these multiple goals at once. For rapidly changing new words and new senses, the dictionary needs not only more precise definitions, but also more dynamic mechanisms for lexical selection, editing, and updating. For example, if it still includes internet buzzwords that were already popular when the draft was first completed but have since clearly faded from use, that suggests the final lexical screening could benefit from up-to-date frequency databases and corpus statistics.

So the issue is not just a few flawed entries. It is that the dictionary still relies too heavily on a traditional publishing logic and has not fully adapted to an era of rapid knowledge change. We may need to explore a new production model that better balances knowledge dissemination, long-term preservation, and the protection of copyright and intellectual property.