牛津高阶英汉双解词典 第4版 文本校改(2022-02-20更新)


可能制作时 unicode 还未支援 IPA.。所以选用金山字体方案。

1 个赞

谁有pdawiki的账号,帮我转载一下 ⚜ 打造完美《牛津高阶双解 第四版》(2019.11.11更新) - MDict 词库资源区 - MDict Dictionaries - 掌上百科 - PDAWIKI - Powered by Discuz! 帖子里“1250楼,1263楼,1281楼” achallan网友贴的单词粘连数据?我看不到附件。这些错误我也可以设法用程序/代码扫,但既然已经有人做了这个工作,就不用重复劳动。



1 个赞

我用得是 lgmcw 的原始说法,其实在上下附近,可能因为删帖楼层变化了一些。

wrong.txt (43.5 KB)
wrong.def.txt (159.3 KB)

1 个赞


2 个赞


Unicode 1.0 是1991年发布的,其中包括 IPA Extensions 字符89个。IPA用的字符比DJ、KK这种专门表示英语发音的音标所需要的字符多多了。牛高4的繁体光盘大概是1994年左右才制作的,所以这个锅不应该unicode背,我想应该怪罪微软操作系统和字体公司。unicode有规范了,字体公司未必会给你搞个全字库,至今尚且如此,汉字字体里能把现在Unicode里的汉字弄全的据我所知只有日本的花园明体和台湾个人制作的全宋体。


2 个赞

1991 年发布,不代表当时流行的作业系统就有支援。
2000 年出的 windows me 都还是 ANSI 架构。
2001 年出的 windows xp ,unicode 才算开始普及。

Windows NT was the first operating system that used “wide characters” in system calls. Using the UCS-2 encoding scheme at first, it was upgraded to UTF-16 starting with Windows 2000,

「Unicode補完計畫」修改作業系統中的字碼表,處理Unicode和非Unicode字碼的對應。「Unicode補完計畫」首先是在以Unicode架構為核心的微軟Windows NT(包括Windows 2000和Windows XP)作業系統上開發,之後又支援了以ANSI架構為核心的Windows 98和Windows Me


2 个赞


研究了一下,发现原初公布牛高4 txt文本的人是个nasty的精神变态,他不但故意从中间删了L开头的词条,也删了the开头的词组,还故意删了B开头的一些长词条,从中间突然截断,以及后面随机几个长词条的内容。我原先以为这些缺失可能是提取失误,其实不是,是故意所为,因为如果是技术失误,此类缺陷就是随机分布的,但此文本中的缺陷明显是刻意操作的结果。网上混了N多年了,第一次确认还有这种变态玩意,真是大开眼界。

3 个赞

发现这个文本也不是独立提取的,跟其他网上的牛高4 txt是同一渊源,突然截断的不完整缺文词条是一样的。





2)在修正单词粘连错误时,发现原始txt底本存在某些词条释义不完整、被中途截断的现象,设法清查(比如用正则 [^.?!'] \n★ 搜索),找到以下所列词条大量阙文,于是从yru版插图牛高4 mdx中补充缺失文本。


back bad balance bar beat beg begin believe bend best better between big bind bit bite block blow board body boil bone bottom bread break breath bring business but buy






up use

value variety view virtue vision viz voice volume


3)补充全文缺失英镑符号 £ 400处左右,修正若干ä、ç、ô、∨、∧等形式的特殊符号。

1 个赞


与我收集的某一牛高4 mdx(可能是yru版)解出文本( 见 https://github.com/mahavivo/english-dictionary/tree/master/OALD4 )程序对比,加人工核实,初步统计校改文本缺失词头(headword)如下所列:

acerbic | acetic | -ade | adjacent | admonish | aftermath | afters | -age | AIDS | -al | -an | -ana | -ance, -ence | -ancy, -ency | aniseed | -ant, -ent | anticipation | antiknock | antimony | antipathy | arcane | -ard | -arian | artisan | artiste | -ary | -ate | -ation | -ative | -ator | augur | august | August | auricle | auriferous | balls | bead | become | -bellied | besom | biceps | biro | birthday | bisect | -bodied | bop | buckler | by- | carcinogen | cheers | cheery | -cide | cirrhosis | cirrus | cissy | Cistercian | cistern | citadel | cite | comely | contentious | contest | context | CONTRA | COORDINATE | CO-ORDINATE | -cracy | -crat | -cy | -d | Dail Eireann | despot | -dom | -ectomy | -ed | -ee | -eer | 'em | embrace | -en | -ence | -ent | -er | -ery | -ese | -esque | -ess | -ette | -ey | falsetto | falsify | federal | fo’c’s’le | -fold | fortunate | -ful | future | -fy | -gamy | garlic | gnome | -gon | -gram | gram | -graph | -graphy | gully | ha’p’orth | -hood | hormone | -ial | -ian | -iana | -iatrics | -iatry | -ible | -ic | -ics | -ide | -ie | -ify | -in | indispensable | indoors | institution | instruct | -ion | -ise | -ish | -ism | -ist | -ite | -ition | -itis | -ity | -ive | -ize, -ise | L/Cpl | lights | -mania | -ment | metallurgy | -meter | -metre | misgovern | -most | mynah | ne’er | ne’er-do-well | -ness | -oid | op cit | -or | -ory | -ous | overshoot | PARA | -path | -pathy | -philia | -phobia | -phone | prepuce | re-cover | ritual | ruble | -ry | -scape | -scope | -ship | -shooter | -sion | -some | -speak | -sphere | -ster | sulfide | syndrome | -th | times | -tion | toccata | toils | 'un | -ure | Virginia `creeper | vv | -ward | -ways | -wise | -xion | -y | -acy


与endnote 2022新春版mdx对比,去除和“yru版”重复的数据后,单独查核出来的的“缺失”词头如下:

-able | -acy | agr- | aide-mémoire | -ance | -ancy | -ant | anthrop- | appliqué | après-ski | arête | attaché | aut- | blasé | bric-à-brac, bric-a-brac | bête noire | café, cafe | canapé | cardi- | cent- | chargé d’affaires | chron- | château | cliché | communiqué | compère, compere | consommé | coupé | crypt- | crème de la crème | crème de menthe | débutante | début | dem- | derm- | diamanté | débâcle, debacle | décolleté | décor, decor | déjà vu | démarche, demarche | déshabillé | détente | electr- | entrepôt, entrepot | entrée, entree | Eur- | exposé, expose | façade, facade | flambé | fête, fete | glacé | -graphical | gruyère | habitué | hect- | hex- | -ian | ingénue, ingenue | -ion | is- | -ize | kümmel | lycée | manqué | matinée | -metre | mise-en-scène | mélange | ménage, menage | mésalliance | métier | mêlée, melee | necr- | neur- | née, nee | orth- | outré | pant- | passé | path- | phil- | phon- | physi- | pied-à-terre | pietà | pièce de rèsistance | première, premiere | prot- | précis | pseud- | psych- | pâté | quadr- | raison d’être | recherché | retroussé | risqué | rosé | roué | résumé | röntgen | sauté | sept- | señor | soigné | soirée, soiree | son et lumière | table d’hôte | tel- | tetr- | the- | therm- | touché | tête-a-tête | Virginia creeper | vis-à-vis | à la carte | à la mode | éclair | éclat | élite, elite | émigré, emigré | épée, epee



2 个赞


-ability, -ibility | -ability, -ibility | -able, -ible | -able, -ible | -ably, -ibly | -acy | -ade | -age | -al | -ally | -an | -ana | -ance, -ence | -ance, -ence | -ancy, -ency | -ancy, -ency | -ant, -ent | -ant, -ent | -ard | -arian | -ary | -ary | -ate | -ately | -ation | -ative | -atively | -ator | -bedded | -behaved | -bellied | -bellied | -bodied | -born | -bound | -brimmed | -burger | -cheeked | -chested | -cidal | -cide | -cornered | -cracy | -craft | -crat | -cratic | -cy (also -acy) | -d | -decker | -deep | -dimensional | -dom | -eared | -ectomy | -ed (also -d) | -edged | -ee | -eer | -en | -ence | -ent | -er | -ery (also -ry) | -ese | -esque | -ess | -ess | -ette | -ey | -eyed | -faced | -faceted | -fired | -flavoured (US -flavored) | -fold | -footed | -footed | -footer | -footer | -former | -free | -friendly | -ful | -fy | -gamous, -gamously | -gamy | -goer | -gon | -gonal | -grained | -gram | -graph | -grapher | -graphic(al) | -graphy | -haired | -handed | -handled | -headed | -hearted | -heeled | -hipped | -hood | -hued | -humoured (US -humored) | -ial | -ially | -ian (also -an) | -iana (also -ana) | -iatric | -iatric, -iatrical | -iatrics | -iatry | -ible | -ic | -ical | -ical | -ically | -ics | -ide | -ie | -ify (also -fy) | -ily | -in | -in-chief | -iness | -intensive | -intentioned | -ion (also -ation, -ition, -sion, -tion, -xion) | -ise | -ise, -ize | -ish | -ism | -ist | -ist | -ite | -ition | -itis | -ity | -ive | -ization, -isation | -izationally, -isationally | -ize, -ise | -man | -man | -mania | -maniac | -mannered | -manship | -manship | -masted | -ment | -mental | -mentally | -mentioned | -meter | -metre (US -meter) | -most | -mouthed | -natured | -ness | -nosed | -oid | -orientated | -ory | -ous | -path | -path | -pathic | -pathy | -phile (also -phil) | -philia | -philiac | -phobe | -phobia | -phobic | -phone | -phonic | -pronged | -raiser | -resistant | -rimmed | -roomed | -ry | -saving | -scape | -scope | -scopic(al) | -scopy | -seater | -sexed | -shaped | -ship | -shooter | -shy | -sick | -sided | -sighted | -sion | -sized | -skinned | -skulled | -sleeved | -soled | -some | -sounding | -speak | -sphere | -spheric (also -spherical) | -spirited | -spoken | -stemmed | -ster | -suited | -syllabled | -tailed | -tasting | -tempered | -th | -throated | -tiered | -tion | -toned | -tongued | -ure | -voiced | -waisted | -ward | -wards (also esp US -ward) | -ways | -wheeled | -wheeler | -wide | -willed | -wise | -witted | -woman | -worthy | -xion | -y | -y (also -ey)

the ,South Pole | the ,Supreme Being | the Almighty | the Antarctic | the Antarctic Circle | the Arctic | the Arctic Circle | the Ark of the Covenant | the Authorized Version | the Black Country | the Black Death | the Blessed | the Blessed Sacrament | the British | the British Isles | the Broads | the Bronze Age | the Christian Era | the Church of England | the Civil Service | the Common Market (also the European EconomicCommunity) | the Communist Party | the Conservative Party | the Coptic Church | the Corn Laws | the Dark Ages | the Dark Continent | the East End | the Eastern Bloc | the English Channel (also the Channel) | the Eternal City | the European Economic Community (abbr 缩写 EEC) | the Far East | the Far West | the First World War (also World War I) | the Foreign and Commonwealth Office (abbr 缩写 FCO) | the Grand National | the Great Bear | the Great Lakes | the Great War | the Green Party | the Gulf Stream | the Holy City | the Holy Father | the Holy Ghost | the Holy Grail | the Holy Land | the Holy See | the Holy Spirit (also the Holy Ghost) | the Home Counties | the Home Guard | the Home Office | the House of Commons (also the Commons) | the House of Lords (also the Lords) | the House of Representatives | the Houses of Parliament | the Immaculate Conception | the Industrial Revolution | the Infinite | the Iron Age | the Iron Curtain | the Jolly Roger | the Last Judgement | the Last Supper | the Metropolitan Police (also the Met) | the Middle Ages | the Middle East | the Middle West | the Midlands | the Midwest | the Milky Way | the National Debt | the New Testament | the New World | the North Country | the North Pole | the Old Testament | the Old World | the Olympic Games | the Open University | the Orthodox Church (also The Eastern Orthodox Church) | the Peak District | the Pilgrim Fathers (also the Pilgrims) | the Pleistocene | the Pliocene | the Press Association (abbr 缩写 PA) | the Queen’s English | the Redeemer | the Revised Standard Version | the Roman alphabet | the Security Council | the Social and Liberal Democrats (abbr 缩写 SLD) | the Son of God, the Son of Man | the Spanish Main | the Stars and Stripes | the State Department | the Stone Age | the Supreme Court | the Supreme Soviet | the Union Jack (also the Union flag) | the United Kingdom (abbr 缩写 (the) UK) | the United Nations (abbr 缩写 (the) UN) | the United States (of America) (abbrs 缩写 (the) US, USA) | the Upper Chamber (also the Upper House) | the West Country | the West End | the White House | the Wild West | the women's movement | the absolute | the accused | the ancients | the armed forces, the armed services | the assured | the beau monde | the bereaved | the blind | the body politic | the burden of proof | the class struggle (also the class war) | the damned | the deceased | the deep South | the departed | the dispossessed | the dog-star | the elect | the electric chair | the evening star | the fair sex | the faithful | the fallen | the few | the fine print | the first person | the foregoing | the former | the front bench | the front line | the generation gap | the golden mean | the gripes | the handicapped | the high jump | the high sea (also the high seas) | the holy of holies | the home front | the home straight (also esp US the home stretch) | the homeless | the human race | the impossible | the inevitable | the infirm | the initiated | the insane | the insured | the jet set | the jitters | the just | the kiss of life | the last post | the last rites | the many | the metric system | the middle distance | the midnight sun | the military | the missing | the money supply | the morning star | the next | the northern lights | the occult | the old | the old country | the old guard | the once | the open | the open sea | the open season | the oppressed | the poor | the priesthood | the rag trade | the rank and file | the ravages | the rich | the rising generation | the roadway | the sack | the sack | the safe period | the same | the sandman | the second coming | the seventh day | the small hours | the solar system | the solar year | the sterling area | the sulks | the supernatural | the synoptic gospels | the unconscious | the undermentioned | the underprivileged | the undersigned | the unemployed | the unexpected | the unwaged | the unwary | the utmost (also the uttermost) | the vitals | the weak | the wicked | the working class (also the working classes) | the yellow press

(right) up one’s street | April Fool’s Day | L/Cpl | Virginia creeper | a few | a fifth column | a gentleman’s agreement | a mite | a sword of Damocles | agitation | askance (at sb/sth) | be taken with sb/sth | cent(i) | incompre-hensibly | lawcourt (also court of law) | like a ton of bricks | make bricks without straw | on the quiet | take the bread out of sb's mouth | will-o-the-wisp

2 个赞


1 个赞