牛津初阶英汉双解词典的制作

初阶词典的主要问题是做词典的人自己基本用不上,去为一本自己不用的词典耗费时间,修改几百数千错误、排版格式等是困难的。而且作为电子辞典,牛津中阶、高阶又不是不能用,它们都有中文解释,也是用简单英文词释义的,小学生同样可以用。

1 个赞

是这样的。论坛里中阶层次,目前制作最好的,也就算了朗文active了。[协作接力2.0] Longman Active Study Dictionary 5th 同类词中译

1 个赞

初阶词典还是有其价值的,可查词还可用于检测词汇量。牛津小词典,两万多词条,释义经典,可当作词表使用。论坛里很多人辛苦搞coca六万两万,不如直接用这本小词典。纸质书是软皮的,纸张厚实不透,印刷清晰,译文考究,极为理想。初阶词条八千多,比小学生多一倍。这三本词典,适合小学到大学使用。

3 个赞

确实是这样。期待后面有大佬跟进,如果将之前的语音弄进来集成 就完美了。纸版摸起来很舒服,不厚。有一种说法牛津对初阶是相当用心的。

用我之前那个脚本改了下插入的,有的好有的差,要仔细调哪些插入应该应用,哪些不应该应用。
脚本原理就是两个词条去除标签,然后进行对比,还原为插入,修改和删除,相当于程序判断哪些修改要应用,现在策略是所有插入都应用,其他都不应用,实际可以改成比如对于的换行和符号不应用,替换要考虑小段英文替换成另一小段英文加中文的地方。

这是效果图(我没下载mdd,所以图标无法显示)

另外发现词条对应不上,ocr版的l被错误识别为了I,导致两个I词条:
此外英文版多了以下词条:
[‘Jeep™’, ‘VDU’, ‘appointment’, ‘backpack¹’, ‘bulletin board’, ‘camp¹’, ‘candlestick’, ‘choose’, ‘clean²’, ‘code’, ‘color,colored,colorful’, ‘conference’, ‘contract’, ‘cow’, ‘crowd²’, ‘drink¹’, ‘entertainment’, ‘fairly’, ‘false’, ‘farther’, ‘fascinate’, ‘fascinating’, ‘fed’, ‘giraffe’, ‘hare’, ‘held’, ‘highlight’, ‘im-’, ‘inspiration’, ‘label²’, ‘lash’, ‘lb’, ‘leant’, ‘leapt’, ‘measurement’, ‘mid-’, ‘midnight’, ‘million’, ‘mind²’, ‘paint²’, ‘park¹’, ‘pay phone’, ‘peer’, ‘plant¹’, ‘pumpkin’, ‘rainforest’, ‘soaked’, ‘spat’, ‘spend’, ‘swop’, ‘tap²’, “the Qu’ran”, ‘then’, ‘true’, ‘usually’, ‘well²’, ‘went’, ‘wept’, ‘wheelchair’]
双解版多了以下词条:
[‘BMX’, ‘Blu-ray’, ‘CCTV’, ‘CD burner’, ‘CGI’, ‘COW’, ‘DT’, ‘DVD burner’, ‘FALSE’, ‘Facebook™’, ‘HD’, ‘IWB’, ‘Ib’, ‘JeepT™’, ‘LOL’, ‘SIM card’, ‘SWOP’, ‘TRUE’, ‘Twitter™’, ‘Wi-Fi™’, ‘address book’, ‘alliteration’, ‘alternative medicine’, ‘anorexia’, ‘antiseptic’, ‘antivirus’, ‘app’, ‘appendix’, ‘asteroid’, ‘autism’, ‘baguette’, ‘billboard’, ‘bin bag’, ‘bin liner’, ‘biodiversity’, ‘black hole’, ‘bling’, ‘blood group’, ‘bloodstream’, ‘bookmark’, ‘bookshelf’, ‘breakcrumbs’, ‘builetin board’, ‘call centre’, ‘canopy’, ‘cartridge’, ‘celebrity’, ‘charger’, ‘cheetah’, ‘chewy’, ‘chill’, ‘cleam²’, ‘climate change’, ‘clip art’, ‘clone’, ‘colloquial’, ‘color,colored,colorful’, ‘comic strip’, ‘compile’, ‘contract²’, ‘contract¹’, ‘control pad’, ‘coral’, ‘cubic’, ‘debut’, ‘desktop’, ‘dial-up’, ‘digit’, ‘digital television’, ‘direct speech’, ‘e-’, ‘e-book’, ‘e-reader’, ‘educated’, ‘emoticon’, ‘equation’, ‘exhausted’, ‘expire’, ‘fact sheet’, ‘factor’, ‘findings’, ‘fish finger’, ‘flat-screen’, ‘flip chart’, ‘flow chart’, ‘follower’, ‘font’, ‘food chain’, ‘foreman’, ‘founder’, ‘friction’, ‘fạscinate’, ‘gigabyte’, ‘go-kart’, ‘godchild’, ‘google’, ‘guide dog’, ‘hand-held’, ‘handset’, ‘hare/heə(r)’, ‘hash’, ‘hashtag’, ‘hat-trick’, ‘heldform’, ‘highlighter’, ‘highlight²’, ‘highlight¹’, ‘hip hop’, ‘hoarding’, ‘hoody’, ‘iPod™’, ‘iabel²’, ‘im’, ‘imam’, ‘inbox’, ‘interactive whiteboard’, ‘jargon’, ‘junk mail’, ‘keypad’, ‘landmine’, ‘layout’, ‘leantform’, ‘leaptform’, ‘left-click’, ‘leggings’, ‘literacy’, ‘long-term’, ‘measurementﺩ’, ‘megabyte’, ‘megastore’, ‘memory stick’, ‘metaphor’, ‘microblogging’, ‘mid’, ‘midmight’, ‘mindl²’, ‘misprint’, ‘misspell’, ‘mixed race’, ‘mmllion’, ‘mouse mat’, ‘multiplex’, ‘old-age pensioner’, ‘paint ²’, ‘panini’, ‘parlk¹’, ‘particle’, ‘payphone’, ‘peer²’, ‘peer¹’, ‘photoshop’, ‘pie chart’, ‘pixel’, ‘plasma
screen’, ‘podcast’, ‘procedure’, ‘raw material’, ‘reality TV’, ‘rearrange’, ‘recharge’, ‘reef’, ‘rename’, ‘reuse’, ‘right-click’, ‘router’, ‘scornful’, ‘search engine’, ‘selfishness’, ‘sequel’, ‘set book’, ‘sewer’, ‘short-term’, ‘show-off’, ‘silhouette’, ‘smartphone’, ‘smiley’, ‘smoothie’, ‘social networking’, ‘spatform’, ‘special effects’, ‘special needs’, ‘spellcheck’, ‘spellchecker’, ‘stir-fry’, ‘sudoku’, ‘sunblock’, ‘suncream’, ‘superfood’, ‘supermodel’, ‘tactics’, “the Qur’an”, ‘theme park’, ‘then/ðen’, ‘tonsil’, ‘top-up’, ‘topping’, ‘top³’, ‘touch pad’, ‘touch screen’, ‘trackpad’, ‘transplant’, ‘trilogy’, ‘tweet’, ‘twitter’, ‘upload’, ‘username’, ‘volcanic’, ‘wand’, ‘well’, ‘wentform’, ‘weptform’, ‘wheeIchair’, ‘whiteboard’, ‘wiki’, ‘wind farm’, ‘wind turbine’, ‘wireless’, ‘wok’]

我没时间仔细调了,楼主可以在这个脚本上改:
extractable_text.py (24.5 KB)

另外感觉有第五版了直接做第五版更好。

3 个赞

能把你插入中文的草稿发一个来吗,我看看手工休休。

这个很多中文是没有插入的,只是用来做程序示例的,标签插入位置也有问题,不建议在这上面修改。

Oxford Essential Dictionary.cn.mdx.zip (1.8 MB)

另外我看讨论说这个英文版本实际是Oxford Essential Dictionary第一版,对应的是中文版第三版,这也解释了为什么很多词对应不上。可能例句也是有差异的,所以匹配会出问题,我现在正在研究@amob提供的第五版数据,准备尝试制作第五版。

1 个赞

好的,谢谢,期待第5版