原帖: O X ten raw_data
创建本帖用于:梳理原帖数据。故新帖左上角选择“赞踩投票”将内容自动分类、筛选。
回复前:
- 请检查原帖,或已有楼层,在楼中楼回复。(每层楼左下角的“添加评论”)
- 检查不存在后再开新楼。
谢谢大家。
原帖: O X ten raw_data
创建本帖用于:梳理原帖数据。故新帖左上角选择“赞踩投票”将内容自动分类、筛选。
回复前:
谢谢大家。
原材料 OALD_10th_data.7z (15.4 MB)
技术研究:mdx 格式组装示例
mdx
f
<a href="sound://sentence_mp3/__em%23_gbs_1.wav">ddd</a>
<img src="ill/fruit_misc.jpg"></a>
<a href="sound://word_mp3/5p%23_gb_1.mp3">b</a>
</>
地址要 urllib.parse.quote(name),否则找不到文件。用mdx私有发音协议就不用js了,audio 选 QT multimedia(ffmpeg测试未通过,tested on goldendict-ng)
mdd格式说明:(默认mdd为必要资源;非必要可选资源:1为图片,2为单词发音,3为例句发音(之前打包成1了,直接改名成3即可)
数据还原,源数据在二楼,共40970条,
key: {
和 }
首尾两行,即40970,满且相同。 1 {
2 "data": {
3 "o10dict": {
4 "id": {
5 +---122910 lines: "u596c17338875400e.30be04e6.154e2615987.3466": {······································································································
122915 },
122916 "word": {
122917 +---122790 lines: "-ie": {··············································································································································
245707 },
245708 "word_body": {
245709 +---8973718 lines: "o10dict": {·········································································································································
9219427 }
9219428 }
9219429 },
9219430 "status_code": {
9219431 +--40972 lines: "0": {··················································································································································
9260403 },
9260404 "message": {
9260405 +--40972 lines: "\u6210\u529f": {·······································································································································
9301377 }
9301378 }
想完善数据的,需要 APP 对照这几个词。
数字即行号,对应源文件文本行号。这里是单词顶部栏(单词发音之前,词头+单词表分级标志的地方),这11个单词有什么特别之处。
update: bud 暖心提供的截图
例句发音 mp3
aria2c --input-file='./../eSP_urls_aria.txt'
发音分类:英美、强弱
±-----------40388 lines: “BrE”: {··············································································································································
±-----------40383 lines: “NAmE”: {·············································································································································
±----------- 86 lines: “NAmE also”: {··········································································································································
±----------- 87 lines: “BrE also”: {···········································································································································
±----------- 55 lines: “EAfrE”: {··············································································································································
±----------- 53 lines: “SAfrE”: {··············································································································································
±----------- 11 lines: “WAfrE”: {··············································································································································
±----------- 3 lines: “BrE sometimes”:
±-----------40388 lines: “”: {·················································································································································
±----------- 35 lines: “strong form”: {········································································································································
±----------- 10 lines: “weak form”: {··········································································································································
±----------- 10 lines: “before vowels”: {······································································································································
±----------- 3 lines: “before names”: {·······································································································································
±----------- 5 lines: “before vowels and finally”: {··