朗文1.25 数据提取

朗文的html还是比较规范的,因此我的总体格式是 {
“word”: “de·mea·nour”,
“pos”: " noun",
“speakers”: [
“sound://media/english/breProns/demeanour0205.mp3”,
“sound://media/english/ameProns/laaddemeanor.mp3”
],
“defineList”: [
{
“englishExplain”: “the way someone behaves, dresses, speaks etc that shows what their character is like”,
“chExplain”: " 〔反映某人性格特点的〕举止,外表,风度",
“exampleList”: [
{
“audio”: “sound://media/english/exaProns/p008-000816347.mp3”,
“exampleEnglish”: “his quiet, reserved demeanour 他少言寡语、含蓄的举止”,
“exampleChinese”: " 他少言寡语、含蓄的举止"
}
]
}
]
}

然后我提取了 里面的内容,讲真的 html解析太慢了,我整了一天,但是相比于 txt 的正则匹配,那我宁可浪费些时间.
链接: https://pan.baidu.com/s/1RjMZQLvXC6yTBLqhNN_IKQ?pwd=3bdv 提取码: 3bdv 复制这段内容后打开百度网盘手机App,操作更方便哦
–来自百度网盘超级会员v10的分享

建立了四张表
DROP TABLE IF EXISTS row_word;
CREATE TABLE row_word(
ID INT(32) NOT NULL COMMENT ‘主键’ ,
CREATED_BY VARCHAR(32) COMMENT ‘创建人’ ,
CREATED_TIME DATETIME COMMENT ‘创建时间’ ,
UPDATED_BY VARCHAR(32) COMMENT ‘更新人’ ,
UPDATED_TIME DATETIME COMMENT ‘更新时间’ ,
word VARCHAR(255) COMMENT ‘单词’ ,
pos VARCHAR(255) COMMENT ‘词性’ ,
PRIMARY KEY (ID)
) COMMENT = ‘原始单词表’;

DROP TABLE IF EXISTS word_def;
CREATE TABLE word_def(
ID INT(32) NOT NULL COMMENT ‘主键’ ,
CREATED_BY VARCHAR(32) COMMENT ‘创建人’ ,
CREATED_TIME DATETIME COMMENT ‘创建时间’ ,
UPDATED_BY VARCHAR(32) COMMENT ‘更新人’ ,
UPDATED_TIME DATETIME COMMENT ‘更新时间’ ,
english_explain VARCHAR(255) COMMENT ‘英文定义’ ,
chinese_explain VARCHAR(255) COMMENT ‘中文定义’ ,
word_id VARCHAR(255) COMMENT ‘单词id’ ,
PRIMARY KEY (ID)
) COMMENT = ‘单词定义’;
DROP TABLE IF EXISTS word_voice;
CREATE TABLE word_voice(
ID INT(32) NOT NULL COMMENT ‘主键’ ,
CREATED_BY VARCHAR(32) COMMENT ‘创建人’ ,
CREATED_TIME DATETIME COMMENT ‘创建时间’ ,
UPDATED_BY VARCHAR(32) COMMENT ‘更新人’ ,
UPDATED_TIME DATETIME COMMENT ‘更新时间’ ,
word_id VARCHAR(255) COMMENT ‘单词id’ ,
mp3_path VARCHAR(255) COMMENT ‘单词路径’ ,
PRIMARY KEY (ID)
) COMMENT = ‘单词发音’;
DROP TABLE IF EXISTS example;
CREATE TABLE example(
ID INT(32) NOT NULL COMMENT ‘主键’ ,
CREATED_BY VARCHAR(32) COMMENT ‘创建人’ ,
CREATED_TIME DATETIME COMMENT ‘创建时间’ ,
UPDATED_BY VARCHAR(32) COMMENT ‘更新人’ ,
UPDATED_TIME DATETIME COMMENT ‘更新时间’ ,
audio VARCHAR(255) COMMENT ‘例子发音路径’ ,
example_english VARCHAR(255) COMMENT ‘例句英文’ ,
example_chinese VARCHAR(255) COMMENT ‘例句中文’ ,
def_id VARCHAR(255) COMMENT ‘定义id’ ,
PRIMARY KEY (ID)
) COMMENT = ‘单词例句’;

5 Likes

image
请问可否讲解一下如何使用

请问下是提取的朗文哪个?FF大的2.15吗?

是的,但是我发现数据不太全,然后结合了google api做的

主要是用来服务那些在线单词查询拥有自己的词库