You mean OP version? See hkreporter post. The link is blurred out by the OP, but if you clicked on the blur, it will clear and become a clickable link:
Thanks, but I mean your second modified version whose are you talking at #15.
Oh, I wasn’t talking about 2nd version there. I was explaining what I did to arrive at the 1st version.
Potential 2nd version features could be phrase / lemma extraction, but is not likely to be done. 1st version is adequate.
Right, thanks.
常用"double down"来检验一词典是否与时俱进与较全面。
这英美政治文章中的**常用词组**
我的理解是要爬的词头,是按行保存在文本文件里的。要爬取的时候,按行读取词头,爬完后以这个行号做为文件名,检查的时候也是检查行号的文件是否已存在,存在就跳过爬取。
he # 第0行
hello # 第1行
help # 第2行
读取词头的时候按行读取:
with open('words.txt','r') as f:
for idx, line in enumerate(f.readlines()):
print(idx, line.strip()) # 打印行号,词头
爬取保存的时候,使用行号 idx
做为文件名:
with open(str(idx) + ".html", 'w') as f:
f.write("content")
感谢指点!
使用数字命名确实兼容性更好,看来之前只是我还没碰上很复杂的词头,所以没报错。
I will learn to clean the definition part first.
Maybe Oxford Collaboration Dictionary is enough for me. Not in my plan at this moment.
Unabridged is the same as Dictionary dot com. Collins is also available. I think a separate RHL learners will be good. thanks.
I’m trying to split these 3 dictionaries, but I’m still working on it because of my limited coding ability.
The 3 titles split with links are in the link at #4 楼。
RHL learners is one of the few dictionaries that teach words with reference to the root of it. Thank you for your efforts. Already perfect.
I updated the separted version of RHUD, RHLD at 1#. Give it a try.
20240328: 词头+内容去重,RHUD和RHLD的单独版本mdx
之后把信息补充到description里面再删掉
回头我把音标词条补充进去
官网原有内容我就尽量不去动了(虽然我也不知道有什么作用),感兴趣你倒是可以用正则去掉
Please write codes here for hiding dictionary name at the top.
You can add below code into css file packed in mdd.
For hiding dictionary name
.small1{
display: none
}
And for hiding separator line
.entrySeparator{
display: none
}
I’ve updated css file into mdd already to hide both dictionary name and separator line.
It should work fine with the new mdd file.
通过css顺便解决了
您好,多谢制作分享。发现一个bug,查单词language,单击右侧的语义标签computing,却挑出了Computinga词头,把Computing和a粘连在了一起。类似问题在最新的两部词典中都存在,希望解决一下。多谢了。