购买要395美元,太贵了。
我用google搜了搜没有搜到免费途径。
只搜到了一个替代
样例是只以m开头的
250欧元
1 Like
非常感谢,我看了下,是按照字母排序的,有没有现成的按照频率排序的(如果没有的话我就等会写段代码处理一下)
说实话,coca这样的词表作用不大。一般人掌握两万就可以了,这两万还不能靠背词典来达到。背词典只有短期记忆效应,长期还是没用。
不是背词典,是背ngram搭配
ok, i have finished the python code to sort this dsl file.
2 Likes
skip_words = [
"the", "be", "and", "of", "a", "in", "to", "have", "it", "i", "that", "for", "you", "he", "with", "on", "do", "say", "this"
]
with open('N-grams-2.dsl', 'r', encoding='utf-16-le') as file:
lines = file.readlines()
# 将行分割成数组,每6行作为一个元素
n = 6
groups = [lines[i:i+n] for i in range(0, len(lines), n)]
del groups[0] # 不知道为什么有个\ufeff
result = []
for group in groups:
first_line = group[0].strip() # 取第一行 最后有个换行符
if first_line == '————————':
continue
first_line_split = first_line.split()
word1, word2 = first_line_split[0], first_line_split[1]
if word1.lower() in skip_words or word2.lower() in skip_words:
continue
last_line_last_part = int(group[-1].split()[-1]) # 取最后一行最后一个空格后的部分
result.append((first_line, last_line_last_part))
result.sort(key=lambda item: item[1], reverse=True)
with open('output.txt', 'w') as output_file:
for item in result:
output_file.write(f"{item[0]}, {item[1]}\n")