SUBLEXus词频表--美国口语语料基于字幕

Improved frequency measures based on American English subtitles
Brysbaert & New compiled a new frequency measure on the basis of American subtitles (51 million words in total).

介绍
  • The frequency per million words, called SUBTLEXWF (Subtitle frequency: word form frequency)
  • The percentage of films in which a word occurs, called SUBTLEXCD (Subtitle frequency: contextual diversity; see Adelman, Brown, & Quesada (2006) for the qualities of this measure).

The percentage of variance accounted for by these measures is significantly higher than the variance accounted for by Kucera & Francis, and Celex.

AccAll words
N=37,059 RTAll words
N=31,201
SUBTLWF 30.1 62.3
SUBTLCD 31.3 62.9
Source

单词表还处在收集单词本阶段, 这个属于副产品了. 有2个星期没搞是去学python了.

只有单词
subtlexus.txt (719.9 KB)

单词+频率
subtlexus-freq.txt (915.3 KB)

同上, json格式
SUBTLEXus.zip (422.4 KB)

1 Like

非常感谢, 我都打算不用这个词频了, 真找不到. :+1: :+1: :+1:

OK了,那我得删除了,源作者不希望传播。。。。

没有第一时间下载,结果错过了 :sob:

:grimacing: 既然如此, 我也不会传播, 谢谢