Improved frequency measures based on American English subtitles
Brysbaert & New compiled a new frequency measure on the basis of American subtitles (51 million words in total).
介绍
- The frequency per million words, called SUBTLEXWF (Subtitle frequency: word form frequency)
- The percentage of films in which a word occurs, called SUBTLEXCD (Subtitle frequency: contextual diversity; see Adelman, Brown, & Quesada (2006) for the qualities of this measure).
The percentage of variance accounted for by these measures is significantly higher than the variance accounted for by Kucera & Francis, and Celex.
AccAll words
N=37,059 RTAll words
N=31,201
SUBTLWF 30.1 62.3
SUBTLCD 31.3 62.9
Source
单词表还处在收集单词本阶段, 这个属于副产品了. 有2个星期没搞是去学python了.
只有单词
subtlexus.txt (719.9 KB)
单词+频率
subtlexus-freq.txt (915.3 KB)
同上, json格式
SUBTLEXus.zip (422.4 KB)