COCA Frequency 60,000 — 2020.mdx (v1.2, 2024.1.12)

图片

图片

图片

数据收集于2023.12(COCA自2019以来暂无更新)
Data fetched in Dec. 2023 (COCA hasn’t been updated since 2019)
仅供个人学习使用,请勿传播贩卖
PERSONAL USE ONLY, NOT FOR COMMERCIAL PURPOSE.

COCA Frequency 60,000 — 2020.mdx (1.1 MB)
COCA Frequency 60,000 — 2020.css (3.2 KB)

Version: 1.2
(Jan.12 2024, made by @Waylon)

  1. Adaptive width and reduced margin — more compact.
  2. Minor changes like font size, color and spacing.
  3. CSS commented in both Chinese and English.

Version: 1.1

(Jan.7 2024, made by @Waylon, inspired by @encn)

  1. PoS data now shows aside the headword as a tag.
  2. Overhauled for external stylesheet.
  3. CSS commented in Chinese.

Version: 1.0
(Jan.5 2024, made by @Waylon)

NOTE:
This material is for PERSONAL USE ONLY.
Free but WITHOUT ANY WARRANTY.
Includes only public data from:
The Corpus of Contemporary American English (Davies, Mark. 2008-).
Available online at English-Corpora: COCA.


为在手机上缩小宽度,建议隐藏Frequency以及"Rank: ",详见css内注释
Hiding frequency data and "Rank: " would be recommended on mobile platform for better experience. (see comments in .css file)

14 个赞

感谢。 之前那个6w是2016的 现在看起来频率翻了一倍 统计数据多了但不知道新加的语料质量如何 应该不至于越更新越退步吧
虽然挺简洁的但还是觉得独立出css比较好

1 个赞

楼主开辟了一个新纪元 :doge_gif:

链接是啥,不建议在这儿打闲鱼广告。

我是说 COCA license 的链接。

我不太确定,genre的数据应该是需要买这个:

collocation的数据应该是需要买这个:

这个应该是能买整个database,包括前面两个吧(可能)

感谢楼主付出,数据来之不易。

楼主的 txt 像是从 excel 文件转换来的,不太整洁。我提取排序后,写成了 json 文件,并重新打包了 css 外置的 mdx。

json & mdx

COCA Frequency.7z (1.7 MB)

5 个赞

楼主原文件配色和字体都挺棒的!麻烦也能匹配!
谢谢!

提个建议哈,可以把词头放在左边,右边并列放数据,PoS可以统一下宽度:
(我不这么排是因为不想缩写Rank和Freq,然后就很宽,不能并排,不然手机上会换行)
更改前:图片
示意图:图片

大佬改的蛮好的,可以把词典备注信息写进去继承一下,出个v1.0-xxx版

1 个赞

用MdxExport导不出来,用另一个python的东西导出来了。.json不太懂没用,有空再研究了。
那看看如果有下一版借鉴下你的版本,把你的名字加进去可以不(encn?)。
PoS这个简化确实不错,下个版本打算Rank和Freq也不写了,都用边框。

coca_freq.json 格式如下,方便程序读取。

[
  {
    "headword": "the",
    "content": [
      {
        "PoS": "article",
        "Rank": "1",
        "Freq": "50033612"
      }
    ]
  },
  {
    "headword": "be",
    "content": [
      {
        "PoS": "verb",
        "Rank": "2",
        "Freq": "32394756"
      }
    ]
  },
  {
    "headword": "and",
    "content": [
      {
        "PoS": "conjunction",
        "Rank": "3",
        "Freq": "24778098"
      },
      {
        "PoS": "noun",
        "Rank": "36203",
        "Freq": "220"
      }
    ]
  },
...
]
2 个赞

已更新:smiley::smiley::smiley:

1 个赞

This could be a good addition to sound++

如何配置成楼主#1的颜色和排版,特别喜欢。麻烦能发一个。谢谢!

你说的是v1.0还是?