En.wiktionary.org mdx 20231001 (10月数据完成)

Yes, that would solve part of the duplicate problems, but even then, you have to preprocessing the entire data first (and copy it somewhere like another 700GB).

1 个赞