En.wiktionary.org 20260327 更新

meandmyhomies · 2025 年10 月 22 日 14:57

It is on-demand and therefore not periodic and stale.

There are still a small number of duplicates (older versions of the same article)

I have a way to drop them as part of data cleansing.

Akira · 2025 年10 月 22 日 17:15

How about parsing data from the ZIM files? They are updated regularly.

meandmyhomies · 2025 年10 月 22 日 20:11

I think zim files dropped all quotations and may not have images and audio links. I do remember converting the wiktionary from zim but that was what I found.

meandmyhomies · 2025 年10 月 24 日 21:39

I have used (title,id,url) as primary key to keep only latest date_modified as a stringent dedupe criteria. The result is a whopping 14% reduction (or duplication) rate by volume.

This means the snapshot chunking algorithm is really faulty (in retrieving data regardless if it was the current version) A full 14% of its data by volume is duplicate or stale.

Correction: zh.wikipedia has approx 14% stale duplicate content.

en.wikipedia so far has 0% stale duplicate content. They fixed this bug as of 23rd. Perhaps zh.wikipedia downloaded on Oct 11th didn’t get the bug fix.

WalkingDictionary · 2025 年10 月 28 日 11:46

感谢分享？怎么打开例句？

hjtoh · 2025 年10 月 28 日 12:11

哪位可以清理一下里面的空序号。

brightd · 2025 年10 月 29 日 00:35

查某个词条，就往下面找，就会找到例句（注意：不是每个词条都有例句）。

Akira · 2025 年10 月 29 日 05:45

I thought the html dump is not available after 24 March 2025:

What is the data source from which you created the mdd/mdx of English and Chiness Wikipedia?

meandmyhomies · 2025 年10 月 29 日 06:16

That service was a convenience storage server around the commercial site serving the snapshots and live data maybe for profit (commercial domain).

Wikimedia Enterprise - APIs for AI, Search & Knowledge Graphs

Akira · 2025 年10 月 29 日 06:30

So you paid to that service?

meandmyhomies · 2025 年10 月 29 日 06:36

I didn’t, but they have paying customers. The data service is capped for free users.

meandmyhomies · 2025 年10 月 29 日 08:01

The AI replacement for human written Wikipedia is here:

Akira · 2025 年10 月 29 日 11:39

I guess free account is already good enough for our goals

meandmyhomies · 2025 年10 月 29 日 12:01

15 snapshot requests, even if requests failed, which is likely for large downloads, so 15 is only very low. plus it is IP based.

meandmyhomies · 2025 年10 月 29 日 13:29

修复了多余的序号问题。

另外去掉多余的Category: Prefix

hjtoh · 2025 年10 月 29 日 13:58

many thanks

meandmyhomies · 2025 年10 月 31 日 02:16

更新了css（小mdd）
MDict_2KcrjSwd89

meandmyhomies · 2026 年4 月 9 日 00:20

新数据/新皮：

不再以MDictPC为基准。

hjtoh · 2026 年4 月 9 日 05:15

可以默认把例句收纳起来吗？

meandmyhomies · 2026 年4 月 9 日 09:54

应该是下一次更新