Search inside Offline Wikipedia Full Text (exe+txt)

I downloaded entire English Wikipedia dump and extracted all paragraphs into separate sentences inside a single 15 GB plain text (uploaded is compressed to 4 GB) file. Using the simple application, you can find at most 1000 sentences containing the phrase.

Download Wiki-Finder

3 个赞

There appears to be 4.89 million duplicate lines, or approx 450MB in file size wasted.
image

Wiki_Single_Sentences.exe 这个程序运行不了呢?

What is the error message?

Could you please say the exact defect? So that I can work on it.

There are millions of identical sentences in the Single.txt file. You can use emeditor (30 days trial) to dedupe the file. Maybe Notepad+ can do the same.