仅支持 Pytho 3.8
https://github.com/ffreemt/dzbee
先装 fasttext
, pycld2
, PyICU
, 例如 pip install fasttext pycld2 PyICU
(无C++编译环境的系统装对应的 whl Archived: Python Extension Packages for Windows - Christoph Gohlke
例如 pip install fasttext-0.9.2-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl PyICU-2.8.1-cp38-cp38-win_amd64.whl
)
安装
pip install dzbee
或
poetry add dzbee
文档:dzbee --help
或 python -m dzbee
使用:dzbee file1 file2
或 python -m dzbee file1 file2
(预设输出格式 tsv xlsx)
结果展示( 茨威格的 人类的群星闪耀时,无视法语的干扰 !)
Usage: python -m dzbee [OPTIONS] file1 [file2]...
Align de-zh texts, fast.
e.g.
* dzbee file1 file2
* dzbee file1 file2 -p # show plots
* dzbee file1 -s
* dzbee file1 file2 -s
Arguments:
file1 [file2]... files (absolute or relative paths) to be aligned; if only
one file is specified, the -s flag must be used to signal
it's an german/chinese mixed text file and needs to be
separated. [required]
Options:
--eps FLOAT epsilon [default: 10]
--min-samples INTEGER eps, min-samples: Larger esp or smaller
min_samples will result in more aligned pairs
but also more false positives (pairs falsely
identified as candidates). On the other hand,
smaller esp or larger min_samples values tend
to miss `good` pairs. [default: 6]
-s, --need-sep Separate input files that are mixed german
and chinese text.
-p, --show-plot Show heatmap and align trace plots in the
default browser.
--save-xlsx / --no-save-xlsx Save xlsx. [default: save-xlsx]
--save-tsv / --no-save-tsv Save tsv. [default: save-tsv]
--save-csv / --no-save-csv Save csv. [default: no-save-csv]
-v, -V, --version Show version info and exit.
--help Show this message and exit.