开源: 德中命令行对齐工具python包 dzbee

仅支持 Pytho 3.8
https://github.com/ffreemt/dzbee

先装 fasttext, pycld2, PyICU, 例如 pip install fasttext pycld2 PyICU
(无C++编译环境的系统装对应的 whl Python Extension Packages for Windows - Christoph Gohlke
例如 pip install fasttext-0.9.2-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl PyICU-2.8.1-cp38-cp38-win_amd64.whl

安装
pip install dzbee

poetry add dzbee

文档:dzbee --helppython -m dzbee

使用dzbee file1 file2python -m dzbee file1 file2 (预设输出格式 tsv xlsx)

结果展示( 茨威格的 ‎人类的群星闪耀时,无视法语的干扰 :grinning:!)

Usage: python -m dzbee [OPTIONS] file1 [file2]...

  Align de-zh texts, fast.

  e.g.

  * dzbee file1 file2

  * dzbee file1 file2 -p  # show plots

  * dzbee file1 -s

  * dzbee file1 file2 -s

Arguments:
  file1 [file2]...  files (absolute or relative paths) to be aligned; if only
                    one file is specified, the -s flag must be used to signal
                    it's an german/chinese mixed text file and needs to be
                    separated.  [required]

Options:
  --eps FLOAT                   epsilon  [default: 10]
  --min-samples INTEGER         eps, min-samples: Larger esp or smaller
                                min_samples will result in more aligned pairs
                                but also more false positives (pairs falsely
                                identified as candidates). On the other hand,
                                smaller esp or larger min_samples values tend
                                to miss `good` pairs.  [default: 6]
  -s, --need-sep                Separate input files that are mixed german
                                and chinese text.
  -p, --show-plot               Show heatmap and align trace plots in the
                                default browser.
  --save-xlsx / --no-save-xlsx  Save xlsx.  [default: save-xlsx]
  --save-tsv / --no-save-tsv    Save tsv.  [default: save-tsv]
  --save-csv / --no-save-csv    Save csv.  [default: no-save-csv]
  -v, -V, --version             Show version info and exit.
  --help                        Show this message and exit.

咋不用 pycld3 哈哈哈

那个 pycld2 和 pyicu 是别的包(polyglot)的依赖

我知道的问题是 pycld3 在 windows上超级难安装

目标就是给Linux/Mac可能,https://www.lfd.uci.edu/~gohlke/pythonlibs/上好像还没有 whl 给Windows用的