Linux 上结合 Tesseract (OCR) 和 GoldenDict 来取词翻译

slbtty · 2023 年5 月 21 日 23:29

Capture2Text 有 Linux 的移植版本，本质上就是带界面的 Tesseract。对 Linux 用户来说太臃肿了，因为只要几行命令就可以:)

调用截图工具存到临时文件，然后用 Tesseract 来做 OCR 传给 GoldenDict

Peek 2023-05-21 19-07

截图在 KDE 里用的是 spectacle ，Sway/wlroot 用 grim + slurp

tesseract 需要切换语言在 -l eng

#!/usr/bin/env bash

set -e

case $DESKTOP_SESSION in
    sway)
        grim -g "$(slurp)" /tmp/tmp.just_random_name.png
    ;;
    plasmawayland | plasma)
        spectacle --region --nonotify --background \
        --output /tmp/tmp.just_random_name.png
    ;;
    *)
        echo "Failed to know desktop type"
        exit 1
    ;;
esac

# note that tesseract will apppend .txt to output file
tesseract /tmp/tmp.just_random_name.png /tmp/tmp.just_random_name --oem 1  -l eng

goldendict "$(cat /tmp/tmp.just_random_name.txt)"

rm /tmp/tmp.just_random_name.png
rm /tmp/tmp.just_random_name.txt

KDE 里设置全局快捷键：

ylxdxx · 2023 年5 月 23 日 08:31

截图可以使用 scrot ，tesseract精度感人，推荐 rapidocr

xiaoyifang · 2023 年5 月 23 日 09:52

可以放到

howto