在浏览一些贴子的时候,看到了需要将一些特殊字符串还原为正常文字的需求。大部分网友提到的文法是利用一款编辑器,用里面的查找替换功能,这个编辑器这边没法用,也不知道好不好用,这里提供一个快速解决方法,一行命令就能搞定。
这里用到的工具是 uni2ascii
,先用如下命令安装:
sudo apt install uni2ascii
安装好后,即可使用,如:
- 中文转Unicode
echo -n "你好" | uni2ascii -a C -q
结果如下:
- 在网页中,常会遇到
\u6df1\u5733
的形式
echo '\u4F60\u597D' | ascii2uni -a U -q
结果如下
- 在网页中,常会遇到
你好
的形式
echo '你好' | ascii2uni -a H -q
结果如下
利用 ascii2uni -L
可查看支持的格式,如下:
raw hexadecimal numbers
(00E9) R
standard form hexadecimal numbers
(0x00E9) X
prefix v decimal (Perl format)
(v233) 2
prefix $ hexadecimal ($00E9) 3
prefix 16# hexadecimal (16#00E9) 4
prefix #x hexadecimal (Common Lisp format) (#x00E9) 1
prefix #16r hexadecimal (#16r00E9) 5
prefix \u decimal (\u0233) V
prefix \u hexadecimal (\u00E9) U
prefix \U outside BMP, \u within, hexadecimal (U+0000-U+FFFF) L
prefix U hexadecimal (U00E9) E
prefix u hexadecimal (u00E9) F
prefix %u hexadecimal (%u00E9) 9
prefix U+ hexadecimal (U+00E9) P
prefix X with hexadecimal in single quotes (X'00E9') G
prefix 16# and suffix # hexadecimal (16#00E9#) 6
prefix U in anglebrackets hexadecimal (<U00E9>) A
prefix backslash-x hexadecimal (\x00E9) B
prefix backslash-x hexadecimal in braces (\x{00E9}) C
HTML numeric character references - decimal (é) D
HTML numeric character references - hexadecimal (é) H
SGML numeric character references -decimal (\#0233;) N
SGML numeric character references - hexadecimal (\#x00E9;) M
octal escapes for 3 low bytes in big-endian order (\000\000\351) O
hexadecimal escapes for 3 low bytes in big-endian order
(\x00\x00\xE9) S
decimal escapes for 3 low bytes in big-endian order (\d000\d000\d233) T
hexadecimal UTF-8 with each byte's hex preceded by an =-sign (=C3=A9).
RFC 2045 Quoted Printable. I
hexadecimal UTF-8 with each byte's hex preceded by a %-sign (%C3%A9)
RFC 2396 URI escape format. J
hexadecimal UTF-8 with each byte's hex preceded by a backslash-x (\xC3\xA9)
Apache log format. 7
hexadecimal UTF-8 with each byte's hex surrounded by angle brackets (<C3><A9>)
0
octal UTF-8 with backslash escapes (é) K
HTML character entities Q
all three types of HTML escape: hexadecimal character references,
decimal character references, and character entities Y
不仅能从标准输入读取,也可直接从文件中读取,在Linux下,转换,就是如此简单,一行命令的事