Oald10反查，感觉没做好。headword:抠门 cheap

amob · 2025 年2 月 6 日 06:52

如果用的jieba分词，应该是用的精确模式，我做分词一直都用全模式。

sentence = "我来自中国人民大学"
# 默认精确模式
words = jieba.cut(sentence)
print("精确模式:  %s" % " ".join(words))
# 全模式
words = jieba.cut(sentence, cut_all=True)
print("全模式:  %s" % " ".join(words))
# 新词模式
words = jieba.cut(sentence, use_paddle=True)
print("paddle模式:  %s" % " ".join(words))
# 搜索模式
words = jieba.cut_for_search(sentence)
print("搜索模式:  %s" % " ".join(words))

精确模式:  我 来自 中国人民大学
全模式:  我 来自 中国 中国人民大学 国人 人民 人民大学 大学
paddle模式:  我 来自 中国人民大学
搜索模式:  我 来自 中国 国人 人民 大学 中国人民大学

>>> import jieba
>>> seg_list = jieba.cut_for_search("小气的；抠门儿的")
>>> print(", ".join(seg_list))
Building prefix dict from the default dictionary ...
Loading model from cache
Loading model cost 0.564 seconds.
Prefix dict has been built successfully.
小气, 的, ；, 抠门, 门儿, 抠门儿, 的