词汇语义相似度
基于word_net语义相似度计算
参考:WordNet介绍及相似度计算
获取单词的所有含义:
print(wn.synsets("dog"))
>>>[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'),
Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'),
Synset('andiron.n.01'), Synset('chase.v.01')]
计算语义相似度:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
# 当dog词性为'dog.n.01'与'cat.n.01'的语义相似度
similar = dog.path_similarity(cat)
print(similar)
>>>0.2
字面相似度
FuzzyWuzzy
FuzzyWuzzy是字符串模糊匹配工具
安装
pip install fuzzywuzzy
使用
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# 简单匹配
fuzz.ratio("this is a test", "this is a test!")
>>> 97
# 非完全匹配(Partial Ratio)
fuzz.partial_ratio("this is a test", "this is a test!")
>>> 100
# 忽略顺序匹配(Token Sort Ratio)
fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
>>> 91
fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
>>> 100
# 去重子集匹配(Token Set Ratio)
fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
>>> 84
fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
>>> 100
# 从候选字符串中选出最相似的字符串
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
process.extract("new york jets", choices, limit=2)
>>> [('New York Jets', 100), ('New York Giants', 78)]
process.extractOne("cowboys", choices)
>>> ("Dallas Cowboys", 90)
转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 changzeyan@foxmail.com