Jaro Winkler¶
-
class
py_stringmatching.similarity_measure.jaro_winkler.
JaroWinkler
(prefix_weight=0.1)[source]¶ Computes Jaro-Winkler measure.
The Jaro-Winkler measure is designed to capture cases where two strings have a low Jaro score, but share a prefix and thus are likely to match.
Parameters: prefix_weight (float) – Weight to give to the prefix (defaults to 0.1). -
prefix_weight
¶ float – An attribute to store the prefix weight.
-
get_raw_score
(string1, string2)[source]¶ Computes the raw Jaro-Winkler score between two strings.
Parameters: string1,string2 (str) – Input strings. Returns: Jaro-Winkler similarity score (float). Raises: TypeError
– If the inputs are not strings or if one of the inputs is None.Examples
>>> jw = JaroWinkler() >>> jw.get_raw_score('MARTHA', 'MARHTA') 0.9611111111111111 >>> jw.get_raw_score('DWAYNE', 'DUANE') 0.84 >>> jw.get_raw_score('DIXON', 'DICKSONX') 0.8133333333333332
-
get_sim_score
(string1, string2)[source]¶ Computes the normalized Jaro-Winkler similarity score between two strings. Simply call get_raw_score.
Parameters: string1,string2 (str) – Input strings. Returns: Normalized Jaro-Winkler similarity (float). Raises: TypeError
– If the inputs are not strings or if one of the inputs is None.Examples
>>> jw = JaroWinkler() >>> jw.get_sim_score('MARTHA', 'MARHTA') 0.9611111111111111 >>> jw.get_sim_score('DWAYNE', 'DUANE') 0.84 >>> jw.get_sim_score('DIXON', 'DICKSONX') 0.8133333333333332
-