Jaro Winkler

class py_stringmatching.similarity_measure.jaro_winkler.JaroWinkler(prefix_weight=0.1)[source]

Computes Jaro-Winkler measure.

The Jaro-Winkler measure is designed to capture cases where two strings have a low Jaro score, but share a prefix and thus are likely to match.

Parameters:prefix_weight (float) – Weight to give to the prefix (defaults to 0.1).
prefix_weight

float – An attribute to store the prefix weight.

get_prefix_weight()[source]

Get prefix weight.

Returns:prefix weight (float).
get_raw_score(string1, string2)[source]

Computes the raw Jaro-Winkler score between two strings.

Parameters:string1,string2 (str) – Input strings.
Returns:Jaro-Winkler similarity score (float).
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> jw = JaroWinkler()
>>> jw.get_raw_score('MARTHA', 'MARHTA')
0.9611111111111111
>>> jw.get_raw_score('DWAYNE', 'DUANE')
0.84
>>> jw.get_raw_score('DIXON', 'DICKSONX')
0.8133333333333332
get_sim_score(string1, string2)[source]

Computes the normalized Jaro-Winkler similarity score between two strings. Simply call get_raw_score.

Parameters:string1,string2 (str) – Input strings.
Returns:Normalized Jaro-Winkler similarity (float).
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> jw = JaroWinkler()
>>> jw.get_sim_score('MARTHA', 'MARHTA')
0.9611111111111111
>>> jw.get_sim_score('DWAYNE', 'DUANE')
0.84
>>> jw.get_sim_score('DIXON', 'DICKSONX')
0.8133333333333332
set_prefix_weight(prefix_weight)[source]

Set prefix weight.

Parameters:prefix_weight (float) – Weight to give to the prefix.