Jaro

class py_stringmatching.similarity_measure.jaro.Jaro[source]

Computes Jaro measure.

The Jaro measure is a type of edit distance, developed mainly to compare short strings, such as first and last names.

get_raw_score(string1, string2)[source]

Computes the raw Jaro score between two strings.

Parameters:string1,string2 (str) – Input strings.
Returns:Jaro similarity score (float).
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> jaro = Jaro()
>>> jaro.get_raw_score('MARTHA', 'MARHTA')
0.9444444444444445
>>> jaro.get_raw_score('DWAYNE', 'DUANE')
0.8222222222222223
>>> jaro.get_raw_score('DIXON', 'DICKSONX')
0.7666666666666666
get_sim_score(string1, string2)[source]

Computes the normalized Jaro similarity score between two strings. Simply call get_raw_score.

Parameters:string1,string2 (str) – Input strings.
Returns:Normalized Jaro similarity score (float).
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> jaro = Jaro()
>>> jaro.get_sim_score('MARTHA', 'MARHTA')
0.9444444444444445
>>> jaro.get_sim_score('DWAYNE', 'DUANE')
0.8222222222222223
>>> jaro.get_sim_score('DIXON', 'DICKSONX')
0.7666666666666666