Soundex

Soundex phonetic similarity measure

class py_stringmatching.similarity_measure.soundex.Soundex[source]

Soundex phonetic similarity measure class.

get_raw_score(string1, string2)[source]

Computes the Soundex phonetic similarity between two strings.

Phonetic measure such as soundex match string based on their sound. These measures have been especially effective in matching names, since names are often spelled in different ways that sound the same. For example, Meyer, Meier, and Mire sound the same, as do Smith, Smithe, and Smythe.

Soundex is used primarily to match surnames. It does not work as well for names of East Asian origins, because much of the discriminating power of these names resides in the vowel sounds, which the code ignores.

Parameters:string1,string2 (str) – Input strings
Returns:Soundex similarity score (int) is returned
Raises:TypeError – If the inputs are not strings

Examples

>>> s = Soundex()
>>> s.get_raw_score('Robert', 'Rupert')
1
>>> s.get_raw_score('Sue', 's')
1
>>> s.get_raw_score('Gough', 'Goff')
0
>>> s.get_raw_score('a,,li', 'ali')
1
get_sim_score(string1, string2)[source]

Computes the normalized soundex similarity between two strings.

Parameters:string1,string2 (str) – Input strings
Returns:Normalized soundex similarity (int)
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> s = Soundex()
>>> s.get_sim_score('Robert', 'Rupert')
1
>>> s.get_sim_score('Sue', 's')
1
>>> s.get_sim_score('Gough', 'Goff')
0
>>> s.get_sim_score('a,,li', 'ali')
1