Hamming Distance

class py_stringmatching.similarity_measure.hamming_distance.HammingDistance[source]

Computes Hamming distance.

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. Thus, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

get_raw_score(string1, string2)[source]

Computes the raw hamming distance between two strings.

Parameters:

string1,string2 (str) – Input strings.

Returns:

Hamming distance (int).

Raises:
  • TypeError – If the inputs are not strings or if one of the inputs is None.
  • ValueError – If the input strings are not of same length.

Examples

>>> hd = HammingDistance()
>>> hd.get_raw_score('', '')
0
>>> hd.get_raw_score('alex', 'john')
4
>>> hd.get_raw_score(' ', 'a')
1
>>> hd.get_raw_score('JOHN', 'john')
4
get_sim_score(string1, string2)[source]

Computes the normalized Hamming similarity score between two strings.

Parameters:

string1,string2 (str) – Input strings.

Returns:

Normalized Hamming similarity score (float).

Raises:
  • TypeError – If the inputs are not strings or if one of the inputs is None.
  • ValueError – If the input strings are not of same length.

Examples

>>> hd = HammingDistance()
>>> hd.get_sim_score('', '')
1.0
>>> hd.get_sim_score('alex', 'john')
0.0
>>> hd.get_sim_score(' ', 'a')
0.0
>>> hd.get_sim_score('JOHN', 'john')
0.0