Hamming Distance¶
-
class
py_stringmatching.similarity_measure.hamming_distance.
HammingDistance
[source]¶ Computes Hamming distance.
The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. Thus, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
-
get_raw_score
(string1, string2)[source]¶ Computes the raw hamming distance between two strings.
- Parameters
string1 (str) – Input strings.
string2 (str) – Input strings.
- Returns
Hamming distance (int).
- Raises
TypeError – If the inputs are not strings or if one of the inputs is None.
ValueError – If the input strings are not of same length.
Examples
>>> hd = HammingDistance() >>> hd.get_raw_score('', '') 0 >>> hd.get_raw_score('alex', 'john') 4 >>> hd.get_raw_score(' ', 'a') 1 >>> hd.get_raw_score('JOHN', 'john') 4
-
get_sim_score
(string1, string2)[source]¶ Computes the normalized Hamming similarity score between two strings.
- Parameters
string1 (str) – Input strings.
string2 (str) – Input strings.
- Returns
Normalized Hamming similarity score (float).
- Raises
TypeError – If the inputs are not strings or if one of the inputs is None.
ValueError – If the input strings are not of same length.
Examples
>>> hd = HammingDistance() >>> hd.get_sim_score('', '') 1.0 >>> hd.get_sim_score('alex', 'john') 0.0 >>> hd.get_sim_score(' ', 'a') 0.0 >>> hd.get_sim_score('JOHN', 'john') 0.0
-