Bag Distance

Bag distance measure

class py_stringmatching.similarity_measure.bag_distance.BagDistance[source]

Bag distance measure class.

get_raw_score(string1, string2)[source]

Computes the bag distance between two strings.

For two strings X and Y, the Bag distance is: \(max( |bag(string1)-bag(string2)|, |bag(string2)-bag(string1)| )\)

Parameters:string1,string2 (str) – Input strings
Returns:Bag distance (int)
Raises:TypeError – If the inputs are not strings

Examples

>>> bd = BagDistance()
>>> bd.get_raw_score('cat', 'hat')
1
>>> bd.get_raw_score('Niall', 'Neil')
2
>>> bd.get_raw_score('aluminum', 'Catalan')
5
>>> bd.get_raw_score('ATCG', 'TAGC')
0
>>> bd.get_raw_score('abcde', 'xyz')
5

References

get_sim_score(string1, string2)[source]

Computes the normalized bag similarity between two strings.

Parameters:string1,string2 (str) – Input strings
Returns:Normalized bag similarity (float)
Raises:TypeError – If the inputs are not strings

Examples

>>> bd = BagDistance()
>>> bd.get_sim_score('cat', 'hat')
0.6666666666666667
>>> bd.get_sim_score('Niall', 'Neil')
0.6
>>> bd.get_sim_score('aluminum', 'Catalan')
0.375
>>> bd.get_sim_score('ATCG', 'TAGC')
1.0
>>> bd.get_sim_score('abcde', 'xyz')
0.0

References