Bag Distance

Bag distance measure

class py_stringmatching.similarity_measure.bag_distance.BagDistance[source]

Bag distance measure class.

get_raw_score(string1, string2)[source]

Computes the bag distance between two strings.

For two strings X and Y, the Bag distance is: \(max( |bag(string1)-bag(string2)|, |bag(string2)-bag(string1)| )\)

Parameters
  • string1 (str) – Input strings

  • string2 (str) – Input strings

Returns

Bag distance (int)

Raises

TypeError – If the inputs are not strings

Examples

>>> bd = BagDistance()
>>> bd.get_raw_score('cat', 'hat')
1
>>> bd.get_raw_score('Niall', 'Neil')
2
>>> bd.get_raw_score('aluminum', 'Catalan')
5
>>> bd.get_raw_score('ATCG', 'TAGC')
0
>>> bd.get_raw_score('abcde', 'xyz')
5

References

get_sim_score(string1, string2)[source]

Computes the normalized bag similarity between two strings.

Parameters
  • string1 (str) – Input strings

  • string2 (str) – Input strings

Returns

Normalized bag similarity (float)

Raises

TypeError – If the inputs are not strings

Examples

>>> bd = BagDistance()
>>> bd.get_sim_score('cat', 'hat')
0.6666666666666667
>>> bd.get_sim_score('Niall', 'Neil')
0.6
>>> bd.get_sim_score('aluminum', 'Catalan')
0.375
>>> bd.get_sim_score('ATCG', 'TAGC')
1.0
>>> bd.get_sim_score('abcde', 'xyz')
0.0

References