Monge Elkan¶
-
class
py_stringmatching.similarity_measure.monge_elkan.
MongeElkan
(sim_func=jaro_winkler_function)[source]¶ Computes Monge-Elkan measure.
The Monge-Elkan similarity measure is a type of hybrid similarity measure that combines the benefits of sequence-based and set-based methods. This can be effective for domains in which more control is needed over the similarity measure. It implicitly uses a secondary similarity measure, such as Levenshtein to compute over all similarity score. See the string matching chapter in the DI book (Principles of Data Integration).
Parameters: sim_func (function) – Secondary similarity function. This is expected to be a sequence-based similarity measure (defaults to Jaro-Winkler similarity measure). -
sim_func
¶ function
An attribute to store the secondary similarity function.
-
get_raw_score
(bag1, bag2)[source]¶ Computes the raw Monge-Elkan score between two bags (lists).
Parameters: bag1,bag2 (list) – Input lists. Returns: Monge-Elkan similarity score (float). Raises: TypeError
– If the inputs are not lists or if one of the inputs is None.Examples
>>> me = MongeElkan() >>> me.get_raw_score(['Niall'], ['Neal']) 0.8049999999999999 >>> me.get_raw_score(['Niall'], ['Nigel']) 0.7866666666666667 >>> me.get_raw_score(['Comput.', 'Sci.', 'and', 'Eng.', 'Dept.,', 'University', 'of', 'California,', 'San', 'Diego'], ['Department', 'of', 'Computer', 'Science,', 'Univ.', 'Calif.,', 'San', 'Diego']) 0.8677218614718616 >>> me.get_raw_score([''], ['a']) 0.0 >>> me = MongeElkan(sim_func=NeedlemanWunsch().get_raw_score) >>> me.get_raw_score(['Comput.', 'Sci.', 'and', 'Eng.', 'Dept.,', 'University', 'of', 'California,', 'San', 'Diego'], ['Department', 'of', 'Computer', 'Science,', 'Univ.', 'Calif.,', 'San', 'Diego']) 2.0 >>> me = MongeElkan(sim_func=Affine().get_raw_score) >>> me.get_raw_score(['Comput.', 'Sci.', 'and', 'Eng.', 'Dept.,', 'University', 'of', 'California,', 'San', 'Diego'], ['Department', 'of', 'Computer', 'Science,', 'Univ.', 'Calif.,', 'San', 'Diego']) 2.25
References
- Principles of Data Integration book
-