Editex¶
Editex distance measure
-
class
py_stringmatching.similarity_measure.editex.
Editex
(match_cost=0, group_cost=1, mismatch_cost=2, local=False)[source]¶ Editex distance measure class.
Parameters: - match_cost (int) – Weight to give the correct char match, default=0
- group_cost (int) – Weight to give if the chars are in the same editex group, default=1
- mismatch_cost (int) – Weight to give the incorrect char match, default=2
- local (boolean) – Local variant on/off, default=False
-
get_raw_score
(string1, string2)[source]¶ Computes the editex distance between two strings.
As described on pages 3 & 4 of Zobel, Justin and Philip Dart. 1996. Phonetic string matching: Lessons from information retrieval. In: Proceedings of the ACM-SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland. 166–173. http://goanna.cs.rmit.edu.au/~jz/fulltext/sigir96.pdf
The local variant is based on Ring, Nicholas and Alexandra L. Uitdenbogerd. 2009. Finding ‘Lucy in Disguise’: The Misheard Lyric Matching Problem. In: Proceedings of the 5th Asia Information Retrieval Symposium, Sapporo, Japan. 157-167. http://www.seg.rmit.edu.au/research/download.php?manuscript=404
Parameters: string1,string2 (str) – Input strings Returns: Editex distance (int) Raises: TypeError
– If the inputs are not stringsExamples
>>> ed = Editex() >>> ed.get_raw_score('cat', 'hat') 2 >>> ed.get_raw_score('Niall', 'Neil') 2 >>> ed.get_raw_score('aluminum', 'Catalan') 12 >>> ed.get_raw_score('ATCG', 'TAGC') 6
References
- Abydos Library - https://github.com/chrislit/abydos/blob/master/abydos/distance.py
-
get_sim_score
(string1, string2)[source]¶ Computes the normalized editex similarity between two strings.
Parameters: string1,string2 (str) – Input strings Returns: Normalized editex similarity (float) Raises: TypeError
– If the inputs are not stringsExamples
>>> ed = Editex() >>> ed.get_sim_score('cat', 'hat') 0.66666666666666674 >>> ed.get_sim_score('Niall', 'Neil') 0.80000000000000004 >>> ed.get_sim_score('aluminum', 'Catalan') 0.25 >>> ed.get_sim_score('ATCG', 'TAGC') 0.25
References
- Abydos Library - https://github.com/chrislit/abydos/blob/master/abydos/distance.py
-
set_group_cost
(group_cost)[source]¶ Set group cost
Parameters: group_cost (int) – Weight to give if the chars are in the same editex group