Affine Gap¶
-
class
py_stringmatching.similarity_measure.affine.
Affine
(gap_start=1, gap_continuation=0.5, sim_func=identity_function)[source]¶ Returns the affine gap score between two strings.
The affine gap measure is an extension of the Needleman-Wunsch measure that handles the longer gaps more gracefully. For more information refer to the string matching chapter in the DI book (“Principles of Data Integration”).
- Parameters
gap_start (float) – Cost for the gap at the start (defaults to 1).
gap_continuation (float) – Cost for the gap continuation (defaults to 0.5).
sim_func (function) – Function computing similarity score between two characters, which are represented as strings (defaults to an identity function, which returns 1 if the two characters are the same and returns 0 otherwise).
-
gap_start
¶ An attribute to store the gap cost at the start.
- Type
float
-
gap_continuation
¶ An attribute to store the gap continuation cost.
- Type
float
-
sim_func
¶ An attribute to store the similarity function.
- Type
function
-
get_raw_score
(string1, string2)[source]¶ Computes the affine gap score between two strings. This score can be outside the range [0,1].
- Parameters
string1 (str) – Input strings.
string2 (str) – Input strings.
- Returns
Affine gap score betwen the two input strings (float).
- Raises
TypeError – If the inputs are not strings or if one of the inputs is None.
Examples
>>> aff = Affine() >>> aff.get_raw_score('dva', 'deeva') 1.5 >>> aff = Affine(gap_start=2, gap_continuation=0.5) >>> aff.get_raw_score('dva', 'deeve') -0.5 >>> aff = Affine(gap_continuation=0.2, sim_func=lambda s1, s2: (int(1 if s1 == s2 else 0))) >>> aff.get_raw_score('AAAGAATTCA', 'AAATCA') 4.4
-
set_gap_continuation
(gap_continuation)[source]¶ Set gap continuation cost.
- Parameters
gap_continuation (float) – Cost for the gap continuation.