Needleman Wunsch¶
-
class
py_stringmatching.similarity_measure.needleman_wunsch.
NeedlemanWunsch
(gap_cost=1.0, sim_func=identity_function)[source]¶ Computes Needleman-Wunsch measure.
The Needleman-Wunsch distance generalizes the Levenshtein distance and considers global alignment between two strings. Specifically, it is computed by assigning a score to each alignment between the two input strings and choosing the score of the best alignment, that is, the maximal score. An alignment between two strings is a set of correspondences between their characters, allowing for gaps.
Parameters: - gap_cost (float) – Cost of gap (defaults to 1.0).
- sim_func (function) – Similarity function to give a score for each correspondence between the characters (defaults to an identity function, which returns 1 if the two characters are the same and 0 otherwise.
-
gap_cost
¶ float – An attribute to store the gap cost.
-
sim_func
¶ function – An attribute to store the similarity function.
-
get_raw_score
(string1, string2)[source]¶ Computes the raw Needleman-Wunsch score between two strings.
Parameters: string1,string2 (str) – Input strings. Returns: Needleman-Wunsch similarity score (float). Raises: TypeError
– If the inputs are not strings or if one of the inputs is None.Examples
>>> nw = NeedlemanWunsch() >>> nw.get_raw_score('dva', 'deeva') 1.0 >>> nw = NeedlemanWunsch(gap_cost=0.0) >>> nw.get_raw_score('dva', 'deeve') 2.0 >>> nw = NeedlemanWunsch(gap_cost=1.0, sim_func=lambda s1, s2 : (2.0 if s1 == s2 else -1.0)) >>> nw.get_raw_score('dva', 'deeve') 1.0 >>> nw = NeedlemanWunsch(gap_cost=0.5, sim_func=lambda s1, s2 : (1.0 if s1 == s2 else -1.0)) >>> nw.get_raw_score('GCATGCUA', 'GATTACA') 2.5