Smith Waterman¶
-
class
py_stringmatching.similarity_measure.smith_waterman.
SmithWaterman
(gap_cost=1.0, sim_func=identity_function)[source]¶ Computes Smith-Waterman measure.
The Smith-Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. See the string matching chapter in the DI book (Principles of Data Integration).
- Parameters
gap_cost (float) – Cost of gap (defaults to 1.0).
sim_func (function) – Similarity function to give a score for the correspondence between the characters (defaults to an identity function, which returns 1 if the two characters are the same and 0 otherwise).
-
gap_cost
¶ An attribute to store the gap cost.
- Type
float
-
sim_func
¶ An attribute to store the similarity function.
- Type
function
-
get_raw_score
(string1, string2)[source]¶ Computes the raw Smith-Waterman score between two strings.
- Parameters
string1 (str) – Input strings.
string2 (str) – Input strings.
- Returns
Smith-Waterman similarity score (float).
- Raises
TypeError – If the inputs are not strings or if one of the inputs is None.
Examples
>>> sw = SmithWaterman() >>> sw.get_raw_score('cat', 'hat') 2.0 >>> sw = SmithWaterman(gap_cost=2.2) >>> sw.get_raw_score('dva', 'deeve') 1.0 >>> sw = SmithWaterman(gap_cost=1, sim_func=lambda s1, s2 : (2 if s1 == s2 else -1)) >>> sw.get_raw_score('dva', 'deeve') 2.0 >>> sw = SmithWaterman(gap_cost=1.4, sim_func=lambda s1, s2 : (1.5 if s1 == s2 else 0.5)) >>> sw.get_raw_score('GCATAGCU', 'GATTACA') 6.5