Needleman Wunsch

class py_stringmatching.similarity_measure.needleman_wunsch.NeedlemanWunsch(gap_cost=1.0, sim_func=identity_function)[source]

Computes Needleman-Wunsch measure.

The Needleman-Wunsch distance generalizes the Levenshtein distance and considers global alignment between two strings. Specifically, it is computed by assigning a score to each alignment between the two input strings and choosing the score of the best alignment, that is, the maximal score. An alignment between two strings is a set of correspondences between their characters, allowing for gaps.

Parameters:
  • gap_cost (float) – Cost of gap (defaults to 1.0).
  • sim_func (function) – Similarity function to give a score for each correspondence between the characters (defaults to an identity function, which returns 1 if the two characters are the same and 0 otherwise.
gap_cost

float

An attribute to store the gap cost.

sim_func

function

An attribute to store the similarity function.

get_gap_cost()[source]

Get gap cost.

Returns:Gap cost (float).
get_raw_score(string1, string2)[source]

Computes the raw Needleman-Wunsch score between two strings.

Parameters:string1,string2 (str) – Input strings.
Returns:Needleman-Wunsch similarity score (float).
Raises:TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> nw = NeedlemanWunsch()
>>> nw.get_raw_score('dva', 'deeva')
1.0
>>> nw = NeedlemanWunsch(gap_cost=0.0)
>>> nw.get_raw_score('dva', 'deeve')
2.0
>>> nw = NeedlemanWunsch(gap_cost=1.0, sim_func=lambda s1, s2 : (2.0 if s1 == s2 else -1.0))
>>> nw.get_raw_score('dva', 'deeve')
1.0
>>> nw = NeedlemanWunsch(gap_cost=0.5, sim_func=lambda s1, s2 : (1.0 if s1 == s2 else -1.0))
>>> nw.get_raw_score('GCATGCUA', 'GATTACA')
2.5
get_sim_func()[source]

Get the similarity function.

Returns:similarity function (function).
set_gap_cost(gap_cost)[source]

Set gap cost.

Parameters:gap_cost (float) – Cost of gap.
set_sim_func(sim_func)[source]

Set similarity function.

Parameters:sim_func (function) – Similarity function to give a score for the correspondence between characters.