Needleman Wunsch

class py_stringmatching.similarity_measure.needleman_wunsch.NeedlemanWunsch(gap_cost=1.0, sim_func=identity_function)[source]

Computes Needleman-Wunsch measure.

The Needleman-Wunsch distance generalizes the Levenshtein distance and considers global alignment between two strings. Specifically, it is computed by assigning a score to each alignment between the two input strings and choosing the score of the best alignment, that is, the maximal score. An alignment between two strings is a set of correspondences between their characters, allowing for gaps.

Parameters
  • gap_cost (float) – Cost of gap (defaults to 1.0).

  • sim_func (function) – Similarity function to give a score for each correspondence between the characters (defaults to an identity function, which returns 1 if the two characters are the same and 0 otherwise.

gap_cost

An attribute to store the gap cost.

Type

float

sim_func

An attribute to store the similarity function.

Type

function

get_gap_cost()[source]

Get gap cost.

Returns

Gap cost (float).

get_raw_score(string1, string2)[source]

Computes the raw Needleman-Wunsch score between two strings.

Parameters
  • string1 (str) – Input strings.

  • string2 (str) – Input strings.

Returns

Needleman-Wunsch similarity score (float).

Raises

TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> nw = NeedlemanWunsch()
>>> nw.get_raw_score('dva', 'deeva')
1.0
>>> nw = NeedlemanWunsch(gap_cost=0.0)
>>> nw.get_raw_score('dva', 'deeve')
2.0
>>> nw = NeedlemanWunsch(gap_cost=1.0, sim_func=lambda s1, s2 : (2.0 if s1 == s2 else -1.0))
>>> nw.get_raw_score('dva', 'deeve')
1.0
>>> nw = NeedlemanWunsch(gap_cost=0.5, sim_func=lambda s1, s2 : (1.0 if s1 == s2 else -1.0))
>>> nw.get_raw_score('GCATGCUA', 'GATTACA')
2.5
get_sim_func()[source]

Get the similarity function.

Returns

similarity function (function).

set_gap_cost(gap_cost)[source]

Set gap cost.

Parameters

gap_cost (float) – Cost of gap.

set_sim_func(sim_func)[source]

Set similarity function.

Parameters

sim_func (function) – Similarity function to give a score for the correspondence between characters.