Affine Gap

class py_stringmatching.similarity_measure.affine.Affine(gap_start=1, gap_continuation=0.5, sim_func=identity_function)[source]

Returns the affine gap score between two strings.

The affine gap measure is an extension of the Needleman-Wunsch measure that handles the longer gaps more gracefully. For more information refer to the string matching chapter in the DI book (“Principles of Data Integration”).

Parameters
  • gap_start (float) – Cost for the gap at the start (defaults to 1).

  • gap_continuation (float) – Cost for the gap continuation (defaults to 0.5).

  • sim_func (function) – Function computing similarity score between two characters, which are represented as strings (defaults to an identity function, which returns 1 if the two characters are the same and returns 0 otherwise).

gap_start

An attribute to store the gap cost at the start.

Type

float

gap_continuation

An attribute to store the gap continuation cost.

Type

float

sim_func

An attribute to store the similarity function.

Type

function

get_gap_continuation()[source]

Get gap continuation cost.

Returns

gap continuation cost (float).

get_gap_start()[source]

Get gap start cost.

Returns

gap start cost (float).

get_raw_score(string1, string2)[source]

Computes the affine gap score between two strings. This score can be outside the range [0,1].

Parameters
  • string1 (str) – Input strings.

  • string2 (str) – Input strings.

Returns

Affine gap score betwen the two input strings (float).

Raises

TypeError – If the inputs are not strings or if one of the inputs is None.

Examples

>>> aff = Affine()
>>> aff.get_raw_score('dva', 'deeva')
1.5
>>> aff = Affine(gap_start=2, gap_continuation=0.5)
>>> aff.get_raw_score('dva', 'deeve')
-0.5
>>> aff = Affine(gap_continuation=0.2, sim_func=lambda s1, s2: (int(1 if s1 == s2 else 0)))
>>> aff.get_raw_score('AAAGAATTCA', 'AAATCA')
4.4
get_sim_func()[source]

Get similarity function.

Returns

similarity function (function).

set_gap_continuation(gap_continuation)[source]

Set gap continuation cost.

Parameters

gap_continuation (float) – Cost for the gap continuation.

set_gap_start(gap_start)[source]

Set gap start cost.

Parameters

gap_start (float) – Cost for the gap at the start.

set_sim_func(sim_func)[source]

Set similarity function.

Parameters

sim_func (function) – Function computing similarity score between two characters, represented as strings.