Tversky Index¶

Tversky index similarity measure

class py_stringmatching.similarity_measure.tversky_index.TverskyIndex(alpha=0.5, beta=0.5)[source]¶

Tversky index similarity measure class.

Parameters:	beta (alpha,) – Tversky index parameters (defaults to 0.5).

get_alpha()[source]¶

Get alpha

Returns:	alpha (float)

get_beta()[source]¶

Get beta

Returns:	beta (float)

get_raw_score(set1, set2)[source]¶

Computes the Tversky index similarity between two sets.

The Tversky index is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of Dice’s coefficient and Tanimoto coefficient.

For sets X and Y the Tversky index is a number between 0 and 1 given by: \(tversky_index(X, Y) = \frac{|X \cap Y|}{|X \cap Y| + lpha |X-Y| + eta |Y-X|}\) where, :math: lpha, eta >=0

Parameters:	set1,set2 (set or list) – Input sets (or lists). Input lists are converted to sets.
Returns:	Tversly index similarity (float)
Raises:	`TypeError` – If the inputs are not sets (or lists) or if one of the inputs is None.

Examples

>>> tvi = TverskyIndex()
>>> tvi.get_raw_score(['data', 'science'], ['data'])
0.6666666666666666
>>> tvi.get_raw_score(['data', 'management'], ['data', 'data', 'science'])
0.5
>>> tvi.get_raw_score({1, 1, 2, 3, 4}, {2, 3, 4, 5, 6, 7, 7, 8})
0.5454545454545454
>>> tvi = TverskyIndex(0.5, 0.5)
>>> tvi.get_raw_score({1, 1, 2, 3, 4}, {2, 3, 4, 5, 6, 7, 7, 8})
0.5454545454545454
>>> tvi = TverskyIndex(beta=0.5)
>>> tvi.get_raw_score(['data', 'management'], ['data', 'data', 'science'])
0.5

get_sim_score(set1, set2)[source]¶

Computes the normalized tversky index similarity between two sets.

Parameters:	set1,set2 (set or list) – Input sets (or lists). Input lists are converted to sets.
Returns:	Normalized tversky index similarity (float)
Raises:	`TypeError` – If the inputs are not sets (or lists) or if one of the inputs is None.

Examples

>>> tvi = TverskyIndex()
>>> tvi.get_sim_score(['data', 'science'], ['data'])
0.6666666666666666
>>> tvi.get_sim_score(['data', 'management'], ['data', 'data', 'science'])
0.5
>>> tvi.get_sim_score({1, 1, 2, 3, 4}, {2, 3, 4, 5, 6, 7, 7, 8})
0.5454545454545454
>>> tvi = TverskyIndex(0.5, 0.5)
>>> tvi.get_sim_score({1, 1, 2, 3, 4}, {2, 3, 4, 5, 6, 7, 7, 8})
0.5454545454545454
>>> tvi = TverskyIndex(beta=0.5)
>>> tvi.get_sim_score(['data', 'management'], ['data', 'data', 'science'])
0.5

set_alpha(alpha)[source]¶

Set alpha

Parameters:	alpha (float) – Tversky index parameter

set_beta(beta)[source]¶

Set beta

Parameters:	beta (float) – Tversky index parameter