Matcher Combiner¶
- class py_entitymatching.matchercombiner.matchercombiner.MajorityVote[source]¶
THIS CLASS EXPERIMENTAL AND NOT TESTED. USE AT YOUR OWN RISK.
The goal of this combiner is to combine a list of predictions from multiple matchers to produce a consolidated prediction. In this majority voting-based combining, the prediction that occurs most is returned as the consolicated prediction. If there is no clear winning prediction (for example, 0 and 1 occuring equal number of times) then 0 is returned.
Implementation wise, there should be a combiner command to which an object of this class must be given as a parameter. Based on this parameter, the combiner command will use this object to combine the predictions.
- combine(predictions)[source]¶
Combine a list of predictions from matchers using majority voting.
- Parameters
predictions (DataFrame) – A table containing predictions from multiple matchers.
- Returns
A list of consolidated predictions.
Examples
>>> dt = DTMatcher() >>> rf = RFMatcher() >>> nb = NBMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') # H is training set containing feature vectors >>> dt.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='dt_predictions') # L is the test set for which we should get predictions. >>> rf.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') >>> rf.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='rf_predictions') >>> nb.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') >>> nb.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='nb_predictions') >>> mv_combiner = MajorityVote() >>> L['consol_predictions'] = mv_combiner.combine(L[['dt_predictions', 'rf_predictions', 'nb_predictions']])
- class py_entitymatching.matchercombiner.matchercombiner.WeightedVote(weights=None, threshold=None)[source]¶
THIS CLASS EXPERIMENTAL AND NOT TESTED. USE AT YOUR OWN RISK.
The goal of this combiner is to combine a list of predictions from multiple matchers to produce a consolidated prediction. In this weighted voting-based combining, each prediction is given a weight, we compute a weighted sum of these predictions and compare the result to a threshold. If the result is greater than or equal to the threshold then the consolidated prediction is returned as a match (i.e., 1) else returned as a no-match.
Implementation wise, there should be a combiner command to which an object of this class must be given as a parameter. Based on this parameter, the combiner command will use this object to combine the predictions.
- combine(predictions)[source]¶
Combine a list of predictions from matchers using weighted voting.
- Parameters
predictions (DataFrame) – A table containing predictions from multiple matchers.
- Returns
A list of consolidated predictions.
Examples
>>> dt = DTMatcher() >>> rf = RFMatcher() >>> nb = NBMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') # H is training set containing feature vectors >>> dt.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='dt_predictions') # L is the test set for which we should get predictions. >>> rf.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') >>> rf.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='rf_predictions') >>> nb.fit(table=H, exclude_attrs=['_id', 'l_id', 'r_id'], target_attr='label') >>> nb.predict(table=L, exclude_attrs=['id', 'l_id', 'r_id'], append=True, inplace=True, target_attr='nb_predictions') >>> wv_combiner = WeightedVote(weights=[0.1, 0.2, 0.1], threshold=0.2) >>> L['consol_predictions'] = wv_combiner.combine(L[['dt_predictions', 'rf_predictions', 'nb_predictions']])