Triggers

class py_entitymatching.MatchTrigger[source]
add_action(value)[source]

Adds an action to the match trigger. If the result of a rule is the same value as the condition status, then the action will be carried out. The condition status can be added with the function add_cond_status.

Args:

value (integer): The action. Currently only the values 0 and 1 are supported.

Examples:
>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.add_cond_status(True)
>>> mt.add_action(1)
add_cond_rule(conjunct_list, feature_table, rule_name=None)[source]

Adds a rule to the match trigger.

Parameters
  • conjunct_list (list) – A list of conjuncts specifying the rule.

  • feature_table (DataFrame) – A DataFrame containing all the features that are being referenced by the rule (defaults to None). If the feature_table is not supplied here, then it must have been specified during the creation of the rule-based blocker or using set_feature_table function. Otherwise an AssertionError will be raised and the rule will not be added to the rule-based blocker.

  • rule_name (string) – A string specifying the name of the rule to be added (defaults to None). If the rule_name is not specified then a name will be automatically chosen. If there is already a rule with the specified rule_name, then an AssertionError will be raised and the rule will not be added to the rule-based blocker.

Returns

The name of the rule added (string).

Raises
  • AssertionError – If rule_name already exists.

  • AssertionError – If feature_table is not a valid value parameter.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
add_cond_status(status)[source]

Adds a condition status to the match trigger. If the result of a rule is the same value as the condition status, then the action will be carried out. The action can be added with the function add_action.

Args:

status (boolean): The condition status.

Examples:
>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.add_cond_status(True)
>>> mt.add_action(1)
delete_rule(rule_name)[source]

Deletes a rule from the match trigger.

Parameters

rule_name (string) – Name of the rule to be deleted.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.delete_rule('rule_1')
execute(input_table, label_column, inplace=True, verbose=False)[source]

Executes the rules of the match trigger for a table of matcher results.

Parameters
  • input_table (DataFrame) – The input table of type pandas DataFrame containing tuple pairs and labels from matching (defaults to None).

  • label_column (string) – The attribute name where the predictions are stored in the input table (defaults to None).

  • inplace (boolean) – A flag to indicate whether the append needs to be done inplace (defaults to True).

  • verbose (boolean) – A flag to indicate whether the debug information should be logged (defaults to False).

Returns

A DataFrame with predictions updated.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.add_cond_status(True)
>>> mt.add_action(1)
>>> # The table H is a table with prediction labels generated from matching
>>> mt.execute(input_table=H, label_column='predicted_labels', inplace=False)
get_rule(rule_name)[source]

Returns the function corresponding to a rule.

Parameters

rule_name (string) – Name of the rule.

Returns

A function object corresponding to the specified rule.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.get_rule()
get_rule_names()[source]

Returns the names of all the rules in the match trigger.

Returns

A list of names of all the rules in the match trigger (list).

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.get_rule_names()
set_feature_table(feature_table)[source]

Sets feature table for the match trigger.

Parameters

feature_table (DataFrame) – A DataFrame containing features.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> mt.set_feature_table(match_f)
view_rule(rule_name)[source]

Prints the source code of the function corresponding to a rule.

Parameters

rule_name (string) – Name of the rule to be viewed.

Examples

>>> import py_entitymatching as em
>>> mt = em.MatchTrigger()
>>> A = em.read_csv_metadata('path_to_csv_dir/table_A.csv', key='id')
>>> B = em.read_csv_metadata('path_to_csv_dir/table_B.csv', key='id')
>>> match_f = em.get_features_for_matching(A, B)
>>> rule = ['title_title_lev_sim(ltuple, rtuple) > 0.7']
>>> mt.add_cond_rule(rule, match_f)
>>> mt.view_rule('rule_1')