Evaluating the Matching Output¶
- 
py_entitymatching.eval_matches(data_frame, gold_label_attr, predicted_label_attr)¶
- Evaluates the matches from the matcher. - Specifically, given a DataFrame containing golden labels and predicted labels, this function would evaluate the matches and return the accuracy results such as precision, recall and F1. - Parameters: - data_frame (DataFrame) – The input pandas DataFrame containing “gold” labels and “predicted” labels.
- gold_label_attr (string) – An attribute in the input DataFrame containing “gold” labels.
- predicted_label_attr (string) – An attribute in the input DataFrame containing “predicted” labels.
 - Returns: - A Python dictionary containing the accuracy measures such as precision, recall, F1. - Raises: - AssertionError– If data_frame is not of type pandas DataFrame.
- AssertionError– If gold_label_attr is not of type string.
- AssertionError– If predicted_label_attr is not of type string.
- AssertionError– If the gold_label_attr is not in the input dataFrame.
- AssertionError– If the predicted_label_attr is not in the input dataFrame.
 - Examples - >>> import py_entitymatching as em >>> # G is the labeled data used for development purposes, match_f is the feature table >>> H = em.extract_feat_vecs(G, feat_table=match_f, attrs_after='gold_labels') >>> dt = em.DTMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], target_attr='gold_labels') >>> pred_table = dt.predict(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], append=True, target_attr='predicted_labels') >>> eval_summary = em.eval_matches(pred_table, 'gold_labels', 'predicted_labels') 
- 
py_entitymatching.print_eval_summary(eval_summary)¶
- Prints a summary of evaluation results. - Parameters: - eval_summary (dictionary) – Dictionary containing evaluation results, typically from ‘eval_matches’ function. - Examples - >>> import py_entitymatching as em >>> # G is the labeled data used for development purposes, match_f is the feature table >>> H = em.extract_feat_vecs(G, feat_table=match_f, attrs_after='gold_labels') >>> dt = em.DTMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], target_attr='gold_labels') >>> pred_table = dt.predict(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], append=True, target_attr='predicted_labels') >>> eval_summary = em.eval_matches(pred_table, 'gold_labels', 'predicted_labels') >>> em.print_eval_summary(eval_summary) 
- 
py_entitymatching.get_false_positives_as_df(table, eval_summary, verbose=False)¶
- Select only the false positives from the input table and return as a DataFrame based on the evaluation results. - Parameters: - table (DataFrame) – The input table (pandas DataFrame) that was used for evaluation.
- eval_summary (dictionary) – A Python dictionary containing evaluation results, typically from ‘eval_matches’ command.
 - Returns: - A pandas DataFrame containing only the False positives from the input table. - Further, this function sets the output DataFrame’s properties same as input DataFrame. - Examples - >>> import py_entitymatching as em >>> # G is the labeled data used for development purposes, match_f is the feature table >>> H = em.extract_feat_vecs(G, feat_table=match_f, attrs_after='gold_labels') >>> dt = em.DTMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], target_attr='gold_labels') >>> pred_table = dt.predict(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], append=True, target_attr='predicted_labels') >>> eval_summary = em.eval_matches(pred_table, 'gold_labels', 'predicted_labels') >>> false_pos_df = em.get_false_positives_as_df(H, eval_summary) 
- 
py_entitymatching.get_false_negatives_as_df(table, eval_summary, verbose=False)¶
- Select only the false negatives from the input table and return as a DataFrame based on the evaluation results. - Parameters: - table (DataFrame) – The input table (pandas DataFrame) that was used for evaluation.
- eval_summary (dictionary) – A Python dictionary containing evaluation results, typically from ‘eval_matches’ command.
 - Returns: - A pandas DataFrame containing only the false negatives from the input table. - Further, this function sets the output DataFrame’s properties same as input DataFrame. - Examples - >>> import py_entitymatching as em >>> # G is the labeled data used for development purposes, match_f is the feature table >>> H = em.extract_feat_vecs(G, feat_table=match_f, attrs_after='gold_labels') >>> dt = em.DTMatcher() >>> dt.fit(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], target_attr='gold_labels') >>> pred_table = dt.predict(table=H, exclude_attrs=['_id', 'ltable_id', 'rtable_id', 'gold_labels'], append=True, target_attr='predicted_labels') >>> eval_summary = em.eval_matches(pred_table, 'gold_labels', 'predicted_labels') >>> false_neg_df = em.get_false_negatives_as_df(H, eval_summary)