Labeling

py_entitymatching.label_table(table, label_column_name, verbose=False)

Label a pandas DataFrame (for supervised learning purposes).

This functions labels a DataFrame, typically used for supervised learning purposes. This function expects the input DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable). This function creates a copy of the input DataFrame, adds label column at the end of the DataFrame, fills the column values with 0, invokes a GUI for the user to enter labels (0/1, 0: non-match, 1: match) and finally returns the labeled DataFrame. Further, this function also copies the properties from the input DataFrame to the output DataFrame.

Parameters:
  • table (DataFrame) – The input DataFrame to be labeled. Specifically, a DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable) in the catalog.
  • label_column_name (string) – The column name to be given for the labels entered by the user.
  • verbose (boolean) – A flag to indicate whether more detailed information about the execution steps should be printed out (default value is False).
Returns:

A new DataFrame with the labels entered by the user. Further, this function sets the output DataFrame’s properties same as input DataFrame.

Raises:
  • AssertionError – If table is not of type pandas DataFrame.
  • AssertionError – If label_column_name is not of type string.
  • AssertionError – If the label_column_name is already present in the input table.

Examples

>>> import py_entitymatching as em
>>> G = em.label_table(S, label_column_name='label') # S is the (sampled) table that has to be labeled.
py_entitymatching.new_label_table(df, label_column_name)

Method to be invoked to launch the Labeler application.

Parameters:
  • df (Dataframe) – A Dataframe containing the tuple pairs that are possible matches
  • label_column_name (str) – Name of column to be used to save tuple pair labels. This column will be created if it doesn’t already exist.
Returns:

The updated Dataframe with the label column, comments, and tags

Raises:
  • AssertionError – If table is not of type pandas DataFrame.
  • AssertionError – If label_column_name is not of type string.
  • ImportError – If python version is less than 3.5
Scroll To Top