Labeling¶
-
py_entitymatching.
label_table
(table, label_column_name, verbose=False)¶ Label a pandas DataFrame (for supervised learning purposes).
This functions labels a DataFrame, typically used for supervised learning purposes. This function expects the input DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable). This function creates a copy of the input DataFrame, adds label column at the end of the DataFrame, fills the column values with 0, invokes a GUI for the user to enter labels (0/1, 0: non-match, 1: match) and finally returns the labeled DataFrame. Further, this function also copies the properties from the input DataFrame to the output DataFrame.
Parameters: - table (DataFrame) – The input DataFrame to be labeled. Specifically, a DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable) in the catalog.
- label_column_name (string) – The column name to be given for the labels entered by the user.
- verbose (boolean) – A flag to indicate whether more detailed information about the execution steps should be printed out (default value is False).
Returns: A new DataFrame with the labels entered by the user. Further, this function sets the output DataFrame’s properties same as input DataFrame.
Raises: AssertionError
– If table is not of type pandas DataFrame.AssertionError
– If label_column_name is not of type string.AssertionError
– If the label_column_name is already present in the input table.
Examples
>>> import py_entitymatching as em >>> G = em.label_table(S, label_column_name='label') # S is the (sampled) table that has to be labeled.
-
py_entitymatching.
new_label_table
(df, label_column_name)¶ Method to be invoked to launch the Labeler application.
Parameters: - df (Dataframe) – A Dataframe containing the tuple pairs that are possible matches
- label_column_name (str) – Name of column to be used to save tuple pair labels. This column will be created if it doesn’t already exist.
Returns: The updated Dataframe with the label column, comments, and tags
Raises: AssertionError
– If table is not of type pandas DataFrame.AssertionError
– If label_column_name is not of type string.ImportError
– If python version is less than 3.5