Labeling¶
- py_entitymatching.label_table(table, label_column_name, verbose=False)[source]¶
Label a pandas DataFrame (for supervised learning purposes).
This functions labels a DataFrame, typically used for supervised learning purposes. This function expects the input DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable). This function creates a copy of the input DataFrame, adds label column at the end of the DataFrame, fills the column values with 0, invokes a GUI for the user to enter labels (0/1, 0: non-match, 1: match) and finally returns the labeled DataFrame. Further, this function also copies the properties from the input DataFrame to the output DataFrame.
- Parameters
table (DataFrame) – The input DataFrame to be labeled. Specifically, a DataFrame containing the metadata of a candidate set (such as key, fk_ltable, fk_rtable, ltable, rtable) in the catalog.
label_column_name (string) – The column name to be given for the labels entered by the user.
verbose (boolean) – A flag to indicate whether more detailed information about the execution steps should be printed out (default value is False).
- Returns
A new DataFrame with the labels entered by the user. Further, this function sets the output DataFrame’s properties same as input DataFrame.
- Raises
AssertionError – If table is not of type pandas DataFrame.
AssertionError – If label_column_name is not of type string.
AssertionError – If the label_column_name is already present in the input table.
Examples
>>> import py_entitymatching as em >>> G = em.label_table(S, label_column_name='label') # S is the (sampled) table that has to be labeled.