Handling Metadata

py_entitymatching.get_catalog()[source]

Gets the catalog information for the current session.

Returns

A Python dictionary containing the catalog information.

Specifically, the dictionary contains the Python identifier of a DataFrame (obtained by id(DataFrame object)) as the key and their properties as value.

Examples

>>> import py_entitymatching as em
>>> catalog = em.get_catalog()
py_entitymatching.get_catalog_len()[source]

Get the length (i.e the number of entries) in the catalog.

Returns

The number of entries in the catalog as an integer.

Examples

>>> import py_entitymatching as em
>>> len = em.get_catalog_len()
py_entitymatching.del_catalog()[source]

Deletes the catalog for the current session.

Returns

A Boolean value of True is returned if the deletion was successful.

Examples

>>> import py_entitymatching as em
>>> em.del_catalog()
py_entitymatching.is_catalog_empty()[source]

Checks if the catalog is empty.

Returns

A Boolean value of True is returned if the catalog is empty, else returns False.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.is_catalog_empty()
 # False
py_entitymatching.is_dfinfo_present(data_frame)[source]

Checks whether the DataFrame information is present in the catalog.

Parameters

data_frame (DataFrame) – The DataFrame that should be checked for its presence in the catalog.

Returns

A Boolean value of True is returned if the DataFrame is present in the catalog, else False is returned.

Raises

AssertionError – If data_frame is not of type pandas DataFrame.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.is_dfinfo_present(A)
 # True
py_entitymatching.is_property_present_for_df(data_frame, property_name)[source]

Checks if the given property is present for the given DataFrame in the catalog.

Parameters
  • data_frame (DataFrame) – The DataFrame for which the property must be checked for.

  • property_name (string) – The name of the property that should be

  • for its presence for the DataFrame (checked) –

  • the catalog. (in) –

Returns

A Boolean value of True is returned if the property is present for the given DataFrame.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If property_name is not of type string.

  • KeyError – If data_frame is not present in the catalog.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.is_property_present_for_df(A, 'id')
 # True
>>> em.is_property_present_for_df(A, 'fk_ltable')
 # False
py_entitymatching.show_properties(data_frame)[source]

Prints the properties for a DataFrame that is present in the catalog.

Parameters

data_frame (DataFrame) – The input pandas DataFrame for which the properties must be displayed.

Examples

>>> A = pd.DataFrame({'key_attr' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'key_attr')
>>> em.show_properties(A)
# id: 4572922488  # This will change dynamically
# key: key_attr
py_entitymatching.show_properties_for_id(object_id)[source]

Shows the properties for an object id present in the catalog.

Specifically, given an object id got from typically executing id( <object>), where the object could be a DataFrame, this function will display the properties present for that object id in the catalog.

Parameters

object_id (int) – The Python identifier of an object (typically a pandas DataFrame).

Examples

>>> A = pd.DataFrame({'key_attr' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'key_attr')
>>> em.show_properties_for_id(id(A))
# id: 4572922488  # This will change dynamically
# key: key_attr
py_entitymatching.get_property(data_frame, property_name)[source]

Gets the value of a property (with the given property name) for a pandas DataFrame from the catalog.

Parameters
  • data_frame (DataFrame) – The DataFrame for which the property should be retrieved.

  • property_name (string) – The name of the property that should be retrieved.

Returns

A Python object (typically a string or a pandas DataFrame depending on the property name) is returned.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If property_name is not of type string.

  • KeyError – If data_frame information is not present in the catalog.

  • KeyError – If requested property for the data_frame is not present in the catalog.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.get_property(A, 'key')
 # id
py_entitymatching.set_property(data_frame, property_name, property_value)[source]

Sets the value of a property (with the given property name) for a pandas DataFrame in the catalog.

Parameters
  • data_frame (DataFrame) – The DataFrame for which the property must be set.

  • property_name (string) – The name of the property to be set.

  • property_value (object) – The value of the property to be set. This is typically a string (such as key) or pandas DataFrame (such as ltable, rtable).

Returns

A Boolean value of True is returned if the update was successful.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If property_name is not of type string.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_property(A, 'key', 'id')
>>> em.get_property(A, 'key')
 # id
>>> em.get_key(A)
 # id

Note

If the input DataFrame is not present in the catalog, this function will create an entry in the catalog and set the given property.

py_entitymatching.del_property(data_frame, property_name)[source]

Deletes a property for a pandas DataFrame from the catalog.

Parameters
  • data_frame (DataFrame) – The input DataFrame for which a property must be deleted from the catalog.

  • property_name (string) – The name of the property that should be deleted.

Returns

A Boolean value of True is returned if the deletion was successful.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If property_name is not of type string.

  • KeyError – If data_frame information is not present in the catalog.

  • KeyError – If requested property for the DataFrame is not present in the catalog.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_property(A, 'key', 'id')
>>> em.get_property(A, 'key')
# id
>>> em.del_property(A, 'key')
>>> em.is_property_present_for_df(A, 'key')
# False
py_entitymatching.copy_properties(source_data_frame, target_data_frame, replace=True)[source]

Copies properties from a source DataFrame to target DataFrame in the catalog.

Parameters
  • source_data_frame (DataFrame) – The DataFrame from which the properties to be copied from, in the catalog.

  • target_data_frame (DataFrame) – The DataFrame to which the properties to be copied to, in the catalog.

  • replace (boolean) – A flag to indicate whether the source DataFrame’s properties can replace the target DataFrame’s properties in the catalog. The default value for the flag is True. Specifically, if the target DataFrame’s information is already present in the catalog then the function will check if the replace flag is True. If the flag is set to True, then the function will first delete the existing properties and then set it with the source DataFrame properties. If the flag is False, the function will just return without modifying the existing properties.

Returns

A Boolean value of True is returned if the copying was successful.

Raises
  • AssertionError – If source_data_frame is not of type pandas DataFrame.

  • AssertionError – If target_data_frame is not of type pandas DataFrame.

  • KeyError – If source DataFrame is not present in the catalog.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.copy_properties(A, B)
>>> em.get_key(B)
# 'id'
py_entitymatching.get_key(data_frame)[source]

Gets the value of ‘key’ property for a DataFrame from the catalog.

Parameters

data_frame (DataFrame) – The DataFrame for which the key must be retrieved from the catalog.

Returns

A string value containing the key column name is returned (if present).

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.get_key(A)
# 'id'

See also

get_property()

py_entitymatching.set_key(data_frame, key_attribute)[source]

Sets the value of ‘key’ property for a DataFrame in the catalog with the given attribute (i.e column name).

Specifically, this function set the the key attribute for the DataFrame if the given attribute satisfies the following two properties:

The key attribute should have unique values.

The key attribute should not have missing values. A missing value is represented as np.NaN.

Parameters
  • data_frame (DataFrame) – The DataFrame for which the key must be set in the catalog.

  • key_attribute (string) – The key attribute (column name) in the DataFrame.

Returns

A Boolean value of True is returned, if the given attribute satisfies the conditions for a key and the update was successful.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If key_attribute is not of type string.

  • KeyError – If given key_attribute is not in the DataFrame columns.

Examples

>>> import py_entitymatching as em
>>> import pandas as pd
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> em.set_key(A, 'id')
>>> em.get_key(A)
# 'id'

See also

set_property()

py_entitymatching.get_fk_ltable(data_frame)[source]

Gets the foreign key to left table for a DataFrame from the catalog.

Specifically this function is a sugar function that will get the foreign key to left table using underlying get_property() function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.

Parameters

data_frame (DataFrame) – The input DataFrame for which the foreign key ltable property must be retrieved.

Returns

A Python object, typically a string is returned.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_fk_ltable(C, 'ltable_id')
>>> em.get_fk_ltable(C)
# 'ltable_id'

See also

get_property()

py_entitymatching.set_fk_ltable(data_frame, fk_ltable)[source]

Sets the foreign key to ltable for a DataFrame in the catalog.

Specifically this function is a sugar function that will set the foreign key to the left table using py_entitymatching.set_property() function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.

Parameters
  • data_frame (DataFrame) – The input DataFrame for which the foreign key ltable property must be set.

  • fk_ltable (string) – The attribute that must ne set as the foreign key to the ltable in the catalog.

Returns

A Boolean value of True is returned if the foreign key to ltable was set successfully.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If fk_ltable is not of type string.

  • AssertionError – If fk_ltable is not in the input DataFrame.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_fk_ltable(C, 'ltable_id')
>>> em.get_fk_ltable(C)
# 'ltable_id'

See also

set_property()

py_entitymatching.get_fk_rtable(data_frame)[source]

Gets the foreign key to right table for a DataFrame from the catalog.

Specifically this function is a sugar function that will get the foreign key to right table using py_entitymatching.get_property() function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.

Parameters

data_frame (DataFrame) – The input DataFrame for which the foreign key rtable property must be retrieved.

Returns

A Python object, (typically a string) is returned.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_fk_rtable(C, 'rtable_id')
>>> em.get_fk_rtable(C)
# 'rtable_id'

See also

get_property()

py_entitymatching.set_fk_rtable(data_frame, foreign_key_rtable)[source]

Sets the foreign key to rtable for a DataFrame in the catalog.

Specifically this function is a sugar function that will set the foreign key to right table using set_property function. This function is typically called on a DataFrame which contains metadata such as fk_ltable, fk_rtable, ltable, rtable.

Parameters
  • data_frame (DataFrame) – The input DataFrame for which the foreign key rtable property must be set.

  • foreign_key_rtable (string) – The attribute that must be set as foreign key to rtable in the catalog.

Returns

A Boolean value of True is returned if the foreign key to rtable was

set successfully.

Raises
  • AssertionError – If data_frame is not of type pandas DataFrame.

  • AssertionError – If foreign_key_rtable is not of type string.

  • AssertionError – If fk_rtable is not in the input DataFrame.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_fk_rtable(C, 'rtable_id')
>>> em.get_fk_rtable(C)
# 'rtable_id'

See also

set_property()

py_entitymatching.get_ltable(candset)[source]

Gets the ltable for a DataFrame from the catalog.

Parameters

candset (DataFrame) – The input table for which the ltable must be returned.

Returns

A pandas DataFrame that is pointed by ‘ltable’ property of the input table.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_ltable(C, A)
>>> id(em.get_ltable(A) == id(A)
# True

See also

get_property()

py_entitymatching.set_ltable(candset, table)[source]

Sets the ltable for a DataFrame in the catalog.

Parameters
  • candset (DataFrame) – The input table for which the ltable must be set.

  • table (DataFrame) – The table (typically a pandas DataFrame) that must be set as ltable for the input DataFrame.

Returns

A Boolean value of True is returned, if the update was successful.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_ltable(C, A)
>>> id(em.get_ltable(A) == id(A)
# True

See also

set_property()

py_entitymatching.get_rtable(candset)[source]

Gets the rtable for a DataFrame from the catalog.

Parameters

candset (DataFrame) – Input table for which the rtable must be returned.

Returns

A pandas DataFrame that is pointed by ‘rtable’ property of the input table.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_rtable(C, B)
>>> id(em.get_rtable(B) == id(B)
# True

See also

get_property()

py_entitymatching.set_rtable(candset, table)[source]

Sets the rtable for a DataFrame in the catalog.

Parameters
  • candset (DataFrame) – The input table for which the rtable must be set.

  • table (DataFrame) – The table that must be set as rtable for the input DataFrame.

Returns

A Boolean value of True is returned, if the update was successful.

Examples

>>> import py_entitymatching as em
>>> A = pd.DataFrame({'id' : [1, 2], 'colA':['a', 'b'], 'colB' : [10, 20]})
>>> B = pd.DataFrame({'id' : [1, 2], 'colA':['c', 'd'], 'colB' : [30, 40]})
>>> em.set_key(A, 'id')
>>> em.set_key(B, 'id')
>>> C = pd.DataFrame({'id':[1, 2], 'ltable_id':[1, 2], 'rtable_id':[2, 1]})
>>> em.set_key(C, 'id')
>>> em.set_rtable(C, B)
>>> id(em.get_rtable(B) == id(B)
# True

See also

set_property()