Profilers¶
-
py_stringsimjoin.profiler.profiler.
profile_table_for_join
(input_table, profile_attrs=None)[source]¶ Profiles the attributes in the table to suggest implications for join.
Parameters: - input_table (DataFrame) – input table to profile.
- profile_attrs (list) – list of attribute names from the input table to be profiled (defaults to None). If not provided, all attributes in the input table will be profiled.
Returns: A dataframe consisting of profile output. Specifically, the dataframe contains three columns,
- ’Unique values’ column, which shows the number of unique values in each attribute,
- ’Missing values’ column, which shows the number of missing values in each attribute, and
- ’Comments’ column, which contains comments about each attribute.
The output dataframe is indexed by attribute name, so that the statistics for each attribute can be easily accessed using the attribute name.