Imputing Missing Values¶
-
py_entitymatching.
impute_table
(table, exclude_attrs=None, missing_val='NaN', strategy='mean', axis=0, val_all_nans=0, verbose=True)[source]¶ Impute table containing missing values.
- Parameters
table (DataFrame) – DataFrame which values should be imputed.
exclude_attrs (List) – list of attribute names to be excluded from imputing (defaults to None).
missing_val (string or int) – The placeholder for the missing values. All occurrences of missing_values will be imputed. For missing values encoded as np.nan, use the string value ‘NaN’ (defaults to ‘NaN’).
strategy (string) – String that specifies on how to impute values. Valid strings: ‘mean’, ‘median’, ‘most_frequent’ (defaults to ‘mean’).
axis (int) – axis=1 along rows, and axis=0 along columns (defaults to 0).
val_all_nans (float) – Value to fill in if all the values in the column are NaN.
- Returns
Imputed DataFrame.
- Raises
AssertionError – If table is not of type pandas DataFrame.
Examples
>>> import py_entitymatching as em >>> # H is the feature vector which should be imputed. Specifically, impute the missing values >>> # in each column, with the mean of that column >>> H = em.impute_table(H, exclude_attrs=['_id', 'ltable_id', 'rtable_id'], strategy='mean')