====== Guides ====== The goal of this page is to give you some concrete examples for using py_entitymatching. These are examples with sample data that is already bundled along with the package. The examples are in the form of Jupyter notebooks. A Quick Tour of Jupyter Notebook -------------------------------- `This tutorial `_ gives a quick tour on installing and using Jupyter notebook. End-to-End EM Workflows ----------------------- * EM workflow with blocking using a overlap blocker and matching using Random Forest matcher: `Jupyter notebook `_ * EM workflow with blocking using a overlap blocker, selecting among multiple matchers, using the selected matcher to predict matches, and evaluating the predicted matches: `Jupyter notebook `_ * EM workflow with blocking using multiple blockers (overlap and attribute equivalence blocker), debugging the blocker output, selecting among multiple matchers, debugging the matcher output, using the selected matcher to predict matches, and evaluating the predicted matches: `Jupyter notebook `_ Stepwise Guides --------------- * Reading CSV files from disk: `Jupyter notebook `_ * Down sampling: `Jupyter notebook `_ * Data profiling: `Jupyter notebook `_ * Data exploration: `Jupyter notebook `_ * Blocking: * Using overlap blocker: `Jupyter notebook `_ * Using attribute equivalence blocker: `Jupyter notebook `_ * Using rule-based blocker: `Jupyter notebook `_ * Using blackbox blocker: `Jupyter notebook `_ * Combining multiple blockers: `Jupyter notebook `_ * Debugging blocker output: `Jupyter notebook `_ * Handling features: * Generating features manually: `Jupyter notebook `_ * Editing attribute types and generating features manually: `Jupyter notebook `_ * Adding features to feature table: `Jupyter notebook `_ * Removing features from feature table: `Jupyter notebook `_ * Sampling and labeling: `Jupyter notebook `_ * Matching: * Selecting the best learning-based matcher (involves splitting the labeled data, generating features, instantiating multiple matchers, debugging the matcher output): `Jupyter notebook `_ * Performing matching using rule-based matcher: `Jupyter notebook `_ * Improving matching results using triggers: `Jupyter notebook `_ * Evaluating the predictions from a matcher: `Jupyter notebook `_