=====================================================
Splitting Labeled Data into Training and Testing Sets
=====================================================
While doing entity matching you will have to split data for
multiple purposes. Some examples are:

1. Split labeled data into development and test. Th development
set is used to come up with right features for learning-based matcher, and
`test` set is used to evaluate the matcher.

2. Split feature vectors into a train and test set. The train
set is used to train the learning-based matcher and test set is used
for evaluation.


py_entitymatching provides `split_train_test` command for the above need.
An example of using `split_train_test` is shown below:

    >>> train_test = em.split_train_test(G, train_proportion=0.5)

In the above, `split_train_test` returns a dictionary with two keys: train, and test.
The value for the key `train` is a Dataframe containing tuples
allocated from the input table based on train_proportion.
Similarly, the value for the key `test` is a Dataframe containing
tuples for evaluation. An example of getting train and test Dataframes from the output
of `split_train_test` command is shown below:


    >>> devel_set = train_test['train']
    >>> eval_set = train_test['test']

Setting the value for train proportion would depend on the
context of its use. For instance, if the data is split for machine learning
purposes then train proportion is typically larger than the
test.
The most commonly used values of train_proportion are between
0.5 and 0.8.

Please refer to the API reference of :py:meth:`~py_entitymatching.split_train_test` for
more details.