Run a cross validation procedure for a given algorithm, reporting accuracy measures and computation times. See an example in the :ref:`User Guide `. Args: algo(:obj:`AlgoBase \ `):
(
algo,
data,
measures=["rmse", "mae"],
cv=None,
return_train_measures=False,
n_jobs=1,
pre_dispatch="2*n_jobs",
verbose=False,
)
| 14 | |
| 15 | |
| 16 | def cross_validate( |
| 17 | algo, |
| 18 | data, |
| 19 | measures=["rmse", "mae"], |
| 20 | cv=None, |
| 21 | return_train_measures=False, |
| 22 | n_jobs=1, |
| 23 | pre_dispatch="2*n_jobs", |
| 24 | verbose=False, |
| 25 | ): |
| 26 | """ |
| 27 | Run a cross validation procedure for a given algorithm, reporting accuracy |
| 28 | measures and computation times. |
| 29 | |
| 30 | See an example in the :ref:`User Guide <cross_validate_example>`. |
| 31 | |
| 32 | Args: |
| 33 | algo(:obj:`AlgoBase \ |
| 34 | <surprise.prediction_algorithms.algo_base.AlgoBase>`): |
| 35 | The algorithm to evaluate. |
| 36 | data(:obj:`Dataset <surprise.dataset.Dataset>`): The dataset on which |
| 37 | to evaluate the algorithm. |
| 38 | measures(list of string): The performance measures to compute. Allowed |
| 39 | names are function names as defined in the :mod:`accuracy |
| 40 | <surprise.accuracy>` module. Default is ``['rmse', 'mae']``. |
| 41 | cv(cross-validation iterator, int or ``None``): Determines how the |
| 42 | ``data`` parameter will be split (i.e. how trainsets and testsets |
| 43 | will be defined). If an int is passed, :class:`KFold |
| 44 | <surprise.model_selection.split.KFold>` is used with the |
| 45 | appropriate ``n_splits`` parameter. If ``None``, :class:`KFold |
| 46 | <surprise.model_selection.split.KFold>` is used with |
| 47 | ``n_splits=5``. |
| 48 | return_train_measures(bool): Whether to compute performance measures on |
| 49 | the trainsets. Default is ``False``. |
| 50 | n_jobs(int): The maximum number of folds evaluated in parallel. |
| 51 | |
| 52 | - If ``-1``, all CPUs are used. |
| 53 | - If ``1`` is given, no parallel computing code is used at all,\ |
| 54 | which is useful for debugging. |
| 55 | - For ``n_jobs`` below ``-1``, ``(n_cpus + n_jobs + 1)`` are\ |
| 56 | used. For example, with ``n_jobs = -2`` all CPUs but one are\ |
| 57 | used. |
| 58 | |
| 59 | Default is ``1``. |
| 60 | pre_dispatch(int or string): Controls the number of jobs that get |
| 61 | dispatched during parallel execution. Reducing this number can be |
| 62 | useful to avoid an explosion of memory consumption when more jobs |
| 63 | get dispatched than CPUs can process. This parameter can be: |
| 64 | |
| 65 | - ``None``, in which case all the jobs are immediately created\ |
| 66 | and spawned. Use this for lightweight and fast-running\ |
| 67 | jobs, to avoid delays due to on-demand spawning of the\ |
| 68 | jobs. |
| 69 | - An int, giving the exact number of total jobs that are\ |
| 70 | spawned. |
| 71 | - A string, giving an expression as a function of ``n_jobs``,\ |
| 72 | as in ``'2*n_jobs'``. |
| 73 |