opencv/modules/ml/doc/gradient_boosted_trees.rst

273 lines
12 KiB
ReStructuredText
Raw Normal View History

2012-10-17 03:18:30 +04:00
.. _Gradient Boosted Trees:
Gradient Boosted Trees
======================
.. highlight:: cpp
Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
classification and regression problems. Moreover, it can use any
differential loss function, some popular ones are implemented.
Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
and categorical variables.
.. _Training GBT:
Training the GBT model
----------------------
Gradient Boosted Trees model represents an ensemble of single regression trees
built in a greedy fashion. Training procedure is an iterative process
similar to the numerical optimization via the gradient descent method. Summary loss
on the training set depends only on the current model predictions for the
training samples, in other words
:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
gradient can be computed as follows:
.. math::
grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
\dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
\dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
At every training step, a single regression tree is built to predict an
antigradient vector components. Step length is computed corresponding to the
loss function and separately for every region determined by the tree leaf. It
can be eliminated by changing values of the leaves directly.
See below the main scheme of the training process:
#.
Find the best constant model.
#.
For :math:`i` in :math:`[1,M]`:
#.
Compute the antigradient.
#.
Grow a regression tree to predict antigradient components.
#.
Change values in the tree leaves.
#.
Add the tree to the model.
The following loss functions are implemented for regression problems:
*
Squared loss (``CvGBTrees::SQUARED_LOSS``):
:math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
*
Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
:math:`L(y,f(x))=|y-f(x)|`
*
Huber loss (``CvGBTrees::HUBER_LOSS``):
:math:`L(y,f(x)) = \left\{ \begin{array}{lr}
\delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
\dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
:math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
The following loss functions are implemented for classification problems:
*
Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
:math:`K` functions are built, one function for each output class, and
:math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
is the estimation of the probability of :math:`y=k`.
As a result, you get the following model:
.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
is a regularization parameter from the interval :math:`(0,1]`, further called
*shrinkage*.
.. _Predicting with GBT:
Predicting with the GBT Model
-----------------------------
To get the GBT model prediction, you need to compute the sum of responses of
all the trees in the ensemble. For regression problems, it is the answer.
For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
.. highlight:: cpp
CvGBTreesParams
---------------
.. ocv:struct:: CvGBTreesParams : public CvDTreeParams
GBT training parameters.
The structure contains parameters for each single decision tree in the ensemble,
as well as the whole model characteristics. The structure is derived from
:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
cross-validation, pruning, and class priorities are not used.
CvGBTreesParams::CvGBTreesParams
--------------------------------
.. ocv:function:: CvGBTreesParams::CvGBTreesParams()
.. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
:param loss_function_type: Type of the loss function used for training
(see :ref:`Training GBT`). It must be one of the
following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
types are used for regression problems, and the last one for
classification.
:param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
count of trees in the GBT model, where ``K`` is the output classes count
(equal to one in case of a regression).
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param shrinkage: Regularization parameter (see :ref:`Training GBT`).
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param subsample_portion: Portion of the whole training set used for each algorithm iteration.
Subset is generated randomly. For more information see
http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
:param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
:param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
By default the following constructor is used:
.. code-block:: cpp
CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
: CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
CvGBTrees
---------
.. ocv:class:: CvGBTrees : public CvStatModel
The class implements the Gradient boosted tree model as described in the beginning of this section.
CvGBTrees::CvGBTrees
--------------------
Default and training constructors.
.. ocv:function:: CvGBTrees::CvGBTrees()
.. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
.. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
.. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
CvGBTrees::train
----------------
Trains a Gradient boosted tree model.
.. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
.. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
.. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
.. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
a dummy parameter.
The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
2013-08-21 16:44:09 +04:00
data set container. ``update`` is still a dummy parameter.
2012-10-17 03:18:30 +04:00
All parameters specific to the GBT model are passed into the training function
as a :ocv:class:`CvGBTreesParams` structure.
CvGBTrees::predict
------------------
Predicts a response for an input sample.
.. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
.. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
:param sample: Input feature vector that has the same format as every training set
element. If not all the variables were actually used during training,
``sample`` contains forged values at the appropriate places.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param missing: Missing values mask, which is a dimensional matrix of the same size as
``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
in the same position in the ``sample`` vector. If there are no missing values
in the feature vector, an empty matrix can be passed instead of the missing mask.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param weakResponses: Matrix used to obtain predictions of all the trees.
The matrix has :math:`K` rows,
where :math:`K` is the count of output classes (1 for the regression case).
The matrix has as many columns as the ``slice`` length.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param slice: Parameter defining the part of the ensemble used for prediction.
If ``slice = Range::all()``, all trees are used. Use this parameter to
get predictions of the GBT models with different ensemble sizes learning
only one model.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param k: Number of tree ensembles built in case of the classification problem
(see :ref:`Training GBT`). Use this
parameter to change the output to sum of the trees' predictions in the
``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
must be -1. For regression problems, ``k`` is also equal to -1.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
The method predicts the response corresponding to the given sample
(see :ref:`Predicting with GBT`).
The result is either the class label or the estimated function value. The
:ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
prediction if the OpenCV is built with the TBB library. In this case, predictions
2013-08-21 16:44:09 +04:00
of single trees are computed in a parallel fashion.
2012-10-17 03:18:30 +04:00
CvGBTrees::clear
----------------
Clears the model.
.. ocv:function:: void CvGBTrees::clear()
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
.. ocv:pyfunction:: cv2.GBTrees.clear() -> None
The function deletes the data set information and all the weak models and sets all internal
variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
destructor.
CvGBTrees::calc_error
---------------------
Calculates a training or testing error.
.. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
:param _data: Data set.
2013-08-21 16:44:09 +04:00
2012-10-17 03:18:30 +04:00
:param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
(``CV_TEST_ERROR``).
:param resp: If non-zero, a vector of predictions on the corresponding data set is
returned.
If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
used to get a training/testing error easily and (optionally) all predictions
on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
parallel way, namely, predictions for different samples are computed at the same time.
In case of a regression problem, a mean squared error is returned. For
classifications, the result is a misclassification error in percent.