273 lines
12 KiB
ReStructuredText
273 lines
12 KiB
ReStructuredText
.. _Gradient Boosted Trees:
|
|
|
|
Gradient Boosted Trees
|
|
======================
|
|
|
|
.. highlight:: cpp
|
|
|
|
Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
|
|
Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
|
|
In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
|
|
classification and regression problems. Moreover, it can use any
|
|
differential loss function, some popular ones are implemented.
|
|
Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
|
|
and categorical variables.
|
|
|
|
.. _Training GBT:
|
|
|
|
Training the GBT model
|
|
----------------------
|
|
|
|
Gradient Boosted Trees model represents an ensemble of single regression trees
|
|
built in a greedy fashion. Training procedure is an iterative process
|
|
similar to the numerical optimization via the gradient descent method. Summary loss
|
|
on the training set depends only on the current model predictions for the
|
|
training samples, in other words
|
|
:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
|
|
\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
|
|
gradient can be computed as follows:
|
|
|
|
.. math::
|
|
grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
|
|
\dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
|
|
\dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
|
|
|
|
At every training step, a single regression tree is built to predict an
|
|
antigradient vector components. Step length is computed corresponding to the
|
|
loss function and separately for every region determined by the tree leaf. It
|
|
can be eliminated by changing values of the leaves directly.
|
|
|
|
See below the main scheme of the training process:
|
|
|
|
#.
|
|
Find the best constant model.
|
|
#.
|
|
For :math:`i` in :math:`[1,M]`:
|
|
|
|
#.
|
|
Compute the antigradient.
|
|
#.
|
|
Grow a regression tree to predict antigradient components.
|
|
#.
|
|
Change values in the tree leaves.
|
|
#.
|
|
Add the tree to the model.
|
|
|
|
|
|
The following loss functions are implemented for regression problems:
|
|
|
|
*
|
|
Squared loss (``CvGBTrees::SQUARED_LOSS``):
|
|
:math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
|
|
*
|
|
Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
|
|
:math:`L(y,f(x))=|y-f(x)|`
|
|
*
|
|
Huber loss (``CvGBTrees::HUBER_LOSS``):
|
|
:math:`L(y,f(x)) = \left\{ \begin{array}{lr}
|
|
\delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
|
|
\dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
|
|
|
|
where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
|
|
:math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
|
|
|
|
|
|
The following loss functions are implemented for classification problems:
|
|
|
|
*
|
|
Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
|
|
:math:`K` functions are built, one function for each output class, and
|
|
:math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
|
|
where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
|
|
is the estimation of the probability of :math:`y=k`.
|
|
|
|
As a result, you get the following model:
|
|
|
|
.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
|
|
|
|
where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
|
|
is a regularization parameter from the interval :math:`(0,1]`, further called
|
|
*shrinkage*.
|
|
|
|
.. _Predicting with GBT:
|
|
|
|
Predicting with the GBT Model
|
|
-----------------------------
|
|
|
|
To get the GBT model prediction, you need to compute the sum of responses of
|
|
all the trees in the ensemble. For regression problems, it is the answer.
|
|
For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
|
|
|
|
|
|
.. highlight:: cpp
|
|
|
|
|
|
CvGBTreesParams
|
|
---------------
|
|
.. ocv:class:: CvGBTreesParams
|
|
|
|
GBT training parameters.
|
|
|
|
The structure contains parameters for each single decision tree in the ensemble,
|
|
as well as the whole model characteristics. The structure is derived from
|
|
:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
|
|
cross-validation, pruning, and class priorities are not used.
|
|
|
|
CvGBTreesParams::CvGBTreesParams
|
|
--------------------------------
|
|
.. ocv:function:: CvGBTreesParams::CvGBTreesParams()
|
|
|
|
.. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
|
|
|
|
:param loss_function_type: Type of the loss function used for training
|
|
(see :ref:`Training GBT`). It must be one of the
|
|
following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
|
|
``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
|
|
types are used for regression problems, and the last one for
|
|
classification.
|
|
|
|
:param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
|
|
count of trees in the GBT model, where ``K`` is the output classes count
|
|
(equal to one in case of a regression).
|
|
|
|
:param shrinkage: Regularization parameter (see :ref:`Training GBT`).
|
|
|
|
:param subsample_portion: Portion of the whole training set used for each algorithm iteration.
|
|
Subset is generated randomly. For more information see
|
|
http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
|
|
|
|
:param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
|
|
|
|
:param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
|
|
|
|
By default the following constructor is used:
|
|
|
|
.. code-block:: cpp
|
|
|
|
CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
|
|
: CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
|
|
|
|
CvGBTrees
|
|
---------
|
|
.. ocv:class:: CvGBTrees
|
|
|
|
The class implements the Gradient boosted tree model as described in the beginning of this section.
|
|
|
|
CvGBTrees::CvGBTrees
|
|
--------------------
|
|
Default and training constructors.
|
|
|
|
.. ocv:function:: CvGBTrees::CvGBTrees()
|
|
|
|
.. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
|
|
|
|
.. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
|
|
|
|
.. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
|
|
|
|
The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
|
|
|
|
CvGBTrees::train
|
|
----------------
|
|
Trains a Gradient boosted tree model.
|
|
|
|
.. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
|
|
|
|
.. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
|
|
|
|
.. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
|
|
|
|
.. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
|
|
|
|
The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
|
|
Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
|
|
``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
|
|
``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
|
|
matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
|
|
list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
|
|
a dummy parameter.
|
|
|
|
The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
|
|
data set container. ``update`` is still a dummy parameter.
|
|
|
|
All parameters specific to the GBT model are passed into the training function
|
|
as a :ocv:class:`CvGBTreesParams` structure.
|
|
|
|
|
|
CvGBTrees::predict
|
|
------------------
|
|
Predicts a response for an input sample.
|
|
|
|
.. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
|
|
|
|
.. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
|
|
|
|
.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
|
|
|
|
:param sample: Input feature vector that has the same format as every training set
|
|
element. If not all the variables were actually used during training,
|
|
``sample`` contains forged values at the appropriate places.
|
|
|
|
:param missing: Missing values mask, which is a dimensional matrix of the same size as
|
|
``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
|
|
in the same position in the ``sample`` vector. If there are no missing values
|
|
in the feature vector, an empty matrix can be passed instead of the missing mask.
|
|
|
|
:param weakResponses: Matrix used to obtain predictions of all the trees.
|
|
The matrix has :math:`K` rows,
|
|
where :math:`K` is the count of output classes (1 for the regression case).
|
|
The matrix has as many columns as the ``slice`` length.
|
|
|
|
:param slice: Parameter defining the part of the ensemble used for prediction.
|
|
If ``slice = Range::all()``, all trees are used. Use this parameter to
|
|
get predictions of the GBT models with different ensemble sizes learning
|
|
only one model.
|
|
|
|
:param k: Number of tree ensembles built in case of the classification problem
|
|
(see :ref:`Training GBT`). Use this
|
|
parameter to change the output to sum of the trees' predictions in the
|
|
``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
|
|
must be -1. For regression problems, ``k`` is also equal to -1.
|
|
|
|
The method predicts the response corresponding to the given sample
|
|
(see :ref:`Predicting with GBT`).
|
|
The result is either the class label or the estimated function value. The
|
|
:ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
|
|
prediction if the OpenCV is built with the TBB library. In this case, predictions
|
|
of single trees are computed in a parallel fashion.
|
|
|
|
|
|
CvGBTrees::clear
|
|
----------------
|
|
Clears the model.
|
|
|
|
.. ocv:function:: void CvGBTrees::clear()
|
|
|
|
.. ocv:pyfunction:: cv2.GBTrees.clear() -> None
|
|
|
|
The function deletes the data set information and all the weak models and sets all internal
|
|
variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
|
|
destructor.
|
|
|
|
|
|
CvGBTrees::calc_error
|
|
---------------------
|
|
Calculates a training or testing error.
|
|
|
|
.. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
|
|
|
|
:param _data: Data set.
|
|
|
|
:param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
|
|
(``CV_TEST_ERROR``).
|
|
|
|
:param resp: If non-zero, a vector of predictions on the corresponding data set is
|
|
returned.
|
|
|
|
If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
|
|
used to get a training/testing error easily and (optionally) all predictions
|
|
on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
|
|
parallel way, namely, predictions for different samples are computed at the same time.
|
|
In case of a regression problem, a mean squared error is returned. For
|
|
classifications, the result is a misclassification error in percent.
|