2011-06-15 23:54:25 +02:00
.. _Gradient Boosted Trees:
Gradient Boosted Trees
======================
2011-06-30 00:06:42 +02:00
.. highlight :: cpp
Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
2011-06-15 23:54:25 +02:00
Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
2011-06-30 00:06:42 +02:00
In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
classification and regression problems. Moreover, it can use any
2011-06-15 23:54:25 +02:00
differential loss function, some popular ones are implemented.
2011-06-30 00:06:42 +02:00
Decision trees (:ocv:class: `CvDTree` ) usage as base learners allows to process ordered
2011-06-15 23:54:25 +02:00
and categorical variables.
2011-06-30 14:06:26 +02:00
.. _Training GBT:
2011-06-15 23:54:25 +02:00
Training the GBT model
----------------------
2011-06-30 00:06:42 +02:00
Gradient Boosted Trees model represents an ensemble of single regression trees
2012-04-13 21:04:44 +02:00
built in a greedy fashion. Training procedure is an iterative process
2011-06-30 00:06:42 +02:00
similar to the numerical optimization via the gradient descent method. Summary loss
on the training set depends only on the current model predictions for the
2012-04-13 21:04:44 +02:00
training samples, in other words
2011-06-15 23:54:25 +02:00
:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
\equiv \mathcal{L}(F)`. And the :math:` \mathcal{L}(F)`
gradient can be computed as follows:
.. math ::
grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
\dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
\dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
2011-06-30 00:06:42 +02:00
At every training step, a single regression tree is built to predict an
2011-06-15 23:54:25 +02:00
antigradient vector components. Step length is computed corresponding to the
2011-06-30 00:06:42 +02:00
loss function and separately for every region determined by the tree leaf. It
can be eliminated by changing values of the leaves directly.
2011-06-15 23:54:25 +02:00
2012-04-13 21:04:44 +02:00
See below the main scheme of the training process:
2011-06-15 23:54:25 +02:00
#.
Find the best constant model.
#.
For :math: `i` in :math: `[1,M]` :
#.
Compute the antigradient.
#.
Grow a regression tree to predict antigradient components.
#.
Change values in the tree leaves.
#.
Add the tree to the model.
2011-06-30 00:06:42 +02:00
The following loss functions are implemented for regression problems:
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
*
2011-06-15 23:54:25 +02:00
Squared loss (`` CvGBTrees::SQUARED_LOSS `` ):
:math: `L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
2011-06-30 00:06:42 +02:00
*
2011-06-15 23:54:25 +02:00
Absolute loss (`` CvGBTrees::ABSOLUTE_LOSS `` ):
:math: `L(y,f(x))=|y-f(x)|`
2011-06-30 00:06:42 +02:00
*
2011-06-15 23:54:25 +02:00
Huber loss (`` CvGBTrees::HUBER_LOSS `` ):
:math:`L(y,f(x)) = \left\{ \begin{array}{lr}
\delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
\dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
2011-06-30 00:06:42 +02:00
2011-06-30 14:06:26 +02:00
where :math: `\delta` is the :math: `\alpha` -quantile estimation of the
2011-06-15 23:54:25 +02:00
:math: `|y-f(x)|` . In the current implementation :math: `\alpha=0.2` .
2011-06-30 00:06:42 +02:00
The following loss functions are implemented for classification problems:
*
2011-06-15 23:54:25 +02:00
Deviance or cross-entropy loss (`` CvGBTrees::DEVIANCE_LOSS `` ):
:math: `K` functions are built, one function for each output class, and
:math: `L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}` ,
where :math: `p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
2011-06-30 00:06:42 +02:00
is the estimation of the probability of :math: `y=k` .
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
As a result, you get the following model:
2011-06-15 23:54:25 +02:00
.. math :: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
2011-06-30 00:06:42 +02:00
2011-06-30 02:41:41 +02:00
where :math: `f_0` is the initial guess (the best constant model) and :math: `\nu`
2012-04-13 21:04:44 +02:00
is a regularization parameter from the interval :math: `(0,1]` , further called
2011-06-15 23:54:25 +02:00
*shrinkage* .
2011-06-30 14:06:26 +02:00
.. _Predicting with GBT:
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
Predicting with the GBT Model
2011-06-30 14:06:26 +02:00
-----------------------------
2011-06-15 23:54:25 +02:00
2012-04-13 21:04:44 +02:00
To get the GBT model prediction, you need to compute the sum of responses of
2011-06-30 00:06:42 +02:00
all the trees in the ensemble. For regression problems, it is the answer.
For classification problems, the result is :math: `\arg\max_{i=1..K}(f_i(x))` .
2011-06-15 23:54:25 +02:00
.. highlight :: cpp
CvGBTreesParams
---------------
2012-05-30 13:13:07 +02:00
.. ocv:struct :: CvGBTreesParams : public CvDTreeParams
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
GBT training parameters.
2011-06-15 23:54:25 +02:00
2012-04-13 21:04:44 +02:00
The structure contains parameters for each single decision tree in the ensemble,
2011-06-15 23:54:25 +02:00
as well as the whole model characteristics. The structure is derived from
2011-06-30 00:06:42 +02:00
:ocv:class: `CvDTreeParams` but not all of the decision tree parameters are supported:
cross-validation, pruning, and class priorities are not used.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
CvGBTreesParams::CvGBTreesParams
--------------------------------
.. ocv:function :: CvGBTreesParams::CvGBTreesParams()
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param loss_function_type: Type of the loss function used for training
2011-06-30 14:06:26 +02:00
(see :ref: `Training GBT` ). It must be one of the
2011-06-30 00:06:42 +02:00
following types: `` CvGBTrees::SQUARED_LOSS `` , `` CvGBTrees::ABSOLUTE_LOSS `` ,
2011-06-15 23:54:25 +02:00
`` CvGBTrees::HUBER_LOSS `` , `` CvGBTrees::DEVIANCE_LOSS `` . The first three
2011-06-30 00:06:42 +02:00
types are used for regression problems, and the last one for
2011-06-15 23:54:25 +02:00
classification.
2011-06-30 00:06:42 +02:00
:param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
count of trees in the GBT model, where `` K `` is the output classes count
(equal to one in case of a regression).
2011-06-30 14:06:26 +02:00
:param shrinkage: Regularization parameter (see :ref:`Training GBT`).
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param subsample_portion: Portion of the whole training set used for each algorithm iteration.
Subset is generated randomly. For more information see
http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
2011-06-15 23:54:25 +02:00
By default the following constructor is used:
.. code-block :: cpp
CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
: CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
2011-06-30 00:06:42 +02:00
CvGBTrees
---------
2012-05-30 13:13:07 +02:00
.. ocv:class :: CvGBTrees : public CvStatModel
2011-06-30 00:06:42 +02:00
The class implements the Gradient boosted tree model as described in the beginning of this section.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
CvGBTrees::CvGBTrees
--------------------
Default and training constructors.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: CvGBTrees::CvGBTrees()
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
2011-08-13 18:49:40 +02:00
.. ocv:function :: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
2011-06-30 00:06:42 +02:00
.. ocv:pyfunction :: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
The constructors follow conventions of :ocv:func: `CvStatModel::CvStatModel` . See :ocv:func: `CvStatModel::train` for parameters descriptions.
2011-06-15 23:54:25 +02:00
CvGBTrees::train
----------------
2011-06-30 00:06:42 +02:00
Trains a Gradient boosted tree model.
.. ocv:function :: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
2011-06-15 23:54:25 +02:00
2011-08-13 18:49:40 +02:00
.. ocv:function :: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
2011-06-30 00:06:42 +02:00
2011-08-13 18:49:40 +02:00
.. ocv:function :: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
2011-06-30 00:06:42 +02:00
.. ocv:pyfunction :: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
The first train method follows the common template (see :ocv:func: `CvStatModel::train` ).
2011-06-15 23:54:25 +02:00
Both `` tflag `` values (`` CV_ROW_SAMPLE `` , `` CV_COL_SAMPLE `` ) are supported.
2011-06-30 00:06:42 +02:00
`` trainData `` must be of the `` CV_32F `` type. `` responses `` must be a matrix of type
`` CV_32S `` or `` CV_32F `` . In both cases it is converted into the `` CV_32F ``
2011-06-15 23:54:25 +02:00
matrix inside the training procedure. `` varIdx `` and `` sampleIdx `` must be a
2011-06-30 00:06:42 +02:00
list of indices (`` CV_32S `` ) or a mask (`` CV_8U `` or `` CV_8S `` ). `` update `` is
2011-06-15 23:54:25 +02:00
a dummy parameter.
2011-06-30 00:06:42 +02:00
The second form of :ocv:func: `CvGBTrees::train` function uses :ocv:class: `CvMLData` as a
2011-06-15 23:54:25 +02:00
data set container. `` update `` is still a dummy parameter.
All parameters specific to the GBT model are passed into the training function
2011-06-30 00:06:42 +02:00
as a :ocv:class: `CvGBTreesParams` structure.
2011-06-15 23:54:25 +02:00
CvGBTrees::predict
------------------
2011-06-30 00:06:42 +02:00
Predicts a response for an input sample.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
2011-06-15 23:54:25 +02:00
2011-08-13 18:49:40 +02:00
.. ocv:function :: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:pyfunction :: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
:param sample: Input feature vector that has the same format as every training set
2012-04-13 21:04:44 +02:00
element. If not all the variables were actually used during training,
2011-06-30 00:06:42 +02:00
`` sample `` contains forged values at the appropriate places.
2011-06-15 23:54:25 +02:00
2012-04-13 21:04:44 +02:00
:param missing: Missing values mask, which is a dimensional matrix of the same size as
2011-06-30 00:06:42 +02:00
`` sample `` having the `` CV_8U `` type. `` 1 `` corresponds to the missing value
in the same position in the `` sample `` vector. If there are no missing values
in the feature vector, an empty matrix can be passed instead of the missing mask.
2011-06-15 23:54:25 +02:00
2012-03-29 08:50:05 +02:00
:param weakResponses: Matrix used to obtain predictions of all the trees.
2011-06-30 00:06:42 +02:00
The matrix has :math: `K` rows,
where :math: `K` is the count of output classes (1 for the regression case).
The matrix has as many columns as the `` slice `` length.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param slice: Parameter defining the part of the ensemble used for prediction.
If `` slice = Range::all() `` , all trees are used. Use this parameter to
get predictions of the GBT models with different ensemble sizes learning
only one model.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param k: Number of tree ensembles built in case of the classification problem
2011-06-30 14:06:26 +02:00
(see :ref: `Training GBT` ). Use this
2012-04-13 21:04:44 +02:00
parameter to change the output to sum of the trees' predictions in the
2011-06-30 00:06:42 +02:00
`` k `` -th ensemble only. To get the total GBT model prediction, `` k `` value
must be -1. For regression problems, `` k `` is also equal to -1.
The method predicts the response corresponding to the given sample
2011-06-30 14:06:26 +02:00
(see :ref: `Predicting with GBT` ).
2011-06-30 00:06:42 +02:00
The result is either the class label or the estimated function value. The
2011-08-13 18:49:40 +02:00
:ocv:func: `CvGBTrees::predict` method enables using the parallel version of the GBT model
2011-06-30 00:06:42 +02:00
prediction if the OpenCV is built with the TBB library. In this case, predictions
of single trees are computed in a parallel fashion.
2011-06-15 23:54:25 +02:00
CvGBTrees::clear
----------------
2011-06-30 00:06:42 +02:00
Clears the model.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: void CvGBTrees::clear()
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:pyfunction :: cv2.GBTrees.clear() -> None
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
The function deletes the data set information and all the weak models and sets all internal
variables to the initial state. The function is called in :ocv:func: `CvGBTrees::train` and in the
destructor.
2011-06-15 23:54:25 +02:00
CvGBTrees::calc_error
---------------------
2011-06-30 00:06:42 +02:00
Calculates a training or testing error.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
.. ocv:function :: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> * resp = 0 )
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param _data: Data set.
2011-06-15 23:54:25 +02:00
2011-06-30 00:06:42 +02:00
:param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
2011-06-15 23:54:25 +02:00
(`` CV_TEST_ERROR `` ).
2011-06-30 00:06:42 +02:00
:param resp: If non-zero, a vector of predictions on the corresponding data set is
2011-06-15 23:54:25 +02:00
returned.
2011-08-13 18:49:40 +02:00
If the :ocv:class: `CvMLData` data is used to store the data set, :ocv:func: `CvGBTrees::calc_error` can be
2011-06-30 00:06:42 +02:00
used to get a training/testing error easily and (optionally) all predictions
on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
parallel way, namely, predictions for different samples are computed at the same time.
In case of a regression problem, a mean squared error is returned. For
classifications, the result is a misclassification error in percent.