273 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			273 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _Gradient Boosted Trees:
 | 
						|
 | 
						|
Gradient Boosted Trees
 | 
						|
======================
 | 
						|
 | 
						|
.. highlight:: cpp
 | 
						|
 | 
						|
Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
 | 
						|
Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
 | 
						|
In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
 | 
						|
classification and regression problems. Moreover, it can use any
 | 
						|
differential loss function, some popular ones are implemented.
 | 
						|
Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
 | 
						|
and categorical variables.
 | 
						|
 | 
						|
.. _Training GBT:
 | 
						|
 | 
						|
Training the GBT model
 | 
						|
----------------------
 | 
						|
 | 
						|
Gradient Boosted Trees model represents an ensemble of single regression trees
 | 
						|
built in a greedy fashion. Training procedure is an iterative process
 | 
						|
similar to the numerical optimization via the gradient descent method. Summary loss
 | 
						|
on the training set depends only on the current model predictions for the
 | 
						|
training samples,  in other words
 | 
						|
:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
 | 
						|
\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
 | 
						|
gradient can be computed as follows:
 | 
						|
 | 
						|
.. math::
 | 
						|
    grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
 | 
						|
    \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
 | 
						|
    \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
 | 
						|
 | 
						|
At every training step, a single regression tree is built to predict an
 | 
						|
antigradient vector components. Step length is computed corresponding to the
 | 
						|
loss function and separately for every region determined by the tree leaf. It
 | 
						|
can be eliminated by changing values of the leaves  directly.
 | 
						|
 | 
						|
See below the main scheme of the training process:
 | 
						|
 | 
						|
#.
 | 
						|
    Find the best constant model.
 | 
						|
#.
 | 
						|
    For :math:`i` in :math:`[1,M]`:
 | 
						|
 | 
						|
    #.
 | 
						|
        Compute the antigradient.
 | 
						|
    #.
 | 
						|
        Grow a regression tree to predict antigradient components.
 | 
						|
    #.
 | 
						|
        Change values in the tree leaves.
 | 
						|
    #.
 | 
						|
        Add the tree to the model.
 | 
						|
 | 
						|
 | 
						|
The following loss functions are implemented for regression problems:
 | 
						|
 | 
						|
*
 | 
						|
    Squared loss (``CvGBTrees::SQUARED_LOSS``):
 | 
						|
    :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
 | 
						|
*
 | 
						|
    Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
 | 
						|
    :math:`L(y,f(x))=|y-f(x)|`
 | 
						|
*
 | 
						|
    Huber loss (``CvGBTrees::HUBER_LOSS``):
 | 
						|
    :math:`L(y,f(x)) = \left\{ \begin{array}{lr}
 | 
						|
    \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
 | 
						|
    \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
 | 
						|
    
 | 
						|
    where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
 | 
						|
    :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
 | 
						|
 | 
						|
 | 
						|
The following loss functions are implemented for classification problems:
 | 
						|
 | 
						|
*
 | 
						|
    Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
 | 
						|
    :math:`K` functions are built, one function for each output class, and
 | 
						|
    :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
 | 
						|
    where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
 | 
						|
    is the estimation of the probability of :math:`y=k`.
 | 
						|
 | 
						|
As a result, you get the following model:
 | 
						|
 | 
						|
.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
 | 
						|
 | 
						|
where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
 | 
						|
is a regularization parameter from the interval :math:`(0,1]`, further called
 | 
						|
*shrinkage*.
 | 
						|
 | 
						|
.. _Predicting with GBT:
 | 
						|
 | 
						|
Predicting with the GBT Model
 | 
						|
-----------------------------
 | 
						|
 | 
						|
To get the GBT model prediction, you need to compute the sum of responses of
 | 
						|
all the trees in the ensemble. For regression problems, it is the answer.
 | 
						|
For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
 | 
						|
 | 
						|
 | 
						|
.. highlight:: cpp
 | 
						|
 | 
						|
 | 
						|
CvGBTreesParams
 | 
						|
---------------
 | 
						|
.. ocv:struct:: CvGBTreesParams : public CvDTreeParams
 | 
						|
 | 
						|
GBT training parameters.
 | 
						|
 | 
						|
The structure contains parameters for each single decision tree in the ensemble,
 | 
						|
as well as the whole model characteristics. The structure is derived from
 | 
						|
:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
 | 
						|
cross-validation, pruning, and class priorities are not used.
 | 
						|
 | 
						|
CvGBTreesParams::CvGBTreesParams
 | 
						|
--------------------------------
 | 
						|
.. ocv:function:: CvGBTreesParams::CvGBTreesParams()
 | 
						|
 | 
						|
.. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
 | 
						|
 | 
						|
   :param loss_function_type: Type of the loss function used for training
 | 
						|
    (see :ref:`Training GBT`). It must be one of the
 | 
						|
    following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
 | 
						|
    ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
 | 
						|
    types are used for regression problems, and the last one for
 | 
						|
    classification.
 | 
						|
 | 
						|
   :param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
 | 
						|
    count of trees in the GBT model, where ``K`` is the output classes count
 | 
						|
    (equal to one in case of a regression).
 | 
						|
  
 | 
						|
   :param shrinkage: Regularization parameter (see :ref:`Training GBT`).
 | 
						|
    
 | 
						|
   :param subsample_portion: Portion of the whole training set used for each algorithm iteration.
 | 
						|
    Subset is generated randomly. For more information see
 | 
						|
    http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
 | 
						|
 | 
						|
   :param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
 | 
						|
 | 
						|
   :param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
 | 
						|
    
 | 
						|
By default the following constructor is used:
 | 
						|
 | 
						|
.. code-block:: cpp
 | 
						|
 | 
						|
    CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
 | 
						|
        : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
 | 
						|
 | 
						|
CvGBTrees
 | 
						|
---------
 | 
						|
.. ocv:class:: CvGBTrees : public CvStatModel
 | 
						|
 | 
						|
The class implements the Gradient boosted tree model as described in the beginning of this section.
 | 
						|
 | 
						|
CvGBTrees::CvGBTrees
 | 
						|
--------------------
 | 
						|
Default and training constructors.
 | 
						|
 | 
						|
.. ocv:function:: CvGBTrees::CvGBTrees()
 | 
						|
 | 
						|
.. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
 | 
						|
 | 
						|
.. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
 | 
						|
 | 
						|
.. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
 | 
						|
 | 
						|
The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
 | 
						|
 | 
						|
CvGBTrees::train
 | 
						|
----------------
 | 
						|
Trains a Gradient boosted tree model.
 | 
						|
 | 
						|
.. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | 
						|
 | 
						|
.. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
 | 
						|
 | 
						|
.. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | 
						|
 | 
						|
.. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
 | 
						|
    
 | 
						|
The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
 | 
						|
Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
 | 
						|
``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
 | 
						|
``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
 | 
						|
matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
 | 
						|
list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
 | 
						|
a dummy parameter.
 | 
						|
 | 
						|
The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
 | 
						|
data set container. ``update`` is still a dummy parameter. 
 | 
						|
 | 
						|
All parameters specific to the GBT model are passed into the training function
 | 
						|
as a :ocv:class:`CvGBTreesParams` structure.
 | 
						|
 | 
						|
 | 
						|
CvGBTrees::predict
 | 
						|
------------------
 | 
						|
Predicts a response for an input sample.
 | 
						|
 | 
						|
.. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
 | 
						|
 | 
						|
.. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
 | 
						|
 | 
						|
.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
 | 
						|
 | 
						|
   :param sample: Input feature vector that has the same format as every training set
 | 
						|
    element. If not all the variables were actually used during training,
 | 
						|
    ``sample`` contains forged values at the appropriate places.
 | 
						|
    
 | 
						|
   :param missing: Missing values mask, which is a dimensional matrix of the same size as
 | 
						|
    ``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
 | 
						|
    in the same position in the ``sample`` vector. If there are no missing values
 | 
						|
    in the feature vector, an empty matrix can be passed instead of the missing mask.
 | 
						|
    
 | 
						|
   :param weakResponses: Matrix used to obtain predictions of all the trees.
 | 
						|
    The matrix has :math:`K` rows,
 | 
						|
    where :math:`K` is the count of output classes (1 for the regression case).
 | 
						|
    The matrix has as many columns as the ``slice`` length.
 | 
						|
    
 | 
						|
   :param slice: Parameter defining the part of the ensemble used for prediction.
 | 
						|
    If ``slice = Range::all()``, all trees are used. Use this parameter to
 | 
						|
    get predictions of the GBT models with different ensemble sizes learning
 | 
						|
    only one model.
 | 
						|
    
 | 
						|
   :param k: Number of tree ensembles built in case of the classification problem
 | 
						|
    (see :ref:`Training GBT`). Use this
 | 
						|
    parameter to change the output to sum of the trees' predictions in the
 | 
						|
    ``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
 | 
						|
    must be -1. For regression problems, ``k`` is also equal to -1.
 | 
						|
 
 | 
						|
The method predicts the response corresponding to the given sample
 | 
						|
(see :ref:`Predicting with GBT`).
 | 
						|
The result is either the class label or the estimated function value. The
 | 
						|
:ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
 | 
						|
prediction if the OpenCV is built with the TBB library. In this case, predictions
 | 
						|
of single trees are computed in a parallel fashion. 
 | 
						|
 | 
						|
    
 | 
						|
CvGBTrees::clear
 | 
						|
----------------
 | 
						|
Clears the model.
 | 
						|
 | 
						|
.. ocv:function:: void CvGBTrees::clear()
 | 
						|
    
 | 
						|
.. ocv:pyfunction:: cv2.GBTrees.clear() -> None
 | 
						|
 | 
						|
The function deletes the data set information and all the weak models and sets all internal
 | 
						|
variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
 | 
						|
destructor.
 | 
						|
 | 
						|
 | 
						|
CvGBTrees::calc_error
 | 
						|
---------------------
 | 
						|
Calculates a training or testing error.
 | 
						|
 | 
						|
.. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
 | 
						|
 | 
						|
   :param _data: Data set.
 | 
						|
    
 | 
						|
   :param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
 | 
						|
    (``CV_TEST_ERROR``).
 | 
						|
 | 
						|
   :param resp: If non-zero, a vector of predictions on the corresponding data set is
 | 
						|
    returned.
 | 
						|
 | 
						|
If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
 | 
						|
used to get a training/testing error easily and (optionally) all predictions
 | 
						|
on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
 | 
						|
parallel way, namely, predictions for different samples are computed at the same time.
 | 
						|
In case of a regression problem, a mean squared error is returned. For
 | 
						|
classifications, the result is a misclassification error in percent.
 |