273 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			273 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _Gradient Boosted Trees:
 | |
| 
 | |
| Gradient Boosted Trees
 | |
| ======================
 | |
| 
 | |
| .. highlight:: cpp
 | |
| 
 | |
| Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
 | |
| Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
 | |
| In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
 | |
| classification and regression problems. Moreover, it can use any
 | |
| differential loss function, some popular ones are implemented.
 | |
| Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
 | |
| and categorical variables.
 | |
| 
 | |
| .. _Training GBT:
 | |
| 
 | |
| Training the GBT model
 | |
| ----------------------
 | |
| 
 | |
| Gradient Boosted Trees model represents an ensemble of single regression trees
 | |
| built in a greedy fashion. Training procedure is an iterative process
 | |
| similar to the numerical optimization via the gradient descent method. Summary loss
 | |
| on the training set depends only on the current model predictions for the
 | |
| training samples,  in other words
 | |
| :math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
 | |
| \equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
 | |
| gradient can be computed as follows:
 | |
| 
 | |
| .. math::
 | |
|     grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
 | |
|     \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
 | |
|     \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
 | |
| 
 | |
| At every training step, a single regression tree is built to predict an
 | |
| antigradient vector components. Step length is computed corresponding to the
 | |
| loss function and separately for every region determined by the tree leaf. It
 | |
| can be eliminated by changing values of the leaves  directly.
 | |
| 
 | |
| See below the main scheme of the training process:
 | |
| 
 | |
| #.
 | |
|     Find the best constant model.
 | |
| #.
 | |
|     For :math:`i` in :math:`[1,M]`:
 | |
| 
 | |
|     #.
 | |
|         Compute the antigradient.
 | |
|     #.
 | |
|         Grow a regression tree to predict antigradient components.
 | |
|     #.
 | |
|         Change values in the tree leaves.
 | |
|     #.
 | |
|         Add the tree to the model.
 | |
| 
 | |
| 
 | |
| The following loss functions are implemented for regression problems:
 | |
| 
 | |
| *
 | |
|     Squared loss (``CvGBTrees::SQUARED_LOSS``):
 | |
|     :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
 | |
| *
 | |
|     Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
 | |
|     :math:`L(y,f(x))=|y-f(x)|`
 | |
| *
 | |
|     Huber loss (``CvGBTrees::HUBER_LOSS``):
 | |
|     :math:`L(y,f(x)) = \left\{ \begin{array}{lr}
 | |
|     \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
 | |
|     \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
 | |
|     
 | |
|     where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
 | |
|     :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
 | |
| 
 | |
| 
 | |
| The following loss functions are implemented for classification problems:
 | |
| 
 | |
| *
 | |
|     Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
 | |
|     :math:`K` functions are built, one function for each output class, and
 | |
|     :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
 | |
|     where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
 | |
|     is the estimation of the probability of :math:`y=k`.
 | |
| 
 | |
| As a result, you get the following model:
 | |
| 
 | |
| .. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
 | |
| 
 | |
| where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
 | |
| is a regularization parameter from the interval :math:`(0,1]`, further called
 | |
| *shrinkage*.
 | |
| 
 | |
| .. _Predicting with GBT:
 | |
| 
 | |
| Predicting with the GBT Model
 | |
| -----------------------------
 | |
| 
 | |
| To get the GBT model prediction, you need to compute the sum of responses of
 | |
| all the trees in the ensemble. For regression problems, it is the answer.
 | |
| For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
 | |
| 
 | |
| 
 | |
| .. highlight:: cpp
 | |
| 
 | |
| 
 | |
| CvGBTreesParams
 | |
| ---------------
 | |
| .. ocv:struct:: CvGBTreesParams : public CvDTreeParams
 | |
| 
 | |
| GBT training parameters.
 | |
| 
 | |
| The structure contains parameters for each single decision tree in the ensemble,
 | |
| as well as the whole model characteristics. The structure is derived from
 | |
| :ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
 | |
| cross-validation, pruning, and class priorities are not used.
 | |
| 
 | |
| CvGBTreesParams::CvGBTreesParams
 | |
| --------------------------------
 | |
| .. ocv:function:: CvGBTreesParams::CvGBTreesParams()
 | |
| 
 | |
| .. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
 | |
| 
 | |
|    :param loss_function_type: Type of the loss function used for training
 | |
|     (see :ref:`Training GBT`). It must be one of the
 | |
|     following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
 | |
|     ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
 | |
|     types are used for regression problems, and the last one for
 | |
|     classification.
 | |
| 
 | |
|    :param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
 | |
|     count of trees in the GBT model, where ``K`` is the output classes count
 | |
|     (equal to one in case of a regression).
 | |
|   
 | |
|    :param shrinkage: Regularization parameter (see :ref:`Training GBT`).
 | |
|     
 | |
|    :param subsample_portion: Portion of the whole training set used for each algorithm iteration.
 | |
|     Subset is generated randomly. For more information see
 | |
|     http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
 | |
| 
 | |
|    :param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
 | |
| 
 | |
|    :param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
 | |
|     
 | |
| By default the following constructor is used:
 | |
| 
 | |
| .. code-block:: cpp
 | |
| 
 | |
|     CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
 | |
|         : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
 | |
| 
 | |
| CvGBTrees
 | |
| ---------
 | |
| .. ocv:class:: CvGBTrees : public CvStatModel
 | |
| 
 | |
| The class implements the Gradient boosted tree model as described in the beginning of this section.
 | |
| 
 | |
| CvGBTrees::CvGBTrees
 | |
| --------------------
 | |
| Default and training constructors.
 | |
| 
 | |
| .. ocv:function:: CvGBTrees::CvGBTrees()
 | |
| 
 | |
| .. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
 | |
| 
 | |
| .. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
 | |
| 
 | |
| .. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
 | |
| 
 | |
| The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
 | |
| 
 | |
| CvGBTrees::train
 | |
| ----------------
 | |
| Trains a Gradient boosted tree model.
 | |
| 
 | |
| .. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | |
| 
 | |
| .. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
 | |
| 
 | |
| .. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | |
| 
 | |
| .. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
 | |
|     
 | |
| The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
 | |
| Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
 | |
| ``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
 | |
| ``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
 | |
| matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
 | |
| list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
 | |
| a dummy parameter.
 | |
| 
 | |
| The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
 | |
| data set container. ``update`` is still a dummy parameter. 
 | |
| 
 | |
| All parameters specific to the GBT model are passed into the training function
 | |
| as a :ocv:class:`CvGBTreesParams` structure.
 | |
| 
 | |
| 
 | |
| CvGBTrees::predict
 | |
| ------------------
 | |
| Predicts a response for an input sample.
 | |
| 
 | |
| .. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
 | |
| 
 | |
| .. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
 | |
| 
 | |
| .. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
 | |
| 
 | |
|    :param sample: Input feature vector that has the same format as every training set
 | |
|     element. If not all the variables were actually used during training,
 | |
|     ``sample`` contains forged values at the appropriate places.
 | |
|     
 | |
|    :param missing: Missing values mask, which is a dimensional matrix of the same size as
 | |
|     ``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
 | |
|     in the same position in the ``sample`` vector. If there are no missing values
 | |
|     in the feature vector, an empty matrix can be passed instead of the missing mask.
 | |
|     
 | |
|    :param weakResponses: Matrix used to obtain predictions of all the trees.
 | |
|     The matrix has :math:`K` rows,
 | |
|     where :math:`K` is the count of output classes (1 for the regression case).
 | |
|     The matrix has as many columns as the ``slice`` length.
 | |
|     
 | |
|    :param slice: Parameter defining the part of the ensemble used for prediction.
 | |
|     If ``slice = Range::all()``, all trees are used. Use this parameter to
 | |
|     get predictions of the GBT models with different ensemble sizes learning
 | |
|     only one model.
 | |
|     
 | |
|    :param k: Number of tree ensembles built in case of the classification problem
 | |
|     (see :ref:`Training GBT`). Use this
 | |
|     parameter to change the output to sum of the trees' predictions in the
 | |
|     ``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
 | |
|     must be -1. For regression problems, ``k`` is also equal to -1.
 | |
|  
 | |
| The method predicts the response corresponding to the given sample
 | |
| (see :ref:`Predicting with GBT`).
 | |
| The result is either the class label or the estimated function value. The
 | |
| :ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
 | |
| prediction if the OpenCV is built with the TBB library. In this case, predictions
 | |
| of single trees are computed in a parallel fashion. 
 | |
| 
 | |
|     
 | |
| CvGBTrees::clear
 | |
| ----------------
 | |
| Clears the model.
 | |
| 
 | |
| .. ocv:function:: void CvGBTrees::clear()
 | |
|     
 | |
| .. ocv:pyfunction:: cv2.GBTrees.clear() -> None
 | |
| 
 | |
| The function deletes the data set information and all the weak models and sets all internal
 | |
| variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
 | |
| destructor.
 | |
| 
 | |
| 
 | |
| CvGBTrees::calc_error
 | |
| ---------------------
 | |
| Calculates a training or testing error.
 | |
| 
 | |
| .. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
 | |
| 
 | |
|    :param _data: Data set.
 | |
|     
 | |
|    :param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
 | |
|     (``CV_TEST_ERROR``).
 | |
| 
 | |
|    :param resp: If non-zero, a vector of predictions on the corresponding data set is
 | |
|     returned.
 | |
| 
 | |
| If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
 | |
| used to get a training/testing error easily and (optionally) all predictions
 | |
| on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
 | |
| parallel way, namely, predictions for different samples are computed at the same time.
 | |
| In case of a regression problem, a mean squared error is returned. For
 | |
| classifications, the result is a misclassification error in percent.
 | 
