372 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			372 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _Gradient Boosted Trees:
 | |
| 
 | |
| Gradient Boosted Trees
 | |
| ======================
 | |
| 
 | |
| Gradient Boosted Trees (GBT) is a generalized boosting algorithm, introduced by
 | |
| Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
 | |
| In contrast to AdaBoost.M1 algorithm GBT can deal with both multiclass
 | |
| classification and regression problems. More than that it can use any
 | |
| differential loss function, some popular ones are implemented.
 | |
| Decision trees (:ref:`CvDTree`) usage as base learners allows to process ordered
 | |
| and categorical variables.
 | |
| 
 | |
| 
 | |
| .. _Training the GBT model:
 | |
| 
 | |
| Training the GBT model
 | |
| ----------------------
 | |
| 
 | |
| Gradient Boosted Trees model represents an ensemble of single regression trees,
 | |
| that are built in a greedy fashion. Training procedure is an iterative proccess
 | |
| similar to the numerical optimazation via gradient descent method. Summary loss
 | |
| on the training set depends only from the current model predictions on the
 | |
| thaining samples,  in other words
 | |
| :math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
 | |
| \equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
 | |
| gradient can be computed as follows:
 | |
| 
 | |
| .. math::
 | |
|     grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
 | |
|     \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
 | |
|     \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
 | |
| On every training step a single regression tree is built to predict an
 | |
| antigradient vector components. Step length is computed corresponding to the
 | |
| loss function and separately for every region determined by the tree leaf, and
 | |
| can be eliminated by changing leaves' values directly.
 | |
| 
 | |
| The main scheme of the training proccess is shown below.
 | |
| 
 | |
| #.
 | |
|     Find the best constant model.
 | |
| #.
 | |
|     For :math:`i` in :math:`[1,M]`:
 | |
| 
 | |
|     #.
 | |
|         Compute the antigradient.
 | |
|     #.
 | |
|         Grow a regression tree to predict antigradient components.
 | |
|     #.
 | |
|         Change values in the tree leaves.
 | |
|     #.
 | |
|         Add the tree to the model.
 | |
| 
 | |
| 
 | |
| The following loss functions are implemented:
 | |
| 
 | |
| *for regression problems:*
 | |
| 
 | |
| #.
 | |
|     Squared loss (``CvGBTrees::SQUARED_LOSS``):
 | |
|     :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
 | |
| #.
 | |
|     Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
 | |
|     :math:`L(y,f(x))=|y-f(x)|`
 | |
| #.
 | |
|     Huber loss (``CvGBTrees::HUBER_LOSS``):
 | |
|     :math:`L(y,f(x)) = \left\{ \begin{array}{lr}
 | |
|     \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
 | |
|     \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
 | |
|     where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
 | |
|     :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
 | |
| 
 | |
| *for classification problems:*
 | |
| 
 | |
| 4.
 | |
|     Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
 | |
|     :math:`K` functions are built, one function for each output class, and
 | |
|     :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
 | |
|     where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
 | |
|     is the estimation of the probability that :math:`y=k`.
 | |
| 
 | |
| In the end we get the model in the following form:
 | |
| 
 | |
| .. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
 | |
| where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
 | |
| is a regularization parameter from the interval :math:`(0,1]`, futher called
 | |
| *shrinkage*.
 | |
| 
 | |
| 
 | |
| .. _Predicting with GBT model:
 | |
| 
 | |
| Predicting with GBT model
 | |
| -------------------------
 | |
| 
 | |
| To get the GBT model prediciton it is needed to compute the sum of responses of
 | |
| all the trees in the ensemble. For regression problems it is the answer, and
 | |
| for classification problems the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
 | |
| 
 | |
| 
 | |
| .. highlight:: cpp
 | |
| 
 | |
| 
 | |
| .. index:: CvGBTreesParams
 | |
| .. _CvGBTreesParams:
 | |
| 
 | |
| CvGBTreesParams
 | |
| ---------------
 | |
| .. c:type:: CvGBTreesParams
 | |
| 
 | |
| GBT training parameters ::
 | |
| 
 | |
|     struct CvGBTreesParams : public CvDTreeParams
 | |
|     {
 | |
|         int weak_count;
 | |
|         int loss_function_type;
 | |
|         float subsample_portion;
 | |
|         float shrinkage;
 | |
| 
 | |
|         CvGBTreesParams();
 | |
|         CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage,
 | |
|             float subsample_portion, int max_depth, bool use_surrogates );
 | |
|     };
 | |
| 
 | |
| The structure contains parameters for each sigle decision tree in the ensemble,
 | |
| as well as the whole model characteristics. The structure is derived from
 | |
| :ref:`CvDTreeParams` but not all of the decision tree parameters are supported:
 | |
| cross-validation, pruning and class priorities are not used. The whole
 | |
| parameters list is shown below:
 | |
| 
 | |
| ``weak_count``
 | |
| 
 | |
|     The count of boosting algorithm iterations. ``weak_count*K`` -- is the total
 | |
|     count of trees in the GBT model, where ``K`` is the output classes count
 | |
|     (equal to one in the case of regression).
 | |
|     
 | |
| ``loss_function_type``
 | |
| 
 | |
|     The type of the loss function used for training
 | |
|     (see :ref:`Training the GBT model`). It must be one of the
 | |
|     following: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
 | |
|     ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
 | |
|     ones are used for the case of regression problems, and the last one for
 | |
|     classification.
 | |
|     
 | |
| ``shrinkage``
 | |
| 
 | |
|     Regularization parameter (see :ref:`Training the GBT model`).
 | |
|     
 | |
| ``subsample_portion``
 | |
| 
 | |
|     The portion of the whole training set used on each algorithm iteration.
 | |
|     Subset is generated randomly
 | |
|     (For more information see
 | |
|     http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf).
 | |
| 
 | |
| ``max_depth``
 | |
| 
 | |
|     The maximal depth of each decision tree in the ensemble (see :ref:`CvDTree`).
 | |
| 
 | |
| ``use_surrogates``
 | |
| 
 | |
|     If ``true`` surrogate splits are built (see :ref:`CvDTree`).
 | |
|     
 | |
| By default the following constructor is used:
 | |
| 
 | |
| .. code-block:: cpp
 | |
| 
 | |
|     CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)
 | |
|         : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
 | |
| 
 | |
| 
 | |
| 
 | |
| .. index:: CvGBTrees
 | |
| .. _CvGBTrees:
 | |
| 
 | |
| CvGBTrees
 | |
| ---------
 | |
| .. c:type:: CvGBTrees
 | |
| 
 | |
| GBT model ::
 | |
| 
 | |
| 	class CvGBTrees : public CvStatModel
 | |
| 	{
 | |
| 	public:
 | |
| 
 | |
| 		enum {SQUARED_LOSS=0, ABSOLUTE_LOSS, HUBER_LOSS=3, DEVIANCE_LOSS};
 | |
| 
 | |
| 		CvGBTrees();
 | |
| 		CvGBTrees( const cv::Mat& trainData, int tflag,
 | |
|                         const Mat& responses, const Mat& varIdx=Mat(),
 | |
|                         const Mat& sampleIdx=Mat(), const cv::Mat& varType=Mat(),
 | |
|                         const Mat& missingDataMask=Mat(),
 | |
|                         CvGBTreesParams params=CvGBTreesParams() );
 | |
| 
 | |
| 		virtual ~CvGBTrees();
 | |
| 		virtual bool train( const Mat& trainData, int tflag,
 | |
|                         const Mat& responses, const Mat& varIdx=Mat(),
 | |
|                         const Mat& sampleIdx=Mat(), const Mat& varType=Mat(),
 | |
|                         const Mat& missingDataMask=Mat(),
 | |
|                         CvGBTreesParams params=CvGBTreesParams(),
 | |
|                         bool update=false );
 | |
| 		
 | |
| 		virtual bool train( CvMLData* data,
 | |
|                         CvGBTreesParams params=CvGBTreesParams(),
 | |
|                         bool update=false );
 | |
| 
 | |
| 		virtual float predict( const Mat& sample, const Mat& missing=Mat(),
 | |
|                         const Range& slice = Range::all(),
 | |
|                         int k=-1 ) const;
 | |
| 
 | |
| 		virtual void clear();
 | |
| 
 | |
| 		virtual float calc_error( CvMLData* _data, int type,
 | |
|                         std::vector<float> *resp = 0 );
 | |
| 
 | |
| 		virtual void write( CvFileStorage* fs, const char* name ) const;
 | |
| 
 | |
| 		virtual void read( CvFileStorage* fs, CvFileNode* node );
 | |
| 
 | |
| 	protected:
 | |
| 		
 | |
| 		CvDTreeTrainData* data;
 | |
| 		CvGBTreesParams params;
 | |
| 		CvSeq** weak;
 | |
| 		Mat& orig_response;
 | |
| 		Mat& sum_response;
 | |
| 		Mat& sum_response_tmp;
 | |
| 		Mat& weak_eval;
 | |
| 		Mat& sample_idx;
 | |
| 		Mat& subsample_train;
 | |
| 		Mat& subsample_test;
 | |
| 		Mat& missing;
 | |
| 		Mat& class_labels;
 | |
| 		RNG* rng;
 | |
| 		int class_count;
 | |
| 		float delta;
 | |
| 		float base_value;
 | |
| 		
 | |
| 		...
 | |
| 
 | |
| 	};
 | |
| 
 | |
| 
 | |
| 	
 | |
| .. index:: CvGBTrees::train
 | |
| 
 | |
| .. _CvGBTrees::train:
 | |
| 
 | |
| CvGBTrees::train
 | |
| ----------------
 | |
| .. c:function:: bool train(const Mat & trainData, int tflag, const Mat & responses, const Mat & varIdx=Mat(), const Mat & sampleIdx=Mat(), const Mat & varType=Mat(), const Mat & missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | |
| 
 | |
| .. c:function:: bool train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
 | |
|     
 | |
| 	Trains a Gradient boosted tree model.
 | |
| 	
 | |
| The first train method follows the common template (see :ref:`CvStatModel::train`).
 | |
| Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
 | |
| ``trainData`` must be of ``CV_32F`` type. ``responses`` must be a matrix of type
 | |
| ``CV_32S`` or ``CV_32F``, in both cases it is converted into the ``CV_32F``
 | |
| matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
 | |
| list of indices (``CV_32S``), or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
 | |
| a dummy parameter.
 | |
| 
 | |
| The second form of :ref:`CvGBTrees::train` function uses :ref:`CvMLData` as a
 | |
| data set container. ``update`` is still a dummy parameter. 
 | |
| 
 | |
| All parameters specific to the GBT model are passed into the training function
 | |
| as a :ref:`CvGBTreesParams` structure.
 | |
| 
 | |
| 
 | |
| .. index:: CvGBTrees::predict
 | |
| 
 | |
| .. _CvGBTrees::predict:
 | |
| 
 | |
| CvGBTrees::predict
 | |
| ------------------
 | |
| .. c:function:: float predict(const Mat & sample, const Mat & missing=Mat(), const Range & slice = Range::all(), int k=-1) const
 | |
| 
 | |
|     Predicts a response for an input sample.
 | |
|  
 | |
| The method predicts the response, corresponding to the given sample
 | |
| (see :ref:`Predicting with GBT model`).
 | |
| The result is either the class label or the estimated function value.
 | |
| :c:func:`predict` method allows to use the parallel version of the GBT model
 | |
| prediction if the OpenCV is built with the TBB library. In this case predicitons
 | |
| of single trees are computed in a parallel fashion.
 | |
| 
 | |
| ``sample``
 | |
| 
 | |
|     An input feature vector, that has the same format as every training set
 | |
|     element. Hence, if not all the variables were actualy used while training,
 | |
|     ``sample`` have to contain fictive values on the appropriate places.
 | |
|     
 | |
| ``missing``
 | |
| 
 | |
|     The missing values mask. The one dimentional matrix of the same size as
 | |
|     ``sample`` having a ``CV_8U`` type. ``1`` corresponds to the missing value
 | |
|     in the same position in the ``sample`` vector. If there are no missing values
 | |
|     in the feature vector empty matrix can be passed instead of the missing mask.
 | |
|     
 | |
| ``weak_responses``
 | |
| 
 | |
|     In addition to the prediciton of the whole model all the trees' predcitions
 | |
|     can be obtained by passing a ``weak_responses`` matrix with :math:`K` rows,
 | |
|     where :math:`K` is the output classes count (1 for the case of regression)
 | |
|     and having as many columns as the ``slice`` length.
 | |
|     
 | |
| ``slice``
 | |
|     
 | |
|     Defines the part of the ensemble used for prediction.
 | |
|     All trees are used when ``slice = Range::all()``. This parameter is useful to
 | |
|     get predictions of the GBT models with different ensemble sizes learning
 | |
|     only the one model actually.
 | |
|     
 | |
| ``k``
 | |
|     
 | |
|     In the case of the classification problem not the one, but :math:`K` tree
 | |
|     ensembles are built (see :ref:`Training the GBT model`). By passing this
 | |
|     parameter the ouput can be changed to sum of the trees' predictions in the
 | |
|     ``k``'th ensemble only. To get the total GBT model prediction ``k`` value
 | |
|     must be -1. For regression problems ``k`` have to be equal to -1 also.
 | |
|     
 | |
| 
 | |
|     
 | |
| .. index:: CvGBTrees::clear
 | |
| 
 | |
| .. _CvGBTrees::clear:
 | |
| 
 | |
| CvGBTrees::clear
 | |
| ----------------
 | |
| .. c:function:: void clear()
 | |
| 
 | |
|     Clears the model.
 | |
|     
 | |
| Deletes the data set information, all the weak models and sets all internal
 | |
| variables to the initial state. Is called in :ref:`CvGBTrees::train` and in the
 | |
| destructor.
 | |
| 
 | |
| 
 | |
| .. index:: CvGBTrees::calc_error
 | |
| 
 | |
| .. _CvGBTrees::calc_error:
 | |
| 
 | |
| CvGBTrees::calc_error
 | |
| ---------------------
 | |
| .. c:function:: float calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
 | |
| 
 | |
|     Calculates training or testing error.
 | |
|     
 | |
| If the :ref:`CvMLData` data is used to store the data set :c:func:`calc_error` can be
 | |
| used to get the training or testing error easily and (optionally) all predictions
 | |
| on the training/testing set. If TBB library is used, the error is computed in a
 | |
| parallel way: predictions for different samples are computed at the same time.
 | |
| In the case of regression problem mean squared error is returned. For
 | |
| classifications the result is the misclassification error in percent.
 | |
| 
 | |
| ``_data``
 | |
| 
 | |
|     Data set.
 | |
|     
 | |
| ``type``
 | |
|     
 | |
|     Defines what error should be computed: train (``CV_TRAIN_ERROR``) or test
 | |
|     (``CV_TEST_ERROR``).
 | |
| 
 | |
| ``resp``
 | |
|     
 | |
|     If not ``0`` a vector of predictions on the corresponding data set is
 | |
|     returned.
 | |
| 
 | 
