merged all the latest changes from 2.4 to trunk
This commit is contained in:
@@ -83,7 +83,7 @@ The constructors.
|
||||
|
||||
:param boost_type: Type of the boosting algorithm. Possible values are:
|
||||
|
||||
* **CvBoost::DISCRETE** Discrete AbaBoost.
|
||||
* **CvBoost::DISCRETE** Discrete AdaBoost.
|
||||
* **CvBoost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
|
||||
* **CvBoost::LOGIT** LogitBoost. It can produce good regression fits.
|
||||
* **CvBoost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
|
||||
|
@@ -159,9 +159,9 @@ The constructors.
|
||||
|
||||
:param max_depth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``max_depth``. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned.
|
||||
|
||||
:param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be splitted.
|
||||
:param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be split.
|
||||
|
||||
:param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be splitted.
|
||||
:param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split.
|
||||
|
||||
:param use_surrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly.
|
||||
|
||||
@@ -239,6 +239,8 @@ There are four ``train`` methods in :ocv:class:`CvDTree`:
|
||||
|
||||
* The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ocv:class:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
|
||||
|
||||
The function is parallelized with the TBB library.
|
||||
|
||||
|
||||
|
||||
CvDTree::predict
|
||||
@@ -273,7 +275,7 @@ Returns error of the decision tree.
|
||||
|
||||
* **CV_TRAIN_ERROR** Error on train samples.
|
||||
|
||||
* **CV_TEST_ERROR** Erron on test samples.
|
||||
* **CV_TEST_ERROR** Error on test samples.
|
||||
|
||||
:param resp: If it is not null then size of this vector will be set to the number of samples and each element will be set to result of prediction on the corresponding sample.
|
||||
|
||||
|
@@ -5,11 +5,11 @@ Extremely randomized trees have been introduced by Pierre Geurts, Damien Ernst a
|
||||
|
||||
#. Extremely randomized trees don't apply the bagging procedure to constract the training samples for each tree. The same input training set is used to train all trees.
|
||||
|
||||
#. Extremely randomized trees pick a node split very extremely (both a variable index and variable spliting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable spliting value) among random subset of variables.
|
||||
#. Extremely randomized trees pick a node split very extremely (both a variable index and variable splitting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable splitting value) among random subset of variables.
|
||||
|
||||
|
||||
CvERTrees
|
||||
----------
|
||||
.. ocv:class:: CvERTrees
|
||||
|
||||
The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get detailes. To set the training parameters of Extremely randomized trees the same class :ocv:class:`CvRTParams` is used.
|
||||
The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get details. To set the training parameters of Extremely randomized trees the same class :ocv:class:`CvRTParams` is used.
|
||||
|
@@ -61,7 +61,7 @@ At the second step (Maximization step or M-step), the mixture parameter estimate
|
||||
Alternatively, the algorithm may start with the M-step when the initial values for
|
||||
:math:`p_{i,k}` can be provided. Another alternative when
|
||||
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
|
||||
:math:`p_{i,k}` . Often (including macnine learning) the
|
||||
:math:`p_{i,k}` . Often (including machine learning) the
|
||||
:ocv:func:`kmeans` algorithm is used for that purpose.
|
||||
|
||||
One of the main problems of the EM algorithm is a large number
|
||||
@@ -115,7 +115,7 @@ The constructors
|
||||
:param start_step: The start step of the EM algorithm:
|
||||
|
||||
* **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixture components to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
|
||||
* **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilites :math:`p_{i,k}` to use this option.
|
||||
* **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilities :math:`p_{i,k}` to use this option.
|
||||
* **CvEM::START_AUTO_STEP** Start with Expectation step. You need not provide any parameters because they will be estimated by the k-means algorithm.
|
||||
|
||||
:param term_crit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``term_crit.max_iter`` (number of M-steps) or when relative change of likelihood logarithm is less than ``term_crit.epsilon``.
|
||||
@@ -139,7 +139,7 @@ The default constructor represents a rough rule-of-the-thumb:
|
||||
}
|
||||
|
||||
|
||||
With another contstructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
|
||||
With another constructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
|
||||
|
||||
|
||||
CvEM
|
||||
@@ -250,7 +250,7 @@ Returns vectors of probabilities for each training sample.
|
||||
|
||||
.. ocv:pyfunction:: cv2.EM.getProbs() -> probs
|
||||
|
||||
For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilites :math:`p_{i,k}` to belong to a mixture component :math:`k`.
|
||||
For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilities :math:`p_{i,k}` to belong to a mixture component :math:`k`.
|
||||
|
||||
|
||||
CvEM::getLikelihood
|
||||
|
@@ -19,10 +19,10 @@ Training the GBT model
|
||||
----------------------
|
||||
|
||||
Gradient Boosted Trees model represents an ensemble of single regression trees
|
||||
built in a greedy fashion. Training procedure is an iterative proccess
|
||||
built in a greedy fashion. Training procedure is an iterative process
|
||||
similar to the numerical optimization via the gradient descent method. Summary loss
|
||||
on the training set depends only on the current model predictions for the
|
||||
thaining samples, in other words
|
||||
training samples, in other words
|
||||
:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
|
||||
\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
|
||||
gradient can be computed as follows:
|
||||
@@ -37,7 +37,7 @@ antigradient vector components. Step length is computed corresponding to the
|
||||
loss function and separately for every region determined by the tree leaf. It
|
||||
can be eliminated by changing values of the leaves directly.
|
||||
|
||||
See below the main scheme of the training proccess:
|
||||
See below the main scheme of the training process:
|
||||
|
||||
#.
|
||||
Find the best constant model.
|
||||
@@ -86,7 +86,7 @@ As a result, you get the following model:
|
||||
.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
|
||||
|
||||
where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
|
||||
is a regularization parameter from the interval :math:`(0,1]`, futher called
|
||||
is a regularization parameter from the interval :math:`(0,1]`, further called
|
||||
*shrinkage*.
|
||||
|
||||
.. _Predicting with GBT:
|
||||
@@ -94,7 +94,7 @@ is a regularization parameter from the interval :math:`(0,1]`, futher called
|
||||
Predicting with the GBT Model
|
||||
-----------------------------
|
||||
|
||||
To get the GBT model prediciton, you need to compute the sum of responses of
|
||||
To get the GBT model prediction, you need to compute the sum of responses of
|
||||
all the trees in the ensemble. For regression problems, it is the answer.
|
||||
For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
|
||||
|
||||
@@ -108,7 +108,7 @@ CvGBTreesParams
|
||||
|
||||
GBT training parameters.
|
||||
|
||||
The structure contains parameters for each sigle decision tree in the ensemble,
|
||||
The structure contains parameters for each single decision tree in the ensemble,
|
||||
as well as the whole model characteristics. The structure is derived from
|
||||
:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
|
||||
cross-validation, pruning, and class priorities are not used.
|
||||
@@ -205,10 +205,10 @@ Predicts a response for an input sample.
|
||||
.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
|
||||
|
||||
:param sample: Input feature vector that has the same format as every training set
|
||||
element. If not all the variables were actualy used during training,
|
||||
element. If not all the variables were actually used during training,
|
||||
``sample`` contains forged values at the appropriate places.
|
||||
|
||||
:param missing: Missing values mask, which is a dimentional matrix of the same size as
|
||||
:param missing: Missing values mask, which is a dimensional matrix of the same size as
|
||||
``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
|
||||
in the same position in the ``sample`` vector. If there are no missing values
|
||||
in the feature vector, an empty matrix can be passed instead of the missing mask.
|
||||
@@ -225,7 +225,7 @@ Predicts a response for an input sample.
|
||||
|
||||
:param k: Number of tree ensembles built in case of the classification problem
|
||||
(see :ref:`Training GBT`). Use this
|
||||
parameter to change the ouput to sum of the trees' predictions in the
|
||||
parameter to change the output to sum of the trees' predictions in the
|
||||
``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
|
||||
must be -1. For regression problems, ``k`` is also equal to -1.
|
||||
|
||||
|
@@ -79,6 +79,8 @@ In case of C++ interface you can use output pointers to empty matrices and the f
|
||||
|
||||
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.
|
||||
|
||||
The function is parallelized with the TBB library.
|
||||
|
||||
CvKNearest::get_max_k
|
||||
---------------------
|
||||
Returns the number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
|
||||
|
@@ -62,9 +62,9 @@ Reads the data set from a ``.csv``-like ``filename`` file and stores all read va
|
||||
|
||||
:param filename: The input file name
|
||||
|
||||
While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labeles. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map.
|
||||
While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labels. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map.
|
||||
|
||||
Also, when reading the data, the method constructs the mask of missing values. For example, values are egual to `'?'`.
|
||||
Also, when reading the data, the method constructs the mask of missing values. For example, values are equal to `'?'`.
|
||||
|
||||
CvMLData::get_values
|
||||
--------------------
|
||||
|
@@ -74,7 +74,7 @@ using
|
||||
:ocv:funcx:`PCA::operator()` or similar technique, and train a smaller network
|
||||
on only essential features.
|
||||
|
||||
Another MPL feature is an inability to handle categorical
|
||||
Another MLP feature is an inability to handle categorical
|
||||
data as is. However, there is a workaround. If a certain feature in the
|
||||
input or output (in case of ``n`` -class classifier for
|
||||
:math:`n>2` ) layer is categorical and can take
|
||||
@@ -238,6 +238,9 @@ Trains/updates MLP.
|
||||
|
||||
This method applies the specified training algorithm to computing/adjusting the network weights. It returns the number of done iterations.
|
||||
|
||||
The RPROP training algorithm is parallelized with the TBB library.
|
||||
|
||||
|
||||
CvANN_MLP::predict
|
||||
------------------
|
||||
Predicts responses for input samples.
|
||||
@@ -275,4 +278,4 @@ Returns neurons weights of the particular layer.
|
||||
.. ocv:function:: double* CvANN_MLP::get_weights(int layer)
|
||||
|
||||
:param layer: Index of the particular layer.
|
||||
|
||||
|
||||
|
@@ -60,3 +60,4 @@ Predicts the response for sample(s).
|
||||
|
||||
The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples``. In case of multiple input vectors, there should be one output vector ``results``. The predicted class for a single input vector is returned by the method.
|
||||
|
||||
The function is parallelized with the TBB library.
|
||||
|
@@ -9,7 +9,7 @@ Random trees have been introduced by Leo Breiman and Adele Cutler:
|
||||
http://www.stat.berkeley.edu/users/breiman/RandomForests/
|
||||
. The algorithm can deal with both classification and regression problems. Random trees is a collection (ensemble) of tree predictors that is called
|
||||
*forest*
|
||||
further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In case of a regression, the classifier response is the average of the responses over all the trees in the forest.
|
||||
further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that received the majority of "votes". In case of a regression, the classifier response is the average of the responses over all the trees in the forest.
|
||||
|
||||
All the trees are trained with the same parameters but on different training sets. These sets are generated from the original training set using the bootstrap procedure: for each training set, you randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each trained tree, not all the variables are used to find the best split, but a random subset of them. With each node a new subset is generated. However, its size is fixed for all the nodes and all the trees. It is a training parameter set to
|
||||
:math:`\sqrt{number\_of\_variables}` by default. None of the built trees are pruned.
|
||||
@@ -67,7 +67,7 @@ The constructors.
|
||||
|
||||
:param nactive_vars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.
|
||||
|
||||
:param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (suprise, suprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
|
||||
:param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (surprise, surprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
|
||||
|
||||
:param forest_accuracy: Sufficient accuracy (OOB error).
|
||||
|
||||
@@ -77,7 +77,7 @@ The constructors.
|
||||
|
||||
* **CV_TERMCRIT_EPS** Terminate learning by the ``forest_accuracy``;
|
||||
|
||||
* **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criterias.
|
||||
* **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criteria.
|
||||
|
||||
For meaning of other parameters see :ocv:func:`CvDTreeParams::CvDTreeParams`.
|
||||
|
||||
@@ -112,6 +112,8 @@ Trains the Random Trees model.
|
||||
|
||||
The method :ocv:func:`CvRTrees::train` is very similar to the method :ocv:func:`CvDTree::train` and follows the generic method :ocv:func:`CvStatModel::train` conventions. All the parameters specific to the algorithm training are passed as a :ocv:class:`CvRTParams` instance. The estimate of the training error (``oob-error``) is stored in the protected class member ``oob_error``.
|
||||
|
||||
The function is parallelized with the TBB library.
|
||||
|
||||
CvRTrees::predict
|
||||
-----------------
|
||||
Predicts the output for an input sample.
|
||||
|
@@ -42,7 +42,7 @@ In this declaration, some methods are commented off. These are methods for which
|
||||
|
||||
CvStatModel::CvStatModel
|
||||
------------------------
|
||||
The default constuctor.
|
||||
The default constructor.
|
||||
|
||||
.. ocv:function:: CvStatModel::CvStatModel()
|
||||
|
||||
|
@@ -121,7 +121,7 @@ The constructors.
|
||||
|
||||
:param coef0: Parameter ``coef0`` of a kernel function (POLY / SIGMOID).
|
||||
|
||||
:param Cvalue: Parameter ``C`` of a SVM optimiazation problem (C_SVC / EPS_SVR / NU_SVR).
|
||||
:param Cvalue: Parameter ``C`` of a SVM optimization problem (C_SVC / EPS_SVR / NU_SVR).
|
||||
|
||||
:param nu: Parameter :math:`\nu` of a SVM optimization problem (NU_SVC / ONE_CLASS / NU_SVR).
|
||||
|
||||
@@ -242,6 +242,9 @@ Predicts the response for input sample(s).
|
||||
|
||||
If you pass one sample then prediction result is returned. If you want to get responses for several samples then you should pass the ``results`` matrix where prediction results will be stored.
|
||||
|
||||
The function is parallelized with the TBB library.
|
||||
|
||||
|
||||
CvSVM::get_default_grid
|
||||
-----------------------
|
||||
Generates a grid for SVM parameters.
|
||||
|
Reference in New Issue
Block a user