merged all the latest changes from 2.4 to trunk

2012-04-13 21:50:59 +00:00
parent 020f9a6047
commit 2fd1e2ea57
416 changed files with 12852 additions and 6070 deletions
--- a/modules/ml/doc/boosting.rst
+++ b/modules/ml/doc/boosting.rst
@@ -83,7 +83,7 @@ The constructors.

    :param boost_type: Type of the boosting algorithm. Possible values are:
        
-        * **CvBoost::DISCRETE** Discrete AbaBoost.
+        * **CvBoost::DISCRETE** Discrete AdaBoost.
        * **CvBoost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
        * **CvBoost::LOGIT** LogitBoost. It can produce good regression fits.
        * **CvBoost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data. 
--- a/modules/ml/doc/decision_trees.rst
+++ b/modules/ml/doc/decision_trees.rst
@@ -159,9 +159,9 @@ The constructors.

    :param max_depth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``max_depth``. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned. 
    
-    :param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be splitted.
+    :param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be split.

-    :param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be splitted.
+    :param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split.
 
    :param use_surrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly.

@@ -239,6 +239,8 @@ There are four ``train`` methods in :ocv:class:`CvDTree`:

 * The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ocv:class:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.

+The function is parallelized with the TBB library.
+


 CvDTree::predict
@@ -273,7 +275,7 @@ Returns error of the decision tree.

        * **CV_TRAIN_ERROR** Error on train samples.

-        * **CV_TEST_ERROR** Erron on test samples.
+        * **CV_TEST_ERROR** Error on test samples.

    :param resp: If it is not null then size of this vector will be set to the number of samples and each element will be set to result of prediction on the corresponding sample.

--- a/modules/ml/doc/ertrees.rst
+++ b/modules/ml/doc/ertrees.rst
@@ -5,11 +5,11 @@ Extremely randomized trees have been introduced by Pierre Geurts, Damien Ernst a

 #. Extremely randomized trees don't apply the bagging procedure to constract the training samples for each tree. The same input training set is used to train all trees.

-#. Extremely randomized trees pick a node split very extremely (both a variable index and variable spliting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable spliting value) among random subset of variables.
+#. Extremely randomized trees pick a node split very extremely (both a variable index and variable splitting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable splitting value) among random subset of variables.


 CvERTrees
 ----------
 .. ocv:class:: CvERTrees

-    The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get detailes. To set the training parameters of Extremely randomized trees the same class :ocv:class:`CvRTParams` is used.
+    The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get details. To set the training parameters of Extremely randomized trees the same class :ocv:class:`CvRTParams` is used.
--- a/modules/ml/doc/expectation_maximization.rst
+++ b/modules/ml/doc/expectation_maximization.rst
@@ -61,7 +61,7 @@ At the second step (Maximization step or M-step), the mixture parameter estimate
 Alternatively, the algorithm may start with the M-step when the initial values for
 :math:`p_{i,k}` can be provided. Another alternative when
 :math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
-:math:`p_{i,k}` . Often (including macnine learning) the
+:math:`p_{i,k}` . Often (including machine learning) the
 :ocv:func:`kmeans` algorithm is used for that purpose.

 One of the main problems of the EM algorithm is a large number
@@ -115,7 +115,7 @@ The constructors
    :param start_step: The start step of the EM algorithm: 

        * **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixture components to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
-        * **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilites :math:`p_{i,k}` to use this option.
+        * **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilities :math:`p_{i,k}` to use this option.
        * **CvEM::START_AUTO_STEP** Start with Expectation step. You need not provide any parameters because they will be estimated by the k-means algorithm.

    :param term_crit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``term_crit.max_iter`` (number of M-steps) or when relative change of likelihood logarithm is less than ``term_crit.epsilon``.
@@ -139,7 +139,7 @@ The default constructor represents a rough rule-of-the-thumb:
    }


-With another contstructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
+With another constructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.


 CvEM
@@ -250,7 +250,7 @@ Returns vectors of probabilities for each training sample.

 .. ocv:pyfunction:: cv2.EM.getProbs() -> probs

-For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilites :math:`p_{i,k}` to belong to a mixture component :math:`k`.
+For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilities :math:`p_{i,k}` to belong to a mixture component :math:`k`.


 CvEM::getLikelihood
--- a/modules/ml/doc/gradient_boosted_trees.rst
+++ b/modules/ml/doc/gradient_boosted_trees.rst
@@ -19,10 +19,10 @@ Training the GBT model
 ----------------------

 Gradient Boosted Trees model represents an ensemble of single regression trees
-built in a greedy fashion. Training procedure is an iterative proccess
+built in a greedy fashion. Training procedure is an iterative process
 similar to the numerical optimization via the gradient descent method. Summary loss
 on the training set depends only on the current model predictions for the
-thaining samples,  in other words
+training samples,  in other words
 :math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
 \equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
 gradient can be computed as follows:
@@ -37,7 +37,7 @@ antigradient vector components. Step length is computed corresponding to the
 loss function and separately for every region determined by the tree leaf. It
 can be eliminated by changing values of the leaves  directly.

-See below the main scheme of the training proccess:
+See below the main scheme of the training process:

 #.
    Find the best constant model.
@@ -86,7 +86,7 @@ As a result, you get the following model:
 .. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,

 where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
-is a regularization parameter from the interval :math:`(0,1]`, futher called
+is a regularization parameter from the interval :math:`(0,1]`, further called
 *shrinkage*.

 .. _Predicting with GBT:
@@ -94,7 +94,7 @@ is a regularization parameter from the interval :math:`(0,1]`, futher called
 Predicting with the GBT Model
 -----------------------------

-To get the GBT model prediciton, you need to compute the sum of responses of
+To get the GBT model prediction, you need to compute the sum of responses of
 all the trees in the ensemble. For regression problems, it is the answer.
 For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.

@@ -108,7 +108,7 @@ CvGBTreesParams

 GBT training parameters.

-The structure contains parameters for each sigle decision tree in the ensemble,
+The structure contains parameters for each single decision tree in the ensemble,
 as well as the whole model characteristics. The structure is derived from
 :ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
 cross-validation, pruning, and class priorities are not used.
@@ -205,10 +205,10 @@ Predicts a response for an input sample.
 .. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval

   :param sample: Input feature vector that has the same format as every training set
-    element. If not all the variables were actualy used during training,
+    element. If not all the variables were actually used during training,
    ``sample`` contains forged values at the appropriate places.
    
-   :param missing: Missing values mask, which is a dimentional matrix of the same size as
+   :param missing: Missing values mask, which is a dimensional matrix of the same size as
    ``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
    in the same position in the ``sample`` vector. If there are no missing values
    in the feature vector, an empty matrix can be passed instead of the missing mask.
@@ -225,7 +225,7 @@ Predicts a response for an input sample.
    
   :param k: Number of tree ensembles built in case of the classification problem
    (see :ref:`Training GBT`). Use this
-    parameter to change the ouput to sum of the trees' predictions in the
+    parameter to change the output to sum of the trees' predictions in the
    ``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
    must be -1. For regression problems, ``k`` is also equal to -1.
 
--- a/modules/ml/doc/k_nearest_neighbors.rst
+++ b/modules/ml/doc/k_nearest_neighbors.rst
@@ -79,6 +79,8 @@ In case of C++ interface you can use output pointers to empty matrices and the f

 If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.

+The function is parallelized with the TBB library.
+
 CvKNearest::get_max_k
 ---------------------
 Returns the number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
--- a/modules/ml/doc/mldata.rst
+++ b/modules/ml/doc/mldata.rst
@@ -62,9 +62,9 @@ Reads the data set from a ``.csv``-like ``filename`` file and stores all read va

    :param filename: The input file name

-While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labeles. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map. 
+While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labels. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map. 

-Also, when reading the data, the method constructs the mask of missing values. For example, values are egual to `'?'`.
+Also, when reading the data, the method constructs the mask of missing values. For example, values are equal to `'?'`.

 CvMLData::get_values
 --------------------
--- a/modules/ml/doc/neural_networks.rst
+++ b/modules/ml/doc/neural_networks.rst
@@ -74,7 +74,7 @@ using
 :ocv:funcx:`PCA::operator()` or similar technique, and train a smaller network
 on only essential features.

-Another MPL feature is an inability to handle categorical
+Another MLP feature is an inability to handle categorical
 data as is. However, there is a workaround. If a certain feature in the
 input or output (in case of ``n`` -class classifier for
 :math:`n>2` ) layer is categorical and can take
@@ -238,6 +238,9 @@ Trains/updates MLP.

 This method applies the specified training algorithm to computing/adjusting the network weights. It returns the number of done iterations.

+The RPROP training algorithm is parallelized with the TBB library.
+
+
 CvANN_MLP::predict
 ------------------
 Predicts responses for input samples.
@@ -275,4 +278,4 @@ Returns neurons weights of the particular layer.
 .. ocv:function:: double* CvANN_MLP::get_weights(int layer)

    :param layer: Index of the particular layer.
-    
+    
--- a/modules/ml/doc/normal_bayes_classifier.rst
+++ b/modules/ml/doc/normal_bayes_classifier.rst
@@ -60,3 +60,4 @@ Predicts the response for sample(s).

 The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples``. In case of multiple input vectors, there should be one output vector ``results``. The predicted class for a single input vector is returned by the method.

+The function is parallelized with the TBB library.
--- a/modules/ml/doc/random_trees.rst
+++ b/modules/ml/doc/random_trees.rst
@@ -9,7 +9,7 @@ Random trees have been introduced by Leo Breiman and Adele Cutler:
 http://www.stat.berkeley.edu/users/breiman/RandomForests/
 . The algorithm can deal with both classification and regression problems. Random trees is a collection (ensemble) of tree predictors that is called
 *forest*
-further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In case of a regression, the classifier response is the average of the responses over all the trees in the forest.
+further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that received the majority of "votes". In case of a regression, the classifier response is the average of the responses over all the trees in the forest.

 All the trees are trained with the same parameters but on different training sets. These sets are generated from the original training set using the bootstrap procedure: for each training set, you randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each trained tree,  not all the variables are used to find the best split, but a random subset of them. With each node a new subset is generated. However, its size is fixed for all the nodes and all the trees. It is a training parameter set to
 :math:`\sqrt{number\_of\_variables}` by default. None of the built trees are pruned.
@@ -67,7 +67,7 @@ The constructors.

    :param nactive_vars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.

-    :param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (suprise, suprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
+    :param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (surprise, surprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.

    :param forest_accuracy: Sufficient accuracy (OOB error).

@@ -77,7 +77,7 @@ The constructors.
        
        * **CV_TERMCRIT_EPS** Terminate learning by the ``forest_accuracy``;

-        * **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criterias.
+        * **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criteria.

 For meaning of other parameters see :ocv:func:`CvDTreeParams::CvDTreeParams`.

@@ -112,6 +112,8 @@ Trains the Random Trees model.

 The method :ocv:func:`CvRTrees::train` is very similar to the method :ocv:func:`CvDTree::train` and follows the generic method :ocv:func:`CvStatModel::train` conventions. All the parameters specific to the algorithm training are passed as a :ocv:class:`CvRTParams` instance. The estimate of the training error (``oob-error``) is stored in the protected class member ``oob_error``.

+The function is parallelized with the TBB library.
+
 CvRTrees::predict
 -----------------
 Predicts the output for an input sample.
--- a/modules/ml/doc/statistical_models.rst
+++ b/modules/ml/doc/statistical_models.rst
@@ -42,7 +42,7 @@ In this declaration, some methods are commented off. These are methods for which

 CvStatModel::CvStatModel
 ------------------------
-The default constuctor.
+The default constructor.

 .. ocv:function:: CvStatModel::CvStatModel()

--- a/modules/ml/doc/support_vector_machines.rst
+++ b/modules/ml/doc/support_vector_machines.rst
@@ -121,7 +121,7 @@ The constructors.

    :param coef0: Parameter ``coef0`` of a kernel function (POLY / SIGMOID).

-    :param Cvalue: Parameter ``C`` of a SVM optimiazation problem (C_SVC / EPS_SVR / NU_SVR).
+    :param Cvalue: Parameter ``C`` of a SVM optimization problem (C_SVC / EPS_SVR / NU_SVR).

    :param nu: Parameter :math:`\nu` of a SVM optimization problem (NU_SVC / ONE_CLASS / NU_SVR).

@@ -242,6 +242,9 @@ Predicts the response for input sample(s).

 If you pass one sample then prediction result is returned. If you want to get responses for several samples then you should pass the ``results`` matrix where prediction results will be stored.

+The function is parallelized with the TBB library.
+
+
 CvSVM::get_default_grid
 -----------------------
 Generates a grid for SVM parameters.
--- a/modules/ml/include/opencv2/ml/ml.hpp
+++ b/modules/ml/include/opencv2/ml/ml.hpp
@@ -46,6 +46,10 @@

 #ifdef __cplusplus

+#include <map>
+#include <string>
+#include <iostream>
+
 // Apple defines a check() macro somewhere in the debug headers
 // that interferes with a method definiton in this header
 #undef check
@@ -121,6 +125,7 @@ CV_INLINE CvParamLattice cvDefaultParamLattice( void )
 #define CV_TYPE_NAME_ML_ANN_MLP     "opencv-ml-ann-mlp"
 #define CV_TYPE_NAME_ML_CNN         "opencv-ml-cnn"
 #define CV_TYPE_NAME_ML_RTREES      "opencv-ml-random-trees"
+#define CV_TYPE_NAME_ML_ERTREES     "opencv-ml-extremely-randomized-trees"
 #define CV_TYPE_NAME_ML_GBT         "opencv-ml-gradient-boosting-trees"

 #define CV_TRAIN_ERROR  0
@@ -549,114 +554,99 @@ protected:
 /****************************************************************************************\
 *                              Expectation - Maximization                                *
 \****************************************************************************************/
-
-struct CV_EXPORTS_W_MAP CvEMParams
+namespace cv
 {
-    CvEMParams();
-    CvEMParams( int nclusters, int cov_mat_type=1/*CvEM::COV_MAT_DIAGONAL*/,
-                int start_step=0/*CvEM::START_AUTO_STEP*/,
-                CvTermCriteria term_crit=cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, FLT_EPSILON),
-                const CvMat* probs=0, const CvMat* weights=0, const CvMat* means=0, const CvMat** covs=0 );
-
-    CV_PROP_RW int nclusters;
-    CV_PROP_RW int cov_mat_type;
-    CV_PROP_RW int start_step;
-    const CvMat* probs;
-    const CvMat* weights;
-    const CvMat* means;
-    const CvMat** covs;
-    CV_PROP_RW CvTermCriteria term_crit;
-};
-
-
-class CV_EXPORTS_W CvEM : public CvStatModel
+class CV_EXPORTS_W EM : public Algorithm
 {
 public:
    // Type of covariation matrices
-    enum { COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2 };
+    enum {COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2, COV_MAT_DEFAULT=COV_MAT_DIAGONAL};

+    // Default parameters
+    enum {DEFAULT_NCLUSTERS=10, DEFAULT_MAX_ITERS=100};
+    
    // The initial step
-    enum { START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0 };
+    enum {START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0};

-    CV_WRAP CvEM();
-    CvEM( const CvMat* samples, const CvMat* sampleIdx=0,
-          CvEMParams params=CvEMParams(), CvMat* labels=0 );
-    //CvEM (CvEMParams params, CvMat * means, CvMat ** covs, CvMat * weights,
-    // CvMat * probs, CvMat * log_weight_div_det, CvMat * inv_eigen_values, CvMat** cov_rotate_mats);
-
-    virtual ~CvEM();
-
-    virtual bool train( const CvMat* samples, const CvMat* sampleIdx=0,
-                        CvEMParams params=CvEMParams(), CvMat* labels=0 );
-
-    virtual float predict( const CvMat* sample, CV_OUT CvMat* probs ) const;
-
-#ifndef SWIG
-    CV_WRAP CvEM( const cv::Mat& samples, const cv::Mat& sampleIdx=cv::Mat(),
-                  CvEMParams params=CvEMParams() );
-    
-    CV_WRAP virtual bool train( const cv::Mat& samples,
-                                const cv::Mat& sampleIdx=cv::Mat(),
-                                CvEMParams params=CvEMParams(),
-                                CV_OUT cv::Mat* labels=0 );
-    
-    CV_WRAP virtual float predict( const cv::Mat& sample, CV_OUT cv::Mat* probs=0 ) const;
-    CV_WRAP virtual double calcLikelihood( const cv::Mat &sample ) const;
-    
-    CV_WRAP int  getNClusters() const;
-    CV_WRAP cv::Mat  getMeans()     const;
-    CV_WRAP void getCovs(CV_OUT std::vector<cv::Mat>& covs)      const;
-    CV_WRAP cv::Mat  getWeights()   const;
-    CV_WRAP cv::Mat  getProbs()     const;
-    
-    CV_WRAP inline double getLikelihood() const { return log_likelihood; }
-    CV_WRAP inline double getLikelihoodDelta() const { return log_likelihood_delta; }
-#endif
+    CV_WRAP EM(int nclusters=EM::DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL,
+       const TermCriteria& termcrit=TermCriteria(TermCriteria::COUNT+
+                                                 TermCriteria::EPS,
+                                                 EM::DEFAULT_MAX_ITERS, FLT_EPSILON));
    
+    virtual ~EM();
    CV_WRAP virtual void clear();

-    int           get_nclusters() const;
-    const CvMat*  get_means()     const;
-    const CvMat** get_covs()      const;
-    const CvMat*  get_weights()   const;
-    const CvMat*  get_probs()     const;
-
-    inline double get_log_likelihood() const { return log_likelihood; }
-    inline double get_log_likelihood_delta() const { return log_likelihood_delta; }
+    CV_WRAP virtual bool train(InputArray samples,
+                       OutputArray labels=noArray(),
+                       OutputArray probs=noArray(),
+                       OutputArray logLikelihoods=noArray());
    
-//    inline const CvMat *  get_log_weight_div_det () const { return log_weight_div_det; };
-//    inline const CvMat *  get_inv_eigen_values   () const { return inv_eigen_values;   };
-//    inline const CvMat ** get_cov_rotate_mats    () const { return cov_rotate_mats;    };
+    CV_WRAP virtual bool trainE(InputArray samples,
+                        InputArray means0,
+                        InputArray covs0=noArray(),
+                        InputArray weights0=noArray(),
+                        OutputArray labels=noArray(),
+                        OutputArray probs=noArray(),
+                        OutputArray logLikelihoods=noArray());
+    
+    CV_WRAP virtual bool trainM(InputArray samples,
+                        InputArray probs0,
+                        OutputArray labels=noArray(),
+                        OutputArray probs=noArray(),
+                        OutputArray logLikelihoods=noArray());
+    
+    CV_WRAP int predict(InputArray sample,
+                OutputArray probs=noArray(),
+                CV_OUT double* logLikelihood=0) const;

-    virtual void read( CvFileStorage* fs, CvFileNode* node );
-    virtual void write( CvFileStorage* fs, const char* name ) const;
+    CV_WRAP bool isTrained() const;

-    virtual void write_params( CvFileStorage* fs ) const;
-    virtual void read_params( CvFileStorage* fs, CvFileNode* node );
+    AlgorithmInfo* info() const;
+    virtual void read(const FileNode& fn);

 protected:
+    
+    virtual void setTrainData(int startStep, const Mat& samples,
+                              const Mat* probs0,
+                              const Mat* means0,
+                              const vector<Mat>* covs0,
+                              const Mat* weights0);

-    virtual void set_params( const CvEMParams& params,
-                             const CvVectors& train_data );
-    virtual void init_em( const CvVectors& train_data );
-    virtual double run_em( const CvVectors& train_data );
-    virtual void init_auto( const CvVectors& samples );
-    virtual void kmeans( const CvVectors& train_data, int nclusters,
-                         CvMat* labels, CvTermCriteria criteria,
-                         const CvMat* means );
-    CvEMParams params;
-    double log_likelihood;
-    double log_likelihood_delta;
+    bool doTrain(int startStep,
+                 OutputArray labels,
+                 OutputArray probs,
+                 OutputArray logLikelihoods);
+    virtual void eStep();
+    virtual void mStep();

-    CvMat* means;
-    CvMat** covs;
-    CvMat* weights;
-    CvMat* probs;
+    void clusterTrainSamples();
+    void decomposeCovs();
+    void computeLogWeightDivDet();

-    CvMat* log_weight_div_det;
-    CvMat* inv_eigen_values;
-    CvMat** cov_rotate_mats;
+    void computeProbabilities(const Mat& sample, int& label, Mat* probs, double* logLikelihood) const;
+
+    // all inner matrices have type CV_64FC1
+    CV_PROP_RW int nclusters;
+    CV_PROP_RW int covMatType;
+    CV_PROP_RW int maxIters;
+    CV_PROP_RW double epsilon;
+
+    Mat trainSamples;
+    Mat trainProbs;
+    Mat trainLogLikelihoods;
+    Mat trainLabels;
+    Mat trainCounts;
+
+    CV_PROP Mat weights;
+    CV_PROP Mat means;
+    CV_PROP vector<Mat> covs;
+
+    vector<Mat> covsEigenValues;
+    vector<Mat> covsRotateMats;
+    vector<Mat> invCovsEigenValues;
+    Mat logWeightDivDet;
 };
+} // namespace cv

 /****************************************************************************************\
 *                                      Decision Tree                                     *
@@ -1052,6 +1042,7 @@ public:
    CvForestTree* get_tree(int i) const;

 protected:
+    virtual std::string getName() const;

    virtual bool grow_forest( const CvTermCriteria term_crit );

@@ -1125,6 +1116,7 @@ public:
 #endif
    virtual bool train( CvMLData* data, CvRTParams params=CvRTParams() );
 protected:
+    virtual std::string getName() const;
    virtual bool grow_forest( const CvTermCriteria term_crit );
 };

@@ -2012,17 +2004,10 @@ CVAPI(void) cvCreateTestSet( int type, CvMat** samples,
                 CvMat** responses,
                 int num_classes, ... );

-
-#endif
-
 /****************************************************************************************\
 *                                      Data                                             *
 \****************************************************************************************/

-#include <map>
-#include <string>
-#include <iostream>
-
 #define CV_COUNT     0
 #define CV_PORTION   1

@@ -2133,8 +2118,6 @@ typedef CvSVMParams SVMParams;
 typedef CvSVMKernel SVMKernel;
 typedef CvSVMSolver SVMSolver;
 typedef CvSVM SVM;
-typedef CvEMParams EMParams;
-typedef CvEM ExpectationMaximization;
 typedef CvDTreeParams DTreeParams;
 typedef CvMLData TrainData;
 typedef CvDTree DecisionTree;
@@ -2156,5 +2139,7 @@ template<> CV_EXPORTS void Ptr<CvDTreeSplit>::delete_obj();
    
 }

-#endif
+#endif // __cplusplus
+#endif // __OPENCV_ML_HPP__
+
 /* End of file. */
--- a/modules/ml/src/data.cpp
+++ b/modules/ml/src/data.cpp
@@ -161,8 +161,21 @@ int CvMLData::read_csv(const char* filename)
        fclose(file);
        return -1;
    }
-    for( ptr = buf; *ptr != '\0'; ptr++ )
-        cols_count += (*ptr == delimiter);
+
+    ptr = buf;
+    while( *ptr == ' ' )
+        ptr++;
+    for( ; *ptr != '\0'; )
+    {
+        if(*ptr == delimiter || *ptr == ' ')
+        {
+            cols_count++;
+            ptr++;
+            while( *ptr == ' ' ) ptr++;
+        }
+        else
+            ptr++;
+    }

    if ( cols_count == 0)
    {
@@ -606,7 +619,7 @@ void CvMLData::set_train_test_split( const CvTrainTestSplit * spl)
            CV_ERROR( CV_StsBadArg, "train samples count is not correct" );
        train_sample_portion = train_sample_portion <= FLT_EPSILON || 
            1 - train_sample_portion <= FLT_EPSILON ? 1 : train_sample_portion;
-        train_sample_count = cvFloor( train_sample_portion * sample_count );
+        train_sample_count = std::max(1, cvFloor( train_sample_portion * sample_count ));
    }

    if ( train_sample_count == sample_count )
@@ -625,8 +638,10 @@ void CvMLData::set_train_test_split( const CvTrainTestSplit * spl)
        for (int i = 0; i < sample_count; i++ )
            sample_idx[i] = i;
        train_sample_idx = cvCreateMatHeader( 1, train_sample_count, CV_32SC1 );
-        test_sample_idx = cvCreateMatHeader( 1, test_sample_count, CV_32SC1 );
        *train_sample_idx = cvMat( 1, train_sample_count, CV_32SC1, &sample_idx[0] );
+
+        CV_Assert(test_sample_count > 0);
+        test_sample_idx = cvCreateMatHeader( 1, test_sample_count, CV_32SC1 );
        *test_sample_idx = cvMat( 1, test_sample_count, CV_32SC1, &sample_idx[train_sample_count] );
    }
    
--- a/modules/ml/src/em.cpp
+++ b/modules/ml/src/em.cpp
--- a/modules/ml/src/ertrees.cpp
+++ b/modules/ml/src/ertrees.cpp
@@ -1517,6 +1517,11 @@ CvERTrees::~CvERTrees()
 {
 }

+std::string CvERTrees::getName() const
+{
+    return CV_TYPE_NAME_ML_ERTREES;
+}
+
 bool CvERTrees::train( const CvMat* _train_data, int _tflag,
                        const CvMat* _responses, const CvMat* _var_idx,
                        const CvMat* _sample_idx, const CvMat* _var_type,
--- a/modules/ml/src/rtrees.cpp
+++ b/modules/ml/src/rtrees.cpp
@@ -246,6 +246,10 @@ CvRTrees::~CvRTrees()
    clear();
 }

+std::string CvRTrees::getName() const
+{
+    return CV_TYPE_NAME_ML_RTREES;
+}

 CvMat* CvRTrees::get_active_var_mask()
 {
@@ -726,7 +730,8 @@ void CvRTrees::write( CvFileStorage* fs, const char* name ) const
    if( ntrees < 1 || !trees || nsamples < 1 )
        CV_Error( CV_StsBadArg, "Invalid CvRTrees object" );

-    cvStartWriteStruct( fs, name, CV_NODE_MAP, CV_TYPE_NAME_ML_RTREES );
+    std::string modelNodeName = this->getName();
+    cvStartWriteStruct( fs, name, CV_NODE_MAP, modelNodeName.c_str() );

    cvWriteInt( fs, "nclasses", nclasses );
    cvWriteInt( fs, "nsamples", nsamples );
--- a/modules/ml/test/test_emknearestkmeans.cpp
+++ b/modules/ml/test/test_emknearestkmeans.cpp
@@ -44,34 +44,49 @@
 using namespace std;
 using namespace cv;

-void defaultDistribs( vector<Mat>& means, vector<Mat>& covs )
+static
+void defaultDistribs( Mat& means, vector<Mat>& covs, int type=CV_32FC1 )
 {
    float mp0[] = {0.0f, 0.0f}, cp0[] = {0.67f, 0.0f, 0.0f, 0.67f};
    float mp1[] = {5.0f, 0.0f}, cp1[] = {1.0f, 0.0f, 0.0f, 1.0f};
    float mp2[] = {1.0f, 5.0f}, cp2[] = {1.0f, 0.0f, 0.0f, 1.0f};
+    means.create(3, 2, type);
    Mat m0( 1, 2, CV_32FC1, mp0 ), c0( 2, 2, CV_32FC1, cp0 );
    Mat m1( 1, 2, CV_32FC1, mp1 ), c1( 2, 2, CV_32FC1, cp1 );
    Mat m2( 1, 2, CV_32FC1, mp2 ), c2( 2, 2, CV_32FC1, cp2 );
    means.resize(3), covs.resize(3);
-    m0.copyTo(means[0]), c0.copyTo(covs[0]);
-    m1.copyTo(means[1]), c1.copyTo(covs[1]);
-    m2.copyTo(means[2]), c2.copyTo(covs[2]);
+
+    Mat mr0 = means.row(0);
+    m0.convertTo(mr0, type);
+    c0.convertTo(covs[0], type);
+
+    Mat mr1 = means.row(1);
+    m1.convertTo(mr1, type);
+    c1.convertTo(covs[1], type);
+
+    Mat mr2 = means.row(2);
+    m2.convertTo(mr2, type);
+    c2.convertTo(covs[2], type);
 }

 // generate points sets by normal distributions
-void generateData( Mat& data, Mat& labels, const vector<int>& sizes, const vector<Mat>& means, const vector<Mat>& covs, int labelType )
+static
+void generateData( Mat& data, Mat& labels, const vector<int>& sizes, const Mat& _means, const vector<Mat>& covs, int dataType, int labelType )
 {
    vector<int>::const_iterator sit = sizes.begin();
    int total = 0;
    for( ; sit != sizes.end(); ++sit )
        total += *sit;
-    assert( means.size() == sizes.size() && covs.size() == sizes.size() );
-    assert( !data.empty() && data.rows == total );
-    assert( data.type() == CV_32FC1 );
+    CV_Assert( _means.rows == (int)sizes.size() && covs.size() == sizes.size() );
+    CV_Assert( !data.empty() && data.rows == total );
+    CV_Assert( data.type() == dataType );
    
    labels.create( data.rows, 1, labelType );

-    randn( data, Scalar::all(0.0), Scalar::all(1.0) );
+    randn( data, Scalar::all(-1.0), Scalar::all(1.0) );
+    vector<Mat> means(sizes.size());
+    for(int i = 0; i < _means.rows; i++)
+        means[i] = _means.row(i);
    vector<Mat>::const_iterator mit = means.begin(), cit = covs.begin();
    int bi, ei = 0;
    sit = sizes.begin();
@@ -83,7 +98,7 @@ void generateData( Mat& data, Mat& labels, const vector<int>& sizes, const vecto
        assert( cit->rows == data.cols && cit->cols == data.cols );
        for( int i = bi; i < ei; i++, p++ )
        {
-            Mat r(1, data.cols, CV_32FC1, data.ptr<float>(i));
+            Mat r = data.row(i);
            r =  r * (*cit) + *mit; 
            if( labelType == CV_32FC1 )
                labels.at<float>(p, 0) = (float)l;
@@ -95,6 +110,7 @@ void generateData( Mat& data, Mat& labels, const vector<int>& sizes, const vecto
    }
 }

+static
 int maxIdx( const vector<int>& count )
 {
    int idx = -1;
@@ -112,74 +128,83 @@ int maxIdx( const vector<int>& count )
    return idx;
 }

+static
 bool getLabelsMap( const Mat& labels, const vector<int>& sizes, vector<int>& labelsMap )
 {
-    int total = 0, setCount = (int)sizes.size();
-    vector<int>::const_iterator sit = sizes.begin();
-    for( ; sit != sizes.end(); ++sit )
-        total += *sit;
+    size_t total = 0, nclusters = sizes.size();
+    for(size_t i = 0; i < sizes.size(); i++)
+        total += sizes[i];
+
    assert( !labels.empty() );
-    assert( labels.rows == total && labels.cols == 1 );
+    assert( labels.total() == total && (labels.cols == 1 || labels.rows == 1));
    assert( labels.type() == CV_32SC1 || labels.type() == CV_32FC1 );

    bool isFlt = labels.type() == CV_32FC1;
-    labelsMap.resize(setCount);
-    vector<int>::iterator lmit = labelsMap.begin();
-    vector<bool> buzy(setCount, false);
-    int bi, ei = 0;
-    for( sit = sizes.begin(); sit != sizes.end(); ++sit, ++lmit )
+
+    labelsMap.resize(nclusters);
+
+    vector<bool> buzy(nclusters, false);
+    int startIndex = 0;
+    for( size_t clusterIndex = 0; clusterIndex < sizes.size(); clusterIndex++ )
    {
-        vector<int> count( setCount, 0 );
-        bi = ei;
-        ei = bi + *sit;
-        if( isFlt )
+        vector<int> count( nclusters, 0 );
+        for( int i = startIndex; i < startIndex + sizes[clusterIndex]; i++)
        {
-            for( int i = bi; i < ei; i++ )
-                count[(int)labels.at<float>(i, 0)]++;
+            int lbl = isFlt ? (int)labels.at<float>(i) : labels.at<int>(i);
+            CV_Assert(lbl < (int)nclusters);
+            count[lbl]++;
+            CV_Assert(count[lbl] < (int)total);
        }
-        else
-        {
-            for( int i = bi; i < ei; i++ )
-                count[labels.at<int>(i, 0)]++;
-        }
-  
-        *lmit = maxIdx( count );
-        if( buzy[*lmit] )
-            return false;
-        buzy[*lmit] = true;
+        startIndex += sizes[clusterIndex];
+
+        int cls = maxIdx( count );
+        CV_Assert( !buzy[cls] );
+
+        labelsMap[clusterIndex] = cls;
+
+        buzy[cls] = true;
    }
-    return true;    
+    for(size_t i = 0; i < buzy.size(); i++)
+        if(!buzy[i])
+            return false;
+
+    return true;
 }

-float calcErr( const Mat& labels, const Mat& origLabels, const vector<int>& sizes, bool labelsEquivalent = true )
+static
+bool calcErr( const Mat& labels, const Mat& origLabels, const vector<int>& sizes, float& err, bool labelsEquivalent = true )
 {
-    int err = 0;
-    assert( !labels.empty() && !origLabels.empty() );
-    assert( labels.cols == 1 && origLabels.cols == 1 );
-    assert( labels.rows == origLabels.rows );
-    assert( labels.type() == origLabels.type() );
-    assert( labels.type() == CV_32SC1 || labels.type() == CV_32FC1 );
+    err = 0;
+    CV_Assert( !labels.empty() && !origLabels.empty() );
+    CV_Assert( labels.rows == 1 || labels.cols == 1 );
+    CV_Assert( origLabels.rows == 1 || origLabels.cols == 1 );
+    CV_Assert( labels.total() == origLabels.total() );
+    CV_Assert( labels.type() == CV_32SC1 || labels.type() == CV_32FC1 );
+    CV_Assert( origLabels.type() == labels.type() );

    vector<int> labelsMap;
    bool isFlt = labels.type() == CV_32FC1;
    if( !labelsEquivalent )
    {
-        getLabelsMap( labels, sizes, labelsMap );
+        if( !getLabelsMap( labels, sizes, labelsMap ) )
+            return false;
+
        for( int i = 0; i < labels.rows; i++ )
            if( isFlt )
-                err += labels.at<float>(i, 0) != labelsMap[(int)origLabels.at<float>(i, 0)];
+                err += labels.at<float>(i) != labelsMap[(int)origLabels.at<float>(i)] ? 1.f : 0.f;
            else
-                err += labels.at<int>(i, 0) != labelsMap[origLabels.at<int>(i, 0)];
+                err += labels.at<int>(i) != labelsMap[origLabels.at<int>(i)] ? 1.f : 0.f;
    }
    else
    {
        for( int i = 0; i < labels.rows; i++ )
            if( isFlt )
-                err += labels.at<float>(i, 0) != origLabels.at<float>(i, 0);
+                err += labels.at<float>(i) != origLabels.at<float>(i) ? 1.f : 0.f;
            else
-                err += labels.at<int>(i, 0) != origLabels.at<int>(i, 0);
+                err += labels.at<int>(i) != origLabels.at<int>(i) ? 1.f : 0.f;
    }
-    return (float)err / (float)labels.rows;
+    err /= (float)labels.rows;
+    return true;
 }

 //--------------------------------------------------------------------------------------------
@@ -198,17 +223,22 @@ void CV_KMeansTest::run( int /*start_from*/ )
    
    Mat data( pointsCount, 2, CV_32FC1 ), labels;
    vector<int> sizes( sizesArr, sizesArr + sizeof(sizesArr) / sizeof(sizesArr[0]) );
-    vector<Mat> means, covs;
+    Mat means;
+    vector<Mat> covs;
    defaultDistribs( means, covs );
-    generateData( data, labels, sizes, means, covs, CV_32SC1 );
+    generateData( data, labels, sizes, means, covs, CV_32FC1, CV_32SC1 );
    
    int code = cvtest::TS::OK;
    float err;
    Mat bestLabels;
    // 1. flag==KMEANS_PP_CENTERS
    kmeans( data, 3, bestLabels, TermCriteria( TermCriteria::COUNT, iters, 0.0), 0, KMEANS_PP_CENTERS, noArray() );
-    err = calcErr( bestLabels, labels, sizes, false );
-    if( err > 0.01f )
+    if( !calcErr( bestLabels, labels, sizes, err , false ) )
+    {
+        ts->printf( cvtest::TS::LOG, "Bad output labels if flag==KMEANS_PP_CENTERS.\n" );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.01f )
    {
        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) if flag==KMEANS_PP_CENTERS.\n", err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
@@ -216,10 +246,14 @@ void CV_KMeansTest::run( int /*start_from*/ )

    // 2. flag==KMEANS_RANDOM_CENTERS
    kmeans( data, 3, bestLabels, TermCriteria( TermCriteria::COUNT, iters, 0.0), 0, KMEANS_RANDOM_CENTERS, noArray() );
-    err = calcErr( bestLabels, labels, sizes, false );
-    if( err > 0.01f )
+    if( !calcErr( bestLabels, labels, sizes, err, false ) )
    {
-        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) if flag==KMEANS_PP_CENTERS.\n", err );
+        ts->printf( cvtest::TS::LOG, "Bad output labels if flag==KMEANS_RANDOM_CENTERS.\n" );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.01f )
+    {
+        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) if flag==KMEANS_RANDOM_CENTERS.\n", err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
    }

@@ -229,10 +263,14 @@ void CV_KMeansTest::run( int /*start_from*/ )
    for( int i = 0; i < 0.5f * pointsCount; i++ )
        bestLabels.at<int>( rng.next() % pointsCount, 0 ) = rng.next() % 3;
    kmeans( data, 3, bestLabels, TermCriteria( TermCriteria::COUNT, iters, 0.0), 0, KMEANS_USE_INITIAL_LABELS, noArray() );
-    err = calcErr( bestLabels, labels, sizes, false );
-    if( err > 0.01f )
+    if( !calcErr( bestLabels, labels, sizes, err, false ) )
    {
-        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) if flag==KMEANS_PP_CENTERS.\n", err );
+        ts->printf( cvtest::TS::LOG, "Bad output labels if flag==KMEANS_USE_INITIAL_LABELS.\n" );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.01f )
+    {
+        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) if flag==KMEANS_USE_INITIAL_LABELS.\n", err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
    }

@@ -255,20 +293,26 @@ void CV_KNearestTest::run( int /*start_from*/ )
    // train data
    Mat trainData( pointsCount, 2, CV_32FC1 ), trainLabels;
    vector<int> sizes( sizesArr, sizesArr + sizeof(sizesArr) / sizeof(sizesArr[0]) );
-    vector<Mat> means, covs;
+    Mat means;
+    vector<Mat> covs;
    defaultDistribs( means, covs );
-    generateData( trainData, trainLabels, sizes, means, covs, CV_32FC1 );
+    generateData( trainData, trainLabels, sizes, means, covs, CV_32FC1, CV_32FC1 );

    // test data
    Mat testData( pointsCount, 2, CV_32FC1 ), testLabels, bestLabels;
-    generateData( testData, testLabels, sizes, means, covs, CV_32FC1 );
+    generateData( testData, testLabels, sizes, means, covs, CV_32FC1, CV_32FC1 );

    int code = cvtest::TS::OK;
    KNearest knearest;
    knearest.train( trainData, trainLabels );
    knearest.find_nearest( testData, 4, &bestLabels );
-    float err = calcErr( bestLabels, testLabels, sizes, true );
-    if( err > 0.01f )
+    float err;
+    if( !calcErr( bestLabels, testLabels, sizes, err, true ) )
+    {
+        ts->printf( cvtest::TS::LOG, "Bad output labels.\n" );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.01f )
    {
        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) on test data.\n", err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
@@ -276,95 +320,216 @@ void CV_KNearestTest::run( int /*start_from*/ )
    ts->set_failed_test_info( code );
 }

+class EM_Params
+{
+public:
+    EM_Params(int nclusters=10, int covMatType=EM::COV_MAT_DIAGONAL, int startStep=EM::START_AUTO_STEP,
+           const cv::TermCriteria& termCrit=cv::TermCriteria(cv::TermCriteria::COUNT+cv::TermCriteria::EPS, 100, FLT_EPSILON),
+           const cv::Mat* probs=0, const cv::Mat* weights=0,
+           const cv::Mat* means=0, const std::vector<cv::Mat>* covs=0)
+        : nclusters(nclusters), covMatType(covMatType), startStep(startStep),
+        probs(probs), weights(weights), means(means), covs(covs), termCrit(termCrit)
+    {}
+    
+    int nclusters;
+    int covMatType;
+    int startStep;
+    
+    // all 4 following matrices should have type CV_32FC1
+    const cv::Mat* probs;
+    const cv::Mat* weights;
+    const cv::Mat* means;
+    const std::vector<cv::Mat>* covs;
+    
+    cv::TermCriteria termCrit;
+};
+
 //--------------------------------------------------------------------------------------------
-class CV_EMTest : public cvtest::BaseTest {
+class CV_EMTest : public cvtest::BaseTest
+{
 public:
    CV_EMTest() {}
 protected:
    virtual void run( int start_from );
+    int runCase( int caseIndex, const EM_Params& params,
+                  const cv::Mat& trainData, const cv::Mat& trainLabels,
+                  const cv::Mat& testData, const cv::Mat& testLabels,
+                  const vector<int>& sizes);
 };

-void CV_EMTest::run( int /*start_from*/ )
+int CV_EMTest::runCase( int caseIndex, const EM_Params& params,
+                        const cv::Mat& trainData, const cv::Mat& trainLabels,
+                        const cv::Mat& testData, const cv::Mat& testLabels,
+                        const vector<int>& sizes )
 {
-    int sizesArr[] = { 5000, 7000, 8000 };
-    int pointsCount = sizesArr[0]+ sizesArr[1] + sizesArr[2];
-
-    // train data
-    Mat trainData( pointsCount, 2, CV_32FC1 ), trainLabels;
-    vector<int> sizes( sizesArr, sizesArr + sizeof(sizesArr) / sizeof(sizesArr[0]) );
-    vector<Mat> means, covs;
-    defaultDistribs( means, covs );
-    generateData( trainData, trainLabels, sizes, means, covs, CV_32SC1 );
-
-    // test data
-    Mat testData( pointsCount, 2, CV_32FC1 ), testLabels, bestLabels;
-    generateData( testData, testLabels, sizes, means, covs, CV_32SC1 );
-
    int code = cvtest::TS::OK;
+
+    cv::Mat labels;
    float err;
-    ExpectationMaximization em;
-    CvEMParams params;
-    params.nclusters = 3;
-    em.train( trainData, Mat(), params, &bestLabels );
+
+    cv::EM em(params.nclusters, params.covMatType, params.termCrit);
+    if( params.startStep == EM::START_AUTO_STEP )
+        em.train( trainData, labels );
+    else if( params.startStep == EM::START_E_STEP )
+        em.trainE( trainData, *params.means, *params.covs, *params.weights, labels );
+    else if( params.startStep == EM::START_M_STEP )
+        em.trainM( trainData, *params.probs, labels );

    // check train error
-    err = calcErr( bestLabels, trainLabels, sizes, false );
-    if( err > 0.002f )
+    if( !calcErr( labels, trainLabels, sizes, err , false ) )
    {
-        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) on train data.\n", err );
+        ts->printf( cvtest::TS::LOG, "Case index %i : Bad output labels.\n", caseIndex );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.008f )
+    {
+        ts->printf( cvtest::TS::LOG, "Case index %i : Bad accuracy (%f) on train data.\n", caseIndex, err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
    }

    // check test error
-    bestLabels.create( testData.rows, 1, CV_32SC1 );
+    labels.create( testData.rows, 1, CV_32SC1 );
    for( int i = 0; i < testData.rows; i++ )
    {
-        Mat sample( 1, testData.cols, CV_32FC1, testData.ptr<float>(i));
-        bestLabels.at<int>(i,0) = (int)em.predict( sample, 0 );
+        Mat sample = testData.row(i);
+        double likelihood = 0;
+        Mat probs;
+        labels.at<int>(i,0) = (int)em.predict( sample, probs, &likelihood );
    }
-    err = calcErr( bestLabels, testLabels, sizes, false );
-    if( err > 0.005f )
+    if( !calcErr( labels, testLabels, sizes, err, false ) )
    {
-        ts->printf( cvtest::TS::LOG, "Bad accuracy (%f) on test data.\n", err );
+        ts->printf( cvtest::TS::LOG, "Case index %i : Bad output labels.\n", caseIndex );
+        code = cvtest::TS::FAIL_INVALID_OUTPUT;
+    }
+    else if( err > 0.008f )
+    {
+        ts->printf( cvtest::TS::LOG, "Case index %i : Bad accuracy (%f) on test data.\n", caseIndex, err );
        code = cvtest::TS::FAIL_BAD_ACCURACY;
    }
+
+    return code;
+}
+
+void CV_EMTest::run( int /*start_from*/ )
+{
+    int sizesArr[] = { 500, 700, 800 };
+    int pointsCount = sizesArr[0]+ sizesArr[1] + sizesArr[2];
+
+    // Points distribution
+    Mat means;
+    vector<Mat> covs;
+    defaultDistribs( means, covs, CV_64FC1 );
+
+    // train data
+    Mat trainData( pointsCount, 2, CV_64FC1 ), trainLabels;
+    vector<int> sizes( sizesArr, sizesArr + sizeof(sizesArr) / sizeof(sizesArr[0]) );
+    generateData( trainData, trainLabels, sizes, means, covs, CV_64FC1, CV_32SC1 );
+
+    // test data
+    Mat testData( pointsCount, 2, CV_64FC1 ), testLabels;
+    generateData( testData, testLabels, sizes, means, covs, CV_64FC1, CV_32SC1 );
+
+    EM_Params params;
+    params.nclusters = 3;
+    Mat probs(trainData.rows, params.nclusters, CV_64FC1, cv::Scalar(1));
+    params.probs = &probs;
+    Mat weights(1, params.nclusters, CV_64FC1, cv::Scalar(1));
+    params.weights = &weights;
+    params.means = &means;
+    params.covs = &covs;
+
+    int code = cvtest::TS::OK;
+    int caseIndex = 0;
+    {
+        params.startStep = cv::EM::START_AUTO_STEP;
+        params.covMatType = cv::EM::COV_MAT_GENERIC;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_AUTO_STEP;
+        params.covMatType = cv::EM::COV_MAT_DIAGONAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_AUTO_STEP;
+        params.covMatType = cv::EM::COV_MAT_SPHERICAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_M_STEP;
+        params.covMatType = cv::EM::COV_MAT_GENERIC;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_M_STEP;
+        params.covMatType = cv::EM::COV_MAT_DIAGONAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_M_STEP;
+        params.covMatType = cv::EM::COV_MAT_SPHERICAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_E_STEP;
+        params.covMatType = cv::EM::COV_MAT_GENERIC;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_E_STEP;
+        params.covMatType = cv::EM::COV_MAT_DIAGONAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
+    {
+        params.startStep = cv::EM::START_E_STEP;
+        params.covMatType = cv::EM::COV_MAT_SPHERICAL;
+        int currCode = runCase(caseIndex++, params, trainData, trainLabels, testData, testLabels, sizes);
+        code = currCode == cvtest::TS::OK ? code : currCode;
+    }
    
    ts->set_failed_test_info( code );
 }

-class CV_EMTest_Smoke : public cvtest::BaseTest {
+class CV_EMTest_SaveLoad : public cvtest::BaseTest {
 public:
-    CV_EMTest_Smoke() {}
+    CV_EMTest_SaveLoad() {}
 protected:
    virtual void run( int /*start_from*/ )
    {
        int code = cvtest::TS::OK;
-        CvEM em;
+        const int nclusters = 2;
+        cv::EM em(nclusters);

-        Mat samples = Mat(3,2,CV_32F);
-        samples.at<float>(0,0) = 1;
-        samples.at<float>(1,0) = 2;
-        samples.at<float>(2,0) = 3;
-
-        CvEMParams params;
-        params.nclusters = 2;
+        Mat samples = Mat(3,1,CV_64FC1);
+        samples.at<double>(0,0) = 1;
+        samples.at<double>(1,0) = 2;
+        samples.at<double>(2,0) = 3;

        Mat labels;

-        em.train(samples, Mat(), params, &labels);
+        em.train(samples, labels);

-        Mat firstResult(samples.rows, 1, CV_32FC1);
+        Mat firstResult(samples.rows, 1, CV_32SC1);
        for( int i = 0; i < samples.rows; i++)
-            firstResult.at<float>(i) = em.predict( samples.row(i) );
+            firstResult.at<int>(i) = em.predict(samples.row(i));

        // Write out
        string filename = tempfile() + ".xml";
        {
            FileStorage fs = FileStorage(filename, FileStorage::WRITE);
-
            try
            {
-                em.write(fs.fs, "EM");
+                fs << "em" << "{";
+                em.write(fs);
+                fs << "}";
            }
            catch(...)
            {
@@ -378,11 +543,11 @@ protected:
        // Read in
        {
            FileStorage fs = FileStorage(filename, FileStorage::READ);
-            FileNode fileNode = fs["EM"];
-
+            CV_Assert(fs.isOpened());
+            FileNode fn = fs["em"];
            try
            {
-                em.read(const_cast<CvFileStorage*>(fileNode.fs), const_cast<CvFileNode*>(fileNode.node));
+                em.read(fn);
            }
            catch(...)
            {
@@ -395,7 +560,7 @@ protected:

        int errCaseCount = 0;
        for( int i = 0; i < samples.rows; i++)
-            errCaseCount = std::abs(em.predict(samples.row(i)) - firstResult.at<float>(i)) < FLT_EPSILON ? 0 : 1;
+            errCaseCount = std::abs(em.predict(samples.row(i)) - firstResult.at<int>(i)) < FLT_EPSILON ? 0 : 1;

        if( errCaseCount > 0 )
        {
@@ -410,4 +575,4 @@ protected:
 TEST(ML_KMeans, accuracy) { CV_KMeansTest test; test.safe_run(); }
 TEST(ML_KNearest, accuracy) { CV_KNearestTest test; test.safe_run(); }
 TEST(ML_EM, accuracy) { CV_EMTest test; test.safe_run(); }
-TEST(ML_EM, smoke) { CV_EMTest_Smoke test; test.safe_run(); }
+TEST(ML_EM, save_load) { CV_EMTest_SaveLoad test; test.safe_run(); }
--- a/modules/ml/test/test_mltests2.cpp
+++ b/modules/ml/test/test_mltests2.cpp
@@ -451,7 +451,6 @@ CV_MLBaseTest::CV_MLBaseTest(const char* _modelName)
    nbayes = 0;
    knearest = 0;
    svm = 0;
-    em = 0;
    ann = 0;
    dtree = 0;
    boost = 0;
@@ -463,8 +462,6 @@ CV_MLBaseTest::CV_MLBaseTest(const char* _modelName)
        knearest = new CvKNearest;
    else if( !modelName.compare(CV_SVM) )
        svm = new CvSVM;
-    else if( !modelName.compare(CV_EM) )
-        em = new CvEM;
    else if( !modelName.compare(CV_ANN) )
        ann = new CvANN_MLP;
    else if( !modelName.compare(CV_DTREE) )
@@ -487,8 +484,6 @@ CV_MLBaseTest::~CV_MLBaseTest()
        delete knearest;
    if( svm )
        delete svm;
-    if( em )
-        delete em;
    if( ann )
        delete ann;
    if( dtree )
@@ -756,8 +751,6 @@ void CV_MLBaseTest::save( const char* filename )
        knearest->save( filename );
    else if( !modelName.compare(CV_SVM) )
        svm->save( filename );
-    else if( !modelName.compare(CV_EM) )
-        em->save( filename );
    else if( !modelName.compare(CV_ANN) )
        ann->save( filename );
    else if( !modelName.compare(CV_DTREE) )
@@ -778,8 +771,6 @@ void CV_MLBaseTest::load( const char* filename )
        knearest->load( filename );
    else if( !modelName.compare(CV_SVM) )
        svm->load( filename );
-    else if( !modelName.compare(CV_EM) )
-        em->load( filename );
    else if( !modelName.compare(CV_ANN) )
        ann->load( filename );
    else if( !modelName.compare(CV_DTREE) )
--- a/modules/ml/test/test_precomp.hpp
+++ b/modules/ml/test/test_precomp.hpp
@@ -44,7 +44,6 @@ protected:
    CvNormalBayesClassifier* nbayes;
    CvKNearest* knearest;
    CvSVM* svm;
-    CvEM* em;
    CvANN_MLP* ann;
    CvDTree* dtree;
    CvBoost* boost;