2011-02-22 21:43:26 +01:00
Clustering
==========
2011-04-19 13:41:12 +02:00
.. highlight :: cpp
2011-02-28 22:26:43 +01:00
kmeans
2011-03-08 23:22:24 +01:00
------
2011-06-30 00:06:42 +02:00
Finds centers of clusters and groups input samples around the clusters.
2012-05-28 13:22:43 +02:00
.. ocv:function :: double kmeans( InputArray data, int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray() )
2011-06-30 00:06:42 +02:00
2013-03-15 13:55:58 +01:00
.. ocv:pyfunction :: cv2.kmeans(data, K, bestLabels, criteria, attempts, flags[, centers]) -> retval, bestLabels, centers
2011-03-08 23:22:24 +01:00
2012-05-28 13:22:43 +02:00
.. ocv:cfunction :: int cvKMeans2( const CvArr* samples, int cluster_count, CvArr* labels, CvTermCriteria termcrit, int attempts=1, CvRNG* rng=0, int flags=0, CvArr* _centers=0, double* compactness=0 )
2011-02-22 21:43:26 +01:00
2011-04-07 22:29:59 +02:00
:param samples: Floating-point matrix of input samples, one row per sample.
2011-02-22 21:43:26 +01:00
2013-06-14 09:53:54 +02:00
:param data: Data for clustering.
2012-05-29 12:36:19 +02:00
:param cluster_count: Number of clusters to split the set by.
2011-02-22 21:43:26 +01:00
2013-06-14 09:53:54 +02:00
:param K: Number of clusters to split the set by.
2011-04-10 23:25:46 +02:00
:param labels: Input/output integer array that stores the cluster indices for every sample.
2011-02-22 21:43:26 +01:00
2011-06-30 00:06:42 +02:00
:param criteria: The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as ``criteria.epsilon``. As soon as each of the cluster centers moves by less than ``criteria.epsilon`` on some iteration, the algorithm stops.
2011-02-22 21:43:26 +01:00
2013-06-14 09:53:54 +02:00
:param termcrit: The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy.
2012-04-13 23:50:59 +02:00
:param attempts: Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness (see the last function parameter).
:param rng: CvRNG state initialized by RNG().
2011-02-22 21:43:26 +01:00
2011-04-10 23:25:46 +02:00
:param flags: Flag that can take the following values:
2011-02-22 21:43:26 +01:00
2011-04-10 23:25:46 +02:00
* **KMEANS_RANDOM_CENTERS** Select random initial centers in each attempt.
2011-02-22 21:43:26 +01:00
2011-06-30 00:06:42 +02:00
* **KMEANS_PP_CENTERS** Use `` kmeans++ `` center initialization by Arthur and Vassilvitskii [Arthur2007].
2011-02-22 21:43:26 +01:00
2011-06-17 23:38:06 +02:00
* **KMEANS_USE_INITIAL_LABELS** During the first (and possibly the only) attempt, use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of `` KMEANS_*_CENTERS `` flag to specify the exact method.
2011-02-22 21:43:26 +01:00
2011-04-10 23:25:46 +02:00
:param centers: Output matrix of the cluster centers, one row per each cluster center.
2011-02-22 21:43:26 +01:00
2013-06-14 09:53:54 +02:00
:param _centers: Output matrix of the cluster centers, one row per each cluster center.
2012-04-13 23:50:59 +02:00
:param compactness: The returned value that is described below.
2011-02-26 12:05:10 +01:00
The function `` kmeans `` implements a k-means algorithm that finds the
2012-05-29 12:36:19 +02:00
centers of `` cluster_count `` clusters and groups the input samples
2011-06-17 23:38:06 +02:00
around the clusters. As an output,
2011-02-26 12:05:10 +01:00
:math: `\texttt{labels}_i` contains a 0-based cluster index for
the sample stored in the
:math: `i^{th}` row of the `` samples `` matrix.
2011-02-22 21:43:26 +01:00
2011-06-17 23:38:06 +02:00
The function returns the compactness measure that is computed as
2011-02-22 21:43:26 +01:00
2011-02-26 12:05:10 +01:00
.. math ::
2011-02-22 21:43:26 +01:00
2011-02-26 12:05:10 +01:00
\sum _i \| \texttt{samples} _i - \texttt{centers} _{ \texttt{labels} _i} \| ^2
2011-02-22 21:43:26 +01:00
2011-04-10 23:25:46 +02:00
after every attempt. The best (minimum) value is chosen and the
2011-02-22 21:43:26 +01:00
corresponding labels and the compactness value are returned by the function.
2011-04-10 23:25:46 +02:00
Basically, you can use only the core of the function, set the number of
attempts to 1, initialize labels each time using a custom algorithm, pass them with the
2011-02-26 12:05:10 +01:00
( `` flags `` = `` KMEANS_USE_INITIAL_LABELS `` ) flag, and then choose the best (most-compact) clustering.
2011-02-22 21:43:26 +01:00
2011-02-28 22:26:43 +01:00
partition
2011-02-22 21:43:26 +01:00
-------------
2011-06-30 00:06:42 +02:00
Splits an element set into equivalency classes.
2011-02-22 21:43:26 +01:00
2011-07-07 18:59:09 +02:00
.. ocv:function :: template<typename _Tp, class _EqPredicate> int partition( const vector<_Tp>& vec, vector<int>& labels, _EqPredicate predicate=_EqPredicate())
2011-02-22 21:43:26 +01:00
2011-04-10 23:25:46 +02:00
:param vec: Set of elements stored as a vector.
2011-02-22 21:43:26 +01:00
2012-10-17 19:42:09 +02:00
:param labels: Output vector of labels. It contains as many elements as ``vec``. Each label ``labels[i]`` is a 0-based cluster index of ``vec[i]`` .
2011-06-17 23:38:06 +02:00
:param predicate: Equivalence predicate (pointer to a boolean function of two arguments or an instance of the class that has the method ``bool operator()(const _Tp& a, const _Tp& b)`` ). The predicate returns ``true`` when the elements are certainly in the same class, and returns ``false`` if they may or may not be in the same class.
2011-02-22 21:43:26 +01:00
2011-02-26 12:05:10 +01:00
The generic function `` partition `` implements an
:math: `O(N^2)` algorithm for
splitting a set of
:math: `N` elements into one or more equivalency classes, as described in
2011-02-22 21:43:26 +01:00
http://en.wikipedia.org/wiki/Disjoint-set_data_structure
. The function
returns the number of equivalency classes.
2011-07-07 18:59:09 +02:00
.. [Arthur2007] Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007