Purpose: updated the last section of chapter 10

2011-03-31 22:07:17 +00:00 · 2011-03-31 22:07:17 +00:00 · 3f2daa1dcf
commit 3f2daa1dcf
parent 60633fddd0
10 changed files with 259 additions and 219 deletions
--- a/modules/gpu/doc/data_structures.rst
+++ b/modules/gpu/doc/data_structures.rst
@ -7,8 +7,7 @@ gpu::DevMem2D\_
 ---------------
 .. cpp:class:: gpu::DevMem2D\_

-This lightweight class encapsulates pitched memory on a GPU and is passed to nvcc-compiled code (CUDA kernels). Typically, it is used internally by OpenCV and by users who write device code. You can call its members from both host and device code. 
-::
+This lightweight class encapsulates pitched memory on a GPU and is passed to nvcc-compiled code (CUDA kernels). Typically, it is used internally by OpenCV and by users who write device code. You can call its members from both host and device code. ::

    template <typename T> struct DevMem2D_
    {
@ -103,9 +102,10 @@ This is a base storage class for GPU memory with reference counting. Its interfa
 *   
    no expression templates technique support
    
-Beware that the latter limitation may lead to overloaded matrix operators that cause memory allocations. The ``GpuMat`` class is convertible to :cpp:class:`gpu::DevMem2D_` and :cpp:class:`gpu::PtrStep_` so it can be passed to directly to kernel.
+Beware that the latter limitation may lead to overloaded matrix operators that cause memory allocations. The ``GpuMat`` class is convertible to :cpp:class:`gpu::DevMem2D_` and :cpp:class:`gpu::PtrStep_` so it can be passed directly to kernel.

 **Note:**
+
 In contrast with :c:type:`Mat`, in most cases ``GpuMat::isContinuous() == false`` . This means that rows are aligned to size depending on the hardware. Single-row ``GpuMat`` is always a continuous matrix. ::

    class CV_EXPORTS GpuMat
@ -141,6 +141,7 @@ In contrast with :c:type:`Mat`, in most cases ``GpuMat::isContinuous() == false`


 **Note:**
+
 You are not recommended to leave static or global ``GpuMat`` variables allocated, that is to rely on its destructor. The destruction order of such variables and CUDA context is undefined. GPU memory release function returns error if the CUDA context has been destroyed before.

 See Also:
@ -156,14 +157,15 @@ This class with reference counting wraps special memory type allocation function
 :func:`Mat`-like but with additional memory type parameters.
    
 *
-    ``ALLOC_PAGE_LOCKED``:  Sets a page locked memory type, used commonly for fast and asynchronous upload/download data from/to GPU.
+    ``ALLOC_PAGE_LOCKED``:  Sets a page locked memory type, used commonly for fast and asynchronous uploading/downloading data from/to GPU.
 *
    ``ALLOC_ZEROCOPY``:  Specifies a zero copy memory allocation that enables mapping the host memory to GPU address space, if supported.
 *
    ``ALLOC_WRITE_COMBINED``:  Sets the write combined buffer that is not cached by CPU. Such buffers are used to supply GPU with data when GPU only reads it. The advantage is a better CPU cache utilization.

 **Note:**
-Allocation size of such memory types is usually limited. For more details please see "CUDA 2.2 Pinned Memory APIs" document or "CUDA_C Programming Guide".
+
+Allocation size of such memory types is usually limited. For more details, see "CUDA 2.2 Pinned Memory APIs" document or "CUDA C Programming Guide".
 ::

    class CV_EXPORTS CudaMem
@ -212,7 +214,7 @@ gpu::CudaMem::createGpuMatHeader

 .. cpp:function:: GpuMat gpu::CudaMem::createGpuMatHeader() const

-    Maps CPU memory to GPU address space and creates :cpp:class:`gpu::GpuMat` header without reference counting for it. This can be done only if memory was allocated with ``ALLOC_ZEROCOPY`` flag and if it is supported by the hardware (laptops often share video and CPU memory, so address spaces can be mapped, and that eliminates an extra copy).
+    Maps CPU memory to GPU address space and creates the :cpp:class:`gpu::GpuMat` header without reference counting for it. This can be done only if memory was allocated with the ``ALLOC_ZEROCOPY`` flag and if it is supported by the hardware (laptops often share video and CPU memory, so address spaces can be mapped, which eliminates an extra copy).

 .. index:: gpu::CudaMem::canMapHostMemory

@ -220,7 +222,7 @@ gpu::CudaMem::canMapHostMemory
 ----------------------------------
 .. cpp:function:: static bool gpu::CudaMem::canMapHostMemory()

-    Returns true if the current hardware supports address space mapping and ``ALLOC_ZEROCOPY`` memory allocation.
+    Returns ``true`` if the current hardware supports address space mapping and ``ALLOC_ZEROCOPY`` memory allocation.

 .. index:: gpu::Stream

@ -228,9 +230,10 @@ gpu::Stream
 -----------
 .. cpp:class:: gpu::Stream

-This class encapsulated a queue of asynchronous calls. Some functions have overloads with the additional ``gpu::Stream`` parameter. The overloads do initialization work (allocate output buffers, upload constants, and so on), start the GPU kernel, and return before results are ready. You can check whether all operation are complete via :cpp:func:`gpu::Stream::queryIfComplete`. You can asynchronously upload/download data from/to page-locked buffers, using :cpp:class:`gpu::CudaMem` or :c:type:`Mat` header that points to a region of :cpp:class:`gpu::CudaMem`.
+This class encapsulates a queue of asynchronous calls. Some functions have overloads with the additional ``gpu::Stream`` parameter. The overloads do initialization work (allocate output buffers, upload constants, and so on), start the GPU kernel, and return before results are ready. You can check whether all operations are complete via :cpp:func:`gpu::Stream::queryIfComplete`. You can asynchronously upload/download data from/to page-locked buffers, using the :cpp:class:`gpu::CudaMem` or :c:type:`Mat` header that points to a region of :cpp:class:`gpu::CudaMem`.

 **Note:**
+
 Currently, you may face problems if an operation is enqueued twice with different data. Some functions use the constant GPU memory, and next call may update the memory before the previous one has been finished. But calling different operations asynchronously is safe because each operation has its own constant buffer. Memory copy/upload/download/set operations to the buffers you hold are also safe. 
 ::

@ -275,7 +278,7 @@ gpu::Stream::queryIfComplete
 --------------------------------
 .. cpp:function:: bool gpu::Stream::queryIfComplete()

-    Returns true if the current stream queue is finished, otherwise false.
+    Returns ``true`` if the current stream queue is finished. Otherwise, it returns false.

 .. index:: gpu::Stream::waitForCompletion

@ -283,7 +286,7 @@ gpu::Stream::waitForCompletion
 ----------------------------------
 .. cpp:function:: void gpu::Stream::waitForCompletion()

-    Blocks until all operations in the stream are complete.
+    Blocks ?? until all operations in the stream are complete.

 .. index:: gpu::StreamAccessor

@ -318,14 +321,14 @@ gpu::createContinuous

    The following wrappers are also available:
    
-    *
-        .. cpp:function:: GpuMat gpu::createContinuous(int rows, int cols, int type)
-    *
-        .. cpp:function:: void gpu::createContinuous(Size size, int type, GpuMat& m)
-    *
-        .. cpp:function:: GpuMat gpu::createContinuous(Size size, int type)
+    
+		* .. cpp:function:: GpuMat gpu::createContinuous(int rows, int cols, int type)
+    
+		* .. cpp:function:: void gpu::createContinuous(Size size, int type, GpuMat& m)
+    
+		* .. cpp:function:: GpuMat gpu::createContinuous(Size size, int type)

-    Matrix is called continuous if its elements are stored continuously, that is wuthout gaps in the end of each row.
+    Matrix is called continuous if its elements are stored continuously, that is without gaps in the end of each row.

 .. index:: gpu::ensureSizeIsEnough

@ -341,13 +344,13 @@ gpu::ensureSizeIsEnough

    :param cols: Minimum desired number of columns.
    
-    :param size: rows and cols passed as a structure
+    :param size: Rows and coumns passed as a structure.

    :param type: Desired matrix type.

    :param m: Destination matrix.

-    The following wrapper is also available:
+    The following wrapper is also available: ??

    

--- a/modules/gpu/doc/feature_detection_and_description.rst
+++ b/modules/gpu/doc/feature_detection_and_description.rst
@ -9,7 +9,7 @@ gpu::SURF_GPU
 -------------
 .. cpp:class:: gpu::SURF_GPU

-Class used for extracting Speeded Up Robust Features (SURF) from an image. 
+This class is used for extracting Speeded Up Robust Features (SURF) from an image. 
 ::

    class SURF_GPU : public CvSURFParams
@ -72,12 +72,12 @@ Class used for extracting Speeded Up Robust Features (SURF) from an image.

 The class ``SURF_GPU`` implements Speeded Up Robust Features descriptor. There is a fast multi-scale Hessian keypoint detector that can be used to find the keypoints (which is the default option). But the descriptors can also be computed for the user-specified keypoints. Only 8 bit grayscale images are supported.

-The class ``SURF_GPU`` can store results in the GPU and CPU memory. It provides functions to convert results between CPU and GPU version ( ``uploadKeypoints``,``downloadKeypoints``,``downloadDescriptors`` ). The format of CPU results is the same as ``SURF`` results. GPU results are stored in  ``GpuMat`` . The ``keypoints`` matrix is one-row matrix of the ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian`` .  The ``descriptors`` matrix is
-:math:`\texttt{nFeatures} \times \texttt{descriptorSize}` matrix with ``CV_32FC1`` type.
+The class ``SURF_GPU`` can store results in the GPU and CPU memory. It provides functions to convert results between CPU and GPU version ( ``uploadKeypoints``, ``downloadKeypoints``, ``downloadDescriptors`` ). The format of CPU results is the same as ``SURF`` results. GPU results are stored in  ``GpuMat`` . The ``keypoints`` matrix is a one-row matrix of the ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian`` .  The ``descriptors`` matrix is
+:math:`\texttt{nFeatures} \times \texttt{descriptorSize}` matrix with the ``CV_32FC1`` type.

 The class ``SURF_GPU`` uses some buffers and provides access to it. All buffers can be safely released between function calls.

-See Also: :c:type:`SURF`.
+See Also: :c:type:`SURF`

 .. index:: gpu::BruteForceMatcher_GPU

@ -85,7 +85,7 @@ gpu::BruteForceMatcher_GPU
 --------------------------
 .. cpp:class:: gpu::BruteForceMatcher_GPU

-Brute-force descriptor matcher. For each descriptor in the first set, this matcher finds the closest descriptor in the second set by trying each one. This descriptor matcher supports masking permissible matches between descriptor sets. ::
+This is a brute-force descriptor matcher. For each descriptor in the first set, this matcher finds the closest descriptor in the second set by trying each one. This descriptor matcher supports masking permissible matches between descriptor sets. ::

    template<class Distance>
    class BruteForceMatcher_GPU
@ -170,9 +170,9 @@ Brute-force descriptor matcher. For each descriptor in the first set, this match
    };


-The class ``BruteForceMatcher_GPU`` has the interface similar to class :c:type:`DescriptorMatcher`. It has two groups of ``match`` methods: for matching descriptors of one image with another image or with an image set. Also, all functions have an alternative: save results to the GPU memory or to the CPU memory. ``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only ``L1<float>`` and ``L2<float>`` distance types.
+The class ``BruteForceMatcher_GPU`` has an interface similar to the class :c:type:`DescriptorMatcher`. It has two groups of ``match`` methods: for matching descriptors of one image with another image or with an image set. Also, all functions have an alternative: save results to the GPU memory or to the CPU memory. ``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only the ``L1<float>`` and ``L2<float>`` distance types.

-See also: :c:type:`DescriptorMatcher`, :c:type:`BruteForceMatcher`.
+See Also: :c:type:`DescriptorMatcher`, :c:type:`BruteForceMatcher`

 .. index:: gpu::BruteForceMatcher_GPU::match

@ -185,7 +185,7 @@ gpu::BruteForceMatcher_GPU::match
    Finds the best match for each descriptor from a query set with train descriptors.

 See Also:
-:c:func:`DescriptorMatcher::match` .
+:c:func:`DescriptorMatcher::match` 

 .. index:: gpu::BruteForceMatcher_GPU::matchSingle

@ -201,7 +201,7 @@ gpu::BruteForceMatcher_GPU::matchSingle
    
    :param trainIdx: The output single-row ``CV_32SC1`` matrix that contains the best train index for each query. If some query descriptors are masked out in ``mask`` , it contains -1.
    
-    :param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``mask``, it will contains ``FLT_MAX``.
+    :param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``mask``, it contains ``FLT_MAX``.

    :param mask: Mask specifying permissible matches between the input query and train matrices of descriptors.

@ -213,17 +213,17 @@ gpu::BruteForceMatcher_GPU::matchCollection

    Finds the best match for each query descriptor from train collection. Results are stored in the GPU memory.

-    :param queryDescs: Query set of descriptors.
+	:param queryDescs: Query set of descriptors.
    
-    :param trainCollection: :cpp:class:`gpu::GpuMat` containing train collection. It can be obtained from the collection of train descriptors that was set using the ``add``     method by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it may contain a user-defined collection. This is a one-row matrix where each element is ``DevMem2D`` pointing out to a matrix of train descriptors.
+	:param trainCollection: :cpp:class:`gpu::GpuMat` containing train collection. It can be obtained from the collection of train descriptors that was set using the ``add``     method by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it may contain a user-defined collection. This is a one-row matrix where each element is ``DevMem2D`` pointing out to a matrix of train descriptors.
    
-    :param trainIdx: The output single-row ``CV_32SC1`` matrix that contains the best train index for each query. If some query descriptors are masked out in ``maskCollection``  , it contains -1.
+	:param trainIdx: The output single-row ``CV_32SC1`` matrix that contains the best train index for each query. If some query descriptors are masked out in ``maskCollection``  , it contains -1.
    
-    :param imgIdx: The output single-row ``CV_32SC1`` matrix that contains image train index for each query. If some query descriptors are masked out in ``maskCollection``  , it contains -1.
+	:param imgIdx: The output single-row ``CV_32SC1`` matrix that contains image train index for each query. If some query descriptors are masked out in ``maskCollection``  , it contains -1.
    
-    :param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``maskCollection``  , it contains ``FLT_MAX``.
+	:param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``maskCollection``  , it contains ``FLT_MAX``.

-    :param maskCollection: ``GpuMat``  containing a set of masks. It can be obtained from  ``std::vector<GpuMat>``  by  :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection` or it may contain  a user-defined mask set. This is an empty matrix or one-row matrix where each element is a  ``PtrStep``  that points to one mask.
+	:param maskCollection: ``GpuMat``  containing a set of masks. It can be obtained from  ``std::vector<GpuMat>``  by  :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection` or it may contain  a user-defined mask set. This is an empty matrix or one-row matrix where each element is a  ``PtrStep``  that points to one mask.

 .. index:: gpu::BruteForceMatcher_GPU::makeGpuCollection

@ -231,7 +231,8 @@ gpu::BruteForceMatcher_GPU::makeGpuCollection
 -------------------------------------------------
 .. cpp:function:: void gpu::BruteForceMatcher_GPU::makeGpuCollection(GpuMat& trainCollection, GpuMat& maskCollection, const vector<GpuMat>&masks = std::vector<GpuMat>())

-    Makes gpu collection of train descriptors and masks in suitable format for :cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` function.
+	Performs a GPU collection of train descriptors and masks in a suitable format for the 
+	:cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` function.

 .. index:: gpu::BruteForceMatcher_GPU::matchDownload

@ -241,7 +242,9 @@ gpu::BruteForceMatcher_GPU::matchDownload

 .. cpp:function:: void gpu::BruteForceMatcher_GPU::matchDownload(const GpuMat& trainIdx, GpuMat& imgIdx, const GpuMat& distance, std::vector<DMatch>&matches)

-    Downloads ``trainIdx``, ``imgIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::matchSingle` or :cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` to CPU vector with :c:type:`DMatch`.
+	Downloads ``trainIdx``, ``imgIdx``, and ``distance`` matrices obtained via 
+	:cpp:func:`gpu::BruteForceMatcher_GPU::matchSingle` or 
+	:cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` to CPU vector with :c:type:`DMatch`.

 .. index:: gpu::BruteForceMatcher_GPU::knnMatch

@ -254,7 +257,7 @@ gpu::BruteForceMatcher_GPU::knnMatch
 .. c:function:: void knnMatch(const GpuMat& queryDescs, std::vector< std::vector<DMatch> >&matches, int k, const std::vector<GpuMat>&masks = std::vector<GpuMat>(), bool compactResult = false )

 See Also:
-:func:`DescriptorMatcher::knnMatch` .
+:func:`DescriptorMatcher::knnMatch` 

 .. index:: gpu::BruteForceMatcher_GPU::knnMatch

@ -266,9 +269,9 @@ gpu::BruteForceMatcher_GPU::knnMatch

    :param queryDescs: Query set of descriptors.
    :param trainDescs: Training set of descriptors. It is not be added to train descriptors collection stored in the class object.
-    :param trainIdx: The output matrix of ``queryDescs.rows x k`` size and ``CV_32SC1`` type. ``trainIdx.at<int>(i, j)`` contains an index of the j-th best match for the i-th query descriptor. If some query descriptors are masked out in ``mask``, it will contains -1.
-    :param distance: The output matrix of ``queryDescs.rows x k`` size and ``CV_32FC1`` type. ``distance.at<float>(i, j)`` contains a distance from the j-th best match for the i-th query descriptor to the query descriptor. If some query descriptors are masked out in ``mask``, it will contain ``FLT_MAX``.
-    :param allDist: The floating-point matrix of the size ``queryDescs.rows x trainDescs.rows``. This is a buffer to store all distances between each query descriptors and each train descriptor. On output, ``allDist.at<float>(queryIdx, trainIdx)`` will contain ``FLT_MAX`` if ``trainIdx`` is one from k best.
+    :param trainIdx: The output matrix of ``queryDescs.rows x k`` size and ``CV_32SC1`` type. ``trainIdx.at<int>(i, j)`` contains an index of the j-th best match for the i-th query descriptor. If some query descriptors are masked out in ``mask``, it contains -1.
+    :param distance: The output matrix of ``queryDescs.rows x k`` size and ``CV_32FC1`` type. ``distance.at<float>(i, j)`` contains a distance from the j-th best match for the i-th query descriptor to the query descriptor. If some query descriptors are masked out in ``mask``, it contains ``FLT_MAX``.
+    :param allDist: The floating-point matrix of the size ``queryDescs.rows x trainDescs.rows``. This is a buffer to store all distances between each query descriptors and each train descriptor. On output, ``allDist.at<float>(queryIdx, trainIdx)`` contains ``FLT_MAX`` if ``trainIdx`` is one from k best.

    :param k: Number of the best matches per each query descriptor (or less if it is not possible).

@ -280,7 +283,7 @@ gpu::BruteForceMatcher_GPU::knnMatchDownload
 ------------------------------------------------
 .. cpp:function:: void gpu::BruteForceMatcher_GPU::knnMatchDownload(const GpuMat& trainIdx, const GpuMat& distance, std::vector< std::vector<DMatch> >&matches, bool compactResult = false)

-    Downloads ``trainIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::knnMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
+    Downloads ``trainIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::knnMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true, the ``matches`` vector does not contain matches for fully masked-out query descriptors.

 .. index:: gpu::BruteForceMatcher_GPU::radiusMatch

@ -292,11 +295,10 @@ gpu::BruteForceMatcher_GPU::radiusMatch

 .. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatch(const GpuMat& queryDescs, std::vector< std::vector<DMatch> >&matches, float maxDistance, const std::vector<GpuMat>&masks = std::vector<GpuMat>(), bool compactResult = false)

-    This function works only on devices with the compute capability
-:math:`>=` 1.1.
+    This function works only on devices with the compute capability  :math:`>=` 1.1.

 See Also:
-:func:`DescriptorMatcher::radiusMatch` .
+:func:`DescriptorMatcher::radiusMatch` 

 .. index:: gpu::BruteForceMatcher_GPU::radiusMatch

@ -310,9 +312,9 @@ gpu::BruteForceMatcher_GPU::radiusMatch
    
    :param trainDescs: Training set of descriptors. It is not added to train descriptors collection stored in the class object.
    
-    :param trainIdx: ``trainIdx.at<int>(i, j)`` is the index of j-th training descriptor which is close enough to i-th query descriptor. If ``trainIdx`` is empty, it is created with the size ``queryDescs.rows x trainDescs.rows``. When the matrix is pre-allocated, it can have less than ``trainDescs.rows`` columns. Then the function will return as many matches for each query descriptors as fit into the matrix.
+    :param trainIdx: ``trainIdx.at<int>(i, j)`` , the index of j-th training descriptor which is close enough to i-th query descriptor. If ``trainIdx`` is empty, it is created with the size ``queryDescs.rows x trainDescs.rows``. When the matrix is pre-allocated, it can have less than ``trainDescs.rows`` columns. Then, the function returns as many matches for each query descriptor as fit into the matrix.
    
-    :param nMatches: ``nMatches.at<unsigned int>(0, i)`` contains the number of matching descriptors for the i-th query descriptor. The value can be larger than ``trainIdx.cols`` - it means that the function could not store all the matches since it did not have enough memory.
+    :param nMatches: ``nMatches.at<unsigned int>(0, i)`` containing the number of matching descriptors for the i-th query descriptor. The value can be larger than ``trainIdx.cols`` , which means that the function could not store all the matches since it does not have enough memory.
    
    :param distance: ``distance.at<int>(i, j)`` Distance between the j-th match for the j-th query descriptor and this very query descriptor. The matrix has the ``CV_32FC1`` type and the same size as ``trainIdx``.

@ -328,5 +330,5 @@ gpu::BruteForceMatcher_GPU::radiusMatchDownload
 ---------------------------------------------------
 .. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatchDownload(const GpuMat& trainIdx, const GpuMat& nMatches, const GpuMat& distance, std::vector< std::vector<DMatch> >&matches, bool compactResult = false)

-    Downloads ``trainIdx``, ``nMatches`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::radiusMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
+	Downloads ``trainIdx``, ``nMatches`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::radiusMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true, the ``matches`` vector does not contain matches for fully masked-out query descriptors.

--- a/modules/gpu/doc/image_filtering.rst
+++ b/modules/gpu/doc/image_filtering.rst
@ -13,7 +13,7 @@ gpu::BaseRowFilter_GPU
 ----------------------
 .. cpp:class:: gpu::BaseRowFilter_GPU

-The base class for linear or non-linear filters that processes rows of 2D arrays. Such filters are used for the "horizontal" filtering passes in separable filters. ::
+This is a base class for linear or non-linear filters that processes rows of 2D arrays. Such filters are used for the "horizontal" filtering passes in separable filters. ::

    class BaseRowFilter_GPU
    {
@ -25,7 +25,9 @@ The base class for linear or non-linear filters that processes rows of 2D arrays
    };


-**Note:** This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
+**Note:** 
+
+This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.

 .. index:: gpu::BaseColumnFilter_GPU

@ -33,7 +35,7 @@ gpu::BaseColumnFilter_GPU
 -------------------------
 .. cpp:class:: gpu::BaseColumnFilter_GPU

-The base class for linear or non-linear filters that processes columns of 2D arrays. Such filters are used for the "vertical" filtering passes in separable filters. ::
+This is a base class for linear or non-linear filters that processes columns of 2D arrays. Such filters are used for the "vertical" filtering passes in separable filters. ::

    class BaseColumnFilter_GPU
    {
@ -46,6 +48,7 @@ The base class for linear or non-linear filters that processes columns of 2D arr


 **Note:**
+
 This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.

 .. index:: gpu::BaseFilter_GPU
@ -54,7 +57,7 @@ gpu::BaseFilter_GPU
 -------------------
 .. cpp:class:: gpu::BaseFilter_GPU

-The base class for non-separable 2D filters. ::
+This is a base class for non-separable 2D filters. ::

    class CV_EXPORTS BaseFilter_GPU
    {
@ -68,6 +71,7 @@ The base class for non-separable 2D filters. ::


 **Note:**
+
 This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.

 .. index:: gpu::FilterEngine_GPU
@ -76,7 +80,7 @@ gpu::FilterEngine_GPU
 ---------------------
 .. cpp:class:: gpu::FilterEngine_GPU

-The base class for Filter Engine. ::
+This is a base class for Filter Engine. ::

    class CV_EXPORTS FilterEngine_GPU
    {
@ -89,9 +93,10 @@ The base class for Filter Engine. ::


 The class can be used to apply an arbitrary filtering operation to an image. It contains all the necessary intermediate buffers. Pointers to the initialized ``FilterEngine_GPU`` instances are returned by various ``create*Filter_GPU`` functions (see below), and they are used inside high-level functions such as
-:func:`gpu::filter2D`,:func:`gpu::erode`,:func:`gpu::Sobel` , and others.
+:func:`gpu::filter2D`, :func:`gpu::erode`, :func:`gpu::Sobel` , and others.

-By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary memory allocation for intermediate buffers and get much better performance: ::
+By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary memory allocation for intermediate buffers and get much better performance: 
+::

    while (...)
    {
@ -113,9 +118,11 @@ By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary mem
    // Release buffers only once
    filter.release();

- ``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``,``FilterEngine_GPU`` processes the inner region of an image ( ``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)`` ), because some filters do not check whether indices are outside the image for better perfomance. See below to understand which filters support processing the whole image and which do not and identify image type limitations.
+ ``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``, ``FilterEngine_GPU`` processes the inner region of an image ( ``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)`` ), because some filters do not check whether indices are outside the image for better perfomance. See below to understand which filters support processing the whole image and which do not and identify image type limitations.

-**Note:** The GPU filters do not support the in-place mode.
+**Note:** 
+
+The GPU filters do not support the in-place mode.

 See also: :cpp:class:`gpu::BaseRowFilter_GPU`, :cpp:class:`gpu::BaseColumnFilter_GPU`, :cpp:class:`gpu::BaseFilter_GPU`, :cpp:func:`gpu::createFilter2D_GPU`, :cpp:func:`gpu::createSeparableFilter_GPU`, :cpp:func:`gpu::createBoxFilter_GPU`, :cpp:func:`gpu::createMorphologyFilter_GPU`, :cpp:func:`gpu::createLinearFilter_GPU`, :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :cpp:func:`gpu::createDerivFilter_GPU`, :cpp:func:`gpu::createGaussianFilter_GPU`.

@ -171,7 +178,9 @@ gpu::getRowSumFilter_GPU

    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

 .. index:: gpu::getColumnSumFilter_GPU

@ -189,7 +198,9 @@ gpu::getColumnSumFilter_GPU

    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

 .. index:: gpu::createBoxFilter_GPU

@ -209,9 +220,11 @@ gpu::createBoxFilter_GPU

    :param anchor: Anchor point. The default value ``Point(-1, -1)`` means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`boxFilter`.
+See Also: :c:func:`boxFilter`

 .. index:: gpu::boxFilter

@ -231,9 +244,11 @@ gpu::boxFilter

    :param anchor: Anchor point. The default value ``Point(-1, -1)`` means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`boxFilter`.
+See Also: :c:func:`boxFilter`

 .. index:: gpu::blur

@ -251,9 +266,11 @@ gpu::blur

    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`blur`, :cpp:func:`gpu::boxFilter`.
+See Also: :c:func:`blur`, :cpp:func:`gpu::boxFilter`

 .. index:: gpu::createMorphologyFilter_GPU

@ -275,9 +292,11 @@ gpu::createMorphologyFilter_GPU

    :param anchor: Anchor position within the structuring element. Negative values mean that the anchor is at the center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`createMorphologyFilter`.
+See Also: :c:func:`createMorphologyFilter`

 .. index:: gpu::erode

@ -297,9 +316,11 @@ gpu::erode

    :param iterations: Number of times erosion to be applied.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`erode`.
+See Also: :c:func:`erode`

 .. index:: gpu::dilate

@ -319,9 +340,11 @@ gpu::dilate

    :param iterations: Number of times dilation to be applied.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`dilate`.
+See Also: :c:func:`dilate`

 .. index:: gpu::morphologyEx

@ -354,9 +377,11 @@ gpu::morphologyEx

    :param iterations: Number of times erosion and dilation to be applied.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`morphologyEx` .
+See Also: :c:func:`morphologyEx` 

 .. index:: gpu::createLinearFilter_GPU

@ -378,9 +403,11 @@ gpu::createLinearFilter_GPU

    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`createLinearFilter`.
+See Also: :c:func:`createLinearFilter`

 .. index:: gpu::filter2D

@ -400,9 +427,11 @@ gpu::filter2D

    :param anchor: Anchor of the kernel that indicates the relative position of a filtered point within the kernel. The anchor resides within the kernel. The special default value (-1,-1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

-See Also: :c:func:`filter2D`.
+See Also: :c:func:`filter2D`

 .. index:: gpu::Laplacian

@ -423,6 +452,7 @@ gpu::Laplacian
    :param scale: Optional scale factor for the computed Laplacian values. By default, no scaling is applied (see  :c:func:`getDerivKernels` ).

 	**Note:**
+	
 	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

 See Also: :c:func:`Laplacian`,:func:`gpu::filter2D` .
@ -471,9 +501,9 @@ gpu::getLinearColumnFilter_GPU

 	There are two versions of the algorithm: NPP and OpenCV.
 	* NPP version is called when ``dstType == CV_8UC1`` or ``dstType == CV_8UC4`` and ``bufType == dstType`` . Otherwise, the OpenCV version is called. NPP supports only ``BORDER_CONSTANT`` border type and does not check indices outside the image. 
-	* OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE``, and ``BORDER_CONSTANT`` border types. It checks indices outside image.
+	* OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``, ``BORDER_REPLICATE``, and ``BORDER_CONSTANT`` border types. It checks indices outside image.
 	
-See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :c:func:`createSeparableLinearFilter`.
+See Also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :c:func:`createSeparableLinearFilter`

 .. index:: gpu::createSeparableLinearFilter_GPU

@ -494,7 +524,7 @@ gpu::createSeparableLinearFilter_GPU
    :param rowBorderType, columnBorderType: Pixel extrapolation method in the horizontal and vertical directions For details, see  :c:func:`borderInterpolate`. For details on limitations, see :cpp:func:`gpu::getLinearRowFilter_GPU`, cpp:func:`gpu::getLinearColumnFilter_GPU`.


-See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
+See Also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`

 .. index:: gpu::sepFilter2D

@ -516,7 +546,7 @@ gpu::sepFilter2D

    :param rowBorderType, columnBorderType: Pixel extrapolation method. For details, see  :c:func:`borderInterpolate`.

-See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`sepFilter2D`.
+See Also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`sepFilter2D`

 .. index:: gpu::createDerivFilter_GPU

@ -538,7 +568,7 @@ gpu::createDerivFilter_GPU

    :param rowBorderType, columnBorderType: Pixel extrapolation method. See  :c:func:`borderInterpolate` for details.

-See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createDerivFilter`.
+See Also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createDerivFilter`

 .. index:: gpu::Sobel

@ -564,7 +594,7 @@ gpu::Sobel

    :param rowBorderType, columnBorderType: Pixel extrapolation method. See  :c:func:`borderInterpolate` for details.

-See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Sobel`.
+See Also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Sobel`

 .. index:: gpu::Scharr

@ -588,7 +618,7 @@ gpu::Scharr

    :param rowBorderType, columnBorderType: Pixel extrapolation method. For details, see  :c:func:`borderInterpolate`  and :c:func:`Scharr` .

-See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Scharr`.
+See Also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Scharr`

 .. index:: gpu::createGaussianFilter_GPU

@ -608,7 +638,7 @@ gpu::createGaussianFilter_GPU

    :param rowBorderType, columnBorderType: Border type to use. See  :c:func:`borderInterpolate` for details.

-See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createGaussianFilter`.
+See Also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createGaussianFilter`

 .. index:: gpu::GaussianBlur

@ -628,7 +658,7 @@ gpu::GaussianBlur

    :param rowBorderType, columnBorderType: Pixel extrapolation method. See  :c:func:`borderInterpolate` for details.

-See also: :cpp:func:`gpu::createGaussianFilter_GPU`, :c:func:`GaussianBlur`.
+See Also: :cpp:func:`gpu::createGaussianFilter_GPU`, :c:func:`GaussianBlur`

 .. index:: gpu::getMaxFilter_GPU

@ -646,7 +676,9 @@ gpu::getMaxFilter_GPU

    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.

 .. index:: gpu::getMinFilter_GPU

@ -664,4 +696,6 @@ gpu::getMinFilter_GPU

    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.

-	**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
+	**Note:** 
+	
+	This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
--- a/modules/gpu/doc/image_processing.rst
+++ b/modules/gpu/doc/image_processing.rst
@ -12,7 +12,7 @@ gpu::meanShiftFiltering
   TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER
   + TermCriteria::EPS, 5, 1))

-    Performs mean-shift filtering for each point of the source image. It maps each point of the source image into another point. As a result, we have a new color and new position of each point.
+    Performs mean-shift filtering for each point of the source image. It maps each point of the source image into another point. As a result, you have a new color and new position of each point.

    :param src: Source image. Only  ``CV_8UC4`` images are supported for now.

@ -22,7 +22,7 @@ gpu::meanShiftFiltering

    :param sr: Color window radius.

-    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
+    :param criteria: Termination criteria. See :c:class:`TermCriteria`.

 .. index:: gpu::meanShiftProc

@ -44,10 +44,10 @@ gpu::meanShiftProc

    :param sr: Color window radius.

-    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
+    :param criteria: Termination criteria. See :c:class:`TermCriteria`.

 See Also:
-:c:func:`gpu::meanShiftFiltering` .
+:c:func:`gpu::meanShiftFiltering` 

 .. index:: gpu::meanShiftSegmentation

@ -55,7 +55,7 @@ gpu::meanShiftSegmentation
 ------------------------------
 .. cpp:function:: void gpu::meanShiftSegmentation(const GpuMat& src, Mat& dst, int sp, int sr, int minsize, TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1))

-    Performs a mean-shift segmentation of the source image and eleminates small segments.
+    Performs a mean-shift segmentation of the source image and eliminates small segments.

    :param src: Source image. Only  ``CV_8UC4`` images are supported for now.

@ -67,7 +67,7 @@ gpu::meanShiftSegmentation

    :param minsize: Minimum segment size. Smaller segements are merged.

-    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
+    :param criteria: Termination criteria. See :c:class:`TermCriteria`.

 .. index:: gpu::integral

@ -86,7 +86,7 @@ gpu::integral
    :param sqsum: Squared integral image of the  ``CV_32FC1`` type.

 See Also:
-:c:func:`integral` .
+:c:func:`integral` 

 .. index:: gpu::sqrIntegral

@ -124,7 +124,7 @@ gpu::cornerHarris

    :param src: Source image. Only  ``CV_8UC1`` and  ``CV_32FC1`` images are supported for now.

-    :param dst: Destination image containing cornerness values. The size is the same. The type is ``CV_32FC1`` .
+    :param dst: Destination image containing cornerness values. The size is the same??. The type is ``CV_32FC1`` .

    :param blockSize: Neighborhood size.

@ -135,7 +135,7 @@ gpu::cornerHarris
    :param borderType: Pixel extrapolation method. Only  ``BORDER_REFLECT101`` and  ``BORDER_REPLICATE`` are supported for now.

 See Also:
-:c:func:`cornerHarris` .
+:c:func:`cornerHarris` 

 .. index:: gpu::cornerMinEigenVal

@ -159,7 +159,7 @@ gpu::cornerMinEigenVal

    :param borderType: Pixel extrapolation method. Only ``BORDER_REFLECT101`` and ``BORDER_REPLICATE`` are supported for now.

-See also: :c:func:`cornerMinEigenVal`.
+See also: :c:func:`cornerMinEigenVal`

 .. index:: gpu::mulSpectrums

@ -183,7 +183,7 @@ gpu::mulSpectrums
    Only full (not packed) ``CV_32FC2`` complex spectrums in the interleaved format are supported for now.

 See Also:
-:c:func:`mulSpectrums` .
+:c:func:`mulSpectrums` 

 .. index:: gpu::mulAndScaleSpectrums

@ -209,7 +209,7 @@ gpu::mulAndScaleSpectrums
    Only full (not packed) ``CV_32FC2`` complex spectrums in the interleaved format are supported for now.

 See Also:
-:c:func:`mulSpectrums` .
+:c:func:`mulSpectrums` 

 .. index:: gpu::dft

@ -217,13 +217,13 @@ gpu::dft
 ------------
 .. cpp:function:: void gpu::dft(const GpuMat& src, GpuMat& dst, Size dft_size, int flags=0)

-    Performs a forward or inverse discrete Fourier transform (1D or 2D) of the floating point matrix. Use to handle real matrices (CV32FC1) and complex matrices in the interleaved format (CV32FC2).
+    Performs a forward or inverse discrete Fourier transform (1D or 2D) of the floating point matrix. Use to handle real matrices (``CV32FC1``) and complex matrices in the interleaved format (``CV32FC2``).

    :param src: Source matrix (real or complex).

    :param dst: Destination matrix (real or complex).

-    :param dft_size: Size of discrete Fourier transform.
+    :param dft_size: Size of a discrete Fourier transform.

    :param flags: Optional flags:

@ -231,7 +231,7 @@ gpu::dft

            * **DFT_SCALE** Scale the result: divide it by the number of elements in the transform (obtained from  ``dft_size`` ).

-            * **DFT_INVERSE** Invert DFT. Use for complex-complex cases (real-complex and complex-real cases are respectively forward and inverse always).
+            * **DFT_INVERSE** Invert DFT. Use for complex-complex cases (real-complex and complex-real cases are always forward and inverse, respectively).

            * **DFT_REAL_OUTPUT** Specify the output as real. The source matrix is the result of real-complex transform, so the destination matrix must be real.
            
@ -239,7 +239,7 @@ gpu::dft
    The source matrix should be continuous, otherwise reallocation and data copying is performed. The function chooses an operation mode depending on the flags, size, and channel count of the source matrix:

    *
-        If the source matrix is complex and the output is not specified as real, the destination matrix is complex, has the ``dft_size``    size and ``CV_32FC2``    type. The destination matrix contains a full result of the DFT (forward or inverse).
+        If the source matrix is complex and the output is not specified as real, the destination matrix is complex and has the ``dft_size``    size and ``CV_32FC2``    type. The destination matrix contains a full result of the DFT (forward or inverse).

    *
        If the source matrix is complex and the output is specified as real, the function assumes that its input is the result of the forward transform (see next item). The destionation matrix has the ``dft_size``    size and ``CV_32FC1``    type. It contains the result of the inverse DFT.
@ -248,7 +248,7 @@ gpu::dft
        If the source matrix is real (its type is ``CV_32FC1``    ), forward DFT is performed. The result of the DFT is packed into complex ( ``CV_32FC2``    ) matrix. So, the width of the destination matrix is ``dft_size.width / 2 + 1``    . But if the source is a single column, the height is reduced instead of the width.

 See Also:
-:c:func:`dft` .
+:c:func:`dft` 

 .. index:: gpu::convolve

@ -260,7 +260,7 @@ gpu::convolve
 .. cpp:function:: void gpu::convolve(const GpuMat& image, const GpuMat& templ, GpuMat& result,
   bool ccorr, ConvolveBuf& buf)

-    Computes convolution (or cross-correlation) of two images.
+    Computes a convolution (or cross-correlation) of two images.

    :param image: Source image. Only  ``CV_32FC1`` images are supported for now.

@ -280,7 +280,7 @@ gpu::ConvolveBuf
 ----------------
 .. cpp:class:: gpu::ConvolveBuf

-    Provides a memory buffer for the
+This class provides a memory buffer for the
    :c:func:`gpu::convolve` function. 
 ::

@ -302,7 +302,7 @@ gpu::ConvolveBuf::ConvolveBuf
 ---------------------------------
 .. cpp:function:: ConvolveBuf::ConvolveBuf()

-    Constructs an empty buffer that will be properly resized after the first call of the 
+    Constructs an empty buffer that is properly resized after the first call of the 
    :c:func:`convolve` function.

 .. cpp:function:: ConvolveBuf::ConvolveBuf(Size image_size, Size templ_size)
@ -329,20 +329,20 @@ gpu::matchTemplate

    The following methods are supported for the ``CV_8U`` depth images for now:

-    * CV_TM_SQDIFF
-    * CV_TM_SQDIFF_NORMED
-    * CV_TM_CCORR
-    * CV_TM_CCORR_NORMED
-    * CV_TM_CCOEFF
-    * CV_TM_CCOEFF_NORMED
+    * ``CV_TM_SQDIFF``
+    * ``CV_TM_SQDIFF_NORMED``
+    * ``CV_TM_CCORR``
+    * ``CV_TM_CCORR_NORMED``
+    * ``CV_TM_CCOEFF``
+    * ``CV_TM_CCOEFF_NORMED``

    The following methods are supported for the ``CV_32F`` images for now:

-    * CV_TM_SQDIFF
-    * CV_TM_CCORR
+    * ``CV_TM_SQDIFF``
+    * ``CV_TM_CCORR``

 See Also:
-:c:func:`matchTemplate` .
+:c:func:`matchTemplate` 

 .. index:: gpu::remap

@ -362,13 +362,13 @@ gpu::remap

    The function transforms the source image using the specified map:

-    .. math::
+.. math::

-        \texttt{dst} (x,y) =  \texttt{src} (xmap(x,y), ymap(x,y))
+    \texttt{dst} (x,y) =  \texttt{src} (xmap(x,y), ymap(x,y))

-    Values of pixels with non-integer coordinates are computed using bilinear interpolation.
+    Values of pixels with non-integer coordinates are computed using bilinear the interpolation.

-See Also: :c:func:`remap` .
+See Also: :c:func:`remap` 

 .. index:: gpu::cvtColor

@ -378,7 +378,7 @@ gpu::cvtColor

 .. cpp:function:: void gpu::cvtColor(const GpuMat& src, GpuMat& dst, int code, int dcn, const Stream& stream)

-    Converts image from one color space to another.
+    Converts an image from one color space to another.

    :param src: Source image with  ``CV_8U``, ``CV_16U``, or  ``CV_32F`` depth and 1, 3, or 4 channels.

@ -390,10 +390,10 @@ gpu::cvtColor

    :param stream: Stream for the asynchronous version.

-    3-channel color spaces (like ``HSV``,``XYZ``, and so on) can be stored to a 4-channel image for better perfomance.
+    3-channel color spaces (like ``HSV``, ``XYZ``, and so on) can be stored in a 4-channel image for better perfomance.

 See Also:
-:func:`cvtColor` .
+:func:`cvtColor` 

 .. index:: gpu::threshold

@ -418,7 +418,7 @@ gpu::threshold
    :param stream: Stream for the asynchronous version.

 See Also:
-:func:`threshold` .
+:func:`threshold` 

 .. index:: gpu::resize

@ -428,7 +428,7 @@ gpu::resize

    Resizes an image.

-    :param src: Source image. Supports  the ``CV_8UC1`` and  ``CV_8UC4`` types.
+    :param src: Source image.  ``CV_8UC1`` and  ``CV_8UC4`` types are supported.

    :param dst: Destination image  with the same type as  ``src`` . The size is ``dsize`` (when it is non-zero) or the size is computed from  ``src.size()``, ``fx``, and  ``fy`` .

@ -453,7 +453,7 @@ gpu::resize

    :param interpolation: Interpolation method. Only  ``INTER_NEAREST`` and  ``INTER_LINEAR`` are supported.

-See Also: :func:`resize` .
+See Also: :func:`resize` 

 .. index:: gpu::warpAffine

@ -463,7 +463,7 @@ gpu::warpAffine

    Applies an affine transformation to an image.

-    :param src: Source image. Supports  ``CV_8U``, ``CV_16U``, ``CV_32S``, or  ``CV_32F`` depth and 1, 3, or 4 channels.
+    :param src: Source image.  ``CV_8U``, ``CV_16U``, ``CV_32S``, or  ``CV_32F`` depth and 1, 3, or 4 channels are supported.

    :param dst: Destination image with the same type as  ``src`` . The size is  ``dsize`` . 

@ -471,10 +471,10 @@ gpu::warpAffine

    :param dsize: Size of the destination image.

-    :param flags: Combination of interpolation methods (see  :func:`resize` ) and the optional flag  ``WARP_INVERSE_MAP`` specifying that  ``M`` is the inverse transformation (``dst=>src``). Only ``INTER_NEAREST``, ``INTER_LINEAR``, and  ``INTER_CUBIC`` interpolation methods are supported.
+    :param flags: Combination of interpolation methods (see  :func:`resize`) and the optional flag  ``WARP_INVERSE_MAP`` specifying that  ``M`` is an inverse transformation (``dst=>src``). Only ``INTER_NEAREST``, ``INTER_LINEAR``, and  ``INTER_CUBIC`` interpolation methods are supported.

 See Also:
-:func:`warpAffine` .
+:func:`warpAffine` 

 .. index:: gpu::warpPerspective

@ -495,7 +495,7 @@ gpu::warpPerspective
    :param flags: Combination of interpolation methods (see  :func:`resize` ) and the optional flag  ``WARP_INVERSE_MAP`` specifying that  ``M`` is the inverse transformation (``dst => src``). Only  ``INTER_NEAREST``, ``INTER_LINEAR``, and  ``INTER_CUBIC`` interpolation methods are supported.

 See Also:
-:func:`warpPerspective` .
+:func:`warpPerspective` 

 .. index:: gpu::rotate

@ -505,7 +505,7 @@ gpu::rotate

    Rotates an image around the origin (0,0) and then shifts it.

-    :param src: Source image. Supports  ``CV_8UC1`` and  ``CV_8UC4`` types.
+    :param src: Source image.  ``CV_8UC1`` and  ``CV_8UC4`` types are supported.

    :param dst: Destination image with the same type as  ``src`` . The size is  ``dsize`` . 

@ -520,7 +520,7 @@ gpu::rotate
    :param interpolation: Interpolation method. Only  ``INTER_NEAREST``, ``INTER_LINEAR``, and  ``INTER_CUBIC`` are supported.

 See Also:
-:func:`gpu::warpAffine` .
+:func:`gpu::warpAffine` 

 .. index:: gpu::copyMakeBorder

@ -532,7 +532,7 @@ gpu::copyMakeBorder

    :param src: Source image. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and  ``CV_32FC1`` types are supported.

-    :param dst: Destination image with the same type as  ``src`` . The size is  ``Size(src.cols+left+right, src.rows+top+bottom)`` .
+    :param dst: Destination image with the same type as  ``src``. The size is  ``Size(src.cols+left+right, src.rows+top+bottom)`` .

    :param top, bottom, left, right: Number of pixels in each direction from the source image rectangle to extrapolate. For example:  ``top=1, bottom=1, left=1, right=1`` mean that 1 pixel-wide border needs to be built.

@ -540,6 +540,7 @@ gpu::copyMakeBorder

 See Also:
 :func:`copyMakeBorder`
+
 .. index:: gpu::rectStdDev

 gpu::rectStdDev
--- a/modules/gpu/doc/initalization_and_information.rst
+++ b/modules/gpu/doc/initalization_and_information.rst
@ -17,7 +17,7 @@ gpu::setDevice
 ------------------
 .. cpp:function:: void setDevice(int device)

-    Sets a device and initializes it for the current thread. If call of this function is omitted, a default device is initialized at the fist GPU usage.
+    Sets a device and initializes it for the current thread. If the call of this function is omitted, a default device is initialized at the fist GPU usage.

    :param device: System index of a GPU device starting with 0.

@ -27,7 +27,7 @@ gpu::getDevice
 ------------------
 .. cpp:function:: int getDevice()

-    Returns the current device index that was set by {gpu::getDevice} or initialized by default.
+    Returns the current device index that was set by ``{gpu::getDevice}`` or initialized by default.

 .. index:: gpu::GpuFeature

@ -81,7 +81,7 @@ This class provides functionality for querying the specified GPU properties.
 .. Comment: two lines below look like a bug

 gpu::DeviceInfo::DeviceInfo
------------------------------- ``_``
+------------------------------- 
 .. cpp:function:: gpu::DeviceInfo::DeviceInfo()

 .. cpp:function:: gpu::DeviceInfo::DeviceInfo(int device_id)
@ -144,7 +144,7 @@ gpu::DeviceInfo::supports
 -----------------------------
 .. cpp:function:: bool gpu::DeviceInfo::supports(GpuFeature feature)

-    Provides information on GPU feature support. This function returns true if the device has the specified GPU feature, otherwise returns false.
+    Provides information on GPU feature support. This function returns true if the device has the specified GPU feature. Otherwise, it returns false.

    :param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.

@ -154,7 +154,7 @@ gpu::DeviceInfo::isCompatible
 ---------------------------------
 .. cpp:function:: bool gpu::DeviceInfo::isCompatible()

-    Checks the GPU module and device compatibility. This function returns true if the GPU module can be run on the specified device, otherwise returns false.
+    Checks the GPU module and device compatibility. This function returns ``true`` if the GPU module can be run on the specified device. Otherwise, it returns false.

 .. index:: gpu::TargetArchs

@ -164,13 +164,13 @@ gpu::TargetArchs
 ----------------
 .. cpp:class:: gpu::TargetArchs

-This class provides a set of static methods to check what NVIDIA card architecture the GPU module was built for.
+This class provides a set of static methods to check what NVIDIA* card architecture the GPU module was built for.

 The following method checks whether the module was built with the support of the given feature:

-.. cpp:function:: static bool gpu::TargetArchs::builtWith(GpuFeature feature)
+	.. cpp:function:: static bool gpu::TargetArchs::builtWith(GpuFeature feature)

-    :param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.
+		:param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.

 There is a set of methods to check whether the module contains intermediate (PTX) or binary GPU code for the given architecture(s):

@ -192,7 +192,7 @@ There is a set of methods to check whether the module contains intermediate (PTX

        :param minor: Minor compute capability version.

-    According to the CUDA C Programming Guide Version 3.2: "PTX code produced for some specific compute capability can always be compiled to binary code of greater or equal compute capability".
+According to the CUDA C Programming Guide Version 3.2: "PTX code produced for some specific compute capability can always be compiled to binary code of greater or equal compute capability".


 .. index:: gpu::MultiGpuManager
@ -201,7 +201,7 @@ gpu::MultiGpuManager
 --------------------
 .. cpp:class:: gpu::MultiGpuManager

-Provides functionality for working with many GPUs. ::
+This class provides functionality for working with many GPUs. ::

    class MultiGpuManager
    {
@ -229,7 +229,7 @@ gpu::MultiGpuManager::MultiGpuManager
 ----------------------------------------
 .. cpp:function:: gpu::MultiGpuManager::MultiGpuManager()

-    Creates multi GPU manager, but doesn't initialize it.
+    Creates a multi-GPU manager but does not initialize it.



@ -239,7 +239,7 @@ gpu::MultiGpuManager::~MultiGpuManager
 ----------------------------------------
 .. cpp:function:: gpu::MultiGpuManager::~MultiGpuManager()

-    Releases multi GPU manager.
+    Releases a multi-GPU manager.



@ -249,7 +249,7 @@ gpu::MultiGpuManager::init
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::init()

-    Initializes multi GPU manager.
+    Initializes a multi-GPU manager.



@ -259,9 +259,9 @@ gpu::MultiGpuManager::gpuOn
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::gpuOn(int gpu_id)

-    Makes the given GPU active.
+    Activates the given GPU.
    
-    :param gpu_id: Index of the GPU device in system starting with 0.
+    :param gpu_id: System index of the GPU device starting with 0.



@ -271,5 +271,5 @@ gpu::MultiGpuManager::gpuOff
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::gpuOff()

-    Finishes the piece of work on the current GPU.
+    Finishes a piece of work on the current GPU.

--- a/modules/gpu/doc/introduction.rst
+++ b/modules/gpu/doc/introduction.rst
@ -6,35 +6,35 @@ GPU Module Introduction
 General Information
 -------------------

-The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVidia* CUDA Runtime API and supports only NVidia GPUs. The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others), ready to be used by the application developers.
+The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVIDIA* CUDA Runtime API and supports only NVIDIA GPUs. The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others), ready to be used by the application developers.

 The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.

-The GPU module depends on the CUDA Toolkit and NVidia Performance Primitives library (NPP). Make sure you have the latest versions of this software installed. You can download two libraries for all supported platforms from the NVidia site. To compile the OpenCV GPU module, you need a compiler compatible with the Cuda Runtime Toolkit.
+The GPU module depends on the CUDA Toolkit and NVIDIA Performance Primitives library (NPP). Make sure you have the latest versions of this software installed. You can download two libraries for all supported platforms from the NVIDIA site. To compile the OpenCV GPU module, you need a compiler compatible with the CUDA Runtime Toolkit.

 The OpenCV GPU module is designed for ease of use and does not require any knowledge of CUDA. Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest performance. It is helpful to understand the cost of various operations, what the GPU does, what the preferred data formats are, and so on. The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if your algorithm involves many simple operations, then, for the best possible performance, you may still need to write your own kernels to avoid extra write and read operations on the intermediate results.

-To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built. Otherwise, the module is still built, but at runtime all functions from the module throw
+To enable CUDA support, configure OpenCV using ``CMake`` with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built. Otherwise, the module is still built, but at runtime all functions from the module throw
 :func:`Exception` with ``CV_GpuNotSupported`` error code, except for
-:func:`gpu::getCudaEnabledDeviceCount()`. The latter function returns zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using
-:func:`gpu::getCudaEnabledDeviceCount()` function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
+:func:`gpu::getCudaEnabledDeviceCount()`. The latter function returns zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using the
+:func:`gpu::getCudaEnabledDeviceCount()` function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose an appropriate implementation (CPU or GPU) accordingly.

-Compilation for Different NVidia* Platforms
+Compilation for Different NVIDIA* Platforms
 -------------------------------------------

-NVidia* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
+NVIDIA* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.

 At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails.
 By default, the OpenCV GPU module includes:

 *
-    Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN``     in CMake)
+    Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN``     in ``CMake``)

 *
-    PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX``     in CMake)
+    PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX``     in ``CMake``)

 This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT'ed. For devices with CC 1.0, no code is available and the functions throw
-:func:`Exception`. For platforms where JIT compilation is performed first, run is slow.
+:func:`Exception`. For platforms where JIT compilation is performed first, the run is slow.

 On a GPU with CC 1.0, you can still compile the GPU module and most of the functions will run flawlessly. To achieve this, add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"`` . The functions that cannot be run on CC 1.0 GPUs throw an exception.

@ -44,7 +44,7 @@ You can always determine at runtime whether the OpenCV GPU-built binaries (or PT
 Threading and Multi-threading
 ------------------------------

-The OpenCV GPU module follows the CUDA Runtime API conventions regarding the multi-threaded programming. This means that for the first API call a CUDA context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as memory allocation, GPU code compilation, are associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread cannot be accessed by the other thread. Instead, for this other thread CUDA creates another context associated with it. In short, by default, different threads do not share resources.
+The OpenCV GPU module follows the CUDA Runtime API conventions regarding the multi-threaded programming. This means that for the first API call a CUDA context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as a memory allocation, GPU code compilation, are associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread cannot be accessed by the other thread. Instead, for this other thread CUDA creates another context associated with it. In short, by default, different threads do not share resources.

 But you can remove this limitation by using the CUDA Driver API (version 3.1 or later). You can retrieve context reference for one thread, attach it to another thread, and make it "current" for that thread. As a result, the threads can share memory and other resources. It is also possible to create a context explicitly before calling any GPU code and attach it to all the threads you want to share the resources with.

@ -56,7 +56,7 @@ Utilizing Multiple GPUs
 In the current version, each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, you have to manually distribute the work between GPUs. Here are the two ways of utilizing multiple GPUs:

 *
-    If you only use synchronous functions, create several CPU threads (one per each GPU) and from within each thread create a CUDA context for the corresponding GPU using
+    If you use only synchronous functions, create several CPU threads (one per each GPU) and from within each thread create a CUDA context for the corresponding GPU using
    :func:`gpu::setDevice()`     or Driver API. Each of the threads will use the associated GPU.

 *
--- a/modules/gpu/doc/matrix_reductions.rst
+++ b/modules/gpu/doc/matrix_reductions.rst
@ -17,7 +17,7 @@ gpu::meanStdDev

    :param stddev: Standard deviation value.

-See Also: :c:func:`meanStdDev` .
+See Also: :c:func:`meanStdDev` 

 .. index:: gpu::norm

@ -37,7 +37,7 @@ gpu::norm

    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.

-See Also: :c:func:`norm`.
+See Also: :c:func:`norm`

 .. index:: gpu::sum

@ -53,7 +53,7 @@ gpu::sum

    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.

-See Also: :c:func:`sum` .
+See Also: :c:func:`sum` 

 .. index:: gpu::absSum

@ -103,9 +103,9 @@ gpu::minMax

    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.

-	The Function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
+The function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
 	
-See Also: :c:func:`minMaxLoc` .
+See Also: :c:func:`minMaxLoc` 

 .. index:: gpu::minMaxLoc

@ -135,7 +135,7 @@ gpu::minMaxLoc

 	The function does not work with ``CV_64F`` images on GPU with the compute capability < 1.3.

-See Also: :c:func:`minMaxLoc` .
+See Also: :c:func:`minMaxLoc` 

 .. index:: gpu::countNonZero

@ -153,4 +153,4 @@ gpu::countNonZero

 	The function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
 	
-	See Also: :c:func:`countNonZero` .
+	See Also: :c:func:`countNonZero` 
--- a/modules/gpu/doc/object_detection.rst
+++ b/modules/gpu/doc/object_detection.rst
@ -9,7 +9,7 @@ gpu::HOGDescriptor
 ------------------
 .. cpp:class:: gpu::HOGDescriptor

-     Provides a histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector.
+This class provides a histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector.
 ::

    struct CV_EXPORTS HOGDescriptor
@ -61,7 +61,7 @@ gpu::HOGDescriptor
    }


-	Interfaces of all methods are kept similar to the ``CPU HOG`` descriptor and detector analogues as much as possible.
+Interfaces of all methods are kept similar to the ``CPU HOG`` descriptor and detector analogues as much as possible.

 .. index:: gpu::HOGDescriptor::HOGDescriptor

@ -150,17 +150,17 @@ gpu::HOGDescriptor::detect
   vector<Point>\& found_locations, double hit_threshold=0,
   Size win_stride=Size(), Size padding=Size())

-    Performs object detection without a multi-scale window.
+	Performs object detection without a multi-scale window.

-    :param img: Source image.  ``CV_8UC1``  and  ``CV_8UC4`` types are supported for now.
+	:param img: Source image.  ``CV_8UC1``  and  ``CV_8UC4`` types are supported for now.

-    :param found_locations: Left-top corner points of detected objects boundaries.
+	:param found_locations: Left-top corner points of detected objects boundaries.

    :param hit_threshold: Threshold for the distance between features and SVM classifying plane. Usually it is 0 and should be specfied in the detector coefficients (as the last free coefficient). But if the free coefficient is omitted (which is allowed), you can specify it manually here.

-    :param win_stride: Window stride. It must be a multiple of block stride.
+	:param win_stride: Window stride. It must be a multiple of block stride.

-    :param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
+	:param padding: Mock parameter to keep the CPU interface compatibility. It must be (0,0).

 .. index:: gpu::HOGDescriptor::detectMultiScale

@ -171,7 +171,7 @@ gpu::HOGDescriptor::detectMultiScale
   Size win_stride=Size(), Size padding=Size(),
   double scale0=1.05, int group_threshold=2)

-    Performs object detection with a multi-scale window.
+	Performs object detection with a multi-scale window.

    :param img: Source image. See  :func:`gpu::HOGDescriptor::detect`  for type limitations.

@ -181,12 +181,11 @@ gpu::HOGDescriptor::detectMultiScale

    :param win_stride: Window stride. It must be a multiple of block stride.

-    :param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
+    :param padding: Mock parameter to keep the CPU interface compatibility. It must be (0,0).

    :param scale0: Coefficient of the detection window increase.

-    :param group_threshold: Coefficient to regulate the similarity threshold. When detected, some objects can be covered by many rectangles. 0 means not to perform grouping. See
-    :func:`groupRectangles` .
+    :param group_threshold: Coefficient to regulate the similarity threshold. When detected, some objects can be covered by many rectangles. 0 means not to perform grouping. See  :func:`groupRectangles` .

 .. index:: gpu::HOGDescriptor::getDescriptors

@ -217,7 +216,7 @@ gpu::CascadeClassifier_GPU
 --------------------------
 .. cpp:class:: gpu::CascadeClassifier_GPU

-    The cascade classifier class used for object detection. 
+This cascade classifier class is used for object detection. 
 ::

    class CV_EXPORTS CascadeClassifier_GPU
@ -252,7 +251,7 @@ gpu::CascadeClassifier_GPU::CascadeClassifier_GPU

    Loads the classifier from a file.

-    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
+    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the ``haar`` training application) and NVIDIA's ``nvbin`` are supported.

 .. index:: gpu::CascadeClassifier_GPU::empty

@ -274,7 +273,7 @@ gpu::CascadeClassifier_GPU::load

    Loads the classifier from a file. The previous content is destroyed.

-    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
+    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the ``haar`` training application) and NVIDIA's ``nvbin`` are supported.

 .. index:: gpu::CascadeClassifier_GPU::release

@ -294,7 +293,7 @@ gpu::CascadeClassifier_GPU::detectMultiScale

    :param image: Matrix of type  ``CV_8U``  containing an image where objects should be detected.

-    :param objects: Buffer to store detected objects (rectangles). If it is empty, it is allocated with the default size. If not empty, the function searches not more than N objects, where N = sizeof(objectsBufer's data)/sizeof(cv::Rect).
+    :param objects: Buffer to store detected objects (rectangles). If it is empty, it is allocated with the default size. If not empty, the function searches not more than N objects, where ``N = sizeof(objectsBufer's data)/sizeof(cv::Rect)``.

    :param scaleFactor: Value to specify how much the image size is reduced at each image scale.

@ -302,7 +301,8 @@ gpu::CascadeClassifier_GPU::detectMultiScale

    :param minSize: Minimum possible object size. Objects smaller than that are ignored.

-    The function returns the number of detected objects, so you can retrieve them as in the following example: ::
+    The function returns the number of detected objects, so you can retrieve them as in the following example: 
+::

    gpu::CascadeClassifier_GPU cascade_gpu(...);

@ -324,5 +324,5 @@ gpu::CascadeClassifier_GPU::detectMultiScale
    imshow("Faces", image_cpu);


-See Also: :c:func:`CascadeClassifier::detectMultiScale` .
+See Also: :c:func:`CascadeClassifier::detectMultiScale` 

--- a/modules/gpu/doc/operations_on_matrices.rst
+++ b/modules/gpu/doc/operations_on_matrices.rst
@ -16,7 +16,7 @@ gpu::transpose
    :param dst: Destination matrix.

 See Also:
-:c:func:`transpose` .
+:c:func:`transpose` 

 .. index:: gpu::flip

@ -40,7 +40,7 @@ gpu::flip
            

 See Also:
-:c:func:`flip` .
+:c:func:`flip` 

 .. index:: gpu::LUT

@ -52,12 +52,12 @@ gpu::LUT

    :param src: Source matrix.  ``CV_8UC1``  and  ``CV_8UC3``  matrices are supported for now.

-    :param lut: Look-up table of 256 elements. Must be continuous, ``CV_8U`` matrix.
+    :param lut: Look-up table of 256 elements. It is a continuous ``CV_8U`` matrix.

-    :param dst: Destination matrix with the same depth as  ``lut``  and the same number of channels as  ``src`` .
+    :param dst: Destination matrix with the same depth as  ``lut``  and the same number of channels as  ``src``.
            

-See Also: :c:func:`LUT` .
+See Also: :c:func:`LUT` 

 .. index:: gpu::merge

@ -81,7 +81,7 @@ gpu::merge

    :param stream: Stream for the asynchronous version.

-See Also: :c:func:`merge` .
+See Also: :c:func:`merge` 

 .. index:: gpu::split

@ -103,7 +103,7 @@ gpu::split

    :param stream: Stream for the asynchronous version.

-See Also: :c:func:`split`.
+See Also: :c:func:`split`

 .. index:: gpu::magnitude

@ -119,16 +119,16 @@ gpu::magnitude

    :param xy: Source complex matrix in the interleaved format (``CV_32FC2``).
    
-    :param x: Source matrix, containing real components (``CV_32FC1``).
+    :param x: Source matrix containing real components (``CV_32FC1``).

-    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
+    :param y: Source matrix containing imaginary components (``CV_32FC1``).

    :param magnitude: Destination matrix of float magnitudes (``CV_32FC1``).

    :param stream: Stream for the asynchronous version.

 See Also:
-:c:func:`magnitude` .
+:c:func:`magnitude` 

 .. index:: gpu::magnitudeSqr

@ -144,9 +144,9 @@ gpu::magnitudeSqr

    :param xy: Source complex matrix in the interleaved format (``CV_32FC2``).

-    :param x: Source matrix, containing real components (``CV_32FC1``).
+    :param x: Source matrix containing real components (``CV_32FC1``).

-    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
+    :param y: Source matrix containing imaginary components (``CV_32FC1``).

    :param magnitude: Destination matrix of float magnitude squares (``CV_32FC1``).

@ -173,7 +173,7 @@ gpu::phase
    :param stream: Stream for the asynchronous version.

 See Also:
-:c:func:`phase` .
+:c:func:`phase` 

 .. index:: gpu::cartToPolar

@ -198,7 +198,7 @@ gpu::cartToPolar
    :param stream: Stream for the asynchronous version.

 See Also:
-:c:func:`cartToPolar` .
+:c:func:`cartToPolar` 

 .. index:: gpu::polarToCart

@ -223,4 +223,4 @@ gpu::polarToCart
    :param stream: Stream for the asynchronous version.

 See Also:
-:c:func:`polarToCart` .
+:c:func:`polarToCart` 
--- a/modules/gpu/doc/per_element_operations.rst
+++ b/modules/gpu/doc/per_element_operations.rst
@ -15,13 +15,13 @@ gpu::add

    Computes a matrix-matrix or matrix-scalar sum.

-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.

    :param src2: Second source matrix or a scalar to be added to ``src1``.

    :param dst: Destination matrix with the same size and type as ``src1``.

-See Also: :c:func:`add`.
+See Also: :c:func:`add`

 .. index:: gpu::subtract

@ -31,15 +31,15 @@ gpu::subtract

 .. cpp:function:: void gpu::subtract(const GpuMat& src1, const Scalar& src2, GpuMat& dst)

-    Computes matrix-matrix or matrix-scalar difference.
+    Computes a matrix-matrix or matrix-scalar difference.

-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.

    :param src2: Second source matrix or a scalar to be subtracted from ``src1``.

    :param dst: Destination matrix with the same size and type as ``src1``.

-See Also: :c:func:`subtract`.
+See Also: :c:func:`subtract`



@ -53,13 +53,13 @@ gpu::multiply

    Computes a matrix-matrix or matrix-scalar per-element product.

-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.

    :param src2: Second source matrix or a scalar to be multiplied by ``src1`` elements.

    :param dst: Destination matrix with the same size and type as ``src1``.

-See Also: :c:func:`multiply`.
+See Also: :c:func:`multiply`


 .. index:: gpu::divide
@ -72,7 +72,7 @@ gpu::divide

    Computes a matrix-matrix or matrix-scalar sum.

-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.

    :param src2: Second source matrix or a scalar. The ``src1`` elements are divided by it.

@ -80,7 +80,7 @@ gpu::divide

 	This function, in contrast to :c:func:`divide`, uses a round-down rounding mode.

-See Also: :c:func:`divide`.
+See Also: :c:func:`divide`



@ -96,7 +96,7 @@ gpu::exp

    :param dst: Destination matrix with the same size and type as ``src``.

-See Also: :c:func:`exp`.
+See Also: :c:func:`exp`



@ -112,7 +112,7 @@ gpu::log

    :param dst: Destination matrix with the same size and type as ``src``.

-See Also: :c:func:`log`.
+See Also: :c:func:`log`



@ -132,7 +132,7 @@ gpu::absdiff

    :param dst: Destination matrix with the same size and type as ``src1``.

-See Also: :c:func:`absdiff`.
+See Also: :c:func:`absdiff`

 .. index:: gpu::compare

@ -157,7 +157,7 @@ gpu::compare
            * **CMP_LE:** ``src1(.) <= src2(.)``
            * **CMP_NE:** ``src1(.) != src2(.)``

-See Also: :c:func:`compare`.
+See Also: :c:func:`compare`


 .. index:: gpu::bitwise_not
@ -168,7 +168,7 @@ gpu::bitwise_not

 .. cpp:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask, const Stream& stream)

-    Performs per-element bitwise inversion.
+    Performs a per-element bitwise inversion.

    :param src: Source matrix.

@ -188,7 +188,7 @@ gpu::bitwise_or

 .. cpp:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)

-    Performs per-element bitwise disjunction of two matrices.
+    Performs a per-element bitwise disjunction of two matrices.

    :param src1: First source matrix.

@ -210,7 +210,7 @@ gpu::bitwise_and

 .. cpp:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)

-    Performs per-element bitwise conjunction of two matrices.
+    Performs a per-element bitwise conjunction of two matrices.

    :param src1: First source matrix.

@ -232,7 +232,7 @@ gpu::bitwise_xor

 .. cpp:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)

-    Performs per-element bitwise "exclusive or" of two matrices.
+    Performs a per-element bitwise "exclusive or" operation of two matrices.

    :param src1: First source matrix.

@ -268,7 +268,7 @@ gpu::min

    :param stream: Stream for the asynchronous version.

-See Also: :c:func:`min`.
+See Also: :c:func:`min`



@ -294,4 +294,4 @@ gpu::max

    :param stream: Stream for the asynchronous version.

-See Also: :c:func:`max`.
+See Also: :c:func:`max`