added videoio docs and tutorials

2015-12-17 10:16:10 +05:30
parent 8d79285d02
commit 24da1ba3dc
15 changed files with 30 additions and 25 deletions
--- a/doc/tutorials/videoio/images/video-input-psnr-ssim.png
+++ b/doc/tutorials/videoio/images/video-input-psnr-ssim.png
--- a/doc/tutorials/videoio/images/video-write.png
+++ b/doc/tutorials/videoio/images/video-write.png
--- a/doc/tutorials/videoio/table_of_content_videoio.markdown
+++ b/doc/tutorials/videoio/table_of_content_videoio.markdown
@@ -0,0 +1,19 @@
+Video Input and Output (videoio module) {#tutorial_table_of_content_videoio}
+=========================================
+
+This section contains tutorials about how to read/save your video files.
+
+-   @subpage tutorial_video_input_psnr_ssim
+
+    *Compatibility:* \> OpenCV 2.0
+
+    *Author:* Bernát Gábor
+
+    You will learn how to read video streams, and how to calculate similarity values such as PSNR
+    or SSIM.
+
+-   @subpage tutorial_video_write
+
+    *Compatibility:* \> OpenCV 2.0
+
+    *Author:* Bernát Gábor
--- a/doc/tutorials/videoio/video-input-psnr-ssim/images/outputVideoInput.png
+++ b/doc/tutorials/videoio/video-input-psnr-ssim/images/outputVideoInput.png
--- a/doc/tutorials/videoio/video-input-psnr-ssim/video_input_psnr_ssim.markdown
+++ b/doc/tutorials/videoio/video-input-psnr-ssim/video_input_psnr_ssim.markdown
@@ -0,0 +1,251 @@
+Video Input with OpenCV and similarity measurement {#tutorial_video_input_psnr_ssim}
+==================================================
+
+Goal
+----
+
+Today it is common to have a digital video recording system at your disposal. Therefore, you will
+eventually come to the situation that you no longer process a batch of images, but video streams.
+These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard
+disk drive stored files. Luckily OpenCV threats these two in the same manner, with the same C++
+class. So here's what you'll learn in this tutorial:
+
+-   How to open and read video streams
+-   Two ways for checking image similarity: PSNR and SSIM
+
+The source code
+---------------
+
+As a test case where to show off these using OpenCV I've created a small program that reads in two
+video files and performs a similarity check between them. This is something you could use to check
+just how well a new video compressing algorithms works. Let there be a reference (original) video
+like [this small Megamind clip
+](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/video/Megamind.avi) and [a compressed
+version of it ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/video/Megamind_bugy.avi).
+You may also find the source code and these video file in the
+`samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/` folder of the OpenCV source library.
+
+@include cpp/tutorial_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp
+
+How to read a video stream (online-camera or offline-file)?
+-----------------------------------------------------------
+
+Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture
+C++ class. This on itself builds on the FFmpeg open source library. This is a basic
+dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession
+of images, we refer to these in the literature as frames. In case of a video file there is a *frame
+rate* specifying just how long is between two frames. While for the video cameras usually there is a
+limit of just how many frames they can digitalize per second, this property is less important as at
+any time the camera sees the current snapshot of the world.
+
+The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do
+this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an
+integer then you will bind the class to a camera, a device. The number passed here is the ID of the
+device, assigned by the operating system. If you have a single camera attached to your system its ID
+will probably be zero and further ones increasing from there. If the parameter passed to these is a
+string it will refer to a video file, and the string points to the location and name of the file.
+For example, to the upper source code a valid command line is:
+@code{.bash}
+video/Megamind.avi video/Megamind_bug.avi  35 10
+@endcode
+We do a similarity check. This requires a reference and a test case video file. The first two
+arguments refer to this. Here we use a relative address. This means that the application will look
+into its current working directory and open the video folder and try to find inside this the
+*Megamind.avi* and the *Megamind_bug.avi*.
+@code{.cpp}
+const string sourceReference = argv[1],sourceCompareWith = argv[2];
+
+VideoCapture captRefrnc(sourceReference);
+// or
+VideoCapture captUndTst;
+captUndTst.open(sourceCompareWith);
+@endcode
+To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened
+function:
+@code{.cpp}
+if ( !captRefrnc.isOpened())
+  {
+  cout  << "Could not open reference " << sourceReference << endl;
+  return -1;
+  }
+@endcode
+Closing the video is automatic when the objects destructor is called. However, if you want to close
+it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just
+simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put
+them inside a *Mat* one. The video streams are sequential. You may get the frames one after another
+by the @ref cv::VideoCapture::read or the overloaded \>\> operator:
+@code{.cpp}
+Mat frameReference, frameUnderTest;
+captRefrnc >> frameReference;
+captUndTst.open(frameUnderTest);
+@endcode
+The upper read operations will leave empty the *Mat* objects if no frame could be acquired (either
+cause the video stream was closed or you got to the end of the video file). We can check this with a
+simple if:
+@code{.cpp}
+if( frameReference.empty()  || frameUnderTest.empty())
+{
+ // exit the program
+}
+@endcode
+A read method is made of a frame grab and a decoding applied on that. You may call explicitly these
+two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions.
+
+Videos have many-many information attached to them besides the content of the frames. These are
+usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to
+this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double
+values containing these properties. Use bitwise operations to decode the characters from a double
+type and conversions where valid values are only integers. Its single argument is the ID of the
+queried property. For example, here we get the size of the frames in the reference and test case
+video file; plus the number of frames inside the reference.
+@code{.cpp}
+Size refS = Size((int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
+                 (int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
+
+cout << "Reference frame resolution: Width=" << refS.width << "  Height=" << refS.height
+     << " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
+@endcode
+When you are working with videos you may often want to control these values yourself. To do this
+there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to
+change and there is a second of double type containing the value to be set. It will return true if
+it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time
+or frame:
+@code{.cpp}
+captRefrnc.set(CAP_PROP_POS_MSEC, 1.2);  // go to the 1.2 second in the video
+captRefrnc.set(CAP_PROP_POS_FRAMES, 10); // go to the 10th frame of the video
+// now a read operation would read the frame at the set position
+@endcode
+For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and
+@ref cv::VideoCapture::set functions.
+
+Image similarity - PSNR and SSIM
+--------------------------------
+
+We want to check just how imperceptible our video converting operation went, therefore we need a
+system to check frame by frame the similarity or differences. The most common algorithm used for
+this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out
+from the *mean squad error*. Let there be two images: I1 and I2; with a two dimensional size i and
+j, composed of c number of channels.
+
+\f[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\f]
+
+Then the PSNR is expressed as:
+
+\f[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\f]
+
+Here the \f$MAX_I^2\f$ is the maximum valid value for a pixel. In case of the simple single byte image
+per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in
+an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as
+we'll need to handle this case separately. The transition to a logarithmic scale is made because the
+pixel values have a very wide dynamic range. All this translated to OpenCV and a C++ function looks
+like:
+@code{.cpp}
+double getPSNR(const Mat& I1, const Mat& I2)
+{
+ Mat s1;
+ absdiff(I1, I2, s1);       // |I1 - I2|
+ s1.convertTo(s1, CV_32F);  // cannot make a square on 8 bits
+ s1 = s1.mul(s1);           // |I1 - I2|^2
+
+ Scalar s = sum(s1);        // sum elements per channel
+
+ double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels
+
+ if( sse <= 1e-10) // for small values return zero
+     return 0;
+ else
+ {
+     double  mse =sse /(double)(I1.channels() * I1.total());
+     double psnr = 10.0*log10((255*255)/mse);
+     return psnr;
+ }
+}
+@endcode
+Typically result values are anywhere between 30 and 50 for video compression, where higher is
+better. If the images significantly differ you'll get much lower ones like 15 and so. This
+similarity check is easy and fast to calculate, however in practice it may turn out somewhat
+inconsistent with human eye perception. The **structural similarity** algorithm aims to correct
+this.
+
+Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read
+the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV
+implementation below.
+
+@sa
+    SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P.
+    Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE
+    Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.
+
+@code{.cpp}
+Scalar getMSSIM( const Mat& i1, const Mat& i2)
+{
+ const double C1 = 6.5025, C2 = 58.5225;
+ /***************************** INITS **********************************/
+ int d     = CV_32F;
+
+ Mat I1, I2;
+ i1.convertTo(I1, d);           // cannot calculate on one byte large values
+ i2.convertTo(I2, d);
+
+ Mat I2_2   = I2.mul(I2);        // I2^2
+ Mat I1_2   = I1.mul(I1);        // I1^2
+ Mat I1_I2  = I1.mul(I2);        // I1 * I2
+
+ /***********************PRELIMINARY COMPUTING ******************************/
+
+ Mat mu1, mu2;   //
+ GaussianBlur(I1, mu1, Size(11, 11), 1.5);
+ GaussianBlur(I2, mu2, Size(11, 11), 1.5);
+
+ Mat mu1_2   =   mu1.mul(mu1);
+ Mat mu2_2   =   mu2.mul(mu2);
+ Mat mu1_mu2 =   mu1.mul(mu2);
+
+ Mat sigma1_2, sigma2_2, sigma12;
+
+ GaussianBlur(I1_2, sigma1_2, Size(11, 11), 1.5);
+ sigma1_2 -= mu1_2;
+
+ GaussianBlur(I2_2, sigma2_2, Size(11, 11), 1.5);
+ sigma2_2 -= mu2_2;
+
+ GaussianBlur(I1_I2, sigma12, Size(11, 11), 1.5);
+ sigma12 -= mu1_mu2;
+
+ ///////////////////////////////// FORMULA ////////////////////////////////
+ Mat t1, t2, t3;
+
+ t1 = 2 * mu1_mu2 + C1;
+ t2 = 2 * sigma12 + C2;
+ t3 = t1.mul(t2);              // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2))
+
+ t1 = mu1_2 + mu2_2 + C1;
+ t2 = sigma1_2 + sigma2_2 + C2;
+ t1 = t1.mul(t2);               // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2))
+
+ Mat ssim_map;
+ divide(t3, t1, ssim_map);      // ssim_map =  t3./t1;
+
+ Scalar mssim = mean( ssim_map ); // mssim = average of ssim map
+ return mssim;
+}
+@endcode
+This will return a similarity index for each channel of the image. This value is between zero and
+one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite
+costly, so while the PSNR may work in a real time like environment (24 frame per second) this will
+take significantly more than to accomplish similar performance results.
+
+Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement
+for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For
+visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to
+the console. Expect to see something like:
+
+![](images/outputVideoInput.png)
+
+You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg).
+
+\htmlonly
+<div align="center">
+<iframe title="Video Input with OpenCV (Plus PSNR and MSSIM)" width="560" height="349" src="http://www.youtube.com/embed/iOcNljutOgg?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe>
+</div>
+\endhtmlonly
--- a/doc/tutorials/videoio/video-write/images/resultOutputWideoWrite.png
+++ b/doc/tutorials/videoio/video-write/images/resultOutputWideoWrite.png
--- a/doc/tutorials/videoio/video-write/images/videoCompressSelect.png
+++ b/doc/tutorials/videoio/video-write/images/videoCompressSelect.png
--- a/doc/tutorials/videoio/video-write/images/videoFileStructure.png
+++ b/doc/tutorials/videoio/video-write/images/videoFileStructure.png
--- a/doc/tutorials/videoio/video-write/video_write.markdown
+++ b/doc/tutorials/videoio/video-write/video_write.markdown
@@ -0,0 +1,160 @@
+Creating a video with OpenCV {#tutorial_video_write}
+============================
+
+Goal
+----
+
+Whenever you work with video feeds you may eventually want to save your image processing result in a
+form of a new video file. For simple video outputs you can use the OpenCV built-in @ref cv::VideoWriter
+class, designed for this.
+
+-   How to create a video file with OpenCV
+-   What type of video files you can create with OpenCV
+-   How to extract a given color channel from a video
+
+As a simple demonstration I'll just extract one of the BGR color channels of an input video file
+into a new video. You can control the flow of the application from its console line arguments:
+
+-   The first argument points to the video file to work on
+-   The second argument may be one of the characters: R G B. This will specify which of the channels
+    to extract.
+-   The last argument is the character Y (Yes) or N (No). If this is no, the codec used for the
+    input video file will be the same as for the output. Otherwise, a window will pop up and allow
+    you to select yourself the codec to use.
+
+For example, a valid command line would look like:
+@code{.bash}
+video-write.exe video/Megamind.avi R Y
+@endcode
+The source code
+---------------
+
+You may also find the source code and these video file in the
+`samples/cpp/tutorial_code/videoio/video-write/` folder of the OpenCV source library or [download it
+from here ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-write/video-write.cpp).
+
+@include cpp/tutorial_code/videoio/video-write/video-write.cpp
+
+The structure of a video
+------------------------
+
+For start, you should have an idea of just how a video file looks. Every video file in itself is a
+container. The type of the container is expressed in the files extension (for example *avi*, *mov*
+or *mkv*). This contains multiple elements like: video feeds, audio feeds or other tracks (like for
+example subtitles). How these feeds are stored is determined by the codec used for each one of them.
+In case of the audio tracks commonly used codecs are *mp3* or *aac*. For the video files the list is
+somehow longer and includes names such as *XVID*, *DIVX*, *H264* or *LAGS* (*Lagarith Lossless
+Codec*). The full list of codecs you may use on a system depends on just what one you have
+installed.
+
+![](images/videoFileStructure.png)
+
+As you can see things can get really complicated with videos. However, OpenCV is mainly a computer
+vision library, not a video stream, codec and write one. Therefore, the developers tried to keep
+this part as simple as possible. Due to this OpenCV for video containers supports only the *avi*
+extension, its first version. A direct limitation of this is that you cannot save a video file
+larger than 2 GB. Furthermore you can only create and expand a single video track inside the
+container. No audio or other track editing support here. Nevertheless, any video codec present on
+your system might work. If you encounter some of these limitations you will need to look into more
+specialized video writing libraries such as *FFMpeg* or codecs as *HuffYUV*, *CorePNG* and *LCL*. As
+an alternative, create the video track with OpenCV and expand it with sound tracks or convert it to
+other formats by using video manipulation programs such as *VirtualDub* or *AviSynth*.
+
+The *VideoWriter* class
+-----------------------
+
+The content written here builds on the assumption you
+already read the @ref tutorial_video_input_psnr_ssim tutorial and you know how to read video files. To create a
+video file you just need to create an instance of the @ref cv::VideoWriter class. You can specify
+its properties either via parameters in the constructor or later on via the @ref cv::VideoWriter::open function.
+Either way, the parameters are the same: 1. The name of the output that contains the container type
+in its extension. At the moment only *avi* is supported. We construct this from the input file, add
+to this the name of the channel to use, and finish it off with the container extension.
+@code{.cpp}
+const string source      = argv[1];            // the source file name
+string::size_type pAt = source.find_last_of('.');   // Find extension point
+const string NAME = source.substr(0, pAt) + argv[2][0] + ".avi";   // Form the new name with container
+@endcode
+-#  The codec to use for the video track. Now all the video codecs have a unique short name of
+    maximum four characters. Hence, the *XVID*, *DIVX* or *H264* names. This is called a four
+    character code. You may also ask this from an input video by using its *get* function. Because
+    the *get* function is a general function it always returns double values. A double value is
+    stored on 64 bits. Four characters are four bytes, meaning 32 bits. These four characters are
+    coded in the lower 32 bits of the *double*. A simple way to throw away the upper 32 bits would
+    be to just convert this value to *int*:
+    @code{.cpp}
+    VideoCapture inputVideo(source);                                // Open input
+    int ex = static_cast<int>(inputVideo.get(CAP_PROP_FOURCC));     // Get Codec Type- Int form
+    @endcode
+    OpenCV internally works with this integer type and expect this as its second parameter. Now to
+    convert from the integer form to string we may use two methods: a bitwise operator and a union
+    method. The first one extracting from an int the characters looks like (an "and" operation, some
+    shifting and adding a 0 at the end to close the string):
+    @code{.cpp}
+    char EXT[] = {ex & 0XFF , (ex & 0XFF00) >> 8,(ex & 0XFF0000) >> 16,(ex & 0XFF000000) >> 24, 0};
+    @endcode
+    You can do the same thing with the *union* as:
+    @code{.cpp}
+    union { int v; char c[5];} uEx ;
+    uEx.v = ex;                              // From Int to char via union
+    uEx.c[4]='\0';
+    @endcode
+    The advantage of this is that the conversion is done automatically after assigning, while for
+    the bitwise operator you need to do the operations whenever you change the codec type. In case
+    you know the codecs four character code beforehand, you can use the *CV_FOURCC* macro to build
+    the integer:
+    @code{.cpp}
+    CV_FOURCC('P','I','M,'1') // this is an MPEG1 codec from the characters to integer
+    @endcode
+    If you pass for this argument minus one than a window will pop up at runtime that contains all
+    the codec installed on your system and ask you to select the one to use:
+
+    ![](images/videoCompressSelect.png)
+
+-#  The frame per second for the output video. Again, here I keep the input videos frame per second
+    by using the *get* function.
+-#  The size of the frames for the output video. Here too I keep the input videos frame size per
+    second by using the *get* function.
+-#  The final argument is an optional one. By default is true and says that the output will be a
+    colorful one (so for write you will send three channel images). To create a gray scale video
+    pass a false parameter here.
+
+Here it is, how I use it in the sample:
+@code{.cpp}
+VideoWriter outputVideo;
+Size S = Size((int) inputVideo.get(CAP_PROP_FRAME_WIDTH),    //Acquire input size
+              (int) inputVideo.get(CAP_PROP_FRAME_HEIGHT));
+outputVideo.open(NAME , ex, inputVideo.get(CAP_PROP_FPS),S, true);
+@endcode
+Afterwards, you use the @ref cv::VideoWriter::isOpened() function to find out if the open operation succeeded or
+not. The video file automatically closes when the *VideoWriter* object is destroyed. After you open
+the object with success you can send the frames of the video in a sequential order by using the
+@ref cv::VideoWriter::write function of the class. Alternatively, you can use its overloaded operator \<\< :
+@code{.cpp}
+outputVideo.write(res);  //or
+outputVideo << res;
+@endcode
+Extracting a color channel from an BGR image means to set to zero the BGR values of the other
+channels. You can either do this with image scanning operations or by using the split and merge
+operations. You first split the channels up into different images, set the other channels to zero
+images of the same size and type and finally merge them back:
+@code{.cpp}
+split(src, spl);                 // process - extract only the correct channel
+for( int i =0; i < 3; ++i)
+   if (i != channel)
+      spl[i] = Mat::zeros(S, spl[0].type());
+merge(spl, res);
+@endcode
+Put all this together and you'll get the upper source code, whose runtime result will show something
+around the idea:
+
+![](images/resultOutputWideoWrite.png)
+
+You may observe a runtime instance of this on the [YouTube
+here](https://www.youtube.com/watch?v=jpBwHxsl1_0).
+
+\htmlonly
+<div align="center">
+<iframe title="Creating a video with OpenCV" width="560" height="349" src="http://www.youtube.com/embed/jpBwHxsl1_0?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe>
+</div>
+\endhtmlonly