added videoio docs and tutorials

This commit is contained in:
Ishank gulati
2015-12-17 10:16:10 +05:30
committed by ishank08
parent 8d79285d02
commit 24da1ba3dc
15 changed files with 30 additions and 25 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.2 KiB

View File

@@ -0,0 +1,19 @@
Video Input and Output (videoio module) {#tutorial_table_of_content_videoio}
=========================================
This section contains tutorials about how to read/save your video files.
- @subpage tutorial_video_input_psnr_ssim
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor
You will learn how to read video streams, and how to calculate similarity values such as PSNR
or SSIM.
- @subpage tutorial_video_write
*Compatibility:* \> OpenCV 2.0
*Author:* Bernát Gábor

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View File

@@ -0,0 +1,251 @@
Video Input with OpenCV and similarity measurement {#tutorial_video_input_psnr_ssim}
==================================================
Goal
----
Today it is common to have a digital video recording system at your disposal. Therefore, you will
eventually come to the situation that you no longer process a batch of images, but video streams.
These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard
disk drive stored files. Luckily OpenCV threats these two in the same manner, with the same C++
class. So here's what you'll learn in this tutorial:
- How to open and read video streams
- Two ways for checking image similarity: PSNR and SSIM
The source code
---------------
As a test case where to show off these using OpenCV I've created a small program that reads in two
video files and performs a similarity check between them. This is something you could use to check
just how well a new video compressing algorithms works. Let there be a reference (original) video
like [this small Megamind clip
](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/video/Megamind.avi) and [a compressed
version of it ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/video/Megamind_bugy.avi).
You may also find the source code and these video file in the
`samples/cpp/tutorial_code/videoio/video-input-psnr-ssim/` folder of the OpenCV source library.
@include cpp/tutorial_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp
How to read a video stream (online-camera or offline-file)?
-----------------------------------------------------------
Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture
C++ class. This on itself builds on the FFmpeg open source library. This is a basic
dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession
of images, we refer to these in the literature as frames. In case of a video file there is a *frame
rate* specifying just how long is between two frames. While for the video cameras usually there is a
limit of just how many frames they can digitalize per second, this property is less important as at
any time the camera sees the current snapshot of the world.
The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do
this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an
integer then you will bind the class to a camera, a device. The number passed here is the ID of the
device, assigned by the operating system. If you have a single camera attached to your system its ID
will probably be zero and further ones increasing from there. If the parameter passed to these is a
string it will refer to a video file, and the string points to the location and name of the file.
For example, to the upper source code a valid command line is:
@code{.bash}
video/Megamind.avi video/Megamind_bug.avi 35 10
@endcode
We do a similarity check. This requires a reference and a test case video file. The first two
arguments refer to this. Here we use a relative address. This means that the application will look
into its current working directory and open the video folder and try to find inside this the
*Megamind.avi* and the *Megamind_bug.avi*.
@code{.cpp}
const string sourceReference = argv[1],sourceCompareWith = argv[2];
VideoCapture captRefrnc(sourceReference);
// or
VideoCapture captUndTst;
captUndTst.open(sourceCompareWith);
@endcode
To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened
function:
@code{.cpp}
if ( !captRefrnc.isOpened())
{
cout << "Could not open reference " << sourceReference << endl;
return -1;
}
@endcode
Closing the video is automatic when the objects destructor is called. However, if you want to close
it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just
simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put
them inside a *Mat* one. The video streams are sequential. You may get the frames one after another
by the @ref cv::VideoCapture::read or the overloaded \>\> operator:
@code{.cpp}
Mat frameReference, frameUnderTest;
captRefrnc >> frameReference;
captUndTst.open(frameUnderTest);
@endcode
The upper read operations will leave empty the *Mat* objects if no frame could be acquired (either
cause the video stream was closed or you got to the end of the video file). We can check this with a
simple if:
@code{.cpp}
if( frameReference.empty() || frameUnderTest.empty())
{
// exit the program
}
@endcode
A read method is made of a frame grab and a decoding applied on that. You may call explicitly these
two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions.
Videos have many-many information attached to them besides the content of the frames. These are
usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to
this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double
values containing these properties. Use bitwise operations to decode the characters from a double
type and conversions where valid values are only integers. Its single argument is the ID of the
queried property. For example, here we get the size of the frames in the reference and test case
video file; plus the number of frames inside the reference.
@code{.cpp}
Size refS = Size((int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
(int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
cout << "Reference frame resolution: Width=" << refS.width << " Height=" << refS.height
<< " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
@endcode
When you are working with videos you may often want to control these values yourself. To do this
there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to
change and there is a second of double type containing the value to be set. It will return true if
it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time
or frame:
@code{.cpp}
captRefrnc.set(CAP_PROP_POS_MSEC, 1.2); // go to the 1.2 second in the video
captRefrnc.set(CAP_PROP_POS_FRAMES, 10); // go to the 10th frame of the video
// now a read operation would read the frame at the set position
@endcode
For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and
@ref cv::VideoCapture::set functions.
Image similarity - PSNR and SSIM
--------------------------------
We want to check just how imperceptible our video converting operation went, therefore we need a
system to check frame by frame the similarity or differences. The most common algorithm used for
this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out
from the *mean squad error*. Let there be two images: I1 and I2; with a two dimensional size i and
j, composed of c number of channels.
\f[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\f]
Then the PSNR is expressed as:
\f[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\f]
Here the \f$MAX_I^2\f$ is the maximum valid value for a pixel. In case of the simple single byte image
per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in
an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as
we'll need to handle this case separately. The transition to a logarithmic scale is made because the
pixel values have a very wide dynamic range. All this translated to OpenCV and a C++ function looks
like:
@code{.cpp}
double getPSNR(const Mat& I1, const Mat& I2)
{
Mat s1;
absdiff(I1, I2, s1); // |I1 - I2|
s1.convertTo(s1, CV_32F); // cannot make a square on 8 bits
s1 = s1.mul(s1); // |I1 - I2|^2
Scalar s = sum(s1); // sum elements per channel
double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels
if( sse <= 1e-10) // for small values return zero
return 0;
else
{
double mse =sse /(double)(I1.channels() * I1.total());
double psnr = 10.0*log10((255*255)/mse);
return psnr;
}
}
@endcode
Typically result values are anywhere between 30 and 50 for video compression, where higher is
better. If the images significantly differ you'll get much lower ones like 15 and so. This
similarity check is easy and fast to calculate, however in practice it may turn out somewhat
inconsistent with human eye perception. The **structural similarity** algorithm aims to correct
this.
Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read
the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV
implementation below.
@sa
SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P.
Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE
Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.
@code{.cpp}
Scalar getMSSIM( const Mat& i1, const Mat& i2)
{
const double C1 = 6.5025, C2 = 58.5225;
/***************************** INITS **********************************/
int d = CV_32F;
Mat I1, I2;
i1.convertTo(I1, d); // cannot calculate on one byte large values
i2.convertTo(I2, d);
Mat I2_2 = I2.mul(I2); // I2^2
Mat I1_2 = I1.mul(I1); // I1^2
Mat I1_I2 = I1.mul(I2); // I1 * I2
/***********************PRELIMINARY COMPUTING ******************************/
Mat mu1, mu2; //
GaussianBlur(I1, mu1, Size(11, 11), 1.5);
GaussianBlur(I2, mu2, Size(11, 11), 1.5);
Mat mu1_2 = mu1.mul(mu1);
Mat mu2_2 = mu2.mul(mu2);
Mat mu1_mu2 = mu1.mul(mu2);
Mat sigma1_2, sigma2_2, sigma12;
GaussianBlur(I1_2, sigma1_2, Size(11, 11), 1.5);
sigma1_2 -= mu1_2;
GaussianBlur(I2_2, sigma2_2, Size(11, 11), 1.5);
sigma2_2 -= mu2_2;
GaussianBlur(I1_I2, sigma12, Size(11, 11), 1.5);
sigma12 -= mu1_mu2;
///////////////////////////////// FORMULA ////////////////////////////////
Mat t1, t2, t3;
t1 = 2 * mu1_mu2 + C1;
t2 = 2 * sigma12 + C2;
t3 = t1.mul(t2); // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2))
t1 = mu1_2 + mu2_2 + C1;
t2 = sigma1_2 + sigma2_2 + C2;
t1 = t1.mul(t2); // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2))
Mat ssim_map;
divide(t3, t1, ssim_map); // ssim_map = t3./t1;
Scalar mssim = mean( ssim_map ); // mssim = average of ssim map
return mssim;
}
@endcode
This will return a similarity index for each channel of the image. This value is between zero and
one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite
costly, so while the PSNR may work in a real time like environment (24 frame per second) this will
take significantly more than to accomplish similar performance results.
Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement
for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For
visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to
the console. Expect to see something like:
![](images/outputVideoInput.png)
You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg).
\htmlonly
<div align="center">
<iframe title="Video Input with OpenCV (Plus PSNR and MSSIM)" width="560" height="349" src="http://www.youtube.com/embed/iOcNljutOgg?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe>
</div>
\endhtmlonly

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,160 @@
Creating a video with OpenCV {#tutorial_video_write}
============================
Goal
----
Whenever you work with video feeds you may eventually want to save your image processing result in a
form of a new video file. For simple video outputs you can use the OpenCV built-in @ref cv::VideoWriter
class, designed for this.
- How to create a video file with OpenCV
- What type of video files you can create with OpenCV
- How to extract a given color channel from a video
As a simple demonstration I'll just extract one of the BGR color channels of an input video file
into a new video. You can control the flow of the application from its console line arguments:
- The first argument points to the video file to work on
- The second argument may be one of the characters: R G B. This will specify which of the channels
to extract.
- The last argument is the character Y (Yes) or N (No). If this is no, the codec used for the
input video file will be the same as for the output. Otherwise, a window will pop up and allow
you to select yourself the codec to use.
For example, a valid command line would look like:
@code{.bash}
video-write.exe video/Megamind.avi R Y
@endcode
The source code
---------------
You may also find the source code and these video file in the
`samples/cpp/tutorial_code/videoio/video-write/` folder of the OpenCV source library or [download it
from here ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/videoio/video-write/video-write.cpp).
@include cpp/tutorial_code/videoio/video-write/video-write.cpp
The structure of a video
------------------------
For start, you should have an idea of just how a video file looks. Every video file in itself is a
container. The type of the container is expressed in the files extension (for example *avi*, *mov*
or *mkv*). This contains multiple elements like: video feeds, audio feeds or other tracks (like for
example subtitles). How these feeds are stored is determined by the codec used for each one of them.
In case of the audio tracks commonly used codecs are *mp3* or *aac*. For the video files the list is
somehow longer and includes names such as *XVID*, *DIVX*, *H264* or *LAGS* (*Lagarith Lossless
Codec*). The full list of codecs you may use on a system depends on just what one you have
installed.
![](images/videoFileStructure.png)
As you can see things can get really complicated with videos. However, OpenCV is mainly a computer
vision library, not a video stream, codec and write one. Therefore, the developers tried to keep
this part as simple as possible. Due to this OpenCV for video containers supports only the *avi*
extension, its first version. A direct limitation of this is that you cannot save a video file
larger than 2 GB. Furthermore you can only create and expand a single video track inside the
container. No audio or other track editing support here. Nevertheless, any video codec present on
your system might work. If you encounter some of these limitations you will need to look into more
specialized video writing libraries such as *FFMpeg* or codecs as *HuffYUV*, *CorePNG* and *LCL*. As
an alternative, create the video track with OpenCV and expand it with sound tracks or convert it to
other formats by using video manipulation programs such as *VirtualDub* or *AviSynth*.
The *VideoWriter* class
-----------------------
The content written here builds on the assumption you
already read the @ref tutorial_video_input_psnr_ssim tutorial and you know how to read video files. To create a
video file you just need to create an instance of the @ref cv::VideoWriter class. You can specify
its properties either via parameters in the constructor or later on via the @ref cv::VideoWriter::open function.
Either way, the parameters are the same: 1. The name of the output that contains the container type
in its extension. At the moment only *avi* is supported. We construct this from the input file, add
to this the name of the channel to use, and finish it off with the container extension.
@code{.cpp}
const string source = argv[1]; // the source file name
string::size_type pAt = source.find_last_of('.'); // Find extension point
const string NAME = source.substr(0, pAt) + argv[2][0] + ".avi"; // Form the new name with container
@endcode
-# The codec to use for the video track. Now all the video codecs have a unique short name of
maximum four characters. Hence, the *XVID*, *DIVX* or *H264* names. This is called a four
character code. You may also ask this from an input video by using its *get* function. Because
the *get* function is a general function it always returns double values. A double value is
stored on 64 bits. Four characters are four bytes, meaning 32 bits. These four characters are
coded in the lower 32 bits of the *double*. A simple way to throw away the upper 32 bits would
be to just convert this value to *int*:
@code{.cpp}
VideoCapture inputVideo(source); // Open input
int ex = static_cast<int>(inputVideo.get(CAP_PROP_FOURCC)); // Get Codec Type- Int form
@endcode
OpenCV internally works with this integer type and expect this as its second parameter. Now to
convert from the integer form to string we may use two methods: a bitwise operator and a union
method. The first one extracting from an int the characters looks like (an "and" operation, some
shifting and adding a 0 at the end to close the string):
@code{.cpp}
char EXT[] = {ex & 0XFF , (ex & 0XFF00) >> 8,(ex & 0XFF0000) >> 16,(ex & 0XFF000000) >> 24, 0};
@endcode
You can do the same thing with the *union* as:
@code{.cpp}
union { int v; char c[5];} uEx ;
uEx.v = ex; // From Int to char via union
uEx.c[4]='\0';
@endcode
The advantage of this is that the conversion is done automatically after assigning, while for
the bitwise operator you need to do the operations whenever you change the codec type. In case
you know the codecs four character code beforehand, you can use the *CV_FOURCC* macro to build
the integer:
@code{.cpp}
CV_FOURCC('P','I','M,'1') // this is an MPEG1 codec from the characters to integer
@endcode
If you pass for this argument minus one than a window will pop up at runtime that contains all
the codec installed on your system and ask you to select the one to use:
![](images/videoCompressSelect.png)
-# The frame per second for the output video. Again, here I keep the input videos frame per second
by using the *get* function.
-# The size of the frames for the output video. Here too I keep the input videos frame size per
second by using the *get* function.
-# The final argument is an optional one. By default is true and says that the output will be a
colorful one (so for write you will send three channel images). To create a gray scale video
pass a false parameter here.
Here it is, how I use it in the sample:
@code{.cpp}
VideoWriter outputVideo;
Size S = Size((int) inputVideo.get(CAP_PROP_FRAME_WIDTH), //Acquire input size
(int) inputVideo.get(CAP_PROP_FRAME_HEIGHT));
outputVideo.open(NAME , ex, inputVideo.get(CAP_PROP_FPS),S, true);
@endcode
Afterwards, you use the @ref cv::VideoWriter::isOpened() function to find out if the open operation succeeded or
not. The video file automatically closes when the *VideoWriter* object is destroyed. After you open
the object with success you can send the frames of the video in a sequential order by using the
@ref cv::VideoWriter::write function of the class. Alternatively, you can use its overloaded operator \<\< :
@code{.cpp}
outputVideo.write(res); //or
outputVideo << res;
@endcode
Extracting a color channel from an BGR image means to set to zero the BGR values of the other
channels. You can either do this with image scanning operations or by using the split and merge
operations. You first split the channels up into different images, set the other channels to zero
images of the same size and type and finally merge them back:
@code{.cpp}
split(src, spl); // process - extract only the correct channel
for( int i =0; i < 3; ++i)
if (i != channel)
spl[i] = Mat::zeros(S, spl[0].type());
merge(spl, res);
@endcode
Put all this together and you'll get the upper source code, whose runtime result will show something
around the idea:
![](images/resultOutputWideoWrite.png)
You may observe a runtime instance of this on the [YouTube
here](https://www.youtube.com/watch?v=jpBwHxsl1_0).
\htmlonly
<div align="center">
<iframe title="Creating a video with OpenCV" width="560" height="349" src="http://www.youtube.com/embed/jpBwHxsl1_0?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe>
</div>
\endhtmlonly