GSoC Python Tutorials

GSoC Python Tutorials

removed white spaces

removed blank lines at EOF

removed duplicate labels
This commit is contained in:
abidrahmank 2013-09-22 00:14:27 +05:30
parent c0c575d68e
commit 899781b3d6
329 changed files with 9660 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -0,0 +1,213 @@
.. _calibration:
Camera Calibration
********************
Goal
=======
In this section,
* We will learn about distortions in camera, intrinsic and extrinsic parameters of camera etc.
* We will learn to find these parameters, undistort images etc.
Basics
========
Today's cheap pinhole cameras introduces a lot of distortion to images. Two major distortions are radial distortion and tangential distortion.
Due to radial distortion, straight lines will appear curved. Its effect is more as we move away from the center of image. For example, one image is shown below, where two edges of a chess board are marked with red lines. But you can see that border is not a straight line and doesn't match with the red line. All the expected straight lines are bulged out. Visit `Distortion (optics) <http://en.wikipedia.org/wiki/Distortion_%28optics%29>`_ for more details.
.. image:: images/calib_radial.jpg
:alt: Radial Distortion
:align: center
This distortion is solved as follows:
.. math::
x_{corrected} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
y_{corrected} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)
Similarly, another distortion is the tangential distortion which occurs because image taking lense is not aligned perfectly parallel to the imaging plane. So some areas in image may look nearer than expected. It is solved as below:
.. math::
x_{corrected} = x + [ 2p_1xy + p_2(r^2+2x^2)] \\
y_{corrected} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]
In short, we need to find five parameters, known as distortion coefficients given by:
.. math::
Distortion \; coefficients=(k_1 \hspace{10pt} k_2 \hspace{10pt} p_1 \hspace{10pt} p_2 \hspace{10pt} k_3)
In addition to this, we need to find a few more information, like intrinsic and extrinsic parameters of a camera. Intrinsic parameters are specific to a camera. It includes information like focal length (:math:`f_x,f_y`), optical centers (:math:`c_x, c_y`) etc. It is also called camera matrix. It depends on the camera only, so once calculated, it can be stored for future purposes. It is expressed as a 3x3 matrix:
.. math::
camera \; matrix = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ]
Extrinsic parameters corresponds to rotation and translation vectors which translates a coordinates of a 3D point to a coordinate system.
For stereo applications, these distortions need to be corrected first. To find all these parameters, what we have to do is to provide some sample images of a well defined pattern (eg, chess board). We find some specific points in it ( square corners in chess board). We know its coordinates in real world space and we know its coordinates in image. With these data, some mathematical problem is solved in background to get the distortion coefficients. That is the summary of the whole story. For better results, we need atleast 10 test patterns.
Code
========
As mentioned above, we need atleast 10 test patterns for camera calibration. OpenCV comes with some images of chess board (see ``samples/cpp/left01.jpg -- left14.jpg``), so we will utilize it. For sake of understanding, consider just one image of a chess board. Important input datas needed for camera calibration is a set of 3D real world points and its corresponding 2D image points. 2D image points are OK which we can easily find from the image. (These image points are locations where two black squares touch each other in chess boards)
What about the 3D points from real world space? Those images are taken from a static camera and chess boards are placed at different locations and orientations. So we need to know :math:`(X,Y,Z)` values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always) and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of points. In this case, the results we get will be in the scale of size of chess board square. But if we know the square size, (say 30 mm), and we can pass the values as (0,0),(30,0),(60,0),..., we get the results in mm. (In this case, we don't know square size since we didn't take those images, so we pass in terms of square size).
3D points are called **object points** and 2D image points are called **image points.**
Setup
---------
So to find pattern in chess board, we use the function, **cv2.findChessboardCorners()**. We also need to pass what kind of pattern we are looking, like 8x8 grid, 5x5 grid etc. In this example, we use 7x6 grid. (Normally a chess board has 8x8 squares and 7x7 internal corners). It returns the corner points and retval which will be True if pattern is obtained. These corners will be placed in an order (from left-to-right, top-to-bottom)
.. seealso:: This function may not be able to find the required pattern in all the images. So one good option is to write the code such that, it starts the camera and check each frame for required pattern. Once pattern is obtained, find the corners and store it in a list. Also provides some interval before reading next frame so that we can adjust our chess board in different direction. Continue this process until required number of good patterns are obtained. Even in the example provided here, we are not sure out of 14 images given, how many are good. So we read all the images and take the good ones.
.. seealso:: Instead of chess board, we can use some circular grid, but then use the function **cv2.findCirclesGrid()** to find the pattern. It is said that less number of images are enough when using circular grid.
Once we find the corners, we can increase their accuracy using **cv2.cornerSubPix()**. We can also draw the pattern using **cv2.drawChessboardCorners()**. All these steps are included in below code:
::
import numpy as np
import cv2
import glob
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
images = glob.glob('*.jpg')
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (7,6),None)
# If found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners2)
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (7,6), corners2,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
cv2.destroyAllWindows()
One image with pattern drawn on it is shown below:
.. image:: images/calib_pattern.jpg
:alt: Calibration Pattern
:align: center
Calibration
------------
So now we have our object points and image points we are ready to go for calibration. For that we use the function, **cv2.calibrateCamera()**. It returns the camera matrix, distortion coefficients, rotation and translation vectors etc.
::
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
Undistortion
---------------
We have got what we were trying. Now we can take an image and undistort it. OpenCV comes with two methods, we will see both. But before that, we can refine the camera matrix based on a free scaling parameter using **cv2.getOptimalNewCameraMatrix()**. If the scaling parameter ``alpha=0``, it returns undistorted image with minimum unwanted pixels. So it may even remove some pixels at image corners. If ``alpha=1``, all pixels are retained with some extra black images. It also returns an image ROI which can be used to crop the result.
So we take a new image (``left12.jpg`` in this case. That is the first image in this chapter)
::
img = cv2.imread('left12.jpg')
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
1. Using **cv2.undistort()**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the shortest path. Just call the function and use ROI obtained above to crop the result.
::
# undistort
dst = cv2.undistort(img, mtx, dist, None, newcameramtx)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
cv2.imwrite('calibresult.png',dst)
2. Using **remapping**
^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is curved path. First find a mapping function from distorted image to undistorted image. Then use the remap function.
::
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
cv2.imwrite('calibresult.png',dst)
Both the methods give the same result. See the result below:
.. image:: images/calib_result.jpg
:alt: Calibration Result
:align: center
You can see in the result that all the edges are straight.
Now you can store the camera matrix and distortion coefficients using write functions in Numpy (np.savez, np.savetxt etc) for future uses.
Re-projection Error
=======================
Re-projection error gives a good estimation of just how exact is the found parameters. This should be as close to zero as possible. Given the intrinsic, distortion, rotation and translation matrices, we first transform the object point to image point using **cv2.projectPoints()**. Then we calculate the absolute norm between what we got with our transformation and the corner finding algorithm. To find the average error we calculate the arithmetical mean of the errors calculate for all the calibration images.
::
mean_error = 0
for i in xrange(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print "total error: ", mean_error/len(objpoints)
Additional Resources
======================
Exercises
============
#. Try camera calibration with circular grid.

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -0,0 +1,67 @@
.. _py_depthmap:
Depth Map from Stereo Images
******************************
Goal
=======
In this session,
* We will learn to create depth map from stereo images.
Basics
===========
In last session, we saw basic concepts like epipolar constraints and other related terms. We also saw that if we have two images of same scene, we can get depth information from that in an intuitive way. Below is an image and some simple mathematical formulas which proves that intuition. (Image Courtesy :
.. image:: images/stereo_depth.jpg
:alt: Calculating depth
:align: center
The above diagram contains equivalent triangles. Writing their equivalent equations will yield us following result:
.. math::
disparity = x - x' = \frac{Bf}{Z}
:math:`x` and :math:`x'` are the distance between points in image plane corresponding to the scene point 3D and their camera center. :math:`B` is the distance between two cameras (which we know) and :math:`f` is the focal length of camera (already known). So in short, above equation says that the depth of a point in a scene is inversely proportional to the difference in distance of corresponding image points and their camera centers. So with this information, we can derive the depth of all pixels in an image.
So it finds corresponding matches between two images. We have already seen how epiline constraint make this operation faster and accurate. Once it finds matches, it finds the disparity. Let's see how we can do it with OpenCV.
Code
========
Below code snippet shows a simple procedure to create disparity map.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('tsukuba_l.png',0)
imgR = cv2.imread('tsukuba_r.png',0)
stereo = cv2.createStereoBM(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL,imgR)
plt.imshow(disparity,'gray')
plt.show()
Below image contains the original image (left) and its disparity map (right). As you can see, result is contaminated with high degree of noise. By adjusting the values of numDisparities and blockSize, you can get more better result.
.. image:: images/disparity_map.jpg
:alt: Disparity Map
:align: center
.. note:: More details to be added
Additional Resources
=============================
Exercises
============
1. OpenCV samples contain an example of generating disparity map and its 3D reconstruction. Check ``stereo_match.py`` in OpenCV-Python samples.

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -0,0 +1,158 @@
.. _epipolar_geometry:
Epipolar Geometry
*********************
Goal
========
In this section,
* We will learn about the basics of multiview geometry
* We will see what is epipole, epipolar lines, epipolar constraint etc.
Basic Concepts
=================
When we take an image using pin-hole camera, we loose an important information, ie depth of the image. Or how far is each point in the image from the camera because it is a 3D-to-2D conversion. So it is an important question whether we can find the depth information using these cameras. And the answer is to use more than one camera. Our eyes works in similar way where we use two cameras (two eyes) which is called stereo vision. So let's see what OpenCV provides in this field.
(*Learning OpenCV* by Gary Bradsky has a lot of information in this field.)
Before going to depth images, let's first understand some basic concepts in multiview geometry. In this section we will deal with epipolar geometry. See the image below which shows a basic setup with two cameras taking the image of same scene.
.. image:: images/epipolar.jpg
:alt: Epipolar geometry
:align: center
If we are using only the left camera, we can't find the 3D point corresponding to the point :math:`x` in image because every point on the line :math:`OX` projects to the same point on the image plane. But consider the right image also. Now different points on the line :math:`OX` projects to different points (:math:`x'`) in right plane. So with these two images, we can triangulate the correct 3D point. This is the whole idea.
The projection of the different points on :math:`OX` form a line on right plane (line :math:`l'`). We call it **epiline** corresponding to the point :math:`x`. It means, to find the point :math:`x` on the right image, search along this epiline. It should be somewhere on this line (Think of it this way, to find the matching point in other image, you need not search the whole image, just search along the epiline. So it provides better performance and accuracy). This is called **Epipolar Constraint**. Similarly all points will have its corresponding epilines in the other image. The plane :math:`XOO'` is called **Epipolar Plane**.
:math:`O` and :math:`O'` are the camera centers. From the setup given above, you can see that projection of right camera :math:`O'` is seen on the left image at the point, :math:`e`. It is called the **epipole**. Epipole is the point of intersection of line through camera centers and the image planes. Similarly :math:`e'` is the epipole of the left camera. In some cases, you won't be able to locate the epipole in the image, they may be outside the image (which means, one camera doesn't see the other).
All the epilines pass through its epipole. So to find the location of epipole, we can find many epilines and find their intersection point.
So in this session, we focus on finding epipolar lines and epipoles. But to find them, we need two more ingredients, **Fundamental Matrix (F)** and **Essential Matrix (E)**. Essential Matrix contains the information about translation and rotation, which describe the location of the second camera relative to the first in global coordinates. See the image below (Image courtesy: Learning OpenCV by Gary Bradsky):
.. image:: images/essential_matrix.jpg
:alt: Essential Matrix
:align: center
But we prefer measurements to be done in pixel coordinates, right? Fundamental Matrix contains the same information as Essential Matrix in addition to the information about the intrinsics of both cameras so that we can relate the two cameras in pixel coordinates. (If we are using rectified images and normalize the point by dividing by the focal lengths, :math:`F=E`). In simple words, Fundamental Matrix F, maps a point in one image to a line (epiline) in the other image. This is calculated from matching points from both the images. A minimum of 8 such points are required to find the fundamental matrix (while using 8-point algorithm). More points are preferred and use RANSAC to get a more robust result.
Code
=========
So first we need to find as many possible matches between two images to find the fundamental matrix. For this, we use SIFT descriptors with FLANN based matcher and ratio test.
::
import cv2
import numpy as np
from matplotlib import pyplot as plt
img1 = cv2.imread('myleft.jpg',0) #queryimage # left image
img2 = cv2.imread('myright.jpg',0) #trainimage # right image
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
good = []
pts1 = []
pts2 = []
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.8*n.distance:
good.append(m)
pts2.append(kp2[m.trainIdx].pt)
pts1.append(kp1[m.queryIdx].pt)
Now we have the list of best matches from both the images. Let's find the Fundamental Matrix.
::
pts1 = np.int32(pts1)
pts2 = np.int32(pts2)
F, mask = cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS)
# We select only inlier points
pts1 = pts1[mask.ravel()==1]
pts2 = pts2[mask.ravel()==1]
Next we find the epilines. Epilines corresponding to the points in first image is drawn on second image. So mentioning of correct images are important here. We get an array of lines. So we define a new function to draw these lines on the images.
::
def drawlines(img1,img2,lines,pts1,pts2):
''' img1 - image on which we draw the epilines for the points in img2
lines - corresponding epilines '''
r,c = img1.shape
img1 = cv2.cvtColor(img1,cv2.COLOR_GRAY2BGR)
img2 = cv2.cvtColor(img2,cv2.COLOR_GRAY2BGR)
for r,pt1,pt2 in zip(lines,pts1,pts2):
color = tuple(np.random.randint(0,255,3).tolist())
x0,y0 = map(int, [0, -r[2]/r[1] ])
x1,y1 = map(int, [c, -(r[2]+r[0]*c)/r[1] ])
img1 = cv2.line(img1, (x0,y0), (x1,y1), color,1)
img1 = cv2.circle(img1,tuple(pt1),5,color,-1)
img2 = cv2.circle(img2,tuple(pt2),5,color,-1)
return img1,img2
Now we find the epilines in both the images and draw them.
::
# Find epilines corresponding to points in right image (second image) and
# drawing its lines on left image
lines1 = cv2.computeCorrespondEpilines(pts2.reshape(-1,1,2), 2,F)
lines1 = lines1.reshape(-1,3)
img5,img6 = drawlines(img1,img2,lines1,pts1,pts2)
# Find epilines corresponding to points in left image (first image) and
# drawing its lines on right image
lines2 = cv2.computeCorrespondEpilines(pts1.reshape(-1,1,2), 1,F)
lines2 = lines2.reshape(-1,3)
img3,img4 = drawlines(img2,img1,lines2,pts2,pts1)
plt.subplot(121),plt.imshow(img5)
plt.subplot(122),plt.imshow(img3)
plt.show()
Below is the result we get:
.. image:: images/epiresult.jpg
:alt: Epilines
:align: center
You can see in the left image that all epilines are converging at a point outside the image at right side. That meeting point is the epipole.
For better results, images with good resolution and many non-planar points should be used.
Additional Resources
==========================
Exercises
=============
#. One important topic is the forward movement of camera. Then epipoles will be seen at the same locations in both with epilines emerging from a fixed point. `See this discussion <http://answers.opencv.org/question/17912/location-of-epipole/>`_.
#. Fundamental Matrix estimation is sensitive to quality of matches, outliers etc. It becomes worse when all selected matches lie on the same plane. `Check this discussion <http://answers.opencv.org/question/18125/epilines-not-correct/>`_.

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -0,0 +1,132 @@
.. _pose_estimation:
Pose Estimation
*********************
Goal
==========
In this section,
* We will learn to exploit calib3d module to create some 3D effects in images.
Basics
========
This is going to be a small section. During the last session on camera calibration, you have found the camera matrix, distortion coefficients etc. Given a pattern image, we can utilize the above information to calculate its pose, or how the object is situated in space, like how it is rotated, how it is displaced etc. For a planar object, we can assume Z=0, such that, the problem now becomes how camera is placed in space to see our pattern image. So, if we know how the object lies in the space, we can draw some 2D diagrams in it to simulate the 3D effect. Let's see how to do it.
Our problem is, we want to draw our 3D coordinate axis (X, Y, Z axes) on our chessboard's first corner. X axis in blue color, Y axis in green color and Z axis in red color. So in-effect, Z axis should feel like it is perpendicular to our chessboard plane.
First, let's load the camera matrix and distortion coefficients from the previous calibration result.
::
import cv2
import numpy as np
import glob
# Load previously saved data
with np.load('B.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
Now let's create a function, ``draw`` which takes the corners in the chessboard (obtained using **cv2.findChessboardCorners()**) and **axis points** to draw a 3D axis.
::
def draw(img, corners, imgpts):
corner = tuple(corners[0].ravel())
img = cv2.line(img, corner, tuple(imgpts[0].ravel()), (255,0,0), 5)
img = cv2.line(img, corner, tuple(imgpts[1].ravel()), (0,255,0), 5)
img = cv2.line(img, corner, tuple(imgpts[2].ravel()), (0,0,255), 5)
return img
Then as in previous case, we create termination criteria, object points (3D points of corners in chessboard) and axis points. Axis points are points in 3D space for drawing the axis. We draw axis of length 3 (units will be in terms of chess square size since we calibrated based on that size). So our X axis is drawn from (0,0,0) to (3,0,0), so for Y axis. For Z axis, it is drawn from (0,0,0) to (0,0,-3). Negative denotes it is drawn towards the camera.
::
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
axis = np.float32([[3,0,0], [0,3,0], [0,0,-3]]).reshape(-1,3)
Now, as usual, we load each image. Search for 7x6 grid. If found, we refine it with subcorner pixels. Then to calculate the rotation and translation, we use the function, **cv2.solvePnPRansac()**. Once we those transformation matrices, we use them to project our **axis points** to the image plane. In simple words, we find the points on image plane corresponding to each of (3,0,0),(0,3,0),(0,0,3) in 3D space. Once we get them, we draw lines from the first corner to each of these points using our ``draw()`` function. Done !!!
::
for fname in glob.glob('left*.jpg'):
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, (7,6),None)
if ret == True:
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# Find the rotation and translation vectors.
rvecs, tvecs, inliers = cv2.solvePnPRansac(objp, corners2, mtx, dist)
# project 3D points to image plane
imgpts, jac = cv2.projectPoints(axis, rvecs, tvecs, mtx, dist)
img = draw(img,corners2,imgpts)
cv2.imshow('img',img)
k = cv2.waitKey(0) & 0xff
if k == 's':
cv2.imwrite(fname[:6]+'.png', img)
cv2.destroyAllWindows()
See some results below. Notice that each axis is 3 squares long.:
.. image:: images/pose_1.jpg
:alt: Pose Estimation
:align: center
Render a Cube
---------------
If you want to draw a cube, modify the draw() function and axis points as follows.
Modified draw() function:
::
def draw(img, corners, imgpts):
imgpts = np.int32(imgpts).reshape(-1,2)
# draw ground floor in green
img = cv2.drawContours(img, [imgpts[:4]],-1,(0,255,0),-3)
# draw pillars in blue color
for i,j in zip(range(4),range(4,8)):
img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]),(255),3)
# draw top layer in red color
img = cv2.drawContours(img, [imgpts[4:]],-1,(0,0,255),3)
return img
Modified axis points. They are the 8 corners of a cube in 3D space:
::
axis = np.float32([[0,0,0], [0,3,0], [3,3,0], [3,0,0],
[0,0,-3],[0,3,-3],[3,3,-3],[3,0,-3] ])
And look at the result below:
.. image:: images/pose_2.jpg
:alt: Pose Estimation
:align: center
If you are interested in graphics, augmented reality etc, you can use OpenGL to render more complicated figures.
Additional Resources
===========================
Exercises
===========

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

View File

@ -0,0 +1,79 @@
.. _PY_Table-Of-Content-Calib:
Camera Calibration and 3D Reconstruction
----------------------------------------------
* :ref:`calibration`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|calib_1| Let's find how good is our camera. Is there any distortion in images taken with it? If so how to correct it?
=========== ======================================================
.. |calib_1| image:: images/calibration_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`pose_estimation`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|calib_2| This is a small section which will help you to create some cool 3D effects with calib module.
=========== ======================================================
.. |calib_2| image:: images/pose_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`epipolar_geometry`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|calib_3| Let's understand epipolar geometry and epipolar constraint.
=========== ======================================================
.. |calib_3| image:: images/epipolar_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`py_depthmap`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|calib_4| Extract depth information from 2D images.
=========== ======================================================
.. |calib_4| image:: images/depthmap_icon.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. We use a custom table of content format and as the table of content only informs Sphinx about the hierarchy of the files, no need to show it.
.. toctree::
:hidden:
../py_calibration/py_calibration
../py_pose/py_pose
../py_epipolar_geometry/py_epipolar_geometry
../py_depthmap/py_depthmap

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -0,0 +1,181 @@
.. _Basic_Ops:
Basic Operations on Images
******************************
Goal
=======
Learn to:
* Access pixel values and modify them
* Access image properties
* Setting Region of Image (ROI)
* Splitting and Merging images
Almost all the operations in this section is mainly related to Numpy rather than OpenCV. A good knowledge of Numpy is required to write better optimized code with OpenCV.
*( Examples will be shown in Python terminal since most of them are just single line codes )*
Accessing and Modifying pixel values
=======================================
Let's load a color image first:
::
>>> import cv2
>>> import numpy as np
>>> img = cv2.imread('messi5.jpg')
You can access a pixel value by its row and column coordinates. For BGR image, it returns an array of Blue, Green, Red values. For grayscale image, just corresponding intensity is returned.
::
>>> px = img[100,100]
>>> print px
[157 166 200]
# accessing only blue pixel
>>> blue = img[100,100,0]
>>> print blue
157
You can modify the pixel values the same way.
::
>>> img[100,100] = [255,255,255]
>>> print img[100,100]
[255 255 255]
.. warning:: Numpy is a optimized library for fast array calculations. So simply accessing each and every pixel values and modifying it will be very slow and it is discouraged.
.. note:: Above mentioned method is normally used for selecting a region of array, say first 5 rows and last 3 columns like that. For individual pixel access, Numpy array methods, ``array.item()`` and ``array.itemset()`` is considered to be more better. But it always returns a scalar. So if you want to access all B,G,R values, you need to call ``array.item()`` separately for all.
Better pixel accessing and editing method :
.. code-block:: python
# accessing RED value
>>> img.item(10,10,2)
59
# modifying RED value
>>> img.itemset((10,10,2),100)
>>> img.item(10,10,2)
100
Accessing Image Properties
=============================
Image properties include number of rows, columns and channels, type of image data, number of pixels etc.
Shape of image is accessed by ``img.shape``. It returns a tuple of number of rows, columns and channels (if image is color):
::
>>> print img.shape
(342, 548, 3)
.. note:: If image is grayscale, tuple returned contains only number of rows and columns. So it is a good method to check if loaded image is grayscale or color image.
Total number of pixels is accessed by ``img.size``:
::
>>> print img.size
562248
Image datatype is obtained by ``img.dtype``:
::
>>> print img.dtype
uint8
.. note:: ``img.dtype`` is very important while debugging because a large number of errors in OpenCV-Python code is caused by invalid datatype.
Image ROI
===========
Sometimes, you will have to play with certain region of images. For eye detection in images, first face detection is done all over the image and when face is obtained, we select the face region alone and search for eyes inside it instead of searching whole image. It improves accuracy (because eyes are always on faces :D ) and performance (because we search for a small area)
ROI is again obtained using Numpy indexing. Here I am selecting the ball and copying it to another region in the image:
::
>>> ball = img[280:340, 330:390]
>>> img[273:333, 100:160] = ball
Check the results below:
.. image:: images/roi.jpg
:alt: Image ROI
:align: center
Splitting and Merging Image Channels
======================================
Sometimes you will need to work separately on B,G,R channels of image. Then you need to split the BGR images to single planes. Or another time, you may need to join these individual channels to BGR image. You can do it simply by:
::
>>> b,g,r = cv2.split(img)
>>> img = cv2.merge((b,g,r))
Or
>>> b = img[:,:,0]
Suppose, you want to make all the red pixels to zero, you need not split like this and put it equal to zero. You can simply use Numpy indexing, and that is more faster.
::
>>> img[:,:,2] = 0
.. warning:: ``cv2.split()`` is a costly operation (in terms of time). So do it only if you need it. Otherwise go for Numpy indexing.
Making Borders for Images (Padding)
====================================
If you want to create a border around the image, something like a photo frame, you can use **cv2.copyMakeBorder()** function. But it has more applications for convolution operation, zero padding etc. This function takes following arguments:
* **src** - input image
* **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding directions
* **borderType** - Flag defining what kind of border to be added. It can be following types:
* **cv2.BORDER_CONSTANT** - Adds a constant colored border. The value should be given as next argument.
* **cv2.BORDER_REFLECT** - Border will be mirror reflection of the border elements, like this : *fedcba|abcdefgh|hgfedcb*
* **cv2.BORDER_REFLECT_101** or **cv2.BORDER_DEFAULT** - Same as above, but with a slight change, like this : *gfedcb|abcdefgh|gfedcba*
* **cv2.BORDER_REPLICATE** - Last element is replicated throughout, like this: *aaaaaa|abcdefgh|hhhhhhh*
* **cv2.BORDER_WRAP** - Can't explain, it will look like this : *cdefgh|abcdefgh|abcdefg*
* **value** - Color of border if border type is ``cv2.BORDER_CONSTANT``
Below is a sample code demonstrating all these border types for better understanding:
::
import cv2
import numpy as np
from matplotlib import pyplot as plt
BLUE = [255,0,0]
img1 = cv2.imread('opencv_logo.png')
replicate = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REPLICATE)
reflect = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT)
reflect101 = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT_101)
wrap = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_WRAP)
constant= cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_CONSTANT,value=BLUE)
plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL')
plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE')
plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT')
plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101')
plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP')
plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')
plt.show()
See the result below. (Image is displayed with matplotlib. So RED and BLUE planes will be interchanged):
.. image:: images/border.jpg
:alt: Border Types
:align: center
Additional Resources
=========================
Exercises
===========

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,115 @@
.. _Image_Arithmetics:
Arithmetic Operations on Images
*********************************
Goal
=====
* Learn several arithmetic operations on images like addition, subtraction, bitwise operations etc.
* You will learn these functions : **cv2.add()**, **cv2.addWeighted()** etc.
Image Addition
================
You can add two images by OpenCV function, ``cv2.add()`` or simply by numpy operation, ``res = img1 + img2``. Both images should be of same depth and type, or second image can just be a scalar value.
.. note:: There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a saturated operation while Numpy addition is a modulo operation.
For example, consider below sample:
::
>>> x = np.uint8([250])
>>> y = np.uint8([10])
>>> print cv2.add(x,y) # 250+10 = 260 => 255
[[255]]
>>> print x+y # 250+10 = 260 % 256 = 4
[4]
It will be more visible when you add two images. OpenCV function will provide a better result. So always better stick to OpenCV functions.
Image Blending
=================
This is also image addition, but different weights are given to images so that it gives a feeling of blending or transparency. Images are added as per the equation below:
.. math::
g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)
By varying :math:`\alpha` from :math:`0 \rightarrow 1`, you can perform a cool transition between one image to another.
Here I took two images to blend them together. First image is given a weight of 0.7 and second image is given 0.3. ``cv2.addWeighted()`` applies following equation on the image.
.. math::
dst = \alpha \cdot img1 + \beta \cdot img2 + \gamma
Here :math:`\gamma` is taken as zero.
::
img1 = cv2.imread('ml.png')
img2 = cv2.imread('opencv_logo.jpg')
dst = cv2.addWeighted(img1,0.7,img2,0.3,0)
cv2.imshow('dst',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
Check the result below:
.. image:: images/blending.jpg
:alt: Image Blending
:align: center
Bitwise Operations
===================
This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting any part of the image (as we will see in coming chapters), defining and working with non-rectangular ROI etc. Below we will see an example on how to change a particular region of an image.
I want to put OpenCV logo above an image. If I add two images, it will change color. If I blend it, I get an transparent effect. But I want it to be opaque. If it was a rectangular region, I could use ROI as we did in last chapter. But OpenCV logo is a not a rectangular shape. So you can do it with bitwise operations as below:
::
# Load two images
img1 = cv2.imread('messi5.jpg')
img2 = cv2.imread('opencv_logo.png')
# I want to put logo on top-left corner, So I create a ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols ]
# Now create a mask of logo and create its inverse mask also
img2gray = cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 10, 255, cv2.THRESH_BINARY)
mask_inv = cv2.bitwise_not(mask)
# Now black-out the area of logo in ROI
img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
# Take only region of logo from logo image.
img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
# Put logo in ROI and modify the main image
dst = cv2.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst
cv2.imshow('res',img1)
cv2.waitKey(0)
cv2.destroyAllWindows()
See the result below. Left image shows the mask we created. Right image shows the final result. For more understanding, display all the intermediate images in the above code, especially ``img1_bg`` and ``img2_fg``.
.. image:: images/overlay.jpg
:alt: Otsu's Thresholding
:align: center
Additional Resources
======================
Exercises
============
#. Create a slide show of images in a folder with smooth transition between images using ``cv2.addWeighted`` function

View File

@ -0,0 +1,4 @@
.. _Mathematical_Tools:
Mathematical Tools in OpenCV
********************************

View File

@ -0,0 +1,141 @@
.. _Optimization_Techniques:
Performance Measurement and Improvement Techniques
****************************************************
Goal
======
In image processing, since you are dealing with large number of operations per second, it is mandatory that your code is not only providing the correct solution, but also in the fastest manner. So in this chapter, you will learn
* To measure the performance of your code.
* Some tips to improve the performance of your code.
* You will see these functions : **cv2.getTickCount**, **cv2.getTickFrequency** etc.
Apart from OpenCV, Python also provides a module **time** which is helpful in measuring the time of execution. Another module **profile** helps to get detailed report on the code, like how much time each function in the code took, how many times the function was called etc. But, if you are using IPython, all these features are integrated in an user-friendly manner. We will see some important ones, and for more details, check links in **Additional Resouces** section.
Measuring Performance with OpenCV
==================================
**cv2.getTickCount** function returns the number of clock-cycles after a reference event (like the moment machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get number of clock-cycles used to execute a function.
**cv2.getTickFrequency** function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following:
::
e1 = cv2.getTickCount()
# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()
We will demonstrate with following example. Following example apply median filtering with a kernel of odd size ranging from 5 to 49. (Don't worry about what will the result look like, that is not our goal):
::
img1 = cv2.imread('messi5.jpg')
e1 = cv2.getTickCount()
for i in xrange(5,49,2):
img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t
# Result I got is 0.521107655 seconds
.. note:: You can do the same with ``time`` module. Instead of ``cv2.getTickCount``, use ``time.time()`` function. Then take the difference of two times.
Default Optimization in OpenCV
================================
Many of the OpenCV functions are optimized using SSE2, AVX etc. It contains unoptimized code also. So if our system support these features, we should exploit them (almost all modern day processors support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is enabled, else it runs the unoptimized code. You can use **cv2.useOptimized()** to check if it is enabled/disabled and **cv2.setUseOptimized()** to enable/disable it. Let's see a simple example.
::
# check if optimization is enabled
In [5]: cv2.useOptimized()
Out[5]: True
In [6]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop
# Disable it
In [7]: cv2.setUseOptimized(False)
In [8]: cv2.useOptimized()
Out[8]: False
In [9]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop
See, optimized median filtering is ~2x faster than unoptimized version. If you check its source, you can see median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).
Measuring Performance in IPython
============================================================
Sometimes you may need to compare the performance of two similar operations. IPython gives you a magic command ``%timeit`` to perform this. It runs the code several times to get more accurate results. Once again, they are suitable to measure single line codes.
For example, do you know which of the following addition operation is more better, ``x = 5; y = x**2``, ``x = 5; y = x*x``, ``x = np.uint8([5]); y = x*x`` or ``y = np.square(x)`` ? We will find it with %timeit in IPython shell.
::
In [10]: x = 5
In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop
In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop
In [15]: z = np.uint8([5])
In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop
In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop
You can see that, ``x = 5 ; y = x*x`` is fastest and it is around 20x faster compared to Numpy. If you consider the array creation also, it may reach upto 100x faster. Cool, right? *(Numpy devs are working on this issue)*
.. note:: Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy takes advantage when size of array is a little bit bigger.
We will try one more example. This time, we will compare the performance of **cv2.countNonZero()** and **np.count_nonzero()** for same image.
::
In [35]: %timeit z = cv2.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop
In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop
See, OpenCV function is nearly 25x faster than Numpy function.
.. note:: Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.
More IPython magic commands
=============================
There are several other magic commands to measure the performance, profiling, line profiling, memory measurement etc. They all are well documented. So only links to those docs are provided here. Interested readers are recommended to try them out.
Performance Optimization Techniques
=====================================
There are several techniques and coding methods to exploit maximum performance of Python and Numpy. Only relevant ones are noted here and links are given to important sources. The main thing to be noted here is that, first try to implement the algorithm in a simple manner. Once it is working, profile it, find the bottlenecks and optimize them.
#. Avoid using loops in Python as far as possible, especially double/triple loops etc. They are inherently slow.
#. Vectorize the algorithm/code to the maximum possible extent because Numpy and OpenCV are optimized for vector operations.
#. Exploit the cache coherence.
#. Never make copies of array unless it is needed. Try to use views instead. Array copying is a costly operation.
Even after doing all these operations, if your code is still slow, or use of large loops are inevitable, use additional libraries like Cython to make it faster.
Additional Resources
======================
1. `Python Optimization Techniques <http://wiki.python.org/moin/PythonSpeed/PerformanceTips>`_
2. Scipy Lecture Notes - `Advanced Numpy <http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy>`_
3. `Timing and Profiling in IPython <http://pynash.org/2013/03/06/timing-and-profiling.html>`_
Exercises
============

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

View File

@ -0,0 +1,75 @@
.. _PY_Table-Of-Content-Core:
Core Operations
-----------------------------------------------------------
* :ref:`Basic_Ops`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|core_1| Learn to read and edit pixel values, working with image ROI and other basic operations.
=========== ======================================================
.. |core_1| image:: images/pixel_ops.jpg
:height: 90pt
:width: 90pt
* :ref:`Image_Arithmetics`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|core_2| Perform arithmetic operations on images
=========== ======================================================
.. |core_2| image:: images/image_arithmetic.jpg
:height: 90pt
:width: 90pt
* :ref:`Optimization_Techniques`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|core_4| Getting a solution is important. But getting it in the fastest way is more important. Learn to check the speed of your code, optimize the code etc.
=========== ======================================================
.. |core_4| image:: images/speed.jpg
:height: 90pt
:width: 90pt
* :ref:`Mathematical_Tools`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|core_5| Learn some of the mathematical tools provided by OpenCV like PCA, SVD etc.
=========== ======================================================
.. |core_5| image:: images/maths_tools.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. We use a custom table of content format and as the table of content only informs Sphinx about the hierarchy of the files, no need to show it.
.. toctree::
:hidden:
../py_basic_ops/py_basic_ops
../py_image_arithmetics/py_image_arithmetics
../py_optimization/py_optimization
../py_maths_tools/py_maths_tools

View File

@ -0,0 +1,63 @@
.. _BRIEF:
BRIEF (Binary Robust Independent Elementary Features)
***********************************************************
Goal
=======
In this chapter
* We will see the basics of BRIEF algorithm
Theory
=============
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a vector for thousands of features takes a lot of memory which are not feasible for resouce-constraint applications especially for embedded systems. Larger the memory, longer the time it takes for matching.
But all these dimensions may not be needed for actual matching. We can compress it using several methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is used to convert these SIFT descriptors in floating point numbers to binary strings. These binary strings are used to match features using Hamming distance. This provides better speed-up because finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing, which doesn't solve our initial problem on memory.
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly without finding descriptors. It takes smoothened image patch and selects a set of :math:`n_d` (x,y) location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done on these location pairs. For eg, let first location pairs be :math:`p` and :math:`q`. If :math:`I(p) < I(q)`, then its result is 1, else it is 0. This is applied for all the :math:`n_d` location pairs to get a :math:`n_d`-dimensional bitstring.
This :math:`n_d` can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256 (OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can use Hamming Distance to match these descriptors.
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE points than for SURF points.
In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides high recognition rate unless there is large in-plane rotation.
BRIEF in OpenCV
=====================
Below code shows the computation of BRIEF descriptors with the help of CenSurE detector. (CenSurE detector is called STAR detector in OpenCV)
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate STAR detector
star = cv2.FeatureDetector_create("STAR")
# Initiate BRIEF extractor
brief = cv2.DescriptorExtractor_create("BRIEF")
# find the keypoints with STAR
kp = star.detect(img,None)
# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)
print brief.getInt('bytes')
print des.shape
The function ``brief.getInt('bytes')`` gives the :math:`n_d` size used in bytes. By default it is 32. Next one is matching, which will be done in another chapter.
Additional Resources
==========================
#. Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, "BRIEF: Binary Robust Independent Elementary Features", 11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010.
#. LSH (Locality Sensitive Hasing) at wikipedia.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,136 @@
.. _FAST:
FAST Algorithm for Corner Detection
*************************************
Goal
=======
In this chapter,
* We will understand the basics of FAST algorithm
* We will find corners using OpenCV functionalities for FAST algorithm.
Theory
=========
We saw several feature detectors and many of them are really good. But when looking from a real-time application point of view, they are not fast enough. One best example would be SLAM (Simultaneous Localization and Mapping) mobile robot which have limited computational resources.
As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in 2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original paper for more details (All the images are taken from original paper).
Feature Detection using FAST
------------------------------
1. Select a pixel :math:`p` in the image which is to be identified as an interest point or not. Let its intensity be :math:`I_p`.
2. Select appropriate threshold value :math:`t`.
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
.. image:: images/fast_speedtest.jpg
:alt: A corner in the image
:align: center
4. Now the pixel :math:`p` is a corner if there exists a set of :math:`n` contiguous pixels in the circle (of 16 pixels) which are all brighter than :math:`I_p + t`, or all darker than :math:`I_p t`. (Shown as white dash lines in the above image). :math:`n` was chosen to be 12.
5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or darker. If so, then checks 5 and 13). If :math:`p` is a corner, then at least three of these must all be brighter than :math:`I_p + t` or darker than :math:`I_p t`. If neither of these is the case, then :math:`p` cannot be a corner. The full segment test criterion can then be applied to the passed candidates by examining all pixels in the circle. This detector in itself exhibits high performance, but there are several weaknesses:
* It does not reject as many candidates for n < 12.
* The choice of pixels is not optimal because its efficiency depends on ordering of the questions and distribution of corner appearances.
* Results of high-speed tests are thrown away.
* Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using non-maximal suppression.
Machine Learning a Corner Detector
------------------------------------
1. Select a set of images for training (preferably from the target application domain)
2. Run FAST algorithm in every images to find feature points.
3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to get feature vector :math:`P`.
4. Each pixel (say :math:`x`) in these 16 pixels can have one of the following three states:
.. image:: images/fast_eqns.jpg
:alt: FAST equation
:align: center
5. Depending on these states, the feature vector :math:`P` is subdivided into 3 subsets, :math:`P_d`, :math:`P_s`, :math:`P_b`.
6. Define a new boolean variable, :math:`K_p`, which is true if :math:`p` is a corner and false otherwise.
7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable :math:`K_p` for the knowledge about the true class. It selects the :math:`x` which yields the most information about whether the candidate pixel is a corner, measured by the entropy of :math:`K_p`.
8. This is recursively applied to all the subsets until its entropy is zero.
9. The decision tree so created is used for fast detection in other images.
Non-maximal Suppression
---------------------------
Detecting multiple interest points in adjacent locations is another problem. It is solved by using Non-maximum Suppression.
1. Compute a score function, :math:`V` for all the detected feature points. :math:`V` is the sum of absolute difference between :math:`p` and 16 surrounding pixels values.
2. Consider two adjacent keypoints and compute their :math:`V` values.
3. Discard the one with lower :math:`V` value.
Summary
-----------
It is several times faster than other existing corner detectors.
But it is not robust to high levels of noise. It is dependant on a threshold.
FAST Feature Detector in OpenCV
==================================
It is called as any other feature detector in OpenCV. If you want, you can specify the threshold, whether non-maximum suppression to be applied or not, the neighborhood to be used etc.
For the neighborhood, three flags are defined, ``cv2.FAST_FEATURE_DETECTOR_TYPE_5_8``, ``cv2.FAST_FEATURE_DETECTOR_TYPE_7_12`` and ``cv2.FAST_FEATURE_DETECTOR_TYPE_9_16``. Below is a simple code on how to detect and draw the FAST feature points.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate FAST object with default values
fast = cv2.FastFeatureDetector()
# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv2.drawKeypoints(img, kp, color=(255,0,0))
# Print all default params
print "Threshold: ", fast.getInt('threshold')
print "nonmaxSuppression: ", fast.getBool('nonmaxSuppression')
print "neighborhood: ", fast.getInt('type')
print "Total Keypoints with nonmaxSuppression: ", len(kp)
cv2.imwrite('fast_true.png',img2)
# Disable nonmaxSuppression
fast.setBool('nonmaxSuppression',0)
kp = fast.detect(img,None)
print "Total Keypoints without nonmaxSuppression: ", len(kp)
img3 = cv2.drawKeypoints(img, kp, color=(255,0,0))
cv2.imwrite('fast_false.png',img3)
See the results. First image shows FAST with nonmaxSuppression and second one without nonmaxSuppression:
.. image:: images/fast_kp.jpg
:alt: FAST Keypoints
:align: center
Additional Resources
=========================
#. Edward Rosten and Tom Drummond, “Machine learning for high speed corner detection” in 9th European Conference on Computer Vision, vol. 1, 2006, pp. 430443.
#. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp. 105-119.
Exercises
============

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@ -0,0 +1,110 @@
.. _PY_feature_homography:
Feature Matching + Homography to find Objects
***********************************************
Goal
======
In this chapter,
* We will mix up the feature matching and findHomography from calib3d module to find known objects in a complex image.
Basics
=========
So what we did in last session? We used a queryImage, found some feature points in it, we took another trainImage, found the features in that image too and we found the best matches among them. In short, we found locations of some parts of an object in another cluttered image. This information is sufficient to find the object exactly on the trainImage.
For that, we can use a function from calib3d module, ie **cv2.findHomography()**. If we pass the set of points from both the images, it will find the perpective transformation of that object. Then we can use **cv2.perspectiveTransform()** to find the object. It needs atleast four correct points to find the transformation.
We have seen that there can be some possible errors while matching which may affect the result. To solve this problem, algorithm uses RANSAC or LEAST_MEDIAN (which can be decided by the flags). So good matches which provide correct estimation are called inliers and remaining are called outliers. **cv2.findHomography()** returns a mask which specifies the inlier and outlier points.
So let's do it !!!
Code
=========
First, as usual, let's find SIFT features in images and apply the ratio test to find the best matches.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
MIN_MATCH_COUNT = 10
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1,des2,k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)
Now we set a condition that atleast 10 matches (defined by MIN_MATCH_COUNT) are to be there to find the object. Otherwise simply show a message saying not enough matches are present.
If enough matches are found, we extract the locations of matched keypoints in both the images. They are passed to find the perpective transformation. Once we get this 3x3 transformation matrix, we use it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it.
::
if len(good)>MIN_MATCH_COUNT:
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
matchesMask = mask.ravel().tolist()
h,w = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv2.perspectiveTransform(pts,M)
img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
else:
print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT)
matchesMask = None
Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed).
::
draw_params = dict(matchColor = (0,255,0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)
img3 = cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
plt.imshow(img3, 'gray'),plt.show()
See the result below. Object is marked in white color in cluttered image:
.. image:: images/homography_findobj.jpg
:alt: Finding object with feature homography
:align: center
Additional Resources
============================
Exercises
==================

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

View File

@ -0,0 +1,154 @@
.. _Harris_Corners:
Harris Corner Detection
****************************
Goal
=======
In this chapter,
* We will understand the concepts behind Harris Corner Detection.
* We will see the functions: **cv2.cornerHarris()**, **cv2.cornerSubPix()**
Theory
==========
In last chapter, we saw that corners are regions in the image with large variation in intensity in all the directions. One early attempt to find these corners was done by **Chris Harris & Mike Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the difference in intensity for a displacement of :math:`(u,v)` in all directions. This is expressed as below:
.. math::
E(u,v) = \sum_{x,y} \underbrace{w(x,y)}_\text{window function} \, [\underbrace{I(x+u,y+v)}_\text{shifted intensity}-\underbrace{I(x,y)}_\text{intensity}]^2
Window function is either a rectangular window or gaussian window which gives weights to pixels underneath.
We have to maximize this function :math:`E(u,v)` for corner detection. That means, we have to maximize the second term. Applying Taylor Expansion to above equation and using some mathematical steps (please refer any standard text books you like for full derivation), we get the final equation as:
.. math::
E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}
where
.. math::
M = \sum_{x,y} w(x,y) \begin{bmatrix}I_x I_x & I_x I_y \\
I_x I_y & I_y I_y \end{bmatrix}
Here, :math:`I_x` and :math:`I_y` are image derivatives in x and y directions respectively. (Can be easily found out using **cv2.Sobel()**).
Then comes the main part. After this, they created a score, basically an equation, which will determine if a window can contain a corner or not.
.. math::
R = det(M) - k(trace(M))^2
where
* :math:`det(M) = \lambda_1 \lambda_2`
* :math:`trace(M) = \lambda_1 + \lambda_2`
* :math:`\lambda_1` and :math:`\lambda_2` are the eigen values of M
So the values of these eigen values decide whether a region is corner, edge or flat.
* When :math:`|R|` is small, which happens when :math:`\lambda_1` and :math:`\lambda_2` are small, the region is flat.
* When :math:`R<0`, which happens when :math:`\lambda_1 >> \lambda_2` or vice versa, the region is edge.
* When :math:`R` is large, which happens when :math:`\lambda_1` and :math:`\lambda_2` are large and :math:`\lambda_1 \sim \lambda_2`, the region is a corner.
It can be represented in a nice picture as follows:
.. image:: images/harris_region.jpg
:alt: Classification of Image Points
:align: center
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a suitable give you the corners in the image. We will do it with a simple image.
Harris Corner Detector in OpenCV
====================================
OpenCV has the function **cv2.cornerHarris()** for this purpose. Its arguments are :
* **img** - Input image, it should be grayscale and float32 type.
* **blockSize** - It is the size of neighbourhood considered for corner detection
* **ksize** - Aperture parameter of Sobel derivative used.
* **k** - Harris detector free parameter in the equation.
See the example below:
::
import cv2
import numpy as np
filename = 'chessboard.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)
#result is dilated for marking the corners, not important
dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv2.imshow('dst',img)
if cv2.waitKey(0) & 0xff == 27:
cv2.destroyAllWindows()
Below are the three results:
.. image:: images/harris_result.jpg
:alt: Harris Corner Detection
:align: center
Corner with SubPixel Accuracy
===============================
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function **cv2.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is an example. As usual, we need to find the harris corners first. Then we pass the centroids of these corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris corners are marked in red pixels and refined corners are marked in green pixels. For this function, we have to define the criteria when to stop the iteration. We stop it after a specified number of iteration or a certain accuracy is achieved, whichever occurs first. We also need to define the size of neighbourhood it would search for corners.
::
import cv2
import numpy as np
filename = 'chessboard2.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# find Harris corners
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)
dst = cv2.dilate(dst,None)
ret, dst = cv2.threshold(dst,0.01*dst.max(),255,0)
dst = np.uint8(dst)
# find centroids
ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)
# define the criteria to stop and refine the corners
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv2.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)
# Now draw them
res = np.hstack((centroids,corners))
res = np.int0(res)
img[res[:,1],res[:,0]]=[0,0,255]
img[res[:,3],res[:,2]] = [0,255,0]
cv2.imwrite('subpixel5.png',img)
Below is the result, where some important locations are shown in zoomed window to visualize:
.. image:: images/subpixel3.png
:alt: Corner Detection with SubPixel Accuracy
:align: center
Additional Resources
======================
Exercises
============

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

View File

@ -0,0 +1,52 @@
.. _Features_Meaning:
Understanding Features
************************
Goal
=====
In this chapter, we will just try to understand what are features, why are they important, why corners are important etc.
Explanation
==============
Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of a images, where you need to assemble them correctly to form a big real image. **The question is, how you do it?** What about the projecting the same theory to a computer program so that computer can play jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images of a good natural scenery to computer and tell it to stitch all those images to a big single image? If the computer can stitch several natural images to one, what about giving a lot of pictures of a building or any structure and tell computer to create a 3D model out of it?
Well, the questions and imaginations continue. But it all depends on the most basic question? How do you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image? How can you stitch a lot of natural images to a single image?
The answer is, we are looking for specific patterns or specific features which are unique, which can be easily tracked, which can be easily compared. If we go for a definition of such a feature, we may find it difficult to express it in words, but we know what are they. If some one asks you to point out one good feature which can be compared across several images, you can point out one. That is why, even small children can simply play these games. We search for these features in an image, we find them, we find the same features in other images, we align them. That's it. (In jigsaw puzzle, we look more into continuity of different images). All these abilities are present in us inherently.
So our one basic question expands to more in number, but becomes more specific. **What are these features?**. *(The answer should be understandable to a computer also.)*
Well, it is difficult to say how humans find these features. It is already programmed in our brain. But if we look deep into some pictures and search for different patterns, we will find something interesting. For example, take below image:
.. image:: images/feature_building.jpg
:alt: Understanding features
:align: center
Image is very simple. At the top of image, six small image patches are given. Question for you is to find the exact location of these patches in the original image. How many correct results you can find ?
A and B are flat surfaces, and they are spread in a lot of area. It is difficult to find the exact location of these patches.
C and D are much more simpler. They are edges of the building. You can find an approximate location, but exact location is still difficult. It is because, along the edge, it is same everywhere. Normal to the edge, it is different. So edge is much more better feature compared to flat area, but not good enough (It is good in jigsaw puzzle for comparing continuity of edges).
Finally, E and F are some corners of the building. And they can be easily found out. Because at corners, wherever you move this patch, it will look different. So they can be considered as a good feature. So now we move into more simpler (and widely used image) for better understanding.
.. image:: images/feature_simple.png
:alt: Features
:align: center
Just like above, blue patch is flat area and difficult to find and track. Wherever you move the blue patch, it looks the same. For black patch, it is an edge. If you move it in vertical direction (i.e. along the gradient) it changes. Put along the edge (parallel to edge), it looks the same. And for red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So basically, corners are considered to be good features in an image. (Not just corners, in some cases blobs are considered good features).
So now we answered our question, "what are these features?". But next question arises. How do we find them? Or how do we find the corners?. That also we answered in an intuitive way, i.e., look for the regions in images which have maximum variation when moved (by a small amount) in all regions around it. This would be projected into computer language in coming chapters. So finding these image features is called **Feature Detection**.
So we found the features in image (Assume you did it). Once you found it, you should find the same in the other images. What we do? We take a region around the feature, we explain it in our own words, like "upper part is blue sky, lower part is building region, on that building there are some glasses etc" and you search for the same area in other images. Basically, you are describing the feature. Similar way, computer also should describe the region around the feature so that it can find it in other images. So called description is called **Feature Description**. Once you have the features and its description, you can find same features in all images and align them, stitch them or do whatever you want.
So in this module, we are looking to different algorithms in OpenCV to find features, describe them, match them etc.
Additional Resources
=======================
Exercises
===========

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -0,0 +1,207 @@
.. _Matcher:
Feature Matching
*********************************************
Goal
=====
In this chapter
* We will see how to match features in one image with others.
* We will use the Brute-Force matcher and FLANN Matcher in OpenCV
Basics of Brute-Force Matcher
===================================
Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched with all other features in second set using some distance calculation. And the closest one is returned.
For BF matcher, first we have to create the BFMatcher object using **cv2.BFMatcher()**. It takes two optional params. First one is ``normType``. It specifies the distance measurement to be used. By default, it is ``cv2.NORM_L2``. It is good for SIFT, SURF etc (``cv2.NORM_L1`` is also there). For binary string based descriptors like ORB, BRIEF, BRISK etc, ``cv2.NORM_HAMMING`` should be used, which used Hamming distance as measurement. If ORB is using ``VTA_K == 3 or 4``, ``cv2.NORM_HAMMING2`` should be used.
Second param is boolean variable, ``crossCheck`` which is false by default. If it is true, Matcher returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor in set B as the best match and vice-versa. That is, the two features in both sets should match each other. It provides consistant result, and is a good alternative to ratio test proposed by D.Lowe in SIFT paper.
Once it is created, two important methods are *BFMatcher.match()* and *BFMatcher.knnMatch()*. First one returns the best match. Second method returns `k` best matches where k is specified by the user. It may be useful when we need to do additional work on that.
Like we used cv2.drawKeypoints() to draw keypoints, **cv2.drawMatches()** helps us to draw the matches. It stacks two images horizontally and draw lines from first image to second image showing best matches. There is also **cv2.drawMatchesKnn** which draws all the k best matches. If k=2, it will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively draw it.
Let's see one example for each of SURF and ORB (Both use different distance measurements).
Brute-Force Matching with ORB Descriptors
--------------------------------------------
Here, we will see a simple example on how to match features between two images. In this case, I have a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature matching. ( The images are ``/samples/c/box.png`` and ``/samples/c/box_in_scene.png``)
We are using SIFT descriptors to match features. So let's start with loading images, finding descriptors etc.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
orb = cv2.ORB()
# find the keypoints and descriptors with SIFT
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
Next we create a BFMatcher object with distance measurement ``cv2.NORM_HAMMING`` (since we are using ORB) and ``crossCheck`` is switched on for better results. Then we use Matcher.match() method to get the best matches in two images. We sort them in ascending order of their distances so that best matches (with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility. You can increase it as you like)
::
# create BFMatcher object
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 10 matches.
img3 = cv2.drawMatches(img1,kp1,img2,kp2,matches[:10], flags=2)
plt.imshow(img3),plt.show()
Below is the result I got:
.. image:: images/matcher_result1.jpg
:alt: ORB Feature Matching with Brute-Force
:align: center
What is this Matcher Object?
-----------------------------------
The result of ``matches = bf.match(des1,des2)`` line is a list of DMatch objects. This DMatch object has following attributes:
* ``DMatch.distance`` - Distance between descriptors. The lower, the better it is.
* ``DMatch.trainIdx`` - Index of the descriptor in train descriptors
* ``DMatch.queryIdx`` - Index of the descriptor in query descriptors
* ``DMatch.imgIdx`` - Index of the train image.
Brute-Force Matching with SIFT Descriptors and Ratio Test
-------------------------------------------------------------
This time, we will use ``BFMatcher.knnMatch()`` to get k best matches. In this example, we will take k=2 so that we can apply ratio test explained by D.Lowe in his paper.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append([m])
# cv2.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,flags=2)
plt.imshow(img3),plt.show()
See the result below:
.. image:: images/matcher_result2.jpg
:alt: SIFT Descriptor with ratio test
:align: center
FLANN based Matcher
==========================
FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional features. It works more faster than BFMatcher for large datasets. We will see the second example with FLANN based matcher.
For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used, its related parameters etc. First one is IndexParams. For various algorithms, the information to be passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass following:
::
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
While using ORB, you can pass the following. The commented values are recommended as per the docs, but it didn't provide required results in some cases. Other values worked fine.:
::
index_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2
Second dictionary is the SearchParams. It specifies the number of times the trees in the index should be recursively traversed. Higher values gives better precision, but also takes more time. If you want to change the value, pass ``search_params = dict(checks=100)``.
With these informations, we are good to go.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in xrange(len(matches))]
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = 0)
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
plt.imshow(img3,),plt.show()
See the result below:
.. image:: images/matcher_flann.jpg
:alt: FLANN based matching
:align: center
Additional Resources
========================
Exercises
=================

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,75 @@
.. _ORB:
ORB (Oriented FAST and Rotated BRIEF)
***************************************
Goal
======
In this chapter,
* We will see the basics of ORB
Theory
==========
As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs". This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents. Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to find top N points among them. It also use pyramid to produce multiscale-features. But one problem is that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with following modification.
It computes the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation. To improve the rotation invariance, moments are computed with x and y which should be in a circular region of radius :math:`r`, where :math:`r` is the size of the patch.
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For any feature set of :math:`n` binary tests at location
:math:`(x_i, y_i)`, define a :math:`2 \times n` matrix, :math:`S` which contains the coordinates of these pixels. Then using the orientation of patch, :math:`\theta`, its rotation matrix is found and rotates the :math:`S` to get steered(rotated) version :math:`S_\theta`.
ORB discretize the angle to increments of :math:`2 \pi /30` (12 degrees), and construct a lookup table of precomputed BRIEF patterns. As long as the keypoint orientation :math:`\theta` is consistent across views, the correct set of points :math:`S_\theta` will be used to compute its descriptor.
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But once it is oriented along keypoint direction, it loses this property and become more distributed. High variance makes a feature more discriminative, since it responds differentially to inputs. Another desirable property is to have the tests uncorrelated, since then each test will contribute to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to find the ones that have both high variance and means close to 0.5, as well as being uncorrelated. The result is called **rBRIEF**.
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good choice in low-power devices for panorama stitching etc.
ORB in OpenCV
================
As usual, we have to create an ORB object with the function, **cv2.ORB()** or using feature2d common interface. It has a number of optional parameters. Most useful ones are ``nFeatures`` which denotes maximum number of features to be retained (by default 500), ``scoreType`` which denotes whether Harris score or FAST score to rank the features (by default, Harris score) etc. Another parameter, ``WTA_K`` decides number of points that produce each element of the oriented BRIEF descriptor. By default it is two, ie selects two points at a time. In that case, for matching, ``NORM_HAMMING`` distance is used. If WTA_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance is defined by ``NORM_HAMMING2``.
Below is a simple code which shows the use of ORB.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate STAR detector
orb = cv2.ORB()
# find the keypoints with ORB
kp = orb.detect(img,None)
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
# draw only keypoints location,not size and orientation
img2 = cv2.drawKeypoints(img,kp,color=(0,255,0), flags=0)
plt.imshow(img2),plt.show()
See the result below:
.. image:: images/orb_kp.jpg
:alt: ORB Keypoints
:align: center
ORB feature matching, we will do in another chapter.
Additional Resources
==========================
#. Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to SIFT or SURF. ICCV 2011: 2564-2571.
Exercises
==============

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

@ -0,0 +1,77 @@
.. _shi_tomasi:
Shi-Tomasi Corner Detector & Good Features to Track
*******************************************************
Goal
=======
In this chapter,
* We will learn about the another corner detector: Shi-Tomasi Corner Detector
* We will see the function: **cv2.goodFeaturesToTrack()**
Theory
=========
In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small modification to it in their paper **Good Features to Track** which shows better results compared to Harris Corner Detector. The scoring function in Harris Corner Detector was given by:
.. math::
R = \lambda_1 \lambda_2 - k(\lambda_1+\lambda_2)^2
Instead of this, Shi-Tomasi proposed:
.. math::
R = min(\lambda_1, \lambda_2)
If it is a greater than a threshold value, it is considered as a corner. If we plot it in :math:`\lambda_1 - \lambda_2` space as we did in Harris Corner Detector, we get an image as below:
.. image:: images/shitomasi_space.png
:alt: Shi-Tomasi Corner Space
:align: center
From the figure, you can see that only when :math:`\lambda_1` and :math:`\lambda_2` are above a minimum value, :math:`\lambda_{min}`, it is conidered as a corner(green region).
Code
=======
OpenCV has a function, **cv2.goodFeaturesToTrack()**. It finds N strongest corners in the image by Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a grayscale image. Then you specify number of corners you want to find. Then you specify the quality level, which is a value between 0-1, which denotes the minimum quality of corner below which everyone is rejected. Then we provide the minimum euclidean distance between corners detected.
With all these informations, the function finds corners in the image. All corners below quality level are rejected. Then it sorts the remaining corners based on quality in the descending order. Then function takes first strongest corner, throws away all the nearby corners in the range of minimum distance and returns N strongest corners.
In below example, we will try to find 25 best corners:
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
corners = cv2.goodFeaturesToTrack(gray,25,0.01,10)
corners = np.int0(corners)
for i in corners:
x,y = i.ravel()
cv2.circle(img,(x,y),3,255,-1)
plt.imshow(img),plt.show()
See the result below:
.. image:: images/shitomasi_block1.jpg
:alt: Shi-Tomasi Corners
:align: center
This function is more appropriate for tracking. We will see that when its time comes.
Additional Resources
======================
Exercises
============

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

View File

@ -0,0 +1,139 @@
.. _sift_intro:
Introduction to SIFT (Scale-Invariant Feature Transform)
*************************************************************
Goal
======
In this chapter,
* We will learn about the concepts of SIFT algorithm
* We will learn to find SIFT Keypoints and Descriptors.
Theory
========
In last couple of chapters, we saw some corner detectors like Harris etc. They are rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is obvious because corners remain corners in rotated image also. But what about scaling? A corner may not be a corner if the image is scaled. For example, check a simple image below. A corner in a small image within a small window is flat when it is zoomed in the same window. So Harris corner is not scale invariant.
.. image:: images/sift_scale_invariant.jpg
:alt: Scale-Invariance
:align: center
So, in 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant Keypoints**, which extract keypoints and compute its descriptors. *(This paper is easy to understand and considered to be best material available on SIFT. So this explanation is just a short summary of this paper)*.
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
1. Scale-space Extrema Detection
--------------------------------------
From the image above, it is obvious that we can't use the same window to detect keypoints with different scale. It is OK with small corner. But to detect larger corners we need larger windows. For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with various :math:`\sigma` values. LoG acts as a blob detector which detects blobs in various sizes due to change in :math:`\sigma`. In short, :math:`\sigma` acts as a scaling parameter. For eg, in the above image, gaussian kernel with low :math:`\sigma` gives high value for small corner while guassian kernel with high :math:`\sigma` fits well for larger corner. So, we can find the local maxima across the scale and space which gives us a list of :math:`(x,y,\sigma)` values which means there is a potential keypoint at (x,y) at :math:`\sigma` scale.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two different :math:`\sigma`, let it be :math:`\sigma` and :math:`k\sigma`. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:
.. image:: images/sift_dog.jpg
:alt: Difference of Gaussian
:align: center
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:
.. image:: images/sift_local_extrema.jpg
:alt: Difference of Gaussian
:align: center
Regarding different parameters, the paper gives some empirical data which can be summarized as, number of octaves = 4, number of scale levels = 5, initial :math:`\sigma=1.6`, :math:`k=\sqrt{2}` etc as optimal values.
2. Keypoint Localization
------------------------------------
Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is rejected. This threshold is called **contrastThreshold** in OpenCV
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the pricipal curvature. We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function,
.. math:
\frac{Tr(H)^2}{Det(H)} < \frac{(r+1)^2}{r} \; \text{where} \; r = \frac{\lambda_1}{\lambda_2}; \; \lambda_1 > \lambda_2
If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is discarded. It is given as 10 in paper.
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.
3. Orientation Assignment
-----------------------------------
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neigbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. (It is weighted by gradient magnitude and gaussian-weighted circular window with :math:`\sigma` equal to 1.5 times the scale of keypoint. The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching.
4. Keypoint Descriptor
-----------------------------------------
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is devided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.
5. Keypoint Matching
----------------------------------------
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminaters around 90% of false matches while discards only 5% correct matches, as per the paper.
So this is a summary of SIFT algorithm. For more details and understanding, reading the original paper is highly recommended. Remember one thing, this algorithm is patented. So this algorithm is included in Non-free module in OpenCV.
SIFT in OpenCV
=================
So now let's see SIFT functionalities available in OpenCV. Let's start with keypoint detection and draw them. First we have to construct a SIFT object. We can pass different parameters to it which are optional and they are well explained in docs.
::
import cv2
import numpy as np
img = cv2.imread('home.jpg')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT()
kp = sift.detect(gray,None)
img=cv2.drawKeypoints(gray,kp)
cv2.imwrite('sift_keypoints.jpg',img)
**sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to search only a part of image. Each keypoint is a special structure which has many attributes like its (x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation, response that specifies strength of keypoints etc.
OpenCV also provides **cv2.drawKeyPoints()** function which draws the small circles on the locations of keypoints. If you pass a flag, **cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will draw a circle with size of keypoint and it will even show its orientation. See below example.
::
img=cv2.drawKeypoints(gray,kp,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imwrite('sift_keypoints.jpg',img)
See the two results below:
.. image:: images/sift_keypoints.jpg
:alt: SIFT Keypoints
:align: center
Now to calculate the descriptor, OpenCV provides two methods.
1. Since you already found keypoints, you can call **sift.compute()** which computes the descriptors from the keypoints we have found. Eg: ``kp,des = sift.compute(gray,kp)``
2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the function, **sift.detectAndCompute()**.
We will see the second method:
::
sift = cv2.SIFT()
kp, des = sift.detectAndCompute(gray,None)
Here kp will be a list of keypoints and des is a numpy array of shape :math:`Number\_of\_Keypoints \times 128`.
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images. That we will learn in coming chapters.
Additional Resources
=====================
Exercises
=============

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -0,0 +1,145 @@
.. _SURF:
Introduction to SURF (Speeded-Up Robust Features)
*****************************************************
Goal
======
In this chapter,
* We will see the basics of SURF
* We will see SURF functionalities in OpenCV
Theory
==========
In last chapter, we saw SIFT for keypoint detection and description. But it was comparatively slow and people needed more speeded-up version. In 2006, three people, Bay, H., Tuytelaars, T. and Van Gool, L, published another paper, "SURF: Speeded Up Robust Features" which introduced a new algorithm called SURF. As name suggests, it is a speeded-up version of SIFT.
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a demonstration of such an approximation. One big advantage of this approximation is that, convolution with box filter can be easily calculated with the help of integral images. And it can be done in parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale and location.
.. image:: images/surf_boxfilter.jpg
:alt: Box Filter approximation of Laplacian
:align: center
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a neighbourhood of size 6s. Adequate guassian weights are also applied to it. Then they are plotted in a space as given in below image. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of angle 60 degrees. Interesting thing is that, wavelet response can be found out using integral images very easily at any scale. For many applications, rotation invariance is not required, so no need of finding this orientation, which speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves speed and is robust upto :math:`\pm 15^{\circ}`. OpenCV supports both, depending upon the flag, **upright**. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it is more faster.
.. image:: images/surf_orientation.jpg
:alt: Orientation Assignment in SURF
:align: center
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again, use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the keypoint where s is the size. It is divided into 4x4 subregions. For each subregion, horizontal and vertical wavelet responses are taken and a vector is formed like this, :math:`v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|})`. This when represented as a vector gives SURF feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation and matching, but provide better distinctiveness of features.
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of :math:`d_x` and :math:`|d_x|` are computed separately for :math:`d_y < 0` and :math:`d_y \geq 0`. Similarly, the sums of :math:`d_y` and :math:`|d_y|` are split
up according to the sign of :math:`d_x` , thereby doubling the number of features. It doesn't add much computation complexity. OpenCV supports both by setting the value of flag **extended** with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for underlying interest point. It adds no computation cost since it is already computed during detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation. In the matching stage, we only compare features if they have the same type of contrast (as shown in image below). This minimal information allows for faster matching, without reducing the descriptor's performance.
.. image:: images/surf_matching.jpg
:alt: Fast Indexing for Matching
:align: center
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3 times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change.
SURF in OpenCV
====================
OpenCV provides SURF functionalities just like SIFT. You initiate a SURF object with some optional conditions like 64/128-dim descriptors, Upright/Normal SURF etc. All the details are well explained in docs. Then as we did in SIFT, we can use SURF.detect(), SURF.compute() etc for finding keypoints and descriptors.
First we will see a simple demo on how to find SURF keypoints and descriptors and draw it. All examples are shown in Python terminal since it is just same as SIFT only.
::
>>> img = cv2.imread('fly.png',0)
# Create SURF object. You can specify params here or later.
# Here I set Hessian Threshold to 400
>>> surf = cv2.SURF(400)
# Find keypoints and descriptors directly
>>> kp, des = surf.detectAndCompute(img,None)
>>> len(kp)
699
1199 keypoints is too much to show in a picture. We reduce it to some 50 to draw it on an image. While matching, we may need all those features, but not now. So we increase the Hessian Threshold.
::
# Check present Hessian threshold
>>> print surf.hessianThreshold
400.0
# We set it to some 50000. Remember, it is just for representing in picture.
# In actual cases, it is better to have a value 300-500
>>> surf.hessianThreshold = 50000
# Again compute keypoints and check its number.
>>> kp, des = surf.detectAndCompute(img,None)
>>> print len(kp)
47
It is less than 50. Let's draw it on the image.
::
>>> img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
See the result below. You can see that SURF is more like a blob detector. It detects the white blobs on wings of butterfly. You can test it with other images.
.. image:: images/surf_kp1.jpg
:alt: SURF Keypoints with Orientation
:align: center
Now I want to apply U-SURF, so that it won't find the orientation.
::
# Check upright flag, if it False, set it to True
>>> print surf.upright
False
>>> surf.upright = True
# Recompute the feature points and draw it
>>> kp = surf.detect(img,None)
>>> img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
See the results below. All the orientations are shown in same direction. It is more faster than previous. If you are working on cases where orientation is not a problem (like panorama stitching) etc, this is more better.
.. image:: images/surf_kp2.jpg
:alt: Upright-SURF
:align: center
Finally we check the descriptor size and change it to 128 if it is only 64-dim.
::
# Find size of descriptor
>>> print surf.descriptorSize()
64
# That means flag, "extended" is False.
>>> surf.extended
False
# So we make it to True to get 128-dim descriptors.
>>> surf.extended = True
>>> kp, des = surf.detectAndCompute(img,None)
>>> print surf.descriptorSize()
128
>>> print des.shape
(47, 128)
Remaining part is matching which we will do in another chapter.
Additional Resources
=======================
Exercises
==============

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

View File

@ -0,0 +1,170 @@
.. _PY_Table-Of-Content-Feature2D:
Feature Detection and Description
------------------------------------------
* :ref:`Features_Meaning`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_1| What are the main features in an image? How can finding those features be useful to us?
=========== ======================================================
.. |f2d_1| image:: images/features_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`Harris_Corners`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_2| Okay, Corners are good features? But how do we find them?
=========== ======================================================
.. |f2d_2| image:: images/harris_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`shi_tomasi`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_3| We will look into Shi-Tomasi corner detection
=========== ======================================================
.. |f2d_3| image:: images/shi_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`sift_intro`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_4| Harris corner detector is not good enough when scale of image changes. Lowe developed a breakthrough method to find scale-invariant features and it is called SIFT
=========== ======================================================
.. |f2d_4| image:: images/sift_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`SURF`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_5| SIFT is really good, but not fast enough, so people came up with a speeded-up version called SURF.
=========== ======================================================
.. |f2d_5| image:: images/surf_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`FAST`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_06| All the above feature detection methods are good in some way. But they are not fast enough to work in real-time applications like SLAM. There comes the FAST algorithm, which is really "FAST".
=========== ======================================================
.. |f2d_06| image:: images/fast_icon.jpg
:height: 90pt
:width: 90pt
* :ref:`BRIEF`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_07| SIFT uses a feature descriptor with 128 floating point numbers. Consider thousands of such features. It takes lots of memory and more time for matching. We can compress it to make it faster. But still we have to calculate it first. There comes BRIEF which gives the shortcut to find binary descriptors with less memory, faster matching, still higher recognition rate.
=========== ======================================================
.. |f2d_07| image:: images/brief.jpg
:height: 90pt
:width: 90pt
* :ref:`ORB`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_08| SIFT and SURF are good in what they do, but what if you have to pay a few dollars every year to use them in your applications? Yeah, they are patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
=========== ======================================================
.. |f2d_08| image:: images/orb.jpg
:height: 90pt
:width: 90pt
* :ref:`Matcher`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_09| We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
=========== ======================================================
.. |f2d_09| image:: images/matching.jpg
:height: 90pt
:width: 90pt
* :ref:`PY_feature_homography`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|f2d_10| Now we know about feature matching. Let's mix it up with `calib3d` module to find objects in a complex image.
=========== ======================================================
.. |f2d_10| image:: images/homography_icon.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. We use a custom table of content format and as the table of content only informs Sphinx about the hierarchy of the files, no need to show it.
.. toctree::
:hidden:
../py_features_meaning/py_features_meaning
../py_features_harris/py_features_harris
../py_shi_tomasi/py_shi_tomasi
../py_sift_intro/py_sift_intro
../py_surf_intro/py_surf_intro
../py_fast/py_fast
../py_brief/py_brief
../py_orb/py_orb
../py_matcher/py_matcher
../py_feature_homography/py_feature_homography

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -0,0 +1,106 @@
.. _Drawing_Functions:
Drawing Functions in OpenCV
******************************
Goal
=====
.. container:: enumeratevisibleitemswithsquare
* Learn to draw different geometric shapes with OpenCV
* You will learn these functions : **cv2.line()**, **cv2.circle()** , **cv2.rectangle()**, **cv2.ellipse()**, **cv2.putText()** etc.
Code
=====
In all the above functions, you will see some common arguments as given below:
* img : The image where you want to draw the shapes
* color : Color of the shape. for BGR, pass it as a tuple, eg: ``(255,0,0)`` for blue. For grayscale, just pass the scalar value.
* thickness : Thickness of the line or circle etc. If **-1** is passed for closed figures like circles, it will fill the shape. *default thickness = 1*
* lineType : Type of line, whether 8-connected, anti-aliased line etc. *By default, it is 8-connected.* ``cv2.LINE_AA`` gives anti-aliased line which looks great for curves.
Drawing Line
-------------
To draw a line, you need to pass starting and ending coordinates of line. We will create a black image and draw a blue line on it from top-left to bottom-right corners.
::
import numpy as np
import cv2
# Create a black image
img = np.zeros((512,512,3), np.uint8)
# Draw a diagonal blue line with thickness of 5 px
img = cv2.line(img,(0,0),(511,511),(255,0,0),5)
Drawing Rectangle
-------------------
To draw a rectangle, you need top-left corner and bottom-right corner of rectangle. This time we will draw a green rectangle at the top-right corner of image.
::
img = cv2.rectangle(img,(384,0),(510,128),(0,255,0),3)
Drawing Circle
----------------
To draw a circle, you need its center coordinates and radius. We will draw a circle inside the rectangle drawn above.
::
img = cv2.circle(img,(447,63), 63, (0,0,255), -1)
Drawing Ellipse
--------------------
To draw the ellipse, we need to pass several arguments. One argument is the center location (x,y). Next argument is axes lengths (major axis length, minor axis length). ``angle`` is the angle of rotation of ellipse in anti-clockwise direction. ``startAngle`` and ``endAngle`` denotes the starting and ending of ellipse arc measured in clockwise direction from major axis. i.e. giving values 0 and 360 gives the full ellipse. For more details, check the documentation of **cv2.ellipse()**. Below example draws a half ellipse at the center of the image.
::
img = cv2.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
Drawing Polygon
------------------
To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape ``ROWSx1x2`` where ROWS are number of vertices and it should be of type ``int32``. Here we draw a small polygon of with four vertices in yellow color.
::
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1,1,2))
img = cv2.polylines(img,[pts],True,(0,255,255))
.. Note:: If third argument is ``False``, you will get a polylines joining all the points, not a closed shape.
.. Note:: ``cv2.polylines()`` can be used to draw multiple lines. Just create a list of all the lines you want to draw and pass it to the function. All lines will be drawn individually. It is more better and faster way to draw a group of lines than calling ``cv2.line()`` for each line.
Adding Text to Images:
------------------------
To put texts in images, you need specify following things.
* Text data that you want to write
* Position coordinates of where you want put it (i.e. bottom-left corner where data starts).
* Font type (Check **cv2.putText()** docs for supported fonts)
* Font Scale (specifies the size of font)
* regular things like color, thickness, lineType etc. For better look, ``lineType = cv2.LINE_AA`` is recommended.
We will write **OpenCV** on our image in white color.
::
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv2.LINE_AA)
Result
----------
So it is time to see the final result of our drawing. As you studied in previous articles, display the image to see it.
.. image:: images/drawing.jpg
:alt: Drawing Functions in OpenCV
:align: center
Additional Resources
========================
1. The angles used in ellipse function is not our circular angles. For more details, visit `this discussion <http://answers.opencv.org/question/14541/angles-in-ellipse-function/>`_.
Exercises
==============
#. Try to create the logo of OpenCV using drawing functions available in OpenCV

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

View File

@ -0,0 +1,139 @@
.. _PY_Display_Image:
Getting Started with Images
*****************************
Goals
======
.. container:: enumeratevisibleitemswithsquare
* Here, you will learn how to read an image, how to display it and how to save it back
* You will learn these functions : **cv2.imread()**, **cv2.imshow()** , **cv2.imwrite()**
* Optionally, you will learn how to display images with Matplotlib
Using OpenCV
=============
Read an image
--------------
Use the function **cv2.imread()** to read an image. The image should be in the working directory or a full path of image should be given.
Second argument is a flag which specifies the way image should be read.
* cv2.CV_LOAD_IMAGE_COLOR : Loads a color image. Any transparency of image will be neglected. It is the default flag.
* cv2.CV_LOAD_IMAGE_GRAYSCALE : Loads image in grayscale mode
* cv2.CV_LOAD_IMAGE_UNCHANGED : Loads image as such including alpha channel
.. note:: Instead of these three flags, you can simply pass integers 1, 0 or -1 respectively.
See the code below:
::
import numpy as np
import cv2
# Load an color image in grayscale
img = cv2.imread('messi5.jpg',0)
.. warning:: Even if the image path is wrong, it won't throw any error, but ``print img`` will give you ``None``
Display an image
-----------------
Use the function **cv2.imshow()** to display an image in a window. The window automatically fits to the image size.
First argument is a window name which is a string. second argument is our image. You can create as many windows as you wish, but with different window names.
::
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
A screenshot of the window will look like this (in Fedora-Gnome machine):
.. image:: images/opencv_screenshot.jpg
:alt: Screenshot of Image Window in OpenCV
:align: center
**cv2.waitKey()** is a keyboard binding function. Its argument is the time in milliseconds. The function waits for specified milliseconds for any keyboard event. If you press any key in that time, the program continues. If **0** is passed, it waits indefinitely for a key stroke. It can also be set to detect specific key strokes like, if key `a` is pressed etc which we will discuss below.
**cv2.destroyAllWindows()** simply destroys all the windows we created. If you want to destroy any specific window, use the function **cv2.destroyWindow()** where you pass the exact window name as the argument.
.. note:: There is a special case where you can already create a window and load image to it later. In that case, you can specify whether window is resizable or not. It is done with the function **cv2.namedWindow()**. By default, the flag is ``cv2.WINDOW_AUTOSIZE``. But if you specify flag to be ``cv2.WINDOW_NORMAL``, you can resize window. It will be helpful when image is too large in dimension and adding track bar to windows.
See the code below:
::
cv2.namedWindow('image', cv2.WINDOW_NORMAL)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Write an image
---------------
Use the function **cv2.imwrite()** to save an image.
First argument is the file name, second argument is the image you want to save.
::
cv2.imwrite('messigray.png',img)
This will save the image in PNG format in the working directory.
Sum it up
---------------
Below program loads an image in grayscale, displays it, save the image if you press 's' and exit, or simply exit without saving if you press `ESC` key.
::
import numpy as np
import cv2
img = cv2.imread('messi5.jpg',0)
cv2.imshow('image',img)
k = cv2.waitKey(0)
if k == 27: # wait for ESC key to exit
cv2.destroyAllWindows()
elif k == ord('s'): # wait for 's' key to save and exit
cv2.imwrite('messigray.png',img)
cv2.destroyAllWindows()
.. warning:: If you are using a 64-bit machine, you will have to modify ``k = cv2.waitKey(0)`` line as follows : ``k = cv2.waitKey(0) & 0xFF``
Using Matplotlib
=================
Matplotlib is a plotting library for Python which gives you wide variety of plotting methods. You will see them in coming articles. Here, you will learn how to display image with Matplotlib. You can zoom images, save it etc using Matplotlib.
::
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
plt.imshow(img, cmap = 'gray', interpolation = 'bicubic')
plt.xticks([]), plt.yticks([]) # to hide tick values on X and Y axis
plt.show()
A screen-shot of the window will look like this :
.. image:: images/matplotlib_screenshot.jpg
:alt: Screenshot of Image Window in Matplotlib
:align: center
.. seealso:: Plenty of plotting options are available in Matplotlib. Please refer to Matplotlib docs for more details. Some, we will see on the way.
.. warning:: Color image loaded by OpenCV is in BGR mode. But Matplotlib displays in RGB mode. So color images will not be displayed correctly in Matplotlib if image is read with OpenCV. Please see the exercises for more details.
Additional Resources
======================
#. `Matplotlib Plotting Styles and Features <http://matplotlib.org/api/pyplot_api.html>`_
Exercises
==========
#. There is some problem when you try to load color image in OpenCV and display it in Matplotlib. Read `this discussion <http://stackoverflow.com/a/15074748/1134940>`_ and understand it.

View File

@ -0,0 +1,108 @@
.. _Mouse_Handling:
Mouse as a Paint-Brush
***********************
Goal
======
.. container:: enumeratevisibleitemswithsquare
* Learn to handle mouse events in OpenCV
* You will learn these functions : **cv2.setMouseCallback()**
Simple Demo
=============
Here, we create a simple application which draws a circle on an image wherever we double-click on it.
First we create a mouse callback function which is executed when a mouse event take place. Mouse event can be anything related to mouse like left-button down, left-button up, left-button double-click etc. It gives us the coordinates (x,y) for every mouse event. With this event and location, we can do whatever we like. To list all available events available, run the following code in Python terminal:
::
>>> import cv2
>>> events = [i for i in dir(cv2) if 'EVENT' in i]
>>> print events
Creating mouse callback function has a specific format which is same everywhere. It differs only in what the function does. So our mouse callback function does one thing, it draws a circle where we double-click. So see the code below. Code is self-explanatory from comments :
::
import cv2
import numpy as np
# mouse callback function
def draw_circle(event,x,y,flags,param):
if event == cv2.EVENT_LBUTTONDBLCLK:
cv2.circle(img,(x,y),100,(255,0,0),-1)
# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv2.namedWindow('image')
cv2.setMouseCallback('image',draw_circle)
while(1):
cv2.imshow('image',img)
if cv2.waitKey(20) & 0xFF == 27:
break
cv2.destroyAllWindows()
More Advanced Demo
===================
Now we go for much more better application. In this, we draw either rectangles or circles (depending on the mode we select) by dragging the mouse like we do in Paint application. So our mouse callback function has two parts, one to draw rectangle and other to draw the circles. This specific example will be really helpful in creating and understanding some interactive applications like object tracking, image segmentation etc.
::
import cv2
import numpy as np
drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1
# mouse callback function
def draw_circle(event,x,y,flags,param):
global ix,iy,drawing,mode
if event == cv2.EVENT_LBUTTONDOWN:
drawing = True
ix,iy = x,y
elif event == cv2.EVENT_MOUSEMOVE:
if drawing == True:
if mode == True:
cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv2.circle(img,(x,y),5,(0,0,255),-1)
elif event == cv2.EVENT_LBUTTONUP:
drawing = False
if mode == True:
cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv2.circle(img,(x,y),5,(0,0,255),-1)
Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set a keyboard binding for key 'm' to toggle between rectangle and circle.
::
img = np.zeros((512,512,3), np.uint8)
cv2.namedWindow('image')
cv2.setMouseCallback('image',draw_circle)
while(1):
cv2.imshow('image',img)
k = cv2.waitKey(1) & 0xFF
if k == ord('m'):
mode = not mode
elif k == 27:
break
cv2.destroyAllWindows()
Additional Resources
========================
Exercises
==========
#. In our last example, we drew filled rectangle. You modify the code to draw an unfilled rectangle.

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

View File

@ -0,0 +1,89 @@
.. _PY_Table-Of-Content-Gui:
Gui Features in OpenCV
-----------------------------------------------------------
* :ref:`PY_Display_Image`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|gui_1| Learn to load an image, display it and save it back
=========== ======================================================
.. |gui_1| image:: images/image_display.jpg
:height: 90pt
:width: 90pt
* :ref:`Display_Video`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|gui_2| Learn to play videos, capture videos from Camera and write it as a video
=========== ======================================================
.. |gui_2| image:: images/video_display.jpg
:height: 90pt
:width: 90pt
* :ref:`Drawing_Functions`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|gui_5| Learn to draw lines, rectangles, ellipses, circles etc with OpenCV
=========== ======================================================
.. |gui_5| image:: images/drawing.jpg
:height: 90pt
:width: 90pt
* :ref:`Mouse_Handling`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|gui_3| Draw stuffs with your mouse
=========== ======================================================
.. |gui_3| image:: images/mouse_drawing.jpg
:height: 90pt
:width: 90pt
* :ref:`Trackbar`
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=========== ======================================================
|gui_4| Create trackbar to control certain parameters
=========== ======================================================
.. |gui_4| image:: images/trackbar.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. We use a custom table of content format and as the table of content only informs Sphinx about the hierarchy of the files, no need to show it.
.. toctree::
:hidden:
../py_image_display/py_image_display
../py_video_display/py_video_display
../py_drawing_functions/py_drawing_functions
../py_mouse_handling/py_mouse_handling
../py_trackbar/py_trackbar

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Some files were not shown because too many files have changed in this diff Show More