Doxygen tutorials: python basic
This commit is contained in:
10
doc/py_tutorials/py_ml/py_kmeans/py_kmeans_index.markdown
Normal file
10
doc/py_tutorials/py_ml/py_kmeans/py_kmeans_index.markdown
Normal file
@@ -0,0 +1,10 @@
|
||||
K-Means Clustering {#tutorial_py_kmeans_index}
|
||||
==================
|
||||
|
||||
- @subpage tutorial_py_kmeans_understanding
|
||||
|
||||
Read to get an intuitive understanding of K-Means Clustering
|
||||
|
||||
- @subpage tutorial_py_kmeans_opencv
|
||||
|
||||
Now let's try K-Means functions in OpenCV
|
@@ -0,0 +1,201 @@
|
||||
K-Means Clustering in OpenCV {#tutorial_py_kmeans_opencv}
|
||||
============================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
- Learn to use **cv2.kmeans()** function in OpenCV for data clustering
|
||||
|
||||
Understanding Parameters
|
||||
------------------------
|
||||
|
||||
### Input parameters
|
||||
|
||||
-# **samples** : It should be of **np.float32** data type, and each feature should be put in a
|
||||
single column.
|
||||
2. **nclusters(K)** : Number of clusters required at end
|
||||
3.
|
||||
|
||||
**criteria** : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are \`( type, max_iter, epsilon )\`:
|
||||
-
|
||||
|
||||
3.a - type of termination criteria : It has 3 flags as below:
|
||||
**cv2.TERM_CRITERIA_EPS** - stop the algorithm iteration if specified accuracy,
|
||||
*epsilon*, is reached. **cv2.TERM_CRITERIA_MAX_ITER** - stop the algorithm
|
||||
after the specified number of iterations, *max_iter*. **cv2.TERM_CRITERIA_EPS +
|
||||
cv2.TERM_CRITERIA_MAX_ITER** - stop the iteration when any of the above
|
||||
condition is met.
|
||||
|
||||
- 3.b - max_iter - An integer specifying maximum number of iterations.
|
||||
- 3.c - epsilon - Required accuracy
|
||||
|
||||
-# **attempts** : Flag to specify the number of times the algorithm is executed using different
|
||||
initial labellings. The algorithm returns the labels that yield the best compactness. This
|
||||
compactness is returned as output.
|
||||
5. **flags** : This flag is used to specify how initial centers are taken. Normally two flags are
|
||||
used for this : **cv2.KMEANS_PP_CENTERS** and **cv2.KMEANS_RANDOM_CENTERS**.
|
||||
|
||||
### Output parameters
|
||||
|
||||
-# **compactness** : It is the sum of squared distance from each point to their corresponding
|
||||
centers.
|
||||
2. **labels** : This is the label array (same as 'code' in previous article) where each element
|
||||
marked '0', '1'.....
|
||||
3. **centers** : This is array of centers of clusters.
|
||||
|
||||
Now we will see how to apply K-Means algorithm with three examples.
|
||||
|
||||
-# Data with Only One Feature
|
||||
-----------------------------
|
||||
|
||||
Consider, you have a set of data with only one feature, ie one-dimensional. For eg, we can take our
|
||||
t-shirt problem where you use only height of people to decide the size of t-shirt.
|
||||
|
||||
So we start by creating data and plot it in Matplotlib
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
x = np.random.randint(25,100,25)
|
||||
y = np.random.randint(175,255,25)
|
||||
z = np.hstack((x,y))
|
||||
z = z.reshape((50,1))
|
||||
z = np.float32(z)
|
||||
plt.hist(z,256,[0,256]),plt.show()
|
||||
@endcode
|
||||
So we have 'z' which is an array of size 50, and values ranging from 0 to 255. I have reshaped 'z'
|
||||
to a column vector. It will be more useful when more than one features are present. Then I made data
|
||||
of np.float32 type.
|
||||
|
||||
We get following image :
|
||||
|
||||

|
||||
|
||||
Now we apply the KMeans function. Before that we need to specify the criteria. My criteria is such
|
||||
that, whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0 is reached, stop
|
||||
the algorithm and return the answer.
|
||||
@code{.py}
|
||||
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
|
||||
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
|
||||
# Set flags (Just to avoid line break in the code)
|
||||
flags = cv2.KMEANS_RANDOM_CENTERS
|
||||
|
||||
# Apply KMeans
|
||||
compactness,labels,centers = cv2.kmeans(z,2,None,criteria,10,flags)
|
||||
@endcode
|
||||
This gives us the compactness, labels and centers. In this case, I got centers as 60 and 207. Labels
|
||||
will have the same size as that of test data where each data will be labelled as '0','1','2' etc.
|
||||
depending on their centroids. Now we split the data to different clusters depending on their labels.
|
||||
@code{.py}
|
||||
A = z[labels==0]
|
||||
B = z[labels==1]
|
||||
@endcode
|
||||
Now we plot A in Red color and B in Blue color and their centroids in Yellow color.
|
||||
@code{.py}
|
||||
# Now plot 'A' in red, 'B' in blue, 'centers' in yellow
|
||||
plt.hist(A,256,[0,256],color = 'r')
|
||||
plt.hist(B,256,[0,256],color = 'b')
|
||||
plt.hist(centers,32,[0,256],color = 'y')
|
||||
plt.show()
|
||||
@endcode
|
||||
Below is the output we got:
|
||||
|
||||

|
||||
|
||||
-# Data with Multiple Features
|
||||
------------------------------
|
||||
|
||||
In previous example, we took only height for t-shirt problem. Here, we will take both height and
|
||||
weight, ie two features.
|
||||
|
||||
Remember, in previous case, we made our data to a single column vector. Each feature is arranged in
|
||||
a column, while each row corresponds to an input test sample.
|
||||
|
||||
For example, in this case, we set a test data of size 50x2, which are heights and weights of 50
|
||||
people. First column corresponds to height of all the 50 people and second column corresponds to
|
||||
their weights. First row contains two elements where first one is the height of first person and
|
||||
second one his weight. Similarly remaining rows corresponds to heights and weights of other people.
|
||||
Check image below:
|
||||
|
||||

|
||||
|
||||
Now I am directly moving to the code:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
X = np.random.randint(25,50,(25,2))
|
||||
Y = np.random.randint(60,85,(25,2))
|
||||
Z = np.vstack((X,Y))
|
||||
|
||||
# convert to np.float32
|
||||
Z = np.float32(Z)
|
||||
|
||||
# define criteria and apply kmeans()
|
||||
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
ret,label,center=cv2.kmeans(Z,2,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
|
||||
|
||||
# Now separate the data, Note the flatten()
|
||||
A = Z[label.ravel()==0]
|
||||
B = Z[label.ravel()==1]
|
||||
|
||||
# Plot the data
|
||||
plt.scatter(A[:,0],A[:,1])
|
||||
plt.scatter(B[:,0],B[:,1],c = 'r')
|
||||
plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's')
|
||||
plt.xlabel('Height'),plt.ylabel('Weight')
|
||||
plt.show()
|
||||
@endcode
|
||||
Below is the output we get:
|
||||
|
||||

|
||||
|
||||
-# Color Quantization
|
||||
---------------------
|
||||
|
||||
Color Quantization is the process of reducing number of colors in an image. One reason to do so is
|
||||
to reduce the memory. Sometimes, some devices may have limitation such that it can produce only
|
||||
limited number of colors. In those cases also, color quantization is performed. Here we use k-means
|
||||
clustering for color quantization.
|
||||
|
||||
There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape
|
||||
the image to an array of Mx3 size (M is number of pixels in image). And after the clustering, we
|
||||
apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have
|
||||
specified number of colors. And again we need to reshape it back to the shape of original image.
|
||||
Below is the code:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2
|
||||
|
||||
img = cv2.imread('home.jpg')
|
||||
Z = img.reshape((-1,3))
|
||||
|
||||
# convert to np.float32
|
||||
Z = np.float32(Z)
|
||||
|
||||
# define criteria, number of clusters(K) and apply kmeans()
|
||||
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
K = 8
|
||||
ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
|
||||
|
||||
# Now convert back into uint8, and make original image
|
||||
center = np.uint8(center)
|
||||
res = center[label.flatten()]
|
||||
res2 = res.reshape((img.shape))
|
||||
|
||||
cv2.imshow('res2',res2)
|
||||
cv2.waitKey(0)
|
||||
cv2.destroyAllWindows()
|
||||
@endcode
|
||||
See the result below for K=8:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
@@ -0,0 +1,85 @@
|
||||
Understanding K-Means Clustering {#tutorial_py_kmeans_understanding}
|
||||
================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
We will deal this with an example which is commonly used.
|
||||
|
||||
### T-shirt size problem
|
||||
|
||||
Consider a company, which is going to release a new model of T-shirt to market. Obviously they will
|
||||
have to manufacture models in different sizes to satisfy people of all sizes. So the company make a
|
||||
data of people's height and weight, and plot them on to a graph, as below:
|
||||
|
||||

|
||||
|
||||
Company can't create t-shirts with all the sizes. Instead, they divide people to Small, Medium and
|
||||
Large, and manufacture only these 3 models which will fit into all the people. This grouping of
|
||||
people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes,
|
||||
which will satisfy all the people. And if it doesn't, company can divide people to more groups, may
|
||||
be five, and so on. Check image below :
|
||||
|
||||

|
||||
|
||||
### How does it work ?
|
||||
|
||||
This algorithm is an iterative process. We will explain it step-by-step with the help of images.
|
||||
|
||||
Consider a set of data as below ( You can consider it as t-shirt problem). We need to cluster this
|
||||
data into two groups.
|
||||
|
||||

|
||||
|
||||
**Step : 1** - Algorithm randomly chooses two centroids, \f$C1\f$ and \f$C2\f$ (sometimes, any two data are
|
||||
taken as the centroids).
|
||||
|
||||
**Step : 2** - It calculates the distance from each point to both centroids. If a test data is more
|
||||
closer to \f$C1\f$, then that data is labelled with '0'. If it is closer to \f$C2\f$, then labelled as '1'
|
||||
(If more centroids are there, labelled as '2','3' etc).
|
||||
|
||||
In our case, we will color all '0' labelled with red, and '1' labelled with blue. So we get
|
||||
following image after above operations.
|
||||
|
||||

|
||||
|
||||
**Step : 3** - Next we calculate the average of all blue points and red points separately and that
|
||||
will be our new centroids. That is \f$C1\f$ and \f$C2\f$ shift to newly calculated centroids. (Remember, the
|
||||
images shown are not true values and not to true scale, it is just for demonstration only).
|
||||
|
||||
And again, perform step 2 with new centroids and label data to '0' and '1'.
|
||||
|
||||
So we get result as below :
|
||||
|
||||

|
||||
|
||||
Now **Step - 2** and **Step - 3** are iterated until both centroids are converged to fixed points.
|
||||
*(Or it may be stopped depending on the criteria we provide, like maximum number of iterations, or a
|
||||
specific accuracy is reached etc.)* **These points are such that sum of distances between test data
|
||||
and their corresponding centroids are minimum**. Or simply, sum of distances between
|
||||
\f$C1 \leftrightarrow Red_Points\f$ and \f$C2 \leftrightarrow Blue_Points\f$ is minimum.
|
||||
|
||||
\f[minimize \;\bigg[J = \sum_{All\: Red_Points}distance(C1,Red_Point) + \sum_{All\: Blue_Points}distance(C2,Blue_Point)\bigg]\f]
|
||||
|
||||
Final result almost looks like below :
|
||||
|
||||

|
||||
|
||||
So this is just an intuitive understanding of K-Means Clustering. For more details and mathematical
|
||||
explanation, please read any standard machine learning textbooks or check links in additional
|
||||
resources. It is just a top layer of K-Means clustering. There are a lot of modifications to this
|
||||
algorithm like, how to choose the initial centroids, how to speed up the iteration process etc.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [Machine Learning Course](https://www.coursera.org/course/ml), Video lectures by Prof. Andrew Ng
|
||||
(Some of the images are taken from this)
|
||||
|
||||
Exercises
|
||||
---------
|
10
doc/py_tutorials/py_ml/py_knn/py_knn_index.markdown
Normal file
10
doc/py_tutorials/py_ml/py_knn/py_knn_index.markdown
Normal file
@@ -0,0 +1,10 @@
|
||||
K-Nearest Neighbour {#tutorial_py_knn_index}
|
||||
===================
|
||||
|
||||
- @subpage tutorial_py_knn_understanding
|
||||
|
||||
Get a basic understanding of what kNN is
|
||||
|
||||
- @subpage tutorial_py_knn_opencv
|
||||
|
||||
Now let's use kNN in OpenCV for digit recognition OCR
|
@@ -0,0 +1,121 @@
|
||||
OCR of Hand-written Data using kNN {#tutorial_py_knn_opencv}
|
||||
==================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will use our knowledge on kNN to build a basic OCR application.
|
||||
- We will try with Digits and Alphabets data available that comes with OpenCV.
|
||||
|
||||
OCR of Hand-written Digits
|
||||
--------------------------
|
||||
|
||||
Our goal is to build an application which can read the handwritten digits. For this we need some
|
||||
train_data and test_data. OpenCV comes with an image digits.png (in the folder
|
||||
opencv/samples/python2/data/) which has 5000 handwritten digits (500 for each digit). Each digit is
|
||||
a 20x20 image. So our first step is to split this image into 5000 different digits. For each digit,
|
||||
we flatten it into a single row with 400 pixels. That is our feature set, ie intensity values of all
|
||||
pixels. It is the simplest feature set we can create. We use first 250 samples of each digit as
|
||||
train_data, and next 250 samples as test_data. So let's prepare them first.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img = cv2.imread('digits.png')
|
||||
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Now we split the image to 5000 cells, each 20x20 size
|
||||
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
|
||||
|
||||
# Make it into a Numpy array. It size will be (50,100,20,20)
|
||||
x = np.array(cells)
|
||||
|
||||
# Now we prepare train_data and test_data.
|
||||
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
|
||||
# Create labels for train and test data
|
||||
k = np.arange(10)
|
||||
train_labels = np.repeat(k,250)[:,np.newaxis]
|
||||
test_labels = train_labels.copy()
|
||||
|
||||
# Initiate kNN, train the data, then test it with test data for k=1
|
||||
knn = cv2.KNearest()
|
||||
knn.train(train,train_labels)
|
||||
ret,result,neighbours,dist = knn.find_nearest(test,k=5)
|
||||
|
||||
# Now we check the accuracy of classification
|
||||
# For that, compare the result with test_labels and check which are wrong
|
||||
matches = result==test_labels
|
||||
correct = np.count_nonzero(matches)
|
||||
accuracy = correct*100.0/result.size
|
||||
print accuracy
|
||||
@endcode
|
||||
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option
|
||||
improve accuracy is to add more data for training, especially the wrong ones. So instead of finding
|
||||
this training data everytime I start application, I better save it, so that next time, I directly
|
||||
read this data from a file and start classification. You can do it with the help of some Numpy
|
||||
functions like np.savetxt, np.savez, np.load etc. Please check their docs for more details.
|
||||
@code{.py}
|
||||
# save the data
|
||||
np.savez('knn_data.npz',train=train, train_labels=train_labels)
|
||||
|
||||
# Now load the data
|
||||
with np.load('knn_data.npz') as data:
|
||||
print data.files
|
||||
train = data['train']
|
||||
train_labels = data['train_labels']
|
||||
@endcode
|
||||
In my system, it takes around 4.4 MB of memory. Since we are using intensity values (uint8 data) as
|
||||
features, it would be better to convert the data to np.uint8 first and then save it. It takes only
|
||||
1.1 MB in this case. Then while loading, you can convert back into float32.
|
||||
|
||||
OCR of English Alphabets
|
||||
------------------------
|
||||
|
||||
Next we will do the same for English alphabets, but there is a slight change in data and feature
|
||||
set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in
|
||||
opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look
|
||||
like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers
|
||||
following it are its different features. These features are obtained from [UCI Machine Learning
|
||||
Repository](http://archive.ics.uci.edu/ml/). You can find the details of these features in [this
|
||||
page](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition).
|
||||
|
||||
There are 20000 samples available, so we take first 10000 data as training samples and remaining
|
||||
10000 as test samples. We should change the alphabets to ascii characters because we can't work with
|
||||
alphabets directly.
|
||||
@code{.py}
|
||||
import cv2
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Load the data, converters convert the letter to a number
|
||||
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
|
||||
converters= {0: lambda ch: ord(ch)-ord('A')})
|
||||
|
||||
# split the data to two, 10000 each for train and test
|
||||
train, test = np.vsplit(data,2)
|
||||
|
||||
# split trainData and testData to features and responses
|
||||
responses, trainData = np.hsplit(train,[1])
|
||||
labels, testData = np.hsplit(test,[1])
|
||||
|
||||
# Initiate the kNN, classify, measure accuracy.
|
||||
knn = cv2.KNearest()
|
||||
knn.train(trainData, responses)
|
||||
ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
|
||||
|
||||
correct = np.count_nonzero(result == labels)
|
||||
accuracy = correct*100.0/10000
|
||||
print accuracy
|
||||
@endcode
|
||||
It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add
|
||||
error data in each level.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
@@ -0,0 +1,153 @@
|
||||
Understanding k-Nearest Neighbour {#tutorial_py_knn_understanding}
|
||||
=================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will understand the concepts of k-Nearest Neighbour (kNN) algorithm.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
kNN is one of the simplest of classification algorithms available for supervised learning. The idea
|
||||
is to search for closest match of the test data in feature space. We will look into it with below
|
||||
image.
|
||||
|
||||

|
||||
|
||||
In the image, there are two families, Blue Squares and Red Triangles. We call each family as
|
||||
**Class**. Their houses are shown in their town map which we call feature space. *(You can consider
|
||||
a feature space as a space where all datas are projected. For example, consider a 2D coordinate
|
||||
space. Each data has two features, x and y coordinates. You can represent this data in your 2D
|
||||
coordinate space, right? Now imagine if there are three features, you need 3D space. Now consider N
|
||||
features, where you need N-dimensional space, right? This N-dimensional space is its feature space.
|
||||
In our image, you can consider it as a 2D case with two features)*.
|
||||
|
||||
Now a new member comes into the town and creates a new home, which is shown as green circle. He
|
||||
should be added to one of these Blue/Red families. We call that process, **Classification**. What we
|
||||
do? Since we are dealing with kNN, let us apply this algorithm.
|
||||
|
||||
One method is to check who is his nearest neighbour. From the image, it is clear it is the Red
|
||||
Triangle family. So he is also added into Red Triangle. This method is called simply **Nearest
|
||||
Neighbour**, because classification depends only on the nearest neighbour.
|
||||
|
||||
But there is a problem with that. Red Triangle may be the nearest. But what if there are lot of Blue
|
||||
Squares near to him? Then Blue Squares have more strength in that locality than Red Triangle. So
|
||||
just checking nearest one is not sufficient. Instead we check some k nearest families. Then whoever
|
||||
is majority in them, the new guy belongs to that family. In our image, let's take k=3, ie 3 nearest
|
||||
families. He has two Red and one Blue (there are two Blues equidistant, but since k=3, we take only
|
||||
one of them), so again he should be added to Red family. But what if we take k=7? Then he has 5 Blue
|
||||
families and 2 Red families. Great!! Now he should be added to Blue family. So it all changes with
|
||||
value of k. More funny thing is, what if k = 4? He has 2 Red and 2 Blue neighbours. It is a tie !!!
|
||||
So better take k as an odd number. So this method is called **k-Nearest Neighbour** since
|
||||
classification depends on k nearest neighbours.
|
||||
|
||||
Again, in kNN, it is true we are considering k neighbours, but we are giving equal importance to
|
||||
all, right? Is it justice? For example, take the case of k=4. We told it is a tie. But see, the 2
|
||||
Red families are more closer to him than the other 2 Blue families. So he is more eligible to be
|
||||
added to Red. So how do we mathematically explain that? We give some weights to each family
|
||||
depending on their distance to the new-comer. For those who are near to him get higher weights while
|
||||
those are far away get lower weights. Then we add total weights of each family separately. Whoever
|
||||
gets highest total weights, new-comer goes to that family. This is called **modified kNN**.
|
||||
|
||||
So what are some important things you see here?
|
||||
|
||||
- You need to have information about all the houses in town, right? Because, we have to check
|
||||
the distance from new-comer to all the existing houses to find the nearest neighbour. If there
|
||||
are plenty of houses and families, it takes lots of memory, and more time for calculation
|
||||
also.
|
||||
- There is almost zero time for any kind of training or preparation.
|
||||
|
||||
Now let's see it in OpenCV.
|
||||
|
||||
kNN in OpenCV
|
||||
-------------
|
||||
|
||||
We will do a simple example here, with two families (classes), just like above. Then in the next
|
||||
chapter, we will do an even better example.
|
||||
|
||||
So here, we label the Red family as **Class-0** (so denoted by 0) and Blue family as **Class-1**
|
||||
(denoted by 1). We create 25 families or 25 training data, and label them either Class-0 or Class-1.
|
||||
We do all these with the help of Random Number Generator in Numpy.
|
||||
|
||||
Then we plot it with the help of Matplotlib. Red families are shown as Red Triangles and Blue
|
||||
families are shown as Blue Squares.
|
||||
@code{.py}
|
||||
import cv2
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Feature set containing (x,y) values of 25 known/training data
|
||||
trainData = np.random.randint(0,100,(25,2)).astype(np.float32)
|
||||
|
||||
# Labels each one either Red or Blue with numbers 0 and 1
|
||||
responses = np.random.randint(0,2,(25,1)).astype(np.float32)
|
||||
|
||||
# Take Red families and plot them
|
||||
red = trainData[responses.ravel()==0]
|
||||
plt.scatter(red[:,0],red[:,1],80,'r','^')
|
||||
|
||||
# Take Blue families and plot them
|
||||
blue = trainData[responses.ravel()==1]
|
||||
plt.scatter(blue[:,0],blue[:,1],80,'b','s')
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
You will get something similar to our first image. Since you are using random number generator, you
|
||||
will be getting different data each time you run the code.
|
||||
|
||||
Next initiate the kNN algorithm and pass the trainData and responses to train the kNN (It constructs
|
||||
a search tree).
|
||||
|
||||
Then we will bring one new-comer and classify him to a family with the help of kNN in OpenCV. Before
|
||||
going to kNN, we need to know something on our test data (data of new comers). Our data should be a
|
||||
floating point array with size \f$number \; of \; testdata \times number \; of \; features\f$. Then we
|
||||
find the nearest neighbours of new-comer. We can specify how many neighbours we want. It returns:
|
||||
|
||||
-# The label given to new-comer depending upon the kNN theory we saw earlier. If you want Nearest
|
||||
Neighbour algorithm, just specify k=1 where k is the number of neighbours.
|
||||
2. The labels of k-Nearest Neighbours.
|
||||
3. Corresponding distances from new-comer to each nearest neighbour.
|
||||
|
||||
So let's see how it works. New comer is marked in green color.
|
||||
@code{.py}
|
||||
newcomer = np.random.randint(0,100,(1,2)).astype(np.float32)
|
||||
plt.scatter(newcomer[:,0],newcomer[:,1],80,'g','o')
|
||||
|
||||
knn = cv2.KNearest()
|
||||
knn.train(trainData,responses)
|
||||
ret, results, neighbours ,dist = knn.find_nearest(newcomer, 3)
|
||||
|
||||
print "result: ", results,"\n"
|
||||
print "neighbours: ", neighbours,"\n"
|
||||
print "distance: ", dist
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
I got the result as follows:
|
||||
@code{.py}
|
||||
result: [[ 1.]]
|
||||
neighbours: [[ 1. 1. 1.]]
|
||||
distance: [[ 53. 58. 61.]]
|
||||
@endcode
|
||||
It says our new-comer got 3 neighbours, all from Blue family. Therefore, he is labelled as Blue
|
||||
family. It is obvious from plot below:
|
||||
|
||||

|
||||
|
||||
If you have large number of data, you can just pass it as array. Corresponding results are also
|
||||
obtained as arrays.
|
||||
@code{.py}
|
||||
# 10 new comers
|
||||
newcomers = np.random.randint(0,100,(10,2)).astype(np.float32)
|
||||
ret, results,neighbours,dist = knn.find_nearest(newcomer, 3)
|
||||
# The results also will contain 10 labels.
|
||||
@endcode
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [NPTEL notes on Pattern Recognition, Chapter
|
||||
11](http://www.nptel.iitm.ac.in/courses/106108057/12)
|
||||
|
||||
Exercises
|
||||
---------
|
@@ -0,0 +1,131 @@
|
||||
Understanding SVM {#tutorial_py_svm_basics}
|
||||
=================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will see an intuitive understanding of SVM
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
### Linearly Separable Data
|
||||
|
||||
Consider the image below which has two types of data, red and blue. In kNN, for a test data, we used
|
||||
to measure its distance to all the training samples and take the one with minimum distance. It takes
|
||||
plenty of time to measure all the distances and plenty of memory to store all the training-samples.
|
||||
But considering the data given in image, should we need that much?
|
||||
|
||||

|
||||
|
||||
Consider another idea. We find a line, \f$f(x)=ax_1+bx_2+c\f$ which divides both the data to two
|
||||
regions. When we get a new test_data \f$X\f$, just substitute it in \f$f(x)\f$. If \f$f(X) > 0\f$, it belongs
|
||||
to blue group, else it belongs to red group. We can call this line as **Decision Boundary**. It is
|
||||
very simple and memory-efficient. Such data which can be divided into two with a straight line (or
|
||||
hyperplanes in higher dimensions) is called **Linear Separable**.
|
||||
|
||||
So in above image, you can see plenty of such lines are possible. Which one we will take? Very
|
||||
intuitively we can say that the line should be passing as far as possible from all the points. Why?
|
||||
Because there can be noise in the incoming data. This data should not affect the classification
|
||||
accuracy. So taking a farthest line will provide more immunity against noise. So what SVM does is to
|
||||
find a straight line (or hyperplane) with largest minimum distance to the training samples. See the
|
||||
bold line in below image passing through the center.
|
||||
|
||||

|
||||
|
||||
So to find this Decision Boundary, you need training data. Do you need all? NO. Just the ones which
|
||||
are close to the opposite group are sufficient. In our image, they are the one blue filled circle
|
||||
and two red filled squares. We can call them **Support Vectors** and the lines passing through them
|
||||
are called **Support Planes**. They are adequate for finding our decision boundary. We need not
|
||||
worry about all the data. It helps in data reduction.
|
||||
|
||||
What happened is, first two hyperplanes are found which best represents the data. For eg, blue data
|
||||
is represented by \f$w^Tx+b_0 > 1\f$ while red data is represented by \f$w^Tx+b_0 < -1\f$ where \f$w\f$ is
|
||||
**weight vector** ( \f$w=[w_1, w_2,..., w_n]\f$) and \f$x\f$ is the feature vector
|
||||
(\f$x = [x_1,x_2,..., x_n]\f$). \f$b_0\f$ is the **bias**. Weight vector decides the orientation of decision
|
||||
boundary while bias point decides its location. Now decision boundary is defined to be midway
|
||||
between these hyperplanes, so expressed as \f$w^Tx+b_0 = 0\f$. The minimum distance from support vector
|
||||
to the decision boundary is given by, \f$distance_{support \, vectors}=\frac{1}{||w||}\f$. Margin is
|
||||
twice this distance, and we need to maximize this margin. i.e. we need to minimize a new function
|
||||
\f$L(w, b_0)\f$ with some constraints which can expressed below:
|
||||
|
||||
\f[\min_{w, b_0} L(w, b_0) = \frac{1}{2}||w||^2 \; \text{subject to} \; t_i(w^Tx+b_0) \geq 1 \; \forall i\f]
|
||||
|
||||
where \f$t_i\f$ is the label of each class, \f$t_i \in [-1,1]\f$.
|
||||
|
||||
### Non-Linearly Separable Data
|
||||
|
||||
Consider some data which can't be divided into two with a straight line. For example, consider an
|
||||
one-dimensional data where 'X' is at -3 & +3 and 'O' is at -1 & +1. Clearly it is not linearly
|
||||
separable. But there are methods to solve these kinds of problems. If we can map this data set with
|
||||
a function, \f$f(x) = x^2\f$, we get 'X' at 9 and 'O' at 1 which are linear separable.
|
||||
|
||||
Otherwise we can convert this one-dimensional to two-dimensional data. We can use \f$f(x)=(x,x^2)\f$
|
||||
function to map this data. Then 'X' becomes (-3,9) and (3,9) while 'O' becomes (-1,1) and (1,1).
|
||||
This is also linear separable. In short, chance is more for a non-linear separable data in
|
||||
lower-dimensional space to become linear separable in higher-dimensional space.
|
||||
|
||||
In general, it is possible to map points in a d-dimensional space to some D-dimensional space
|
||||
\f$(D>d)\f$ to check the possibility of linear separability. There is an idea which helps to compute the
|
||||
dot product in the high-dimensional (kernel) space by performing computations in the low-dimensional
|
||||
input (feature) space. We can illustrate with following example.
|
||||
|
||||
Consider two points in two-dimensional space, \f$p=(p_1,p_2)\f$ and \f$q=(q_1,q_2)\f$. Let \f$\phi\f$ be a
|
||||
mapping function which maps a two-dimensional point to three-dimensional space as follows:
|
||||
|
||||
\f[\phi (p) = (p_{1}^2,p_{2}^2,\sqrt{2} p_1 p_2)
|
||||
\phi (q) = (q_{1}^2,q_{2}^2,\sqrt{2} q_1 q_2)\f]
|
||||
|
||||
Let us define a kernel function \f$K(p,q)\f$ which does a dot product between two points, shown below:
|
||||
|
||||
\f[K(p,q) = \phi(p).\phi(q) &= \phi(p)^T \phi(q) \\
|
||||
&= (p_{1}^2,p_{2}^2,\sqrt{2} p_1 p_2).(q_{1}^2,q_{2}^2,\sqrt{2} q_1 q_2) \\
|
||||
&= p_1 q_1 + p_2 q_2 + 2 p_1 q_1 p_2 q_2 \\
|
||||
&= (p_1 q_1 + p_2 q_2)^2 \\
|
||||
\phi(p).\phi(q) &= (p.q)^2\f]
|
||||
|
||||
It means, a dot product in three-dimensional space can be achieved using squared dot product in
|
||||
two-dimensional space. This can be applied to higher dimensional space. So we can calculate higher
|
||||
dimensional features from lower dimensions itself. Once we map them, we get a higher dimensional
|
||||
space.
|
||||
|
||||
In addition to all these concepts, there comes the problem of misclassification. So just finding
|
||||
decision boundary with maximum margin is not sufficient. We need to consider the problem of
|
||||
misclassification errors also. Sometimes, it may be possible to find a decision boundary with less
|
||||
margin, but with reduced misclassification. Anyway we need to modify our model such that it should
|
||||
find decision boundary with maximum margin, but with less misclassification. The minimization
|
||||
criteria is modified as:
|
||||
|
||||
\f[min \; ||w||^2 + C(distance \; of \; misclassified \; samples \; to \; their \; correct \; regions)\f]
|
||||
|
||||
Below image shows this concept. For each sample of the training data a new parameter \f$\xi_i\f$ is
|
||||
defined. It is the distance from its corresponding training sample to their correct decision region.
|
||||
For those who are not misclassified, they fall on their corresponding support planes, so their
|
||||
distance is zero.
|
||||
|
||||

|
||||
|
||||
So the new optimization problem is :
|
||||
|
||||
\f[\min_{w, b_{0}} L(w,b_0) = ||w||^{2} + C \sum_{i} {\xi_{i}} \text{ subject to } y_{i}(w^{T} x_{i} + b_{0}) \geq 1 - \xi_{i} \text{ and } \xi_{i} \geq 0 \text{ } \forall i\f]
|
||||
|
||||
How should the parameter C be chosen? It is obvious that the answer to this question depends on how
|
||||
the training data is distributed. Although there is no general answer, it is useful to take into
|
||||
account these rules:
|
||||
|
||||
- Large values of C give solutions with less misclassification errors but a smaller margin.
|
||||
Consider that in this case it is expensive to make misclassification errors. Since the aim of
|
||||
the optimization is to minimize the argument, few misclassifications errors are allowed.
|
||||
- Small values of C give solutions with bigger margin and more classification errors. In this
|
||||
case the minimization does not consider that much the term of the sum so it focuses more on
|
||||
finding a hyperplane with big margin.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [NPTEL notes on Statistical Pattern Recognition, Chapters
|
||||
25-29](http://www.nptel.iitm.ac.in/courses/106108057/26).
|
||||
|
||||
Exercises
|
||||
---------
|
10
doc/py_tutorials/py_ml/py_svm/py_svm_index.markdown
Normal file
10
doc/py_tutorials/py_ml/py_svm/py_svm_index.markdown
Normal file
@@ -0,0 +1,10 @@
|
||||
Support Vector Machines (SVM) {#tutorial_py_svm_index}
|
||||
=============================
|
||||
|
||||
- @subpage tutorial_py_svm_basics
|
||||
|
||||
Get a basic understanding of what SVM is
|
||||
|
||||
- @subpage tutorial_py_svm_opencv
|
||||
|
||||
Let's use SVM functionalities in OpenCV
|
@@ -0,0 +1,138 @@
|
||||
OCR of Hand-written Data using SVM {#tutorial_py_svm_opencv}
|
||||
==================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
|
||||
- We will revisit the hand-written data OCR, but, with SVM instead of kNN.
|
||||
|
||||
OCR of Hand-written Digits
|
||||
--------------------------
|
||||
|
||||
In kNN, we directly used pixel intensity as the feature vector. This time we will use [Histogram of
|
||||
Oriented Gradients](http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients) (HOG) as feature
|
||||
vectors.
|
||||
|
||||
Here, before finding the HOG, we deskew the image using its second order moments. So we first define
|
||||
a function **deskew()** which takes a digit image and deskew it. Below is the deskew() function:
|
||||
@code{.py}
|
||||
def deskew(img):
|
||||
m = cv2.moments(img)
|
||||
if abs(m['mu02']) < 1e-2:
|
||||
return img.copy()
|
||||
skew = m['mu11']/m['mu02']
|
||||
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
|
||||
img = cv2.warpAffine(img,M,(SZ, SZ),flags=affine_flags)
|
||||
return img
|
||||
@endcode
|
||||
Below image shows above deskew function applied to an image of zero. Left image is the original
|
||||
image and right image is the deskewed image.
|
||||
|
||||

|
||||
|
||||
Next we have to find the HOG Descriptor of each cell. For that, we find Sobel derivatives of each
|
||||
cell in X and Y direction. Then find their magnitude and direction of gradient at each pixel. This
|
||||
gradient is quantized to 16 integer values. Divide this image to four sub-squares. For each
|
||||
sub-square, calculate the histogram of direction (16 bins) weighted with their magnitude. So each
|
||||
sub-square gives you a vector containing 16 values. Four such vectors (of four sub-squares) together
|
||||
gives us a feature vector containing 64 values. This is the feature vector we use to train our data.
|
||||
@code{.py}
|
||||
def hog(img):
|
||||
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
|
||||
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
|
||||
mag, ang = cv2.cartToPolar(gx, gy)
|
||||
|
||||
# quantizing binvalues in (0...16)
|
||||
bins = np.int32(bin_n*ang/(2*np.pi))
|
||||
|
||||
# Divide to 4 sub-squares
|
||||
bin_cells = bins[:10,:10], bins[10:,:10], bins[:10,10:], bins[10:,10:]
|
||||
mag_cells = mag[:10,:10], mag[10:,:10], mag[:10,10:], mag[10:,10:]
|
||||
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
|
||||
hist = np.hstack(hists)
|
||||
return hist
|
||||
@endcode
|
||||
Finally, as in the previous case, we start by splitting our big dataset into individual cells. For
|
||||
every digit, 250 cells are reserved for training data and remaining 250 data is reserved for
|
||||
testing. Full code is given below:
|
||||
@code{.py}
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
SZ=20
|
||||
bin_n = 16 # Number of bins
|
||||
|
||||
svm_params = dict( kernel_type = cv2.SVM_LINEAR,
|
||||
svm_type = cv2.SVM_C_SVC,
|
||||
C=2.67, gamma=5.383 )
|
||||
|
||||
affine_flags = cv2.WARP_INVERSE_MAP|cv2.INTER_LINEAR
|
||||
|
||||
def deskew(img):
|
||||
m = cv2.moments(img)
|
||||
if abs(m['mu02']) < 1e-2:
|
||||
return img.copy()
|
||||
skew = m['mu11']/m['mu02']
|
||||
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
|
||||
img = cv2.warpAffine(img,M,(SZ, SZ),flags=affine_flags)
|
||||
return img
|
||||
|
||||
def hog(img):
|
||||
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
|
||||
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
|
||||
mag, ang = cv2.cartToPolar(gx, gy)
|
||||
bins = np.int32(bin_n*ang/(2*np.pi)) # quantizing binvalues in (0...16)
|
||||
bin_cells = bins[:10,:10], bins[10:,:10], bins[:10,10:], bins[10:,10:]
|
||||
mag_cells = mag[:10,:10], mag[10:,:10], mag[:10,10:], mag[10:,10:]
|
||||
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
|
||||
hist = np.hstack(hists) # hist is a 64 bit vector
|
||||
return hist
|
||||
|
||||
img = cv2.imread('digits.png',0)
|
||||
|
||||
cells = [np.hsplit(row,100) for row in np.vsplit(img,50)]
|
||||
|
||||
# First half is trainData, remaining is testData
|
||||
train_cells = [ i[:50] for i in cells ]
|
||||
test_cells = [ i[50:] for i in cells]
|
||||
|
||||
###### Now training ########################
|
||||
|
||||
deskewed = [map(deskew,row) for row in train_cells]
|
||||
hogdata = [map(hog,row) for row in deskewed]
|
||||
trainData = np.float32(hogdata).reshape(-1,64)
|
||||
responses = np.float32(np.repeat(np.arange(10),250)[:,np.newaxis])
|
||||
|
||||
svm = cv2.SVM()
|
||||
svm.train(trainData,responses, params=svm_params)
|
||||
svm.save('svm_data.dat')
|
||||
|
||||
###### Now testing ########################
|
||||
|
||||
deskewed = [map(deskew,row) for row in test_cells]
|
||||
hogdata = [map(hog,row) for row in deskewed]
|
||||
testData = np.float32(hogdata).reshape(-1,bin_n*4)
|
||||
result = svm.predict_all(testData)
|
||||
|
||||
####### Check Accuracy ########################
|
||||
mask = result==responses
|
||||
correct = np.count_nonzero(mask)
|
||||
print correct*100.0/result.size
|
||||
@endcode
|
||||
This particular technique gave me nearly 94% accuracy. You can try different values for various
|
||||
parameters of SVM to check if higher accuracy is possible. Or you can read technical papers on this
|
||||
area and try to implement them.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [Histograms of Oriented Gradients Video](www.youtube.com/watch?v=0Zib1YEE4LU)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# OpenCV samples contain digits.py which applies a slight improvement of the above method to get
|
||||
improved result. It also contains the reference. Check it and understand it.
|
||||
|
@@ -0,0 +1,16 @@
|
||||
Machine Learning {#tutorial_py_table_of_contents_ml}
|
||||
================
|
||||
|
||||
- @subpage tutorial_py_knn_index
|
||||
|
||||
Learn to use kNN for classification
|
||||
Plus learn about handwritten digit recognition using kNN
|
||||
|
||||
- @subpage tutorial_py_svm_index
|
||||
|
||||
Understand concepts of SVM
|
||||
|
||||
- @subpage tutorial_py_kmeans_index
|
||||
|
||||
Learn to use K-Means Clustering to group data to a number of clusters.
|
||||
Plus learn to do color quantization using K-Means Clustering
|
Reference in New Issue
Block a user