Support Vector Machines

The objective is to find the widest street and the best boundary that separates the two classes.

Imagine a vector perpendicular to the median line to the street.




The equation to maximize 1/2*(w^2), while classifying everything correctly.



Kernels of svmsvm circular similarity



SVM optimization doesn’t get stuck in local maximum, it has a convex base, unlike Neural nets which could often get stuck in local maxima.

For linearly separable points, SVM works fine, but for nonlinearly separable points, you need to do a transformation to project these points to higher dimensional space, so that they can get linearly separable. For this, you need to use Gaussian or polynomial kernel.

Kernels represent similarity measure for different points and impart domain knowledge.

Adding a nonlinear feature often makes the SVM linearly separable.

Mercer condition:

Writing svm in sklearn:

from sklearn import svm


parameters of svm:

  1. //other kernels, kernel=”rbf”,kernel=”poly”
  2. C – Controls tradeoff between smooth decision boundary and classifying training points correctly. Large C will take the side of classifying more training points correctly.
  3. gamma – Reach of each training example. Low values -> far reach, high values -> close reach.,y)


  1. A general method that is convex and guaranteed to produce a global solution.
  2. Small Sigma in Gaussian kernel can cause overfitting because then classification is shrunk right around the sample points.
  3. For handwritten character recognition, the linear kernel with n=2(nonlinear) works well.
  4. SVMs don’t perform very well in large datasets and lot of features because of the training time being cubic. Naive Bayes classifier would be better when there is lot of overlap and noise compared to svms.
  5. SVMs “don’t work well with lots and lots of noise, so when the classes are very overlapping, you have to count independent evidence.
  6. Compute/training time is way too high for this.

Leave a Reply

%d bloggers like this: