K-Nearest Neighbors Algorithm

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. The following two properties would define KNN well −

  • Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
  • Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.

Why K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

  • Step-1: Select the number K of the neighbors
  • Step-2: Calculate the Euclidean distance of K number of neighbors
  • Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
  • Step-4: Among these k neighbors, count the number of the data points in each category.
  • Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
  • Step-6: Our model is ready.

Pros of KNN

  • It is very simple algorithm to understand and interpret.
  • It is very useful for nonlinear data because there is no assumption about data in this algorithm.
  • It is a versatile algorithm as we can use it for classification as well as regression.
  • It has relatively high accuracy but there are much better supervised learning models than KNN.

Cons of KNN

  • It is computationally a bit expensive algorithm because it stores all the training data.
  • High memory storage required as compared to other supervised learning algorithms.
  • Prediction is slow in case of big N.
  • It is very sensitive to the scale of data as well as irrelevant features.

Thank you for reading! I would appreciate any comments, notes, corrections, questions or suggestions — if there’s anything you’d like me to write about, please don’t hesitate to let me know.