The k-Nearest Neighbors algorithm is a simple machine-learning algorithm that classifies a new data point according to the classes of the data points closest to it. An example should illustrate the nature and effectiveness of the algorithm.
Consider a given apartment renter. If three out of the five renters living nearest to our renter make >$75k/yr, then our renter probably makes >$75k/yr, too. Of course, it’s possible that he or she doesn’t, but it’s likely that he or she does. The k-Nearest Neighbors algorithm operates on this principle. Given an integer k, the algorithm classifies a new data point as the class that appears most frequently among the k data points nearest to the new data point.
As this great blog post explains, it’s crucial to pick an appropriate metric for measuring the nearness of neighbors. In my implementation, I used a common metric, the Euclidean distance.