A simple algorithm that classifies based on the 'majority vote' of its neighbors.
K-Nearest Neighbors (k-NN) is a simple, yet effective, supervised learning algorithm used for both classification and regression. It's considered a 'lazy learner' or an instance-based learning algorithm because it doesn't build a general internal model from the training data. Instead, it stores the entire training dataset in memory. When a prediction is needed for a new, unseen data point, k-NN looks at the 'k' closest data points (the 'neighbors') to it in the training set, based on a distance metric, typically Euclidean distance. For a classification task, the algorithm assigns the new data point to the class that is most common among its k-nearest neighbors (a 'majority vote'). For a regression task, it predicts the average of the values of its k-nearest neighbors. The choice of 'k' is a critical hyperparameter. A small 'k' can make the model sensitive to noise, while a large 'k' can be computationally expensive and may oversmooth the decision boundary. Because k-NN relies on distance, it's essential to scale the features before training, as features on a larger scale can dominate the distance calculation. Its main advantages are its simplicity and ease of interpretation. However, it can be slow and memory-intensive for large datasets, as it needs to compute distances to all training points for each new prediction.