Learning from labeled data for regression and classification.
Supervised learning is a paradigm in machine learning where the goal is to learn a function that maps an input to an output based on example input-output pairs. The name 'supervised' comes from the idea of a teacher or supervisor providing the correct answers (labels) for the training data. The algorithm's task is to learn a general rule that can correctly predict the output for new, unseen inputs. This process involves two main phases: training and inference. During training, the algorithm is fed a dataset containing input features and their corresponding correct labels. It iteratively adjusts its internal parameters to minimize the difference between its predictions and the actual labels. During inference, the trained model is used to make predictions on new data for which the labels are unknown. Supervised learning problems can be broadly categorized into two types: regression and classification. In a regression problem, the goal is to predict a continuous, numerical output. For example, predicting the price of a house based on its features (square footage, number of bedrooms, location) is a regression task. The output is a real value. In a classification problem, the goal is to predict a discrete, categorical label. For instance, classifying an email as either 'spam' or 'not spam' based on its content is a classification task. Other examples include identifying a handwritten digit (classes 0-9) or diagnosing a disease (classes 'positive' or 'negative'). The choice of algorithm and evaluation metrics depends heavily on whether the problem is one of regression or classification.