Logistic Regression

A powerful algorithm for binary classification problems.

Key Notes

Despite its name, Logistic Regression is a supervised learning algorithm used for classification, not regression. It is one of the most popular and widely used algorithms for binary classification problems, where the output is one of two categories (e.g., Yes/No, True/False, Spam/Not Spam). The core idea of Logistic Regression is to take a linear combination of the input features (similar to linear regression) and pass it through a special function called the sigmoid or logistic function. The sigmoid function squishes any real-valued number into a range between 0 and 1. This output can be interpreted as a probability. For example, if the model outputs 0.85 for a given email, it means there is an 85% probability that the email is spam. To make a final decision, a threshold is applied, typically 0.5. If the probability is greater than the threshold, the instance is classified as belonging to the positive class (e.g., spam); otherwise, it's classified as the negative class. The model is trained by finding the optimal weights that minimize a cost function, usually the Log Loss or Binary Cross-Entropy, which measures the difference between the predicted probabilities and the actual class labels. Logistic Regression is valued for its simplicity, interpretability (you can easily understand the influence of each feature), and efficiency in training, making it an excellent baseline model for any classification task.

Back to Supervised Learning

Logistic Regression

A powerful algorithm for binary classification problems.

Key Notes