Logistic Regression Simplified: A Beginner’s Guide to Binary Classification

Logistic Regression Simplified: A Beginner’s Guide to Binary Classification

Hi, I’m Dhyuthidhar! If you're new here, welcome! I love writing about topics in Computer Science, especially Machine Learning. In this blog, we’ll explore Logistic Regression—a supervised learning model. I’ll break it down into simple, digestible pieces to help you understand the concept easily.

Buckle up, guys!


What is Logistic Regression?

Logistic Regression is a machine learning model primarily used for classification problems. For example, if you want to determine whether a message is fake or real, Logistic Regression can help classify it.

PDF] Spam Message Detection Using Logistic Regression | Semantic Scholar

Why "Regression" in Logistic Regression?

The term "regression" might seem misleading because Logistic Regression is more about classification than regression. The name originates from statistics, where its mathematical formulation resembles Linear Regression.


Problem Statement

Logistic Regression models the output y as a binary value (e.g., 0 or 1) based on an input xxx. However, the output of a linear combination of features, wx+b, ranges from −∞ to +∞. This range isn’t suitable for classification.

The Solution: Sigmoid Function

To solve this, scientists adopted the sigmoid function (also known as the logistic function), which maps any real-valued number to a range between 0 and 1.

The sigmoid function is given by:

$$S(x) = \frac{1}{1 + e^{-x}}$$

Different Types of Activation Functions :Sigmoid, tanh, ReLu & Leaky ReLu |  by Ramit Agarwal | Medium

Where e is Euler's number, and e^x is often written as exp(x) in programming languages.

Here’s what the sigmoid function looks like:

Logistic Model Equation

Logistic Regression applies the sigmoid function to a linear equation:

$$f_{(w, b)}(x) = \frac{1}{1 + e^{-(wx + b)}}$$

The term wx+b comes from Linear Regression. By optimizing w and b, we can interpret the output f(x)f(x)f(x) as the probability of a positive class.

  • If f(x)>0.5, it’s classified as positive.

  • If f(x)≤0.5, it’s classified as negative.

The threshold value (default is 0.5) can vary depending on the problem.


How Do We Find Optimal w^* and b^*?

Linear vs. Logistic Regression

In Linear Regression, we minimize the mean squared error (MSE).
In Logistic Regression, we maximize the likelihood of the training data.

Likelihood Function

The likelihood function represents how likely the observed data is under our model. For a labelled example (xi,yi):

$$L(y_i | x_i, w, b) = \begin{cases} f_{(w,b)}(x_i) & \text{if } y_i = 1 \\ 1 - f_{(w,b)}(x_i) & \text{if } y_i = 0 \end{cases}$$

For n training examples, the likelihood function is:

$$L(w, b) = \prod_{i=1}^{n} f_{(w,b)}(x_i)^{y_i} \cdot (1 - f_{(w,b)}(x_i))^{1 - y_i}$$

Maximizing the Log-Likelihood

The product operator makes the equation complex, so we take the logarithm (log-likelihood):

$$\text{Log-Likelihood} = \sum_{i=1}^{n} \left[ y_i \log f_{(w,b)}(x_i) + (1 - y_i) \log (1 - f_{(w,b)}(x_i)) \right]$$

Maximizing the log-likelihood is equivalent to maximizing the original likelihood.


Optimization: Gradient Descent

Unlike Linear Regression, Logistic Regression lacks a closed-form solution. Instead, we use gradient descent to iteratively adjust w and b, minimizing the cost function.


Conclusion

Logistic Regression combines linear models with the sigmoid function to handle classification problems. Maximizing the log likelihood determines the optimal parameters using gradient descent. This makes Logistic Regression a foundational tool in binary classification tasks.


What’s Next?
Stay tuned for my next blog, where I’ll dive into Decision Tree Learning—a powerful algorithm for both classification and regression tasks.

If you enjoyed this blog, try implementing Logistic Regression yourself and let me know your experience or doubts in the comments below! 😊