Hi everyone! 👋
I’m Dhyuthidhar Saraswathula, a passionate blogger exploring data science topics like ML and SQL. If you haven’t checked out my previous blogs, I highly recommend visiting my profile. Take a quick look, and then come back here for today’s deep dive!
Today, we’re diving into the basics of neural networks and uncovering the math behind them. So, buckle up for an exciting journey into the world of AI! 🚀
Let’s start with an analogy: Vector Functions
To understand neural networks, let’s take a detour into math and discuss vector functions.
A vector function is a function that takes a vector as input and produces another vector as output. It’s also called a vector-valued function.
For example:
$$\mathbf{f}(x) = \begin{pmatrix} 2x \\ x + 1 \\ x^2 \end{pmatrix}$$
Here, f(x) takes a scalar x as input and returns a vector with three components.
In neural networks, transformations applied by each layer are vector functions. For instance, in a network with three layers:
$$y = f_3(f_2(f_1(x)))$$
f1 and f2: Vector functions (take weights w, biases b, and input x)
f3: Scalar or vector function (depends on whether the output is a regression value or classification probabilities).
Given the unit vectors i,j, and k parallel to the x,y, and z-axis respectively, we can write a three-dimensional vector-valued function as:
$$r(t) = x(t) \hat{i} + y(t) \hat{j} + z(t) \hat{k}$$
Breaking Down Neural Networks
1. Input Layer
This layer represents the input vector x, analogous to the parameter t in vector functions.
2. Hidden Layers
Each hidden layer applies transformations to its input using weights and biases, followed by an activation function:
$$\mathbf{h}_i = w_1 x \mathbf{i} + w_2 x \mathbf{j} + b_1 \mathbf{k}$$
Here:
w1, w2: Weights
b1: Biases
i,j,k: Unit vectors
3. Output Layer
The output layer produces the final result:
- For classification, it outputs a vector of probabilities:
$$\mathbf{y} = [y_1, y_2, y_3, \ldots, y_n]$$
- For regression, it outputs a scalar value:
$$\mathbf{y} = f_3(\mathbf{h}_2)$$
Understanding Feed-Forward Neural Networks (FFNN)
Neural networks are essentially a combination of interconnected units (neurons), which can be visualized as:
Circle (No transformation): Represents a simple input.
Rectangle (Transformation applied): Represents mathematical functions applied to inputs.
Here’s how it works:
Input is transformed into a vector.
Linear transformation is applied.
The non-linear activation function is used to produce the final output.
Activation Functions
Every unit applies an activation function to its input to add non-linearity. Some commonly used functions are:
- Logistic (Sigmoid): It takes values from 0 to 1.
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
- Hyperbolic Tangent (tanh): It takes values from -1 to 1.
$$\text{tanh}(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$$
- ReLU (Rectified Linear Unit): The value will be either 0 or z.
$$\text{ReLU}(z) = \begin{cases} 0 & \text{if } z < 0 \\ z & \text{if } z \geq 0 \end{cases}$$
Fully Connected Neural Networks
A layer is fully connected if every unit in one layer is connected to every unit in the next.
If all layers in a network are fully connected, the network is called a fully connected neural network.
Key Points About Neural Networks
Final Layer Functionality
The last activation function (e.g., f3) determines the problem type:
Linear: Regression problem
Logistic Regression: Binary classification
Softmax: Multi-class classification
Parameters
Weights (wl): Represented as a matrix.
Biases (bl): Represented as a vector.
Operations
Matrix-vector multiplication followed by the addition of biases.
Activation functions are applied to produce the final output.
What’s Next?
In the next blog, I’ll discuss how to implement a neural network from scratch—no pre-built libraries, just pure Python! Stay tuned.
References
- Andriy Burkov’s 100-Page Machine Learning Book