Building Artificial Neural Networks (ANN) from Scratch
Last Updated : 03 Jun, 2025
Artificial Neural Networks (ANNs) are a collection of interconnected layers of neurons. It includes:
- Input Layer: Receives input features.
- Hidden Layers: Process information through weighted connections and activation functions.
- Output Layer: Produces the final prediction.
- Weights and Biases: Trainable parameters that adjust during learning.
- Activation Functions: Introduces non-linearity which allows the network to learn complex patterns.
Let's build an ANN from scratch using Python and NumPy without relying on deep learning libraries such as TensorFlow or PyTorch. This approach will help in better understanding of the workings of neural networks.
Neural NetworkStep 1: Importing Necessary Libraries
We will use NumPy to handle numerical computations efficiently.
Python
Step 2: Initializing the Neural Network
- Sets initial weights and biases for a two-layer neural network.
- Uses np.random.seed(42) for reproducible results.
- Weights (W1, W2) initialized with small random values scaled by 0.01 to avoid large initial weights.
- W1 shape: (hidden layer size, input layer size).
- W2 shape: (output layer size, hidden layer size).
- Biases (b1, b2) initialized to zero vectors matching their layer sizes.
Python def initialize_parameters(input_size, hidden_size, output_size): np.random.seed(42) # For reproducibility parameters = { "W1": np.random.randn(hidden_size, input_size) * 0.01, "b1": np.zeros((hidden_size, 1)), "W2": np.random.randn(output_size, hidden_size) * 0.01, "b2": np.zeros((output_size, 1)) } return parameters
Step 3: Defining Activation Functions
Activation functions introduce non-linearity into the model, helping it learn complex patterns. We here are using:
Python def sigmoid(Z): return 1 / (1 + np.exp(-Z)) def relu(Z): return np.maximum(0, Z) def relu_derivative(Z): return (Z > 0).astype(int)
Step 4: Forward Propagation
In Forward propagation the function computes the output of the neural network for a given input X
and parameters.
- First, it calculates the linear combination Z1 for the hidden layer by multiplying the input X with the weights W1 and adding bias b1.
- It then applies the ReLU activation function to Z1 producing the hidden layer activations A1.
- Next, it calculates the linear combination Z2 for the output layer by multiplying A1 with W2 and adding b2.
- The sigmoid activation function is applied to Z2 to produce the final output A2.
- The function returns the output A2 along with a cache containing intermediate values needed for backpropagation.
Python def forward_propagation(X, parameters): W1, b1, W2, b2 = parameters["W1"], parameters["b1"], parameters["W2"], parameters["b2"] Z1 = np.dot(W1, X) + b1 A1 = relu(Z1) Z2 = np.dot(W2, A1) + b2 A2 = sigmoid(Z2) cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2} return A2, cache
Step 5: Computing the Cost
Cost function calculates the binary cross-entropy loss which measures how well the neural network’s predictions A2
match the true labels Y
.
m
is the number of examples.np.squeeze
removes any extra dimensions, returning the cost as a scalar.
Python def compute_cost(Y, A2): m = Y.shape[1] cost = -np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2)) / m return np.squeeze(cost)
Step 6: Backpropagation
Backpropagation computes the gradients needed to update the network parameters during training.
- It calculates the error at the output layer (dZ2) as the difference between predicted outputs (A2) and true labels (Y).
- Using this error, it computes gradients of the weights (dW2) and biases (db2) for the output layer.
- Then, it backpropagates the error to the hidden layer by multiplying with the transpose of W2 and element-wise with the derivative of the ReLU activation (relu_derivative).
- Finally, it calculates gradients for the hidden layer weights (dW1) and biases (db1).
- All gradients are averaged over the number of examples m to ensure stable updates.
Python def backward_propagation(X, Y, parameters, cache): m = X.shape[1] W2 = parameters["W2"] dZ2 = cache["A2"] - Y dW2 = np.dot(dZ2, cache["A1"].T) / m db2 = np.sum(dZ2, axis=1, keepdims=True) / m dZ1 = np.dot(W2.T, dZ2) * relu_derivative(cache["Z1"]) dW1 = np.dot(dZ1, X.T) / m db1 = np.sum(dZ1, axis=1, keepdims=True) / m grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return grads
Step 7: Updating Parameters
Gradient descent updates the parameters using the computed gradients and a learning rate.
Python def update_parameters(parameters, grads, learning_rate): for key in parameters.keys(): parameters[key] -= learning_rate * grads["d" + key] return parameters
Step 8: Training the Neural Network
We train the neural network over multiple iterations, updating parameters using backpropagation and gradient descent.
Python def train_neural_network(X, Y, input_size, hidden_size, output_size, epochs=1000, learning_rate=0.01): parameters = initialize_parameters(input_size, hidden_size, output_size) for i in range(epochs): A2, cache = forward_propagation(X, parameters) cost = compute_cost(Y, A2) grads = backward_propagation(X, Y, parameters, cache) parameters = update_parameters(parameters, grads, learning_rate) if i % 100 == 0: print(f"Epoch {i}: Cost = {cost}") return parameters
Step 9: Making Predictions
The trained model predicts outputs by performing forward propagation and applying a threshold of 0.5.
Python def predict(X, parameters): A2, _ = forward_propagation(X, parameters) return (A2 > 0.5).astype(int)
Step 10: Testing the Model
We test the model using an AND logic gate dataset.
Python # Example data (AND logic gate) X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]]) Y = np.array([[0, 0, 0, 1]]) trained_parameters = train_neural_network(X, Y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1) predictions = predict(X, trained_parameters) print("Predictions:", predictions)
Output:
The neural network started with random weights and a high error. Over 10,000 epochs, it optimized its weights and biases using gradient descent. The cost function continuously decreased, confirming effective learning. The final predictions match the expected AND gate truth table, proving that the network has successfully generalized the AND logic.
Similar Reads
Layers in Artificial Neural Networks (ANN) In Artificial Neural Networks (ANNs), data flows from the input layer to the output layer through one or more hidden layers. Each layer consists of neurons that receive input, process it, and pass the output to the next layer. The layers work together to extract features, transform data, and make pr
4 min read
Introduction to Artificial Neural Networks (ANNs) Artificial Neural Networks (ANNs) are computational models inspired by the human brain. They are widely used for solving complex tasks such as pattern recognition, speech processing and decision-making. By mimicking the interconnected structure of biological neurons, ANNs can learn patterns and make
5 min read
Introduction to ANN (Artificial Neural Networks) | Set 3 (Hybrid Systems) Prerequisites: Genetic algorithms, Artificial Neural Networks, Fuzzy Logic Hybrid systems: A Hybrid system is an intelligent system that is framed by combining at least two intelligent technologies like Fuzzy Logic, Neural networks, Genetic algorithms, reinforcement learning, etc. The combination of
4 min read
Implementing Artificial Neural Network training process in Python An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the brain. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning largely involves adju
4 min read
Architecture and Learning process in neural network In order to learn about Backpropagation, we first have to understand the architecture of the neural network and then the learning process in ANN. So, let's start about knowing the various architectures of the ANN: Architectures of Neural Network: ANN is a computational system consisting of many inte
9 min read