Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Actor-Critic Algorithm in Reinforcement Learning
Next article icon

Genetic Algorithm for Reinforcement Learning : Python implementation

Last Updated : 08 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In reinforcement learning, the challenge is to find the best policy or the best set of parameters for a given environment. Genetic Algorithm (GA) is an optimization algorithm inspired by the process of natural evolution. It is used to find approximate solutions to complex problems by evolving a population of candidate solutions over generations.

The integration of Genetic Algorithms with Reinforcement Learning helps us to optimize the policy of RL model.

Use of Genetic Algorithm for RL?

  1. Exploration of Non-differentiable Spaces: If the reward function is not differentiable, traditional RL methods may struggle. GAs can explore the solution space by evolving individuals without relying on gradients.
  2. Global Optimization: GAs are good for finding a global optimum in large or complex search spaces whereas gradient-based methods can get stuck in local optima.
  3. Avoiding Local Minima: GAs maintain population diversity and reduce the risk of convergence to local minima which is a common issue with gradient descent methods.

Example of Genetic Algorithm for Policy Optimization

Let’s imagine that we are applying a GA to evolve a policy for a simple RL task like balancing a pole on a cart.

  1. Initialization: Start with a population of random neural networks representing different policies. Each individual in the population could be a neural network with random weights.
  2. Evaluation: Run the environment with each neural network and calculate the cumulative reward (fitness) for each agent. For example the fitness could be how long the agent can balance the pole.
  3. Selection: Select the top-performing policies based on fitness. Best-performing neural networks are more likely to “reproduce.”
  4. Crossover: Create new neural networks (offspring) by combining parts of the weights from the top-performing networks.
  5. Mutation: Introduce small random changes to the offspring networks weights to add diversity.
  6. Repeat: The process is repeated for several generations. With each generation the population evolves to have better-performing policies for balancing the pole.

Python Implementation of Genetic Algorithm for Reinforcement Learning

To implement it we need to follow below steps:

1. Downloading necessary libraries

We will implement a Genetic Algorithm to optimize the policy of an RL agent using the OpenAI Gym framework for creating environments.

pip install gym

2. Importing Libraries and Creating the Environment

We will import libraries like numpy, random, matplotlib and the environment is set up. The environment used is CartPole-v1 which is a classic control problem where the agent has to balance a pole on a cart.

Python
import gym import numpy as np import random import matplotlib.pyplot as plt  env = gym.make('CartPole-v1') 

3. Population Initialization

This block defines the population initialization function. A population of agents is generated randomly. Each agent’s policy is represented by a set of weights that determines how the agent will act based on the current state.

  • np.random.randn(): Generates random weights for the neural network representing each individual in the population.
  • input_dim: corresponds to the number of state variables
  • output_dim: corresponds to the number of possible actions (2 for Cart Pole: left or right).
  • * 0.5: This scales the random weights to a range that is more likely to produce diverse results initially.
Python
def initialize_population(pop_size, input_dim, output_dim):     population = []     for _ in range(pop_size):         individual = np.random.randn(input_dim, output_dim) * 0.5          population.append(individual)     return population 

4. Fitness Evaluation Function

This function evaluates how well an individual policy performs in the environment. The agent’s performance is measured by the total reward it accumulates in the environment. The function terminates either when the agent has completed the task or after a set number of steps (max_steps).

  • np.dot(state, individual): Computes the action to take based on the current state and the individual’s policy weights. This represents a linear function approximator for the agent’s decision-making.
  • env.step(action): Applies the selected action to the environment and returns the next state, reward and a done flag indicating if the task is completed (balancing the pole or reaching a time limit).
  • np.argmax: Selects the action with the highest value from the computed values.
Python
def fitness_function(individual, env, max_steps=100):     state = env.reset()     done = False     total_reward = 0     steps = 0     while not done and steps < max_steps:          action = np.argmax(np.dot(state, individual))           state, reward, done, _ = env.step(action)         total_reward += reward         steps += 1         print(f"Step: {steps}, Action: {action}, Reward: {reward}, Total Reward: {total_reward}, Done: {done}")     return total_reward 

5. Tournament Selection

This function selects individuals from the population using tournament selection. A random subset of the population is chosen and the best individual from the subset is selected to move on to the next generation.

  • random.sample: Selects a random set of individuals from the population.
  • np.argmax(tournament_fitness): Chooses the fittest individual from the tournament.
Python
def tournament_selection(population, fitness_scores, tournament_size=3):     selected = []     for _ in range(len(population)):         tournament = random.sample(range(len(population)), tournament_size)         tournament_fitness = [fitness_scores[i] for i in tournament]         winner = tournament[np.argmax(tournament_fitness)]         selected.append(population[winner])     return selected 

6. Crossover

The crossover function combines two parent solutions to create two offspring by exchanging parts of their “genetic material” (policy weights). This introduces diversity in the population and is a important part of the evolutionary process.

  • random.randint(1, len(parent1) - 1): Randomly selects a crossover point within the parent policies.
  • np.concatenate: Joins the parts of the two parents to create two offspring.
Python
def crossover(parent1, parent2):     crossover_point = random.randint(1, len(parent1) - 1)     offspring1 = np.concatenate((parent1[:crossover_point], parent2[crossover_point:]), axis=0)     offspring2 = np.concatenate((parent2[:crossover_point], parent1[crossover_point:]), axis=0)     return offspring1, offspring2 

7. Mutation

Mutation introduces random changes to an individual’s policy to maintain genetic diversity and help the algorithm explore different parts of the solution space. The mutation rate determines the probability of a gene (policy weight) being altered.

  • random.random() < mutation_rate: Determines if a mutation should occur at a particular position.
  • np.random.uniform(-0.5, 0.5): Introduces a random change to the policy weights within a range and adding diversity to the population.
Python
def mutate(individual, mutation_rate=0.05):      for i in range(len(individual)):         if random.random() < mutation_rate:             individual[i] += np.random.uniform(-0.5, 0.5)       return individual 

8. Genetic Algorithm Loop

It initializes the population, evaluates the fitness of each individual, performs selection, crossover and mutation operations to create the next generation and repeats for a specified number of generations.

  • initialize_population: Creates the initial population of agents (policies).
  • The loop iterates through generations, evaluating each individual, selecting the fittest, performing crossover and mutation and updating the population.
Python
def genetic_algorithm(env, pop_size=50, generations=10, mutation_rate=0.01, max_steps_per_generation=50):     input_dim = env.observation_space.shape[0]     output_dim = env.action_space.n     population = initialize_population(pop_size, input_dim, output_dim)      for gen in range(generations):         print(f"Generation {gen} start")         fitness_scores = []         for individual in population:             total_reward = fitness_function(individual, env, max_steps=max_steps_per_generation)             fitness_scores.append(total_reward)          print(f"Generation {gen}, Best Fitness: {max(fitness_scores)}")          selected_population = tournament_selection(population, fitness_scores)          next_generation = []         for i in range(0, len(selected_population), 2):             parent1, parent2 = selected_population[i], selected_population[i + 1]             offspring1, offspring2 = crossover(parent1, parent2)             next_generation.append(mutate(offspring1, mutation_rate))             next_generation.append(mutate(offspring2, mutation_rate))          population = next_generation          if gen >= generations:             print("Reached max generations!")             break      return population 

9. Running the Genetic Algorithm

Here we will run our model.

Python
final_population = genetic_algorithm(env, pop_size=50, generations=10, mutation_rate=0.01, max_steps_per_generation=50) 

Output:

Screenshot-2025-04-01-160042

Model Working

Each line represents one timestep during the agent’s operation. Here’s a breakdown of the key parts:

  • Step: The current step within the episode.
  • Action: The action taken by the agent at that step either 0 or 1 representing different directions or movements.
  • Reward: The reward received for taking that action. In this case the agent gets 1.0 reward for each action likely because it’s continuing to balance the pole.
  • Total Reward: The cumulative reward for the agent at that point in the episode.
  • Done: A flag indicating whether the episode has finished. It’s set to False here meaning the agent is still in the process of balancing the pole.

The Best Fitness at Generation 9 is 50.0 indicating that the best individual in this generation has managed to maintain the pole for 50 steps. This output suggests that the agent is progressing and over multiple generations the population should continue to evolve to improve performance.

10. Visualization

This function visualizes the agent’s performance in the environment using the best policy found by the GA. The agent interacts with the environment by selecting actions based on the learned policy and the environment is rendered for visualization.

  • np.argmax(np.dot(state, policy)): The agent selects an action based on the current state and the learned policy.
Python
def evaluate_best_policy(policy, env, max_steps=500):     state = env.reset()     done = False     total_reward = 0     steps = 0     while not done and steps < max_steps:         action = np.argmax(np.dot(state, policy))          state, reward, done, _ = env.step(action)          total_reward += reward         steps += 1     print(f"Total Reward: {total_reward}")   best_policy = final_population[0]   evaluate_best_policy(best_policy, env) 

Output:

Total Reward: 97.0

The output means the agent using the best-found policy and successfully balanced the pole for 97 steps accumulating 97.0 reward points. It shows that the genetic algorithm has made progress in evolving a policy that works fairly well in the CartPole task.

By combining the exploration capabilities of GAs with the decision-making framework of RL we can enhance the ability of agents to adapt and optimize in challenging tasks leading to more robust and diverse solutions.



Next Article
Actor-Critic Algorithm in Reinforcement Learning

N

ngrover241
Improve
Article Tags :
  • Machine Learning
Practice Tags :
  • Machine Learning

Similar Reads

  • ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning
    Prerequisites: Q-Learning technique. Reinforcement Learning is a type of Machine Learning paradigms in which a learning algorithm is trained not on preset data but rather based on a feedback system. These algorithms are touted as the future of Machine Learning as these eliminate the cost of collecti
    6 min read
  • Epsilon-Greedy Algorithm in Reinforcement Learning
    In Reinforcement Learning, the agent or decision-maker learns what to do—how to map situations to actions—so as to maximize a numerical reward signal. The agent is not explicitly told which actions to take, but instead must discover which action yields the most reward through trial and error. Multi-
    4 min read
  • Dyna Algorithm in Reinforcement Learning
    The Dyna algorithm introduces a hybrid approach that leverages both real-world and simulated experiences, enhancing the agent's learning efficiency. This article delves into the key concepts, architecture, and benefits of the Dyna algorithm, along with its applications. Table of Content Understandin
    5 min read
  • Reinforcement Learning in Python: Implementing SARSA Agent in Taxi Environment
    SARSA (State-Action-Reward-State-Action) is an on-policy reinforcement learning algorithm that updates its policy based on the current state-action pair, the reward received, the next state, and the next action chosen by the current policy. In this article, we will implement SARSA in Gymnasium's Tax
    8 min read
  • Actor-Critic Algorithm in Reinforcement Learning
    Actor-Critic Algorithm is a type of reinforcement learning algorithm that combines aspects of both policy-based methods (Actor) and value-based methods (Critic). This hybrid approach is designed to address the limitations of each method when used individually. In the actor-critic framework, an agent
    7 min read
  • Python | Single Point Crossover in Genetic Algorithm
    Single Point Crossover is a method used in a technique called genetic algorithms which are inspired by how living things pass on their traits to their children. Just like in nature where a child gets some traits from the mother and some from the father this method mixes two “parent solutions” to cre
    3 min read
  • Multi-Agent Reinforcement Learning in AI
    Reinforcement learning (RL) can solve complex problems through trial and error, learning from the environment to make optimal decisions. While single-agent reinforcement learning has made remarkable strides, many real-world problems involve multiple agents interacting within the same environment. Th
    7 min read
  • Q-Learning in Reinforcement Learning
    Q-learning is a model-free reinforcement learning algorithm used to train agents (computer programs) to make optimal decisions by interacting with an environment. It helps the agent explore different actions and learn which ones lead to better outcomes. The agent uses trial and error to determine wh
    9 min read
  • Policy Gradient Methods in Reinforcement Learning
    Policy Gradient methods in Reinforcement Learning (RL) aim to directly optimize the policy, unlike value-based methods that estimate the value of states. These methods are particularly useful in environments with continuous action spaces or complex tasks where value-based approaches struggle. Given
    4 min read
  • Neural Logic Reinforcement Learning - An Introduction
    Neural Logic Reinforcement Learning is an algorithm that combines logic programming with deep reinforcement learning methods. Logic programming can be used to express knowledge in a way that does not depend on the implementation, making programs more flexible, compressed and understandable. It enabl
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences