Author: Sevendi Eldrige Rifki Poluan
Created: May 2023
A learning repository implementing various popular Reinforcement Learning (RL) techniques and algorithms using OpenAI Gym environments.
Reinforcement Learning (RL) is a subfield of machine learning that focuses on training agents to make sequential decisions in an environment to maximize a cumulative reward signal. The goal of RL is to develop algorithms that can learn to make intelligent decisions through trial and error, much like how humans and animals learn from experience.
Key Concepts:
- Agent: An entity that learns and makes decisions in an environment
- Environment: The world in which the agent operates
- State: The current situation of the agent in the environment
- Action: A decision the agent makes to interact with the environment
- Reward: Feedback signal indicating the quality of the agent's action
- Policy: A mapping from states to actions that the agent learns
- Q-Value: The expected cumulative reward of taking an action in a state
The agent must balance two competing objectives:
- Exploration: Discovering new strategies and states to find the optimal policy
- Exploitation: Using current knowledge to maximize immediate rewards
This repository focuses on Deep Q-Learning (DQN), an algorithm that combines:
- Q-Learning with neural networks to handle complex environments
- Experience replay to break correlations in training data
- Target networks to stabilize training
RL has real-world applications across multiple domains:
- Robotics: Training robots for manipulation and control tasks
- Finance: Portfolio optimization and algorithmic trading
- Gaming: Developing intelligent agents for complex games
- Autonomous Systems: Decision-making in self-driving cars
- Control Systems: Optimizing resource allocation
reinforcement-learning/
├── README.md # This file
└── DeepQLearning/
├── CartPole-v1/
│ ├── deep_q_learning_cartpole.ipynb
│ ├── README.md
│ ├── Render/ # Demo GIFs (before & after training)
│ └── Saved model/
│ └── model.weights.h5
└── MountainCar-v0/
├── deep_q_learning_mountaincar.ipynb
├── README.md
├── Render/ # Demo GIFs (before & after training)
└── Saved model/
└── model.weights.h5
Objective: Train an agent to balance a pole on a moving cart without falling.
Environment: CartPole-v1 from OpenAI Gym
Key Features:
- 4-dimensional state space (cart position, cart velocity, pole angle, pole angular velocity)
- 2 discrete actions (push cart left or right)
- Episode completes when pole falls beyond ±12° or cart moves beyond ±2.4 units
- Reward: +1 for each timestep the pole remains upright
Results:
- Baseline (untrained): Quick failure
- After training: Successfully maintains pole balance for extended periods
Objective: Train an agent to drive an underpowered car up a steep mountain.
Environment: MountainCar-v0 from OpenAI Gym
Key Features:
- 2-dimensional state space (car position and velocity)
- 3 discrete actions (push left, coast, push right)
- Episode completes when car reaches the top (position ≥ 0.5) or 200 timesteps elapsed
- Challenge: Car engine is underpowered, must use momentum to climb
Results:
- Baseline (untrained): Cannot reach the top
- After training: Learns to build momentum and successfully climb the mountain
👉 Explore Mountain Car Project
- Python 3.7+
- Jupyter Notebook
- TensorFlow/Keras
- OpenAI Gym 0.26.2
# Clone or navigate to the repository
cd reinforcement-learning
# Install required packages
pip install --upgrade pip
pip install gym==0.26.2
pip install tensorflow
pip install jupyter- Navigate to a project folder (e.g.,
CartPole-v1/) - Open the Jupyter notebook:
jupyter notebook deep_q_learning_cartpole.ipynb
- Follow the notebook cells to:
- Train the DQN model
- Evaluate the trained agent
- Visualize results
| Technology | Purpose |
|---|---|
| TensorFlow/Keras | Deep neural networks for Q-function approximation |
| OpenAI Gym | Standard RL environments and interfaces |
| NumPy | Numerical computations and array operations |
| Matplotlib | Visualization of training results |
| ImageIO | Creating and saving demonstration GIFs |
For deeper understanding of the concepts implemented: