Traffic Light Signal Control Summary

Introduction

Single-Agent RL
- Centralized
- often need to collect all traffic data in the network as the global state
- may lead to high latency and failure

Multi-Agent RL
- decentralized
- each signalized intersection is regarded as an agent.
- main challenge is how to respond to dynamic interaction between each signal agent and the environment.

Background of Reinforcement Learning

Single-Agent RL

Uses Q-learning to solve sequential decision making problems by learning estimates for the optimal value of each action. However, not easy to learn values of all the actions in all states when the state space or action space is larger.

Multi-Agent RL

MARL enables each agent to learn the optimal strategy to maximize its cumulative reward.
However, impossible for all players in a game to optimzie their payoff simultaneously.

Description of the Proposed Method

A. Independent double Q-learning method

traditional Q-learning methods may cause the problem of overestimation, which to some extend harms the performance of RL methods.
Double Q-learning method uses double estimators which is helpful to avoid overestimation issues

B. Cooperative Double Q-learning method

When number of agents is relatively large, it if often not feasible to directly calculate the joint action functino for each agent k.
This method makes the input dimension of each agent k’s Q-function drastically reduce, and the joint action dimension decreases from C Nk to con- stant C2 .
In Co-DQL, the mean-field approximation makes every independent agent learn the awareness of collaboration with the others. Moreover, the reward allocation mechanism and the local state sharing method of agents improve the stabil- ity and robustness of the training process compared with the independent agent learning method.

Wei Jie's Lectures Notes

Explorer

Traffic Light Signal Control Summary

Traffic Light Signal Control Summary

Introduction

Background of Reinforcement Learning

Single-Agent RL

Multi-Agent RL

Description of the Proposed Method

Graph View

Table of Contents

Backlinks