Traffic Light Signal Control Summary
Introduction
- Single-Agent RL
- Centralized
- often need to collect all traffic data in the network as the global state
- may lead to high latency and failure
- Multi-Agent RL
- decentralized
- each signalized intersection is regarded as an agent.
- main challenge is how to respond to dynamic interaction between each signal agent and the environment.
Background of Reinforcement Learning
Single-Agent RL
Uses Q-learning to solve sequential decision making problems by learning estimates for the optimal value of each action. However, not easy to learn values of all the actions in all states when the state space or action space is larger.
Multi-Agent RL
MARL enables each agent to learn the optimal strategy to maximize its cumulative reward.
However, impossible for all players in a game to optimzie their payoff simultaneously.
Description of the Proposed Method
A. Independent double Q-learning method
traditional Q-learning methods may cause the problem of overestimation, which to some extend harms the performance of RL methods.
Double Q-learning method uses double estimators which is helpful to avoid overestimation issues
B. Cooperative Double Q-learning method
When number of agents is relatively large, it if often not feasible to directly calculate the joint action functino for each agent k.
This method makes the input dimension of each agent k’s Q-function drastically reduce, and the joint action dimension decreases from C Nk to con- stant C2 .
In Co-DQL, the mean-field approximation makes every independent agent learn the awareness of collaboration with the others. Moreover, the reward allocation mechanism and the local state sharing method of agents improve the stabil- ity and robustness of the training process compared with the independent agent learning method.