摄动分析、马尔可夫决策和强化学习

Perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) cover different aspects of performance optimization of discrete event dynamic systems (DEDSs). PA provides performance sensitivities for a DEDS by analysing the dynamical behaviour of a single sample path of the DEDS. Performance optimization can be achieved by combining PA with stochastic approximation methods. MDP is a general model for performance optimization of DEDSs. Policy iteration, the basic approach in MDPs, can be implemented based on sample paths. The goal of RL is to learn how to make decisions to improve a system's performance by observing its behaviour. Recent research shows that these three seemingly different areas are naturally related. This four-week short course serves as an introduction to these three main areas of DEDS optimization. The focus will be on the fundamental concepts and the relations among these areas. Some up-to-date research results will be presented, sample-path-based implementation will be emphasized, and examples will be given to illustrate the applications of the optimization theories.