基于序贯检测的快速马尔可夫决策:理论、方法及应用

Sequential Detection Based Quickest Markov Decision ProcessesTheoryAlgorithmsand Applications

  • 摘要: 本文立足存在突变状态与检测噪声复杂环境,针对控制后效性与动作迟滞性问题,探索提升决策与控制时效性的方法,提出了一种基于序贯检测的快速马尔可夫决策框架,并应用于智能电网、疾控、水利等若干典型场景。具体的,本文发掘了统计信号处理中的变化点最速检测与随机最优控制中的马尔可夫决策之间的关联,建立了一种包含四维状态的受约束马尔可夫决策框架。该框架可选择一种可行的联合检测-控制策略,最大化控制对象的期望回报,或达到平均收益与风险的最佳折中。相对于传统的“先检测变化点、后调整可控量”的分层策略,所提出的新方法实现了“边检测变化点,边调整可控量”的跨层协同,可有效应对检测延时、反应迟滞对决策控制时效性带来的挑战。在智能电网、疾控、水利等场景中,均展示了“检中调”的思路显著优于“检后调”的传统方法。最后,本文还简要展望了基于序贯检测的快速马尔可夫决策在海上碳封存、网络攻击检测防御中的潜在应用价值。

     

    Abstract: ‍ ‍In this paper, joint signal processing and control methods for complex dynamical systems with statistically change point, observation noise, aftereffects, and action latency were investigated to maximize the overall utility of delay-sensitive decision making. A unified framework combining the quickest change detection in statistical signal processing and the Markov decision process in stochastic optimal control was presented along with its potential applications in smart grid, disease control, and hydrology. By leveraging a four-dimensional constrained Markov decision process, the proposed framework maximized the expected reward characterized by the weighted sum of the income and risk, while satisfying various constraints due to operations, feasibility, and environments. In contrast to the conventional layered infrastructure in which an action is launched after the change point is detected, the new architecture enabled a cross-layer cross-disciplinary collaboration between signal processing and control, which implemented real-time decisions much timelier based on instantaneous likelihood estimation. The paradigm-shift idea brought substantial gain for dynamical or stochastic systems that are sensitive to the latency in decision or control, while suffering from huge detection delay and/or strong aftereffects. It was demonstrated that the joint detection and control strategy outperformed the control-after-detection policy in smart grid, disease control, and hydrology with considerable gain observed. Finally, we briefly envisioned the potential applications of sequential detection based quickest Markov decision processes in carbon capture and storage in the seafloor as well as network attack detection and mitigation.

     

/

返回文章
返回