基于深度强化学习的多域联合干扰规避

潘筱茜; 张姣; 刘琰; 王杉; 陈海涛; 赵海涛; 魏急波

doi:10.16798/j.issn.1003-0530.2022.12.012

基于深度强化学习的多域联合干扰规避

Multi-domain Joint Interference Avoidance Based on Deep Reinforcement Learning

摘要

摘要: 无线通信系统的信道开放性使其极易受到外部恶意干扰、通信链路质量难以保证，针对以上问题，本文设计了一种基于深度强化学习的多域联合干扰规避决策方法。该方法联合频域、功率域、调制编码域三个域的抗干扰手段进行干扰规避，在考虑系统性能的同时实现可靠通信。首先，将联合智能干扰规避问题建模为一个马尔可夫决策过程（MDP， Markov Decision Process），动作空间包含切换信道、功率控制、改变调制编码方式。然后，采用基于剪裁的近端策略优化算法（PPO-Clip， Proximal Policy Optimization-Clip）求解获得系统的最优联合干扰规避策略。PPO-Clip算法在多回合训练中以小数量样本迭代更新，避免了策略梯度算法中步长难以确定和更新差异过大的问题。最后，分别在扫频干扰、随机扫频干扰和智能阻塞干扰环境下验证了所提算法的有效性和可靠性。

Abstract: ‍ ‍The channel openness of wireless communication systems made them extremely vulnerable to external malicious interference， which was difficult to guarantee the quality of communication links. This paper designs a multi-domain joint interference avoidance decision-making method based on deep reinforcement learning. This method combined the anti-interference means of the frequency domain， power domain， and modulation coding domain to avoid interference， and realized reliable communication while considering system performance at the same time. Firstly， the joint intelligent interference avoidance problem was modeled as a Markov Decision Process （MDP）， the action space included switching channels， adapting power， and changing the modulation and coding method. Then， the proximal policy optimization algorithm based on the objective clip （PPO-Clip） was used to solve the optimal joint interference avoidance strategy of the system. The PPO-Clip algorithm was iteratively updated with a small number of samples in multi-turn training， which avoided the problem that the step length is difficult to determine and the difference in policy updates is too large in the policy gradient algorithm. Finally， the effectiveness and reliability of the proposed algorithm are verified under the environment of sweep interference， random sweep interference， and intelligent blocking interference， respectively.

HTML全文

参考文献(18)

施引文献

资源附件(0)