基于深度强化学习的无人机间通信链路智能决策

周世阳; 程郁凡; 徐丰; 雷霞

doi:10.16798/j.issn.1003-0530.2022.07.008

基于深度强化学习的无人机间通信链路智能决策

Deep Reinforcement Learning Based Intelligent Decision-Making for Communication Links Between UAVs

摘要

摘要: 由于无人机组网灵活、快速、低成本的特性，空中基站被视为在未来无线通信中有前景的技术。无人机集群可以通过相互协调和合作，完成的复杂任务，具有重大的研究和实用价值，而无人机间的高效通信是当下面临的重大挑战。为了在满足无人机间通信速率的前提下，尽可能节省发射功率，本文提出基于深度强化学习的集群方案和功率控制的智能决策算法。首先，本文设计了三种无人机集群方案，以对地面用户提供无缝的无线覆盖；然后，本文提出了基于深度Q网络（Deep Q-network）算法的集群方案和功率控制决策算法，用深度神经网络输出不同条件下联合决策的无人机集群方案和发射功率，并研究了重要性采样技术，提高训练效率。仿真结果表明，本文提出的深度强化学习算法能够正确决策无人机集群方案和发射功率，与不带强化学习的深度学习（Deep Learning Without Reinforcement Learning， DL-WO-RL）算法相比，用更低的发射功率满足无人机之间的通信速率要求，并且重要性采样技术能够缩短DQN算法的收敛时间。

Abstract: ‍ ‍Due to the flexible， swift， and low-cost features of unmanned aerial vehicle （UAV） networking， aerial base stations are considered as a promising technology in future wireless communications. UAV clusters can complete complex tasks through coordination and cooperation， which has great research and practical value， while efficient communication between UAVs is a big challenge currently facing. In order to cost less transmission power as much as possible under the premise of meeting the communication rate between the UAVs， this paper proposes an intelligent decision-making algorithm for cluster scheme and power control based on deep reinforcement learning. First， this paper designs three UAV cluster solutions to provide seamless wireless coverage for the ground users； then， this paper proposes a deep Q-network （DQN） based cluster scheme and power control algorithm， and uses a deep neural network to output the decision-making UAV cluster scheme and transmission power under different conditions， and studied the importance sampling technique to improve training efficiency. The simulation results demonstrate that the proposed deep reinforcement learning algorithm can correctly select the UAV cluster scheme and transmission power， and use less transmission power to meet the communication rate requirements between the UAVs compared with the Deep Learning Without Reinforcement Learning （DL-WO-RL） algorithm. Moreover， importance sampling technique can shorten the convergence time of the DQN algorithm.

HTML全文

参考文献(18)

施引文献

资源附件(0)