基于通感一体的多无人机感知联合优化方法
Joint Optimization for Multi-UAV Sensing Based on Integrated Sensing and Communication
-
摘要: 低空无线网络(Low-Altitude Wireless Networks, LAWNs)作为实现广域覆盖与智能感知的核心支撑,其综合效能受到频谱资源紧张与硬件能力限制的刚性约束。为应对该问题,通感一体化(Integrated Sensing and Communication, ISAC)技术通过共享频谱与硬件资源,在提升资源利用率的同时,也为实现更高效的感知与通信融合提供了可能。进一步,基于ISAC的多无人机(Unmanned Aerial Vehicle, UAV)协同感知能够突破单机在覆盖范围、感知精度等方面的效能瓶颈,成为满足低空监管高要求的关键路径。然而,多机协作引入了复杂的空间几何约束与同频干扰,且三维动态环境下通信与感知任务强耦合,导致联合优化无人机轨迹、功率、子载波分配及用户关联成为极具挑战的混合整数非线性规划问题。此外,经典多智能体强化学习算法在大规模网络中面临状态空间爆炸、收敛速度慢及策略搜索效率低等瓶颈,难以满足动态协同感知的实时性要求。针对上述挑战,本文提出量子增强的分层多智能体强化学习算法(Quantum-enhanced Hierarchical Multi-Agent Proximal Policy Optimization, Q-H-MAPPO),旨在实现多无人机协作感知与通信的联合效能最大化。首先,构建面向协作感知的联合优化模型,引入克拉美-罗下界(Cramér-Rao Lower Bound, CRLB)作为感知性能指标,量化多机几何构型对感知精度的影响,并在保障通信服务质量的前提下最小化定位误差;其次,采用集中训练分散执行(Centralized Training and Decentralized Execution, CTDE)框架,设计分层马尔可夫决策过程,通过任务分解实现离散变量与连续变量的解耦;进一步,引入基于量子变分电路的振幅编码机制,并借鉴量子交换测试思想设计图注意力机制,在经典计算框架下模拟量子态的特征映射能力,高效提取多智能体间非线性协作关系与信道状态信息;最后,仿真结果表明,所提Q-H-MAPPO算法在多目标、高负载动态场景下性能优异。以目标数为6的场景为例,该算法使感知定位CRLB降至约0.18 m,相比其他基准方法降低了至少7%,在15架无人机的大规模组网场景下,系统和速率达到约31.5 Mbps,较上述基线方法提升约21%~44%。同时,算法在扩展至20架无人机时,推理时延仍稳定在约20~26 ms,较典型基线降低约81%~87%,且收敛速度最快,在150~200训练回合内即达到稳定。这些结果验证了Q-H-MAPPO在提升大规模低空网络协作感知精度、系统吞吐量及决策实时性方面的显著优势。Abstract: Low-Altitude Wireless Networks (LAWNs) serve as a critical infrastructure pillar for achieving extensive wide-area coverage and intelligent sensing capabilities. Nevertheless, the overall efficacy of these networks is severely constrained by rigid limitations, specifically the scarcity of spectrum resources and the restricted capabilities of onboard hardware. To mitigate these challenges, Integrated Sensing and Communication (ISAC) technology has emerged as a promising solution. By enabling the sharing of spectrum and hardware resources, ISAC not only enhances resource utilization efficiency but also facilitates a deeper integration of sensing and communication functions. Building upon ISAC, multi-Unmanned Aerial Vehicle (UAV) cooperative sensing offers a pathway to transcend the performance bottlenecks inherent to single-UAV systems, particularly regarding coverage scope and sensing precision. Consequently, this approach represents a pivotal strategy for satisfying the stringent requirements of low-altitude supervision and regulation. However, the introduction of multi-UAV collaboration engenders complex spatial geometric constraints and co-channel interference. Furthermore, in three-dimensional dynamic environments, the strong coupling between communication and sensing tasks renders the joint optimization of UAV trajectories, transmit power, subcarrier allocation, and user association a highly intractable Mixed-Integer Non-Linear Programming (MINLP) problem. Additionally, classical Multi-Agent Reinforcement Learning (MARL) algorithms encounter significant bottlenecks when applied to large-scale networks, including state space explosion, sluggish convergence speeds, and low efficiency in policy search. These limitations hinder their ability to meet the stringent real-time demands of dynamic cooperative sensing tasks. In response to these challenges, this paper proposes a Quantum-enhanced Hierarchical Multi-Agent Proximal Policy Optimization (Q-H-MAPPO) algorithm, designed to maximize the joint effectiveness of multi-UAV cooperative sensing and communication. Initially, a joint optimization model tailored for cooperative sensing is constructed. The Cramér-Rao Lower Bound (CRLB) is adopted as the key performance metric for sensing to quantify the impact of multi-UAV geometric configurations on sensing accuracy. The objective is to minimize the positioning error while simultaneously guaranteeing the Quality of Service (QoS) for communication tasks. Subsequently, a Centralized Training and Decentralized Execution (CTDE) framework is employed to design a Hierarchical Markov Decision Process (H-MDP). This hierarchical structure facilitates task decomposition, thereby achieving effective decoupling of discrete and continuous variables. Furthermore, the study introduces an amplitude encoding mechanism based on quantum variational circuits and designs a Graph Attention Mechanism inspired by the concept of quantum swap tests. By simulating the feature mapping capabilities of quantum states within a classical computing framework, the proposed method efficiently extracts non-linear cooperative relationships among multiple agents as well as critical channel state information. Simulation results indicate that the proposed Q-H-MAPPO algorithm demonstrates superior performance in multi-target, high-load dynamic scenarios. In a specific scenario involving six targets, the algorithm reduces the sensing positioning CRLB to approximately 0.18 m, representing a reduction of at least 7% compared with other benchmark methods. In a large-scale networking scenario involving 15 UAVs, the system sum rate achieves approximately 31.5 Mbps, reflecting an improvement of approximately 21% to 44% over the aforementioned baseline methods. Moreover, when the network scales to 20 UAVs, the inference latency remains stable between 20 and 26 ms that is a reduction of approximately 81% to 87% compared with typical baselines. The algorithm also exhibits the fastest convergence speed, achieving stability within merely 150 to 200 training episodes. These findings validate the significant advantages of Q-H-MAPPO in enhancing the cooperative sensing accuracy, system throughput, and decision-making real-time performance within large-scale low-altitude networks.
下载: