面向6G的跨模态语义编解码技术

陈鸣锴; 柳明浩; 王文俊; 王磊; 郑宝玉

doi:10.16798/j.issn.1003-0530.2023.07.001

摘要: 在6G时代，为了兼顾用户沉浸式的多模态体验需求和低时延，高可靠的通信质量，语义通信技术被认为是6G通信中极具潜力的研究方向之一。为此，本文提出了一个跨模态语义通信系统，其中主要设计了语义编码模块和语义解码模块，对于编码后的三种模态语义中间向量用F范数判别其相似度，舍弃相同特征，保留独有特征进行特征加权求和，完成了跨模态语义融合，实现了多模态业务中用户不同任务需求为驱动的端到端的数据传输。这种编解码框架实现了包含了语音、文本、图像三种多模态数据的跨越式传输，为通信面向语用任务提供了解决方案，极大程度上增强了用户体验。此外，本文还提出了对于收发两端跨模态语义相似度的评价体系结构，该体系结构主要由孪生网络和伪孪生网络构成，能准确获取模态内容之间的匹配损失，反向指导编解码部分的参数优化，使损失值达到最小，促使网络迭代收敛，从而实现编解码模块对于语义的精准达意传输。仿真结果表明，所提出的跨模态语义通信系统明显优于传统通信系统。高信噪比情况下，各种模态相似度几乎都达到90%以上。在低信噪比情况下，跨模态语义通信系统的优势则更加明显，跨模态语义传输的相似度相较于传统通信提升超过53%，因此佐证了跨模态语义通信的优越性和可行性。

Abstract: ‍ ‍In the era of 6G， semantic communication is considered as one of the most potential research directions in 6G communication. Semantic communication tries to focus on the requirements of users’ immersive and multi-modal experience， low latency， and high reliability in order to make it clear. For this reason， a cross-modal semantic communication based on deep learning is proposed in this paper， in which the semantic encoding and the semantic decoding are designed. The Frobenius norm is used to judge the similarity of the three encoded modal semantic intermediate vectors， discarding the same features， and preserving the unique features for the feature-weighted summation. Cross-modal semantic fusion is also designed， and the end-to-end data transmission driven by different user task requirements of users in multi-modal business is realized in this communication. This coding and decoding framework realizes the cross-modal data transmission of multi-modal data including voice， text， and image， it provides solutions for pragmatically oriented tasks of communication and greatly enhances the addition of user experience. In this paper proposes an architecture for evaluating the semantic similarity between the receiver and the transmitter. The architecture is composed of a siamese network and a pseudo-siamese network， the siamese network discriminates the same mode and the pseudo-siamese network discriminates the different modes. At the same time， the matching loss between the modal contents is obtained accurately. And we assume that feedback guides the optimization of the optimization in the reverse direction so that the loss value reaches the minimum， and the whole network iteration converges in such a way as to achieve accurate semantic translation in both the encoder and decoder. From the simulation results， it can be seen that the proposed cross-modal semantic communication is obviously superior to the traditional communication system. In the case of high SNR， the similarity of all the modes is almost more than 90%. In the case of low SNR， the advantage of a cross-modal semantic communication system is more obvious. The similarity of the cross-modal semantic communication is improved by more than 53% compared with traditional communication. Thus， the superiority and feasibility of cross-modal semantic communication are proved.

面向6G的跨模态语义编解码技术

Codec for Cross-modal Semantic Communication in 6G