面向毫米波雷达手势分类的RDI数据增强方法
RDI Data Augmentation Method for Millimeter-wave Radar Gesture Classification
-
摘要: 在基于毫米波雷达的手势分类任务中,应用深度学习技术可以显著提高准确率。然而,深度学习技术对数据量的依赖性很高,当训练样本数据稀缺时容易出现过拟合问题。由于不同的毫米波雷达参数差异较大、采集数据耗时费力,基于毫米波雷达的手势数据量往往非常有限。为了解决数据量稀缺这一问题,本文提出了一种融入注意力模块的距离多普勒图自编码(Range-Doppler Image AutoEncoder with Attention Module, RDI-AEAM)数据增强方法,旨在增强毫米波雷达手势数据的RDI表征。该方法针对RDI缺乏语义信息、难以进行标注以及特征不明显的特点,构建了一个融入注意力模块的自编码网络。首先,利用自编码器进行特征提取和数据压缩,学习输入数据的分布并提取有用特征。其次,利用注意力模块专注学习通道和空间维度的特征,解决特征不明显问题,使模型能够更加集中关注重要特征。训练过程中,预定义了原始数据标签,使用最小均方误差损失函数衡量生成数据的质量,达到设定阈值时将生成数据与预定义标签相关联,而无须额外后期标注。实验先选择100%训练集进行增强,相比仅使用原始训练集进行训练的结果,数据增强后的准确率在自建数据集上提高了0.83%,在公开数据集deepSoli和VR-HGR上分别提高了0.39%和3.23%,表明RDI-AEAM方法提高了手势分类性能。本文进一步探究了采用更少原始数据的增强效果,使用25%的训练集进行增强,在三组数据集上分别取得1.92%、2.62%和1.56%的提升。Abstract: In the task of gesture classification based on millimeter-wave radar, the application of deep learning techniques can significantly improve accuracy. However, deep learning models heavily rely on a large amount of data, and when training samples are scarce, overfitting issues are prone to occur. Gathering gesture data using millimeter-wave radar can be time-consuming and labor-intensive, and there is often a limited amount of data available due to significant variations in millimeter-wave radar parameters. To address the issue of limited data, this study proposes a data augmentation method called Range-Doppler Image AutoEncoder with Attention Module (RDI-AEAM), which incorporates an attention module to enhance the representation of millimeter-wave radar gesture data in range-Doppler image (RDI). This method is designed to overcome the challenges posed by the lack of semantic information in RDIs, the difficulty of annotation and distinctive features. In the RDI-AEAM, a self-encoder network with an attention module is constructed. Firstly, a self-encoder is used to extract features and compress the data, learning the distribution of the input data and extracting useful features. Secondly, the attention module focuses on learning channel and spatial dimension features to address the problem of indistinct features, allowing the model to concentrate on important features. During the training process, predefined labels are assigned to the original data. The quality of the generated data is measured using the mean squared error loss function. When the generated data meets a predefined threshold, it is associated with the predefined labels, eliminating the need for additional post-labeling. By selecting 100% of the training set for augmentation, the accuracy of the RDI-AEAM improves compared to using only the training set for training. The augmented data results in an improvement of 0.83% in accuracy on our self-built dataset, 0.39% on the deepSoli dataset, and 3.23% on the VR-HGR dataset, indicating enhanced gesture discrimination performance. Furthermore, we investigate the effect of using even less original data for augmentation, augmenting only 25% of the training set, which yields improvements of 1.92%, 2.62%, and 1.56% on the three datasets, respectively.