基于多域表征与多分辨率融合的多通道毫米波雷达手势识别

Hand-Gesture Recognition for Multi-channel Millimeter-Wave Radar Based on Multi-Resolution and Multi-Presentation Fusion

  • 摘要: 随着智能设备的普及和技术的快速发展,手势识别技术在智慧家居、智能驾驶领域展现了巨大的应用价值。手势识别的关键在于如何在不同用户、不同方位、手势特征易混淆的情况下仍然保持较高识别精度。为解决易混淆手势识别率低,特征利用不充分的问题,本文提出了一种基于多域表征与多分辨率融合的卷积神经网络的手势识别方法,首先分别通过短时傅里叶变换(Short-Time Fourier Transform, STFT)、二维快速傅里叶变换(Two-Dimensional Fast Fourier Transform, 2D-FFT)、最小方差无畸变响应(Minimum Variance Distortionless Response, MVDR)波束形成生成不同域下的三种表征图像,即时间-频率域、时间-距离域和时间-角度域图像。针对三种表征图像,设计了由三个并行二维卷积神经网络(Two-Dimensional Convolutional Neural Networks, 2DCNNs)与多分辨率融合模块(Multi-Resolution Fusion Module, MRFM)串联的复合神经网络,用于从图像中提取手势特征并进行识别。最后,创建了包含七种手势的多域表征图像数据集,对模型进行训练和测试,测试结果表明,在不同用户、不同方位、不同环境的手势识别场景下,本文提出的方法与不使用多分辨率融合模块的模型相比,对七种易混淆手势的平均识别准确率提高了2.6%。与仅使用单一类别表征的模型相比,该模型的识别准确率提高了4.8%。

     

    Abstract: ‍ ‍With the accelerated popularization of smart devices and the rapid development of technology, gesture recognition technology underpins its application potential and broad market prospect in the fields of smart home and smart driving. In these fields, the key problem of gesture recognition considered maintaining the efficient and accurate recognition capability in the face of different users, different orientations, and confusing gesture features. An innovative gesture recognition method based on a convolutional neural network with multi-domain representation and multi-resolution fusion was proposed in this study to solve the problems of low recognition rate of confusing gestures and underutilization of feature information. The method aimed at complementing and optimizing gesture features. First, three feature expression images in different domains, i.e., time-frequency, time-distance, and time-angle domains, were formed by Short-Time Fourier Transform (STFT), Two-Dimensional Fast Fourier Transform (2D-FFT), and Minimum Variance Distortionless Response (MVDR) beams to form three feature expression images in different domains, i.e., time-frequency domain, time-distance domain, and time-angle domain images. For the three feature expression images, a composite neural network comprising three parallel 2-Dimensional Convolutional Neural Networks (2DCNNs) connected in series with a Multi-Resolution Fusion Module (MRFM) was designed for extracting the features from the images. 2DCNNs were used in tandem with MRFM for extracting gesture features from images and recognizing them. Finally, a multi-domain feature representation image dataset containing seven types of gestures was created to train and test the model, and the test results showed that under gesture recognition scenarios with different users, different locations, and different environments, the proposed method in this paper improved the average recognition accuracy of the seven confusing gestures by at least 2.3% compared to the model without the MRFM. The model improved the recognition accuracy by at least 4.6% compared to the model using only a single category representation.

     

/

返回文章
返回