Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition

Wang Jun; Lu Shu; Li Yunwei

doi:10.16798/j.issn.1003-0530.2020.09.007

Wang Jun, Lu Shu, Li Yunwei. Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(9): 1429-1439. DOI: 10.16798/j.issn.1003-0530.2020.09.007

Citation:

Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition

Graphical Abstract

Abstract

Abstract

One of the difficulties in continuous sign language recognition is the redundant information in the spatio-temporal dimension of the sign language data, and the alignment of the sign language data with a given label sequence . Therefore, we propose a sign language sentence recognition model that combines attention mechanism and connected temporal classification, which can extract short-term spatio-temporal features of color and depth video segments and hand motion trajectories in sign language data. To obtain the long-term spatio-temporal features, the features of the three modals are fused and weighted using spatial attention, then input into the bidirectional long short term memory network in time sequence for time series modeling. Finally, decoder network that integrates the attention mechanism and the connection temporal classification model is used end-to-end to achieve accurate recognition of continuous sign language. This model was tested on a Chinese sign language data set collected by ourselves, and obtained a high accuracy rate 0.943.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition

Abstract

Catalog

Export File

Citation

Format

Content