Wu Wenbo, Gu Guanghua, Liu Qingru, Zhao Zhiming, Li Gang. Dense image caption with deep convolution and global visual[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(9): 1525-1532. DOI: 10.16798/j.issn.1003-0530.2020.09.018
Citation: Wu Wenbo, Gu Guanghua, Liu Qingru, Zhao Zhiming, Li Gang. Dense image caption with deep convolution and global visual[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(9): 1525-1532. DOI: 10.16798/j.issn.1003-0530.2020.09.018

Dense image caption with deep convolution and global visual

  • In order to solve the problems of inaccurate location of Regions of interest (ROI) and coarse-grained description of Regions in dense image cption, in this paper, an dense image description algorithm based on deep convolution and global features is proposed. This algorithm adopts the joint model of Residual network and parallel LSTM(Long Short Term Memory) network to further improve the existing regional overlapping location and the incomplete coarse-grained description details. Firstly, the depth Residual Network and the RPN(Regional Proposal Network) layer of Faster R-CNN are used to obtain more accurate regional boundary frame, so as to avoid overlapping of regional markers. Then the global feature, local feature and context feature information are input into the parallel LSTM network respectively and the fusion operator is used to integrate the three different outputs to obtain the final description statement.Compared with two mainstream algorithms on the open data set, the model presented in this paper has some advantages.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return