基于胶囊网络的中长微博情感分析
Sentiment Analysis of Mid-length Microblog Based on Capsule Network
-
摘要: 针对通过微博文本获取用户情感倾向,以提高舆情监控效率的问题。利用深度学习的方法实现微博语料的情感分类,构建符合近年文本长度分布特点的高质量微博情感分类数据集,分析微博文本长度对情感分类的影响。由于中长语料主观性强、句子关联度弱,其检测准确率偏低。针对此问题,本文提出一种基于胶囊网络的中长微博情感分析模型。采用注意力机制,在融合局部特征与全局特征的基础上,利用胶囊向量实现深层情感特征提取,提高中长语料的检测效果。利用本文搜集的数据集进行实验,结果表明,相较于多种深度学习算法,本文模型性能更佳。在不同文本长度语料的对比实验中,伴随着文本长度的增加,分类准确率逐渐降低。相较于传统的LSTM算法,本文模型随文本长度增加效果提升,证明了该模型针对中长微博文本情感分类的可行性。Abstract: Aiming at the problem of obtaining user sentiment tendency through microblog text to improve the efficiency of public opinion monitoring. This paper uses deep learning to realize sentiment classification of microblog corpus, constructs a high-quality microblog sentiment classification data set that conforms to the characteristics of text length distribution in recent years, and analyzes the influence of microblog text length on sentiment classification. Due to its strong subjectivity and weak sentence relevance, the detection accuracy of the mid-length corpus is low. In response to this problem, this paper proposes a sentiment analysis model for mid-length microblog based on the capsule network. Using the attention mechanism, based on the fusion of local features and global features, the use of capsule vectors to achieve deep emotional feature extraction to improve the detection effect of mid-length corpus. Using the data set collected and constructed in this paper to conduct experiments, the results show that compared with a variety of deep learning algorithms, the performance of the model in this paper is better. In the comparative experiment on corpus of different text lengths, as the length of the text increases, the classification accuracy rate gradually decreases. Compared with the traditional LSTM algorithm, the effect of this model increases with the increase of text length, which proves the feasibility of this model for sentiment classification of mid-length microblog texts.