Semi-supervised Microblog Text Sentiment Classification Based on Global Feature Graph
-
摘要: 网络社交的流行与普及,使得微博等短文本区别于以往传统文章,具有了独有的文学表达形式和情感发泄方式,导致基于短文本的机器学习情感分析工作难度逐渐增大。针对微博短文本的语言表达新特性,爬取收集大量无情感标记微博数据,建立微博短文本语料库,基于全局语料库构建词与短文本的全局关系图,使用BERT(Bidirectional Encoder Representations from Transformers)文档嵌入作为图节点的特征值,采用图卷积进行节点间的特征传递和特征提取。采样部分无情感标记微博数据进行人工标注,采用半监督机器学习方法结合全局关系图提高情感分类器的性能,实验表明通过无情感标记数据比例的增加,该方法可以更好地捕捉全局特征,提高情感分类的精度。在自建人工标记数据、COAE2014数据集和NLP&CC2014数据集上进行了对比实验,实验结果表明该方法在精确率和召回率上均具有很好的表现。
Abstract: Online social networks have gradually become popular and popularization. A number of social networks such as microblog have formed a unique form of literary and emotional expression. Because the expression of microblog is different from the expression of traditional articles, the sentiment analysis research based on short-text machine learning has become more and more difficult. Aiming at the new features of Microblog short text language expression, we crawl and collect a large amount of non-emotionally labeled Microblog data, and build a Microblog short text corpus to create a global relationship graph between words and short texts. The BERT (Bidirectional Encoder Representations from Transformers) document embedding is used as the feature value of the graph node, and graph convolution is used for feature transfer and feature extraction between nodes. We manually annotate non-emotionally labeled Microblog data which sample from the whole Microblog short text corpus. A semi-supervised machine learning method combined with global relationship graph is proposed to improve the performance of sentiment classifier. Experiments show that by increasing the proportion of unmarked data, the method can better capture global features and improve the accuracy of sentiment classification. Comparative experiments are carried out on self-built artificial labeling data, COAE2014 data set and NLP&CC2014 data set. The experimental results show that the method has a good performance in accuracy and recall.-
Keywords:
- Microblog text /
- Sentiment analysis /
- Graph Convolutional Network /
- Semi-Supervised
-
-
[1] YUE L,CHEN W T,LI X,et al. A survey of sentiment analysis in social media[J].Knowledge and Information Systems,2019,60(2):617-663. [2] 李卫疆,伊靖.基于扩展特征矩阵和双层卷积神经网络的微博文本情感分类[J].计算机应用与软件,2019,36(12):150-155. [3] MIN Y, JIANG Q N ,YING S, et al. Hierarchical human-like strategy for aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement learning[J]. Neural networks : the official journal of the International Neural Network Society,2019,117:240-248. [4] 张仰森,郑佳,黄改娟,等.基于双重注意力模型的微博情感分析方法[J].清华大学学报(自然科学版),2018,58(02):122-130. [5] RUNGROJ M,HIDEAKI H,KENICHIK M. Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning[J]. IEEE Software,2019,36(5):65-70. [6] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014. [7] Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks[J]. arXiv preprint arXiv:1503.00075, 2015. [8] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014. [9] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008. [10] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013. [11] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365, 2018. [12] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018. [13] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80. [14] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016. [15] Yao L, Mao C, Luo Y. Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 7370-7377. [16] 李卫疆,漆芳.基于多通道双向长短期记忆网络的情感分析[J].中文信息学报,2019,33(12):119-128. [17] 廖经真. 基于深度学习的短文本情感分析[D].江西财经大学,2020. [18] Kim Y . Convolutional Neural Networks for Sentence Classification[J]. Eprint Arxiv, 2014. [19] Hochreiter S , Schmidhuber J . Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780. [20] 韩萍,孙佳慧,方澄,贾云飞.基于情感融合和多维自注意力机制的微博文本情感分析[J].计算机应用,2019,39(S1):75-78. -
期刊类型引用(6)
1. 张灵,李荣臻,郑苏. 融合标签语义嵌入和图卷积的短文本特征扩展及分类方法. 广东工业大学学报. 2024(01): 69-78 . 百度学术
2. 张亚洲,王梦遥,戎璐,俞洋,赵东明,秦璟. ChatGPT可否充当情感专家?——调查其在情感与隐喻分析的潜力. 北京大学学报(自然科学版). 2024(01): 43-52 . 百度学术
3. 李艳艳,冯柳鑫,徐梦舟,王笑一,张展鹏. 基于ALBERT-BiGRU模型的文本情绪分类研究. 中阿科技论坛(中英文). 2024(12): 80-84 . 百度学术
4. 方澄,李贝,韩萍,吴琼. 基于语法依存图的中文微博细粒度情感分类. 计算机应用. 2023(04): 1056-1061 . 百度学术
5. 唐宇坤,邓松,唐熙淳,许梦雅,郭馨. 基于矛盾关系的评教文本反语检测算法. 江西师范大学学报(自然科学版). 2022(01): 59-66 . 百度学术
6. 谭大宁,刘瑜,姚力波,丁自然,路兴强. 基于视觉注意力机制的多源遥感图像语义分割. 信号处理. 2022(06): 1180-1191 . 本站查看
其他类型引用(7)
计量
- 文章访问数: 164
- HTML全文浏览量: 10
- PDF下载量: 192
- 被引次数: 13