Abstract:
The classical probabilistic topic models can discover the latent topic information of documents by the co-occurrences of words, thus being widely used in text clustering and categorization tasks. In the last few years, with the successful applications of word embedding and neural networks, the research of text categorization based on neural networks has formed the mainstream. This paper shows the superiority of neural networks in text categorization tasks by comparing the Convolutional Neural Networks (CNN) and probabilistic topic models. And on this basis, this paper extracted the document feature vector through CNN and named it Convolutional Semantic Feature. In order to describe the topic information of documents better, this paper proposed a new kind of feature by combing the Convolutional Semantic Feature and latent topic information. The experimental results presented in this paper shows that this kind of new feature is superior to individual probabilistic topic model or CNN model,and obviously improves the F1 performance of topic categorization tasks.