Abstract:
Detecting near-duplicate images accurately is very important for redundancy removal and copyright infringement detection. To improve the performance of Uniform Splitting based Support Vector Machine External Clustering(US-SVMEC), an near-duplicate image clustering algorithm which combines Greedy Tree with SVMEC(GT-SVMEC)is proposed in this paper. Firstly, SVMEC is applied to cluster the dataset into two clusters. Then, greedy tree growing algorithm is used to choose the “best” cluster to split. Repeat above procedure until no improvement can be achieved. In addition, to overcome the problem of visual word synonymy, Probabilistic Latent Semantic Analysis(PLSA)model is adopted to map the co-occurring image visual words to the same direction in the latent semantic space. Experimental results show that compared with SVM-Internal Clustering(SVMIC)and US-SVMEC, our proposed approach improves the clustering performance obviously.