Abstract:
The traditional bag of words model for scene classification doesn’t consider the context information of images and the category differences between image features, a scene classification method based on multi-direction context features and spatial pyramid model is presented to solve this problem. At first, the images are divided into patches by a regular grid, and the scale invariant features (SIFT) are extracted, for each local image patch, its three context features are formed by combining the features from its neighborhood regions in three directions respectively. The visual words are formed by clustering the context features separately from different image categories and collated to form the final codebook, then the visual words histogram of images are obtained in the second step. At last, pyramid histogram of visual words are obtained by using spatial pyramid matching and classified by support vector machine (SVM). This method combines the feature similarity and contextual relation together, according to different scene categories which makes the codebook more discriminative. Experiments in common scene image databases show that this method performs better than the existed methods.