Author: Shangwen Li and C.-C. Jay Kuo

The text based information retrieval techniques has achieved significant progress over the last decades, resulting huge search engine company like Google. However, the image based retrieval problem is still an open field with no perfect solution. Currently, the content based image retrieval methods attempt to extract low-level feature (including shape, color, texture etc.) and search for related images based on the similarity of features. However, this is rather unreliable since one object will have different look under different scenario. Another way to handle this problem would be first annotating the image with key concepts within the images, and then using text based search method to retrieve relevant information. However, manually labeling of images is a tremendous time consuming activity. Consequently, automatic annotation becomes a potential way of solving image retrieval problem.

Current I am still searching for a good solution to the automatic image annotation problem. As shown in Fig. 1, the biggest challenge lies in the image annotation is how we can link the low-level features and high-level linguistic concept together. In current literature, there are no satisfied solutions for this. Typically, the F measure of all proposed algorithms are lower than 0.5. My way of solution aims at first trying to annotate the image with some metadata, like human/non-human, indoor/outdoor, visual salient or not etc. By first categorizing the image into some coarse classes, we can apply different methods to each class accordingly.

Currently, lots of image annotation algorithms are trying to utilize probabilistic topic model to link the features and concept [1][2][3][4]. There are also other methods that tried to use KNN method to solve the problems [5]. However, none of them are trying to divide the [...]