Main Subject Detection via Adaptive Feature Refinement
Cuong T. Vu, and Damon M. Chandler
Journal of Electronic Imaging, 20 (1), March 2011
We present an algorithm for main subject detection (MSD) which operates by adaptively refining low-level features. The algorithm computes, in a block-based fashion, five feature maps corresponding to lightness distance, color distance, contrast, local sharpness, and edge strength. These feature maps are adaptively combined and gradually refined via three stages. The final combination of the refined feature maps produces an estimate of the main subject's location. Tested the proposed algorithm on a large image database, our results show that relatively simple, low-level features, when used in an adaptive and iterative fashion, can be very effective at MSD.
Provided in this web page:
- Diagram of the proposed algorithm
- Representative results
- Overall performance (tested on a 5000-image database)
- Implementation Code
|Diagram of the propose algorithm|
Figure 1 demonstrates the outline of the algorithm. Starting with the baseline stage, five features are measured and adaptively combined to obtain the baseline MSD map and then the baseline bounding box around the main subject. This initial bounding box provides the first guess of the main subject and background region. This guess of the background allows the algorithm to modify features, which are again adaptively combined to obtain the refined MSD map and refined bounding box. The new bounding box can be considered as the new foreground and used to modify the features again. These features are combined adaptively into the final MSD map and thus the final bounding box.
Fig. 1. Outline of the proposed algorithm.
Fig. 2. Comparison of different algorithms' final maps. Other algorithms usually detect not only the main subject regions but also several regions in the background. Overall, our MSD maps show clear main subject regions with well-suppressed background regions.
Fig. 3. Comparison of different algorithms' bounding boxes.
Fig. 4. Representative results of our algorithm on other images from the bounding-box-based ground truth database.
Figure 2 shows our MSD maps compared with maps from other algorithms, and Figure 3 shows the bounding boxes around main subjects based on each map. As shown in Figure 3, most algorithms can yield an accurate bounding box for some images. However, as shown in Figure 2, these accurate bounding boxes do not necessarily stem from accurate maps. The maps from other algorithms generally detect not only the main-subject regions but also several regions in the background. Note, however, that the maps from visual-attention-based algorithms such as ones from Itti et al., Harel et al. and Le Meur et al. were designed to denote regions of visual attention rather than main-subject regions. Overall, our maps show clear main subject regions and good suppression of background regions; our bounding boxes around main subjects are therefore easy to draw. Figure 4 shows additional representative results of our algorithm.
We evaluate the performance of our algorithm based on the first database is the MSRA Salient Object Database, Image Set B, which contains 5000 images. These are 24 bits/pixel color images with sizes ranging from 222×165 to 400×400 pixels. Each image in this set contains only one main subject that has been consistently labeled by nine human observers. The ground truth bounding box surrounding the main subject was averaged from results of observers.
Table 1 shows the Precision, Recall, F-measure, and BDE of our algorithm and others on this database. It is important to note that Recall is not necessarily an appropriate measure for MSD, since a 100% Recall can be easily obtained by selecting the whole image. The main challenge in MSD is to simultaneously obtain high Precision and F-measure, and low BDE. As can be seen from Table 1, on these three criteria, our algorithm gives very competitive results compared to the best results (as we know) from Liu et al., given that we use only low-level features. Other algorithms generally yield higher BDE and lower Precision and F-measure. However, note that the algorithm of Achanta et al. and Hou et al. were not designed to yield a bounding box around the main subject. Also, given that algorithms from Itti et al., Harel et al. and Le Meur et al. were designed to predict visual attention, their performance on this database is noteworthy.
|We are currently packing up our code, both in Matlab and C, which will be available soon. In the mean time, if you need to run our code on your images, please send us an email (cuong dot vu at okstate.edu) with a link to your images. We will run our code and send the results for you.|