Copyright © 2014 by Intelligent Systems Laboratory, Computer Science Department, Technion - Israel Institute of Technology, Haifa 3200003, Israel. All rights reserved

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models


By: Tamar Avraham and Michael Lindenbaum

Image analysis processes often scan the image exhaustively, looking for familiar objects of different location and size. An attention mechanism that suggests priorities to different image parts and by that directs the analysis process to examine more interesting locations first can make the whole process much more efficient.





Motivated from studies of the human visual system, this study focuses mainly on inner-scene similarity as a source of information, testing its usefulness for directing computer visual attention from different perspectives. Taking a stochastic approach, we model the identity of the candidate image parts as a set of correlated random variables and derive two attention/search algorithms based on it:

  • The first algorithm, denoted VSLE (Visual Search using Linear Estimation) suggests a dynamic search procedure. Subsequent fixations are selected from combining inner-scene similarity information with the recognizer's feedback on previously attended sub-images. We show that VSLE can accelerate even fast detection processes as the one suggested by Viola and Jones [3] for face detection. See [1] for details.
  • The second algorithm, denoted Esaliency (Extended Saliency) needs no recognition feedback and does not change the proposed priorities. It is therefore denoted a static algorithm and can compete with previous attention mechanisms that also suggest a pre-calculated saliency map that is used to guide the fixations order. This algorithm incorporates inner-scene similarity information with the common expectation for a relatively small number of objects of interest. Unlike other acceptable models of visual attention (e.g., Itti and Koch's [4] ) that associate saliency with local uniqueness, the Esaliency algorithm takes a global approach by considering a region as salient if there are only a few (or none) other similar regions in the whole scene. The algorithm uses a graphical model approximation that allows the hypotheses for target locations with the highest likelihood to be efficiently revealed. Its performance on natural scenes is extensively tested and its advantages for directing the recognition algorithm's resources are demonstrated. See [2] for details.


 Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models


Papers Download:

[1] Tamar Avraham and Michael Lindenbaum, Dynamic Visual Search Using Inner Scene Similarity - Algorithms and Bounds IEEE-Transactions on Pattern Analysis and Machine Learning, 28(2):151-264.

[2] Tamar Avraham and Michael Lindenbaum, Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling – In preparation



The EsaliencyVSLE application implements both the VSLE and the Esaliency algorithms, enabling to apply each or both. The application may be applied on single images, on a set of images, on a video image sequence, or on frames from a live camera. The input images/frames may be gray leveled or colored. It has a graphical user interface that enables the user to set parameters, to set initial priors, and to present the results. The default recommended parameters are chosen automatically when the application is invoked. For a detailed description of the application and the input parameters see Application Documentation

For the application and some sets of input download The EsaliencyVSLE application


Example Results for VSLE:

The VSLE algorithm was applied on the MIT+CMU faces database [5] that includes 130 images with 511 marked faces. The images where segmented to yield initial candidates. The number of fixations required for detecting targets was recorded for a sequential scan of each image and for the (dynamic) scanning order set by VSLE.

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models


We can see that when using VSLE for deciding on the scan order, target are located faster. In fact, if we stop scanning after scanning only 30% of each image 72% of the faces are already located, and if we stop the scanning after 55% of each image, 90% of the faces are detected.

Some examples: The faces detections if using VSLE and stopping after scanning x% of the image.

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models


In the above experiment we assumed that there is a perfect recognizer available. Now we checked the results when the Viola&Jones face detector (as it is implemented in the openCV library) plays the role of the recognizer. We compare between two scenarios: In the first we apply Viola and Jones algorithm on the 130 images and record the time for detection in each image. In the second scenario VSLE sets the order of attention: each time VSLE chooses a region to fixate on, Viola and Jones's face detector is applied on a corresponding sub image. Time to detection is also recorded in that case. The following graph shows the comparison between the detection times.

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models


We see that at the beginning VSLE waists some time on segmentation and similarity measures, but then the time for detection is less then using only V&J’s algorithm for most faces.  If the time is limited to say 75% of the total time needed to process all the image, VSLE helps. 

See more details about these experiments and more experiments testing VSLE in [1].


Example Results for Esaliency

Esaliency was applied on images from the Washington University database [6]. This database includes natural scenes with object of various categories (examples in the leftmost column of the images below). Human subjects, unaware of the research goal, were asked to mark what they considered as interesting objects (2nd column). Below are some examples of the first fixations suggested by Esaliency (3rd column) and those suggested by the iLab toolkit (4th column) that implement Itti and Koch's attention mechanism [7]. Note that Esaliency is a bottom-up attention mechanism that sets static priorities prior to the search process and does not use any top-down information regarding the properties of the target objects.

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models

Below are some examples where Esaliency is applied to the StreetScenes database [8] images. In those examples we tested the ability of Esaliency to locate pedestrian:

Esaliency & VSLE - Visual Attention Processes Based On Stochastic Models

See more examples and details in [2]



[3] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision (IJCV), 57(2):137 – 154, May 2004.

[4] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 20(11):1254–1259, 1998.