Bayesian Tracking For Real Time Computer Vision Application

*NEWS*: since June 2016 vision-ary project joined ARGO Vision, an innovative firm that excels in visual recognition. For inquiry about cascades and more, please contact ARGO Vision.


In this page we highlights how a probabilistic interpretation of the output provided by a cascade of boosted classifiers can be exploited for Bayesian tracking in video streams. In particular, real-time face and object detection can be achieved by relying on such a Bayesian framework. Results show that such integrated approach is appealing with respect both to robustness and computational efficiency. Tracking of objects in video, that is estimating the correct state (e.g., position, size) of an object given all the observations (vector-valued measurements) up to frame of the video stream, is a fundamental step for many applications [1]. The Bayesian approach provides a unifying framework where tracking is addressed by computing the posterior probability density function (pdf), being a generic hypothesis space. The theoretically optimal solution is provided by recursive Bayesian filtering. Basically, in a prediction step, the state dynamics and the posterior at previous time are exploited to derive the prior pdf of the current state; then -in the update step- the likelihood function of the current measurement is used to compute the posterior.


Eq. (1) requires to specify the likelihood and the dynamics together with an algorithm for calculating Bayes’ recursion. For what concerns the latter issue, the Particle Filtering algorithm (PF, [2]) is at present a widely used and effective technique for dealing with nonlinear, non-Gaussian estimation [1]. As regards likelihood and dynamics, different models have been proposed [1], [3]. However, for specific applications (e.g., behavioral biometrics, content-based video analysis, surveillance) tracking of certain class of objects might be required [1]. In such cases, simultaneous tracking and recognition is shown to be an effective approach, which improves over methods that handle the two tasks separately [3]. Clearly, such approach can be effectively pursued provided that the object detection/recognition step be fully embedded within (1), in the likelihood , under the assumption that a specific class of objects is to be observed. In this respect, the work of Verma et al. [4] that addresses face tracking represents a paradigmatic example of integrated detection and filtering. They adopt the Schneiderman and Kanade face detector [5], since it provides a suitable probabilistic output that can be conveniently plugged into (1). A limitation of such detector is its inadequacy for real-time purposes (26 secs for a frame of size 352×288).

An appealing solution is the exploitation of the real-time detection scheme proposed by Viola and Jones (VJ, [6]) -a cascade of AdaBoost classifiers- arguably the most commonly employed detection method (here a lot of free LBP HAAR HOG cascades, optimized for ARM). Unfortunately, as opposed to the detector adopted in [4], cascade classifiers have no obvious probabilistic interpretation. The combination of AdaBoost and PF has been proposed in [7], but detection results were heuristically combined in the proposal function, rather than being exploited to model a likelihood function in a principled way. In [8], PF is integrated with an AdaBoost monolithic classifier via the probabilistic interpretation given by Friedman et al. in [9]; however, tight coupling is spoiled by the introduction of a mean shift iteration on color features, to predict new hypotheses in adjacent regions. Differently, we exploit the cascaded classifiers described in [10]. Their interpretation in a statistical framework convenient to model the observation function is formally discussed in the following paper.


0. The evolution of boosted cascades: SCARTMAN detection.
1. Boccignone, G., Marcelli, A., Napoletano, P., Di Fiore, G., Iacovoni, G., Morsa, S.: Bayesian integration of face and low-level cues for foveated video coding. IEEE Trans. Circ. Sys. Video Tech. 18 (2008) 1727–1740
2. Verma, R., Schmid, C., Mikolajczyk, K.: Face Detection and Tracking in a Video by Propagating Detection Probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10) (2003) 1215–1228
3. Okuma, K., Taleghani, A., de Freitas, N., Little, J., Lowe, D.: A Boosted Particle Filter: Multitarget Detection and Tracking. Lecture Notes in Computer Science (2004) 28–39
4. Isard, M., Blake, A.: CONDENSATION – Conditional Density Propagation for Visual Tracking. International Journal of Computer Vision 29(1) (1998) 5–28
5. Schneiderman, Kanade, T.: A statistical method for 3D object detection applied to faces and cars. Proc. IEEE Conf. Computer Vision and Pattern Recognition (2000)
6. Viola, P., Jones, M.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2) (2004) 137–154
7. Pantic, M., Pentland, A., Nijholt, A., Huang, T.: Human Computing and Machine Understanding of Human Behavior: A Survey. Lecture Notes in Computer Science 4451 (2007) 47
8. Li, P., , Wang, H.: Probabilistic Object Tracking Based on Machine Learning and Importance Sampling. Lecture Notes in Computer Science 3522 (2005) 161–167
9. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Technical report, Stanford University (1998)
10. Lienhart, R., Kuranov, E., Pisarevsky, V.: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In DAGM 25th Pattern Recognition Symposium (2003)
11. Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.S.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of the IEEE 90(2) (2002)
12. OpenCV library:


  1. Hello,
    My name is Marcos and I’m a student at a brazilian university called INATEL ( I’m working in a project for our
    technology fair ( I’m using OpenCV 2.4.11 in a Raspberry Pi to detect faces with the lbpcascade_frontalface.xml cascade provided in OpenCV’s source. However, I’m getting a lot of false positives and sometimes my program runs too slow.
    I read in your posts that you trained a cascade that has better performance in ARM devices ( and that can detect faces in different poses ( Could you please provide me the face detection cascade? It would help me a lot 🙂 Thanks!

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *