Full Frontal / Partial Faces
HAAR | HOG
Face detection is not a trivial task, especially if you want to perform it on ARM devices. Before using the following cascades read carefully this page to get the best performance and to know the terms of usage.
The more appealing solution to perform face detection is the real-time face detection scheme proposed by Viola and Jones – basically a cascade of AdaBoost classifiers, which is arguably the most commonly employed detection method. Core of the face detection scheme is the detection model, trained with the OpenCV suite. There are two applications in OpenCV to train the cascades: opencv_haartraining and opencv_traincascade. The opencv_traincascade suite is a C++ newer version, in accordance to OpenCV 2.x API. The opencv_traincascade suite supports Haar, LBP (Local Binary Patterns) and HOG (Histogram of Oriented Gradients) features. LBP and HOG features are integer, fewer and more discriminant in contrast to Haar features, so both training and detection with LBP and HOG are several times faster then with Haar features. Regarding the HOG, LBP and Haar detection quality, it depends from several issues: the quality of training dataset, training parameters, the semantics of the object to detect. It’s possible to train a LBP-based classifier or an HOG-based classifier that will provide almost or superior quality as Haar-based one.
A cascaded classifier is obtained by chaining a set of monolithic classifiers or stages so to increase the specialization of the classifiers along the cascade. An example is classified as a positive if and only if it is judged so by all stages, while negative examples are discarded according to an early reject strategy. Along training, each stage falsely accepts a fixed ratio f of the non-face patterns in the training sample, while wrongly eliminating only a very small portion 1-d of face patterns; formally:
p (Fi(x) ≥ 0 | Fi-1(x) ≥ 0, y = 1) = d
p (Fi(x) ≥ 0 | Fi-1(x) ≥ 0, y = -1) = f
where Fi(x) is the weighted sum output by the i-th stage classifier, i = 1, … , k, and k is the total number of stages. By applying the conditional rule of probability and recursively exploiting the Markovianity intrinsic to the cascade, the global detection and false acceptance rates of the trained cascade are easily derived as
dg = d^k
fg = f^k
If the training examples are representative of the learning task, one could expect similar detection and false alarm rates also when applying the cascade to test examples (this is one of the biggest error people make when they train a cascade). It is worth spending some words about the assumption according to which the distributions over the training and real world to be the same. On one side it is clear that this hypothesis does not hold strictly: it is sufficient to notice that, due to the nature of tracking (i.e. particle filtering), many hypothesis will be placed in the neighborhood of the face although not perfectly aligned with the pattern. Since the original training sample usually contains only aligned face regions, most of such hypothesis are likely not to be classified as positives by the monolithic classifier. On the other hand, one can expect these examples to pass through a bigger number of stages (weakly inversely proportional to the degree of misplacement) with respect to true negative patterns, resulting in a higher likelihood and hence contributing to the correct estimation of the density distribution.
Moreover, the same issue applies also to hypothesis generated under other circumstances which are not taken into account by the original training, such as out-of-plane rotations. One possible approach is that the out-of-plane rotations are embedded in the training set to extend the class of positives. In this spirit misalignment should be also included; this approach would introduce too much variability for the basic haar_training learning technique which is based on weak features. To perform this harder training we modified the OpenCV approach in positive and negative selection to improve the capabilities of the weak features.
Below it follows our face detection cascades built with OpenCV 2.4.9 / 2.4.11. This frontal face detection model, trained to detect full/partial frontal human faces, is approx. 2x faster than OpenCV and much more reliable and stable for landmarks localization, suitable for real-time apps on ARM devices.
Full frontal (with partial profiles) human face detection cascade, trained with:
- approx. 17,000 positive samples (randomly sampled)
- approx 1.1B of negative sub-regions containing outdoor and indoor samples (10%-90%)
- Training size w=50 h=50 (aspect ratio 1:1)
LBP: (contact us)
- Features set: 166.464 features
- Training time: ~3 days
- TP: ~ 94.51% of positive training set
- FN: ~ 05.49% of positive training set
- FP: ~ 6.8e-006% of negative training set
HOG: (contact us)
- Training time: ~3 days
- TP: ~ 93.87% of positive training set
- FN: ~ 06.13% of positive training set
- FP: ~ 1e-006% of negative training set
OpenCV references: documentation and official guide.