If your goal is to create a facial recognition system for use in an app of some sort, whether the app is intended for social networking or security, you’ll need to decide on one of the various ways to detect faces in images and video. Over the years many different facial detection algorithms have been designed, and both regular machine learning techniques, as well as deep learning techniques, can be used to carry out face detection.
If you are trying to detect small images of faces, there are a few additional considerations, but the process is very similar to regular face detection. Let's take a look at detecting faces in images, for both small objects and large objects.
When designing a system that does face detection using machine leaning, it is important to understand how humans detect faces, and how this can be replicated by computers.
People can recognize faces from birth, and from approximately four months of age onwards, they can distinguish one person from another. This is done by detecting features of the face, relevant patterns of the face, and joining the patterns together into a complete object. People pay attention to the eyes, mouth, nose, cheekbones, eyebrows, ears, etc., detecting these features individually at first and joining the features together in a system known as bottom-up processing.
Computer software capable of recognizing human faces imitates this general procedure. The software can use a variety of different traits to recognize a human face, and the system can also create unique combinations of traits to recognize unique individuals. As an example, an algorithm will begin by detecting low-level features like patterns in different regions of the face and then join these features together to make a representation of a person, using features like their facial proportions, eye color, and skin color.
Applications that carry out face detection using machine learning can utilize a variety of different algorithms to detect faces: like checking similarities and differences between facial proportions, analyzing the contours of a face and comparing it to other faces, and detecting symmetries with a neural network. As an example, one of the most commonly used algorithms is the Viola-Jones method, popular because it can detect faces and different degrees of rotation as well as run in real-time.
The Viola-Jones framework uses signs of Haar, or white and black rectangular masks to recognize faces. The masks are generated in different regions of the image, and the algorithm adds the brightness of pixels under the masks together, doing this for different masks and then calculating the difference between the resulting values. Analyzing the results of the accumulated data lets the algorithm recognize faces in the image, and then the face can be tracked as angle and image quality are adjusted. Correlation algorithms, or motion vector prediction algorithms, are typically used to track the face once it has been detected.
Common steps for the preprocessing of images containing faces includes transforming the image from RGB to grayscale, due to the fact that grayscale images are easier to detect faces in. After the image is made grayscale, various manipulations/transformation like cropping, resizing, sharpening, or blurring of the image can be done if needed. Image segmentation is then carried out on the image, which segments different regions of the image and allows the classifier to focus just on the possible faces in the image.
After the face is detected by a chosen algorithm, recognition can be done on the image of the face. The system can compare its representation of the face against reference images contained in a database, comparing values at different points with the values it has in its representation. If the values are sufficiently close together, the system recognizes the person in the image. A typical facial recognition system may track around 100 points in the face to achieve this.
When it comes to detecting faces using machine learning in images where the faces are far away, and thus quite small, several issues crop up, chiefly the lack of resolution. Many of the traditional methods to detect faces break down when applied to small images of faces, but several tactics can be used to enhance the performance of systems that detect small faces.
Keeping the context of the image in mind can help greatly when trying to detect small faces. Images of city streets are more likely to contain faces than fields of flowers, and the network can be instructed to pay attention to or try to detect, low resolution faces in such contexts. It is also possible to create multiple systems, ones that specialize in detecting low-resolution faces and others that specify in detecting high-resolution faces. Region proposals can also be useful when trying to accurately detect small faces.
When tracking a small object like a face, there are a variety of methods that can be employed to accomplish the task. Face detection methods include feature-based methods, knowledge-based methods, template matching methods, and appearance-based methods.
Feature-based methods are methods which extract specific features from the image of the face and compare these features to predefined features to recognize the face. In this approach, an image classifier is trained on both images of faces and images of non-faces. This method gets more robust the more the classifier is trained and the larger the training set is.
Knowledge-based face detection methods utilize certain rulesets, derived from human knowledge, to detect faces. As an example, if we state that a face needs to have certain distinguished regions, like eyes, a nose, a mouth, etc., that must all be in certain positions relating to each other, only images that fit this general layout will be classified as a face. The drawback of this approach is that it can be rather sensitive to false positives and that it struggles to find multiple faces within an image.
Face detection methods that use template matching operate by using parameterized, or pre-defined, face templates. These templates are given to the algorithm, which compares the template to the image to look for correlations. Template matching is simpler to implement, but typically lackluster in performance due to the rigidity of templates. However, templates which allow for more flexibility have been proposed.
Appearance-based face detection methods work by combining machine learning tactics with statistical analysis to determine relevant features/characteristics of faces. Appearance-based methods do face detection using machine learning by training on images of faces, letting the network learn the features of faces. Different algorithms used to recognize faces, which belong to appearance-based models, include neural networks, support vector machines, naive Bayes classifiers, hidden Markov models, and inductive learning. These methods typically provide the best performance for face detection and enable the use of feature extraction for face recognition/classification.
One of the most common applications for face detection using machine learning is designing a face recognition system. In general, the process of detecting faces for a facial recognition system in images or video can be broken down into three different steps: detection, alignment, feature extraction, and feature matching.
In the detection step, some algorithm, such as the Viola-Jones algorithm is used to detect the faces. One the face has been detected, the location, size, and pose of the face are extracted and facial alignment is done. Face alignment endeavors to determine the geometric structure of the faces, forming a canonical/general-purpose representation of the face, by examining it under different conditions like different rotations and scales.
After face alignment has been completed, feature extraction is done on the image of the face. Feature extraction is when the unique characteristics of the face, such as shape and position of the eyes, the size of the jaw, the shape of the nose, etc.are all highlighted and interpreted by the network. This forms a representation of the individual’s face and it can be compared to other images to determine if both images/representations contain the same person.
Feature matching is where the features used to represent a person are compared with the features extracted from a new input image. Typically, the representation is made from images of a person drawn from a database of enrolled users. If the database representation of the individual is sufficiently similar to the new input image, the face detected in the image is classified as belonging to the person in the database.