Augmented Reality & VR

Face Tracking Software: Behind the technology

Published

1 year ago

, on

2023-04-04

Rody

Facial Recognition is an analytics program that identifies and authenticates a specific person by their facial features from an image or video. The software uses biometrics to map the geometry of the face.
milestonesys.com

Facial detection and tracking technology is a unique product for mobile and web app developers. Its high performance is achieved due to:

Integrated 3D math model
Datasets tuned to work perfectly with Math algorithms
Optimization for iOS (Apple CPUs) and Android

Table of Contents:

Integrated 3D math model

Other solutions create filters by identifying 2D points on the face first, then applying nonlinear equations to create a 3D model of the head.

Developers establish a 3D model of the head directly, skipping the identification of 2D points. This makes the solution more accurate. The 3D math model reduces all possible transformations to a limited number of variables.

1000+ images about 3D/ Wireframe/ Characters on Pinterest ...

Datasets tuned to work perfectly with Math algorithms

For data sets, Developers use complementary systems. Semi-supervised metric learning and generative adversarial networks (GANs) work together. Developers use hand-crafted, hand-tuned code alongside compilers, harnessing human ability where necessary.

Math models cut the execution time on a smartphone, reduce the learning time for the algorithm, and allow for a larger data set improving the quality of the operation. Developers use a rather unconventional form of deep learning, mixing CNNs and different variations of Random Forests.

Optimization for iOS (Apple CPUs) and Android

Engineers have developed unconventional types of neural network layers, tuned to specific architectures, namely Apple A9-A13 CPUs and Android.

AR SDK is supported even by constrained devices, providing effective and fast performance on 90% of smartphones.

Discover advanced technologies

3D face detection and tracking
3D face analysis
Background subtraction

3D face detection and tracking

Face detection and head pose tracking

This technology detects the face and head-pose movements. Once the face has been detected, the algorithm switches to head-pose tracking mode by using the position of the head on the previous image as the initial approximant. If the face is lost, the algorithm switches back to face detection mode.

Based on a directly inferred 3D model of the head (rather than one transformed from a 2D model), the technology is also operable even with a low SNR (signal to noise ratio) and poor lighting conditions. The incorporated model can forecast the appearance of the head in the subsequent frame, increasing stability and precision.

Features:

High performance (30-60 fps)
High quality
Extreme angles, ranging from –90 to +90
Efficient operation in poor lighting conditions
Operation with even with obstructions of up to 50% of the face
Stable detection and resistance to partial obstructions of the face, including glasses and complex haircuts
Depending upon needs, a 3D model of a face with 64 to 3,308 peaks is created
Supports 360 degrees rotation of the smartphone camera
Estimation of distance from the smartphone

Eye tracking and gaze detection

Thanks to this technology, it is possible to both “track” a person’s gaze, and control a smartphone’s function with eye movements. An algorithm detects micro-movements of the eye with subpixel accuracy in real-time. Based on that data, a vector of movement is created.

Face recognition algorithm helps to measure the distance to various points on a scanned surface with a high degree of precision and to detect its shape. It can detect, for instance, whether the user’s eyes are open or closed.

Features:

High degree of precision
Eye pupil detection and tracking
Eye states: open & closed
Eye blinking
Attention tracking
Facial motion capture

Face Motion Capture

Face Motion Capture involves scanning facial movements and converting them to computer animation for movies, games, or real-time avatars. It can operate either in the real-time or based on the user’s preliminarily saved data or/and their face-motion models. Derived from the movements of real people, the technology results in more realistic and nuanced computer character animation than if the animation were created manually.

Features:

Fast execution – operates in real-time based on at least one or several frames. Each consequent frame may improve the model.
Can be integrated into the Face Recognition pipeline in order to develop precise models of users combining both visual similarity and the resemblance of facial gestures, emotions and other motion-related features.

3D FMC applications:

Face recognition (biometrics)
Autofocus (photography)
Unconventional elements of the user interface, games
Measuring user interaction and engagement with in-app ads
Behavioral analytics of mobile UI/UX based on eye gaze tracking
Work style analysis: eyes blinking frequency, analysis of employee attention concentration, estimation of employee type of activity: typing, reading, calculation of onscreen time spent.

3D face analysis option

Face segmentation

Face segmentation is a specific computer vision task that assigns labels to facial features such as nose, mouth, eye, hair, etc., to each pixel in a face image. Our face segmentation techniques include complex cascaded machine learning algorithms in combination with color model and Monte Carlo approaches. This leads to precise detection of eyes, their structure (iris, pupil, eyeball, etc.) and the nose, ear, cheek, chin, mouth, lip eyebrow and forehead.

Features:

Access over a convenient API
Separate parts of the face can be detected for further analysis

Evaluation of anthropometric parameters

Facial anthropometry refers to the measurement of the individual facial features. Our novel algorithm automatically detects a set of anthropometric facial fiducial points that are associated with these features. This makes it possible to recognize refined facial patterns and recreate detailed semantics, mimics and anthropometrics.

Features:

Access over a convenient API
Reconstruction of face geometry “cleaned” of mimics
Creation of caricature avatars in an instant

Skin and hair color

Developers have developed a library for detecting hair and skin color for iOS. The area above the person’s forehead is used for hair color detection as it is less sensitive to head’s turns than other areas of hair. The technology recognises sharp intensity patterns above the hairline and analyzes color. There will be no such sharp intensity patterns for bald individuals, so our technology also allows us to detect the lack of hair.

To detect skin color samples are taken from areas of the face that algorithms point to as specifically being face skin.

Features:

Precise color measurement
Visual skin color and skin tone correction
Skin-related disorder detection
Virtual makeup

Hair style detection

This technology is based on convolutional neural networks. The learning sample selection consists of images of men and women, categorised by their hairstyle. Learning is the process of matching an image with the most relevant hairstyle sample.

Semi-supervised metrics learning was used for the creation of the data set. Subsequently, a GAN was trained for the expansion of the data set, upon which, the final network was trained. To improve quality, a loss function, specifically selected for the task, was used.

TOP Features:

Detecting hair style, hair color
Changing hair color
Haircuts of any shape can be detected, regardless of hair length
The algorithm distinguishes with pixel precision between hair, face and other parts, such as beard, moustache or glasses
Matting is done with pixel precision. If the background is seen through the hair, it is detected as part of the hair.

App Ideas:

Virtual hair salon: trying out new hair colors and hair styles
Detecting hair style for 3D avatar creation

Emotion and expression recognition

People’s emotions are reflected on their faces, and with the ability to “read a face” it is possible to deliver more personalized content. Our technology allows us to detect six basic emotions; anger, disgust, fear, happiness, sadness, and surprise.

The variables of the model returned by the recognition algorithm can be used either directly, or transformed into parameters for another model, such as Facial Action Coding System (FACS), which detects muscle movements that correspond to specific emotions.

Features:

Real-time detection of anger, disgust, fear, happiness, sadness, and surprise
Access to a convenient API
Conversion of data to Paul Ekman’s Facial Action Coding System

App ideas:

Mood-related content in mobile applications, such as three-dimensional visual masks that reflect users’ feelings while they are communicating in mobile video chats
Targeted ads
Detection of tiredness and degree of stress
Use of emotional reactions to a product or content in empathic apps and advertisements
Creation of human-friendly digital products which react to human mimics
Detection of emotional states of patients for health purposes (helping schizophrenic patients)

Digital Face Beautification

This technology is used for creating visually beautified images of users. Anthropometric data and mimics are analyzed and corrected in real-time.

Features:

Smoothing of skin
Correction of face skin tone
Whitening of eyes and teeth
Correction of face shape (make it slimmer, wider, increase/decrease eye size, change the shape of the nose and head proportions)
Change hair color
Improve face symmetry
Shape and color eyebrows
Correction of lip shape
Virtual makeup
Face morphing

Technology:

Improving the user’s look during video chat
Cosmetic surgery: ‘Before’ and ‘after’ visual aids during consultation with a patient
Cosmetics: makeup application
Demoing effects of face skin products
Fixing smartphone camera distortions
Make selfies more attractive

Background subtraction

Developers have developed a library to separate a person’s image from the background for Apple iOS.

This technology can be used for the real-time replacement of backgrounds with both static and animated textures. Backgrounds can be changed into a number of optional presets during video calls or used as an engaging effect in advertising.

The technology is based on convolutional neural networks, with color images as the input and a probability mask showing whether a pixel belongs to the class “person” or to the class “background.” This allows us to ensure high performance and results for real-time backgrounds.

The problem of lack of data sets for the separation of objects from the background is resolved by the creation of a small initial data set, which is increased by active learning and subsequent fine-tuning.

A correctly selected data set helps to obtain optimal results and high-quality implementation.

Features:

Augmentation of raw output with classic algorithms for computer vision and signal filtration – lightweight post processing (deletion of small contours, precise definition of borders, additional image recognition).
Post-processing, including mating, filters and removal of small objects
Portrait mode and bokeh effect

App Ideas:

Replacement of an unsuitable background and noise removal
Animation effects in the background that are changed by the user as part of interactivity
Animated emotion-related background
Advertisements
Adding colors
Protection of privacy
360-degree background 2D and 3D for educational purposes, e.g. for mixed reality
‘Hollywood effects’ on a mobile phone
Replacing backgrounds for practical purposes (e.g. during a business call) or to entertain (e.g. jungle instead of a wall)
Editing “boring” backgrounds to create perfect videos
Removing unwanted objects or people from videos