05 Feb 2019 |
Research article |
Intelligent and Autonomous Systems
Novel Face Synthesis for Face Recognition in Video Surveillance
The featured image was bought on istock.com and is protected by copyright.
Face recognition is a challenge in video surveillance due to uncontrolled capture conditions (pose variations, expression, illumination, blur, scale, etc.), and the limited number of reference stills. One effective solution to improve the robustness of these systems is to augment the reference set by generating synthetic faces based on the original still. This paper introduces a novel algorithm for face synthesis that generates a compact set of synthetic faces under real-world capture conditions. Keywords: Face Recognition, Video Surveillance, Face Synthesis.
Face Recognition in Video Surveillance
Face recognition (FR) in video surveillance has received significant interest due to covert capture using surveillance cameras, flexible control, high performance to cost ratio, and the possibility of live feed analyses. In recent years, public security organizations have deployed several video surveillance cameras. Despite the recent progress in computer vision and machine learning, designing a robust system for FR in video surveillance under uncontrolled capture conditions remains a challenge. This is due in part to the limited number of representative reference stills per target individual. In addition, reference still images may differ significantly from those captured in videos.
Domain-Specific Face Synthesis
This research aims to address the limited reference still images as well as facial appearance variations by generating multiple synthetic face images per reference still in order to improve the representativeness of face models. This paper introduces a face synthesizing approach that harnesses discriminant information on the generic set for the face synthesis process. The new algorithm, called domain specific face synthesis (DSFS), maps representative variation information from the generic set to the original reference stills. This way, a compact set of synthetic faces is generated, representing reference still images and probe video frames under common capture conditions.
As shown in Fig. 1, the DSFS technique involves two main steps: (1) characterizing capture condition information from the Operation Domain (OD), (2) generating synthetic face images based on the information obtained in the first step. Prior to operation (during the camera calibration process), a generic set is collected from video captured in the OD. A compact and representative subset of face images is selected by clustering this generic set in a capture condition space defined by pose, illumination, blur. The 3D model of each reference still image is reconstructed using a 3D morphable model and rendered based on pose representatives. Finally, the illumination-dependent layers of the lighting representatives are extracted and projected on the rendered reference images with the same pose. In this manner, domain-specific variations are effectively transferred onto the reference still images.
The main advantage of the proposed approach is the ability to provide a compact set that can accurately represent the original reference face with relevant intra-class variations in pose, illumination, motion blur, etc., corresponding to capture conditions.
Still-to-Video Face Recognition Using Face Synthesis
In a particular implementation for still-to-video FR (see Fig. 2), original and synthetic face images are used to design a structural dictionary with powerful variation representation abilities for Sparse Representation-based Classification (SRC). The dictionary blocks represent intra-class variations computed from either the reference faces themselves or the synthetic faces. Combining the SRC with the proposed DSFS improves the SRC robustness for video-based FR in a single sample per person scenario to domain variations.
The main steps of the proposed domain-invariant still-to-video FR with dictionary augmentation are summarized as follows:
- Step 1. Generation of Synthetic Facial Images: In this step, a set of synthetic face images is generated for each image of the reference gallery set using the DSFS.
- Step 2. Augmentation of Dictionary: The synthetic images generated through the DSFS technique are added to the reference dictionary to design a cross-domain dictionary. The presented dictionary design in this work enables SRC to perform recognition with only one reference still image and makes it robust to the visual domain shift.
- Step 3. Classification: Given a probe sample, the SRC first codes the probe sample as a sparse linear combination of all the training reference and synthetic samples, and then classifies the probe sample by evaluating which class leads to the minimum representation error.
- Step 4. Validation: In practical FR systems, it is important to detect and then reject invalid probe images. A sparsity concentration index criteria is used for this purpose.
Fig. 3 shows examples of synthetic images generated under different pose, illumination and contrast conditions using the DSFS technique on the Chokepoint dataset, where Basel Face Models are used as generative 3D shape model.
For proof-of-concept validation, an augmented dictionary with a block structure based on DSFS is designed, and face classification is performed within a SRC framework. Our experiments on the Chokepoint dataset show that augmenting the reference discretionary of still-to-video FR systems using the proposed DSFS face synthesis approach can provide a higher level of accuracy compared to state-of-the-art approaches, with only a moderate increase in its computational complexity.
For more information on the design of a robust still-to-video face recognition in changing surveillance environments using face synthesizing, please refer to the following research paper:
Mokhayeri, Fania, Eric Granger, and Guillaume-Alexandre Bilodeau. “Domain-Specific Face Synthesis for Video Face Recognition From a Single Sample Per Person.” IEEE Transactions on Information Forensics and Security 14, no. 3 (2019): 757-772.
Fania Mokhayeri is currently a Ph.D. candidate in LIVIA Laboratory at ÉTS. Her research interests include computer vision, machine learning, face recognition, and video surveillance applications.
Program : Information Technology Engineering
Research laboratories : LIVIA – Imaging, Vision and Artificial Intelligence Laboratory