Fusion of Logically Concatenated Cross Binary Pattern and ε-Dragging Linear Regression for Face Classification across Poses

Abstract—Face recognition across poses is a fundamental challenge in face recognition. The appearance of the face undergoes drastic changes with changes in the pose angle of the face. The popular uniform texture-based descriptor, Linear Binary Pattern (LBP), loses its appeal as the underlying intensity information changes. During pose changes, crucial facial key points get concealed and henceforth the rate of face recognition falls. In this paper, we propose utilizing an LBP-based texture descriptor, the Logically Concatenated Cross Binary Pattern, for extracting contour information from face images. The produced feature descriptors show a superior performance in terms of computational complexity. In order to retain only discriminant features, a balanced approach, like Marginalized Elastic-Net Regularization with ε-dragging, is fixed into the regularization term of the linear regression model. Classification is realized using regression residuals in the least-squares sense. In order to make the proposed approach robust to poses, Procrustes analysis is utilized at the preprocessing step when pose variations are significant. A comparative investigation of the proposed approach is presented by utilizing the UMIST database, the AT&T database, and the Indian Movie Face Database.


I. INTRODUCTION
In face recognition, there is an inherent challenge of pose changes. Any small change in the pose of the face given for recognition can cause a colossal change in the image's pixel information, thereby adversely affecting the recognition of information. Linear Binary Pattern (LBP) [1], a popular face descriptor, fails to generate a reliable recognition rate in the presence of pose changes, as it is based on the intensity information, which changes with face pose changes. To combat pose changes, sufficient discriminative information needs to be retained in order to overcome the information loss incurred during feature occlusion. Two approaches are commonly used to combat pose changes. The first approach is to craft a pose-invariant descriptor, and the second is to reconstruct the front face image or desired pose from the input pose image. The first approach of crafting a pose-invariant Manuscript received April 19, 2019; revised December 18, 2019. Kumud Arora is with the Inderprastha Engineering College, Ghaziabad, India (e-mail: kumud.kundu@ipec.org.in).
Poonam Gar is with the Institute of Management and Technology, Ghaziabad, India (e-mail: pgarg@imt.edu).
descriptor is expected to have three essential features: retaining enough discriminative information to overcome information loss that occurs during feature occlusion, low dimension, and ease of computation. The second approach involves modeling techniques based on shape analysis and shape modeling. Procrustes analysis is one of the shape analysis approaches utilized for analyzing the given shape set distribution in terms of linear transformation (translation, scaling, reflection, and orthogonal rotation), and the analysis is used to attain the minimum value of the dissimilarity measure between the given shape and the required shape.
Driven by the need for a fast response time in face recognition, less computation complexity is desired at the feature extraction step. For front face image recognition, a plethora of feature descriptors have been investigated and applied. Liu et al. [2] mentioned that the highest classification accuracies are provided by the methods that combine LBP with other complementary features, after comparing 32 recent most promising LBP variants. However, in case of face recognition, a face image contains large informative content, especially around the key points of the face. Large information processing increases the computational complexity as well as the time complexity. In order to reduce both, there must be a compromise with the volume of information content processing. By retaining discriminative content, this type of compromise can be realized. A discriminative least-squares framework has been utilized effectively for image classification and feature selection [3]- [5]. The core idea is to apply slack variables to enhance the class margins under the abstract framework of least-squares regression. The idea of enlarging the class margin of the Discriminative Least-Squares Regression (DLSR) framework is satisfactory for face classification only when the distribution of training faces is in accordance with that of probe faces across different poses. When the distribution of training faces is different from that of the probe faces, the idea of minimizing the class margin of the DLSR framework works better.
In this paper, an LBP-based texture descriptor, Logically Concatenated Cross Binary Pattern (LC-CBP) [6], is used for extracting information content from a face image with a low computation complexity. In order to retain the most discriminative information content, Positive Dragging embedded in Linear Regression is utilized in the case of Procrustes transformation application to the training set and Negative Dragging embedded in Linear Regression is utilized when no Procrustes transformation is applied to the training set. The main contribution of this paper is the utilization of a simple descriptor, the LC-CBP, with a low computational complexity and using it with the dragging approach of the discriminative least-squares framework. Further, a multimodal variant of the LC-CBP descriptor is proposed where texture values are used in conjunction with intensity values. This fusion improved the classification accuracy effectively. In Section II, work related to popular face descriptors and the elastic-net regularized linear regression framework is discussed. Section III comprises the description of the proposed framework utilizing the LC-CBP descriptor and the ε-dragging-based discriminative elastic-net regularized linear regression framework. Section IV presents the outcomes, along with their investigation. In the last section, the conclusion and future works are presented.

II. RELATED WORK
Face images are organized as a two-dimensional (2D) pattern that can be analyzed in both the spatial and the frequency domains. 2D structures can be organized into different facial features, each having different discrimination abilities to recognize a person. The reliability of face classification depends heavily on the extraction of facial features. The primary challenge stems from the instances where pose variations are present, which may lead to the occlusion of key features. In addition, due to the demands for a short response time, there is a need for a short computational time for the extraction of features. The feature extraction process is closely entangled to the task of soft-discrimination decision integration defining the separation criterion between the probe face and the gallery faces intelligently. With Linear Discriminant Analysis (LDA), a small set of features with the most relevant information are utilized for classification. LDA explicitly tries to model the difference between the various subjects [7]. Fisher's Linear Discriminant, a representative class subspace analysis method [8], [9], distinguishes gallery face images of different classes by the high-dimensional feature space constructed from the intensity values of 2D gallery images. The most discriminative features are sought by maximizing the class separation and minimizing the interclass separation. In face classification, which is an inherent multiclass classification problem, the limitation of this method is the generation of linear decision boundaries between the different subjects. The linear decision boundaries cannot generate a reliable classification as the minor variations in view angle results in the nonlinear transformations which can't be represented accurately by linear decision boundaries. In the case of small face databases like the AT&T database [20], LDA suffers from a "small sample problem," which introduces the singularity issue of the covariance matrix. In order to overcome this issue, Dai and Yuen [10] proposed using regularization instead of optimizing the Fisher index. In the literature, regularization formulation performed with an L1 or L0 norm or a combination of both leads to sparse representation used abundantly in feature selection. Zhang, Dai, Xu, and Jordan [11] in their work proved the equivalence between regularized Fisher Discriminant Analysis and multivariate linear regression problems with mild restrictions on the regularization parameter. In order to utilize the residuals of ordinary least-squares regression in the classification framework, Xiang, Nie, Meng, Pan, and Zhang [12] presented a framework of DLSR for enlarging the distance between different classes for multiclass classification and feature selection. They introduced the ε-dragging technique to enlarge the distance between classes. In their framework, they embedded class label information into the LSR formulation. The optimization model of the DLSR is defined as follows [12], where c is the number of classes, ci ∈{1,2, …, c}, denoting the class label of the sample, and X is the set of "n" images, X=[x1,x2,.,xn] T ϵ R n x m . W represents the derived regression coefficients [W = (XX T ) † .XY] ϵ R m×c , and t is a translation vector in R c . B represents a constant matrix representing the dragging direction. Bij ∈ R n×c (ith row and jth column element) is defined as λ represents a positive regularization parameter and e n is a vector with all 1' s = [1, 1,…, 1] T ∈ R n . Y represents the label matrix for each class, with every column vector representing binary regression with the target value "+1" for the desired subject and target value "0" for the rest of the subjects; that is, if the ith sample image belongs to the j th class, then Yi is represented as [0, 0, 1, 0, 0] T .  represents a Hadamard product operator of matrices and F represents the Frobenius norm of the matrix. In order to enhance the performance of DLSR in feature extraction and classification, Lu, Lai, Fan, Cui, and Zhu [13] in their work integrated the local geometrical structure into the data regularization terms. Manifold Discriminant Regression Learning computes the separation plane between various classes by computing an optimal subspace from the within-class graph and between-class graph. Both L1 and L21 norms are used as regularization terms to produce sparse projections for the extraction of feature sets and their classification. RMDRL utilizes the nuclear norm as a regularization term to learn a robust projection matrix.
Recently, Xie, Yang, Qian, Tai, and Zhang [14] and Yang, Luo, Qian, Tai, Zhang, and Xu [15] showed that the nuclear-norm-based matrix regression model (NMR) has great potential in dealing with structural noise. The objective function of the NMR is formulated as where ║W║* is the trace (or nuclear) norm of the regression coefficients (W). Tai et al. [16] proposed a 2D matrix-based error model by applying the nuclear-norm constraint on the error term to deal with face alignment and recognition. The proposed structural orthogonal Procrustes regression is utilized to handle pose variations existing in 2D face images. The objective function is optimized using an augmented Lagrangian multiplier. For face classification with mild pose variations, Peng et al. [4] used negative slack variables under a discriminative least-squares conceptual framework to minimize the margin between the identity coupled pose-variant images. In order to deal with nonlinear variations of face images across pose variations, they utilized the "kernel trick" to transform the nonlinear face space to high-dimensional linear space. However, this becomes computationally challenging in the case of large image databases, like the Indian Movie Face Database (IMFDB) [17]. Formulating an efficient feature selection and representation model for multiclass classification problems is still a vital area to be further studied.

III. PROPOSED WORK
While utilizing the framework of discriminative regression models, the primary challenge is to learn a model that can compactly utilize the extracted features in a discriminative manner. For classification purpose, the framework must be capable of generating a minimal residual error. In this paper, we propose utilizing LC-CBP with DLSR (Fig. 1). The goal of the proposed framework is to broaden the interclass distances as much as possible with positive ε-dragging and to reduce the interclass distance as much as possible with negative ε-dragging.
The proposed approach's framework is comprised of three sections: (1) Face alignment using Procrustes analysis in the case when the pose variation across the yaw axis is more significant than 30°. (2) Feature extraction using the LC-CBP descriptor.
(3) Face classification using the ε-dragging approach for discriminative linear regression.

A. Face Preprocessing
During this step, face detection is performed using the Viola-Jones algorithm [18]. The detected face is cropped from the input image, and then the cropped face image is used for the following steps.

B. Alignment Using Procrustes Transformation
For reliable feature matching, there must be training and test features in the same feature space. Fig. 2 depicts the alignment of gallery face images with the poses to the probe image in order to have the corresponding key features of identity coupled different poses lie in the closer space.

C. LC-CBP Extraction
A pattern is produced by considering direct neighbors and extended neighbors of image cells of size 5 × 5 across the principal and non-principal diagonals of the cell [6]. The center pixel of the cell is encoded with the summation of the threshold function (f) applied to the difference of the center pixel intensity value and neighbor intensity value, In order to extract maximum information from the diagonal pixels under consideration, their signed LBP is logically concatenated along with the magnitude pattern using AND/OR/XOR operators (Fig. 3). The magnitude and sign pattern of neighboring and extended diagonal pixels under consideration is described by where I1 is the top left pixel value of the immediate neighbor, and the anti-cyclic diagonal pixels are taken afterwards. Extended diagonal pixels are taken into consideration in the same fashion. The difference with the center pixel Ic is encoded with a threshold function (f). Here the patterns for a sign and the magnitude patterns of the block are concatenated by a logical OR function; otherwise, other logical functions (AND, XOR, or NOT) can be used in the generation of the concatenated pattern. This pattern facilitated reducing the feature size under consideration by 65% (approximately), inducing an increase in the computational speed.

D. ε-Dragging Linear Regression
For classification purpose, there is usually a need to maximize the distance between the instances belonging to different classes. However, when the training samples' distribution is not in accord with the test samples' distribution instances, then minimization of interclass distances is sought. In order to broaden or to reduce the distance between classes, the DLSR model makes use of class label information as well as slack variables in the process of misclassification error approximation. The "n" instances of training samples is , where the data point X={xi} ϵR m , and associated class label Y={yi} ϵR c . The binary regression targets for multiclass classification is represented by the Fi function with only the ith element equal to one rest all zeros (e.g., F1 =[1,0,0…0] T ). The linear regression model aims to learn a function: where W ϵ R mxc is a transformation matrix that converts the training feature set X into a class label binary matrix Y, en is a vector with all 1 's (i.e., en =[ 1,1,1.1] T ), ο is Hadamard's product operator, λ is a positive regularization parameter, t ϵ R c is a translation vector, and B is a dragging coefficient matrix. In case of negative dragging, the dragging coefficient matrix is defined by In case of positive dragging, the dragging coefficient matrix is defined by The modified label matrix is which is clearly 2 14  D (i.e., the distance between two samples belonging to the same class is decreased).
For positive dragging, Considering the "ε" constraint, clearly, 2 12  D (i.e., the distance between two different samples is increased). In order to improve the flexibility of the ε-dragging approach, enforcing of a marginalized constraint makes the target space more distinct.
The objective function of a linear regression model with Marginalized Elastic-Net Regularization (MENR) and positive ε-dragging is framed as follows: . The learned regression targets follow the constraint of the fixed marginal value "K" for setting a separation plane between true and false classes; that is, International Journal of Machine Learning and Computing, Vol. 10, No. 1, January 2020 The ith sample from the pth class ( p y i  , target class) is expected to have a value larger than that of the rest of the learned targets by a fixed value "K." Similarly, the objective function of a linear regression model with MENR and negative ε-dragging is framed as follows: The ith sample from the pth class ( p y i  , target class) is expected to have a value smaller than that of the rest of the learned targets by a fixed value "K." W can be expressed as  [17]. Considering W= UΣV T , where W1=U√Σ and W2=√ΣV T . The reformulated objective function to be minimized can now be defined as During the training phase, the learned transformation data points. During the testing phase, the probe face is estimated by simple product (W T .xp).

E. Augmented Lagrange Multiplier (ALM) Strategy to
Optimize MENR Here, the ALM approach is utilized to resolve the optimization function f (14) by alternatively maximizing the dual of the original problem and minimizing the augmented Lagrangian of the original problem. The augmented Lagrangian function of problem (14) ℒ(R, W, W1, W2, C1) = where C1 is a Lagrange multiplier, μ is a decorrelation penalty parameter, and <a,b>=tr(a T b).
The block coordinate descent scheme is utilized for finding the minimum points of ℒ with respect to primal variables.
The following are the steps to minimize the augmented Lagrangian by updating one coordinate at each direction.
(a) Update W1 by fixing other variables by solving Being a least-squares problem with regularization, the solution is Being a least-squares problem with regularization, the solution is (c) Update W, The optimal solution for W can be derived by setting Update "R" regression targets row-wise using the optimal solution derived in [5]. For the feature set of the i th probe image, the optimal target representing its actual class (m th ) must satisfy the constraint for positive ε-dragging and negative ε-dragging, respectively: An optimal solution for R can be derived from the rows of the regression target of training data computed using P=X T .W + + en.t T and its associated class labels.
For each row of P For negative ε-dragging, the learning factor can be expressed as and Pr is updated as Repeat steps (a), (b), (c), and (d) until there is no convergence.
(e) Update Lagrange's multiplier C1 by (f) Repeat steps (d) and (e) until where the projecti o n matrix W forms the optimized output.
The calculated regression targets are classified using a simple nearest neighbor classifier.

F. Complexity Analysis of the Proposed Approach
With LC-CBP descriptor computation, the number of operations required to encode the binary pattern is O((k-1)*(k-1)), with cell size k × k and k>2. If the image contains T (k×k) cells, then its utilization as a descriptor will help in the reduction of the number of operations per cell by approximately (1/k 2 ) operations. Therefore, for the runtime complexity O(2 mnc + 2 mcr + m) of the MENR with m-dimensionality, n instances belonging to c classes will have now fewer operations, approximately fewer by (1/k 2 ) operations, for every T cell making up the m dimension of the sample image.

IV. EXPERIMENTAL RESULTS
This section presents the results of the experiments that were conducted to evaluate the significance of the sign of ε-dragging coefficients in classification based on DLSR. Experiments were performed with MATLAB 2017 on a Core i5 2.3 GHz processor and 8 GB of DDR4 RAM. Table I describes the data sets utilized for validate the efficiency of proposed approach For each experiment, there was random selection of input images for the training set. The probe sample was also selected randomly from the rest of the set. 10 runs were conducted for each experiment. Accounting for the random nature of the training set, the standard deviation is associated with the classification accuracy. The best value is considered as the accuracy value.
Utilizing the regression framework for the classification transforms to learning for predicting effectively the label of a query sample from the given training set and an estimator function "f." This translates into finding a coefficient vector β such that the residual error is minimized and the correlation with the actual class is maximized.  [19] This database is now known as the Sheffield Face Database. It consists of 564 images belonging to 20 subjects. Each subject has a range of poses varying from profile to frontal views, in separate directories labeled 1a, 1b, …, 1t. AT&T [20] This database contains 10 images of 40 subjects with minor lighting, facial expressions (open/closed eyes, smiling/not smiling), and yaw pose variations up to 30°of facial details (glasses/no glasses). All images have a dark uniform background.  In case of the preprocessing step, Procrustes transformation analysis is applied to align the images subject/class-wise as per the orientation of the probe sample. Fig. 4 highlights the effect of applying different logical operators (AND, OR, and XOR) amid the magnitudes and sign of the binary pattern. Discriminative information like silhouettes of key features is retained with different logical operators of the LC-CBP descriptor. The conducted experiments stressed that, among the OR/XOR/AND logical operators, XOR fares better for the poses with deviation of the face more than 20°along the yaw axis.
The appropriate values of λ2 and λ1 are studied by observing the impact of variation in their values on the classification accuracy. The value of μ is taken as 0.25 .  Tables II and III demonstrate the   From the above tables (II and III), it was found that the best classification accuracy can be derived with λ1 = .001 and λ2 = .001. Tables IV, V, and VI present the comparison of the classification accuracy with regression-based classification approaches. From the above comparisons, it was found that the performance of MENRLR is better than of the other approaches. Marginalized elastic-net regression with positive dragging (MENRLR-LCBP + ) fares better than marginalized elastic-net regression with negative dragging (MENRLR-LCBP -) for the AT&T database. The underlying reason for the better performance with positive dragging can be attributed to the fact that the AT&T database has images with minor or no pose variations. Therefore, maximizing the separation planes between various classes of the database favors good classification accuracy. From the above comparisons, it was found that the performance of MENRLR is better than of the other approaches.
MENRLR-LCBPfares better than MENRLR-LCBP + for the UMIST database. The underlying reason for the better performance with negative dragging can be attributed to the fact that UMIST has images with pose variations. Procrustes transformation application as per probe increases both the distance between the interclass images and the misclassification error. From the above comparisons, it was found that the performance of MENRLR is better than of the other approaches.
MENRLR-LCBPfares better than MENRLR-LCBP + for the IMFDB. The underlying reason for the better performance with negative dragging can be attributed to the fact that IMFDB has images with huge pose variations. Therefore, minimizing the distance between the interclass images with pose variations favors clustering of pose-variant images of a class, and the separation planes between various classes of the database increase, hence favoring better classification accuracy. However, with positive dragging, along with the increase in the distance of the separation plane between classes, there is also an increase in the misclassification error.

V. CONCLUSION
The experimental results obtained above clearly demonstrate that the sign of the ε-dragging coefficients has a International Journal of Machine Learning and Computing, Vol. 10, No. 1, January 2020 significant impact on the classification of pose-variant images. A DLSR model with MENR and LC-CBP is applied for efficient classification of compact characteristics of face images across pose variations. For frontal or near-frontal face images, it was found that positively signed ε-dragging coefficients along with Procrustes transformation, fare better in increasing the distance between the images of different classes. For half profile pose to nearly profile poses, negatively signed ε-dragging coefficients fare better, as they help indirectly in clustering identity coupled images thereby boosting the classification accuracy.