Optimal Geometrical Set for Automated Marker Placement to Virtualized Real-Time Facial Emotions

In recent years, real-time face recognition has been a major topic of interest in developing intelligent human-machine interaction systems. Over the past several decades, researchers have proposed different algorithms for facial expression recognition, but there has been little focus on detection in real-time scenarios. The present work proposes a new algorithmic method of automated marker placement used to classify six facial expressions: happiness, sadness, anger, fear, disgust, and surprise. Emotional facial expressions were captured using a webcam, while the proposed algorithm placed a set of eight virtual markers on each subject’s face. Facial feature extraction methods, including marker distance (distance between each marker to the center of the face) and change in marker distance (change in distance between the original and new marker positions), were used to extract three statistical features (mean, variance, and root mean square) from the real-time video sequence. The initial position of each marker was subjected to the optical flow algorithm for marker tracking with each emotional facial expression. Finally, the extracted statistical features were mapped into corresponding emotional facial expressions using two simple non-linear classifiers, K-nearest neighbor and probabilistic neural network. The results indicate that the proposed automated marker placement algorithm effectively placed eight virtual markers on each subject’s face and gave a maximum mean emotion classification rate of 96.94% using the probabilistic neural network.


Introduction
Non-verbal communication plays an important role in developing intelligent machines that can exhibit better interaction with humans by closely emulating human-human communications. Researchers have increase their focus on developing an intelligent human-machine interface (HMI) system for assisting elderly people that could improve their quality of life [1,2]. Human body gestures, postures, and facial expressions are used as non-verbal communication mediums to develop HMI systems. Among these modalities, facial expression is the most common due to its cost effectiveness, more reliable detection, and shorter computation time, among other advantages [2][3][4][5]. Over the past several decades, researchers have developed intelligent methodologies to effectively recognize human facial expressions that have been implemented in real-time systems for a variety of applications, such as video gaming, machine vision, pain assessment, psychology, behavioral analysis, and clinical diagnosis [6][7][8][9]. As a result, recent HMI systems can easily "understand" the expressions of humans and perform different tasks [10][11][12].
Emotions can be universally categorized into six types: anger, sadness, surprise, fear, happiness, and disgust. Emotions can be assessed using different modalities, such as physiological signals, gestures, speech, and facial expressions [5,[13][14][15]. Each method of emotion recognition has its own advantages and limitations. Although physiological signals inherently detect human emotions through either central and/or peripheral nervous system activities, issues with higher computational complexity, presence of noise and artifacts in acquired signals, and intrusive electrode placement on the human body limit the development of intelligent realtime systems. Furthermore, most subjects become uncomfortable wearing the electrodes all day long when interacting with systems for any given application. Indeed, most physiological signal-based emotion recognition systems have been developed within a controlled laboratory environment, and very few have been developed in real-time scenarios [16,17]. Therefore, recent developments in novel image processing algorithms will likely make facial expression detection more reliable and effective for real-time system development over other modalities.

Facial Action Coding System (FACS)
The Facial Action Coding System (FACS) was originally proposed by Ekman and Friesen [18,19] to identify facial expression of human emotions. There are 46 AUs and 7000 proposed combinations for facial expression detection in the FACS. Although researchers have used different numbers of AUs for developing facial expression recognition systems in laboratory environments, very few have proposed the detection of facial expressions in real-time [20,21]. Thus, no standard has been proposed for using either a specific set or combination of AUs to identify facial expression. Ekman and Friesen [18] previously discussed facial muscle activation with different emotions and defined the facial AU system for classification of facial expressions. S1 Table shows the effective changes of AUs in facial muscles for each emotion [22]. So, in this research FACS has been used as a guideline to identify the expressions. FACS acts as an investigative tool to study bout the movements of the markers when each expression take place. All eight virtual markers have been placed and investigate according to AU's.

Face and Eye Detection
Face detection is a very important step in facial expression recognition. An efficient automated face detection system should be able to detect a subject's face in complex scenes with cluttered backgrounds and be able to locate its exact position therein [23]. In face detection, facial features, such as the eyes, nose, and mouth, serve as reference points [24]. By scanning facial images at different resolutions, the scale or size of a face can be analyzed [25]. Several face detection methods have been reported for the recognition of facial expressions [11,20]. However, AU-based face detection has been used more often in previous studies than other methods [4,18]; such as, the Viola and Jones face detection method [26]. In real-time scenarios, face detection is performed through either image pixels or Haar-like features [27]. The image pixels-based approach requires a longer computation time to detect the face, and the number of pixels varies in proportion to face shape and pigmentation [28]. Haar-like features can be used to compute changes in pixel contrast (white and black) between adjacent rectangular groups instead of using original pixel values (Fig 1). A more detailed description of the Haar-like features method can be found in a previous study [29]. The Haar-like features method efficiently detects any object, including human faces, in a given image sequence using the Ada-Boost cascade classifier [26]. The Viola and Jones method is used to detect faces using Haarlike features in a shorter time with less computational complexity [26]. S1 Fig shows the flow chart of the Viola and Jones algorithm for face detection.
In the current report, Haar-like features were used to detect the front of each subject's face and their eyes. By using an Open CV library, the facial image captured by webcam was passed into the Open CV in order to detect faces via the Haar cascade classifier. Before sending to Open CV, the acquired facial image was subjected to double-precision formatting and converted to grayscale to reduce computational time and memory. Haar-like features in Open CV were used to detect each subject's face. Haar-like classifiers can detect a subject's face in 0.067 s compared to other face detection methods [26]. The system then creates an ellipse around the subject's face and places a "+" mark on both eyes in order to position virtual markers on the subject's face [30,31]; most human faces are relatively ellipsoidal in shape. Hence, we drew the Virtualized Real-Time Facial Emotions ellipse based on methods discussed previously [30,31]. S2A Fig shows one subject's facial  image captured by webcam and S2B Fig shows the image after face and eye detection. Therefore, in this research six basic emotions were classified. Haar like features are used to identify the user face and eye. A total of eight automated virtual markers is placed on user face at specific location which discussed in proposed method section. Previously this research examines with a total of ten, eight and six virtual markers to study the optimal number of markers which identify better emotion recognition. As a conclusion eight virtual markers gives better accuracy and based on previous study, eight virtual markers is the optimal number of marker [32]. Hence, in this paper eight virtual markers are discussed. All the markers are then mapped into optical flow algorithm (OFA) [33,34] to predict the future point. The movement of the markers for each emotion, then investigate with the guide of FACS. The proper methods and the results are discussed clearly in the following sections.

Proposed Method
The present work proposes a new method of automated virtual marker placement on a subject's face that can be used to detect six basic emotional facial expressions and compares the emotion recognition performance of this new method with manual marker placement. Eight virtual markers were automatically placed at specific locations on each subject's face, while a webcam captured emotional facial expression sequences. Initially, subjects were requested to place the markers manually on their face based on the given guidelines from the instructor. The guidelines are obtained from FACS, and marker positions were then used for developing an algorithm for automated marker placement. The flow of manual and automated marker placement methods for facial emotion detection is given in S3 Fig. Our complete algorithm was implemented in Microsoft Visual Studio with an Open CV library using C++ programming language on a desktop computer with an Intel i3 processor, 2 GB ROM, and Windows 8 operating system. The Haar cascade database in an Open Computer Vision (Open CV) library was used to detect each subject's face from video sequences captured via webcam. Initial marker positions (x-y coordinates) were passed through the Lucas-Kanade OFA for predicting future marker positions. The distance of each marker from the center point of each subject's face defined features for facial expression detection. Extracted features were then mapped with corresponding emotions using the two nonlinear classifiers K-nearest neighbor (KNN) and probabilistic neural network (PNN).

Manual Marker Placement
Manual marker placement was carried out to detect the mean position (distance between the center of the face to the marker's location) of each marker on the subject's face. This position was used to develop the automated marker placement algorithm for facial emotion recognition. In this experiment, subjects were requested to digitally place eight markers on their face in specified locations. The number of markers used for facial expression detection was devised by trial and error. The background was set with a black and white screen and the room light intensity was maintained at 41 lx. All subjects were seated comfortably on a chair placed in front of the computer monitor at a distance of 0.95 m. In total, 10 subjects with a mean (± standard deviation) age of 24 ± 0.84 years were assisted with manually placing the eight markers at defined locations on their facial image using the FACS [18]. Markers were placed by clicking the cursor at each position on the facial image. The system then automatically computed the center of the face [31], calculated each marker's position, and subsequently saved the newly acquired information. Manually clicking the cursor at each of the eight defined facial positions allowed the system to record the exact x-y coordinates of each spot and insert a virtual marker (pink). Using the Pythagorean theorem, the distance between each marker and the center point of the face was calculated [31]. Each subject underwent three marker placement trials for each emotional facial expression, and the mean marker position distance was calculated with respect to the center of the face. S4 and S5 Figs show the experimental setup and manual marker placement on one subject and the marker position recorded by the system in Fig 2, respectively. Calculation of each marker position with reference to the center of the face via manual marker placement was then used to develop the automatic marker placement algorithm.

Automatic Marker Placement
The eight virtual markers positioned according to the previous section were then used for automated facial expression detection, which is extremely convenient, computationally efficient (less computational time and memory), and works with the OFA for tracking markers. Liu et al. proposed a geometric facial model that created a rectangle around the subject's face using eye positions [35]. The distance between the eyes is used to identify the center point of the face, followed by the mouth; this geometric model is shown in S6A The system places a center marker after the computation of distance between two eyes. According to Liu et al. [35], the center point is located a quarter of the distance (in cm) from the eye to the mouth if the facial geometry is rectangular. In the current study, the mean marker position calculated after manual marker placement across 10 subjects was used for computing the center point of the subject's face in the automated marker placement algorithm. Hence, half the distance (in cm) from the eye to mouth was used to position the center point for the entire subject's in the study. An ellipse was then created around each subject's face with reference to the center point. The radius of the ellipse was taken from the facial features previously detected by Haar cascade classifiers.
In general, facial shapes are not constant at all times; each face has its own shape (e.g., ellipsoidal, circular, etc.) and radius [36]. Thus, our new method uses ratios to calculate the radius of the ellipse. Initially, a vertical line was drawn from the center point to the intersection point between the ellipse and x-axis of the center point of the face (S6B Fig). Placement of the eight virtual markers was automatically done at a certain angle and distance from the center marker. The angle and distance (position) of each marker were computed from manual marker placement. The first and second markers of the upper face (p_e1 and p_e2) were placed at a 45°a ngle from the x-axis on the left and right sides of the face, respectively. Later, the radius of the ellipse at the 45°angle was calculated. From manual marker placement, it was determined that a mean marker distance ratio of 6.5:9 with the ellipse radius at a 45°angle provided the best positions for markers (p_e1 and p_e2) on the upper face. The second sets of markers (p_e3 and p_e4) on the upper face were placed at a 65°angle from the x-axis on the left and right sides, respectively. The mean distance ratio of the radius of the ellipse at an angle of 65°was found to be 5:9. The method used for upper face marker placement was then applied to the lower face. A complete description of the ratio calculation and placement of markers on the upper and lower face of each subject is given in Fig 3. Lower face markers were placed to the left, to the right, above, and below the subject's mouth. Point's p_m1 and p_m2 were placed at a 130°angle from the x-axis and had a mean distance ratio of 11:15; point's p_m3 and p_m4 were fixed on the y-axis with mean distance ratios of 3:9 and 7:9, respectively. The placement of markers after the automated marker placement algorithm is shown in  The distance between the center point and each marker is referred to as a distance feature and considered an important feature of facial expression classification. A total of nine features were used: p_e1, p_e2, p_e3, p_e4, p_m1, p_m2, p_m3, p_m4, and p_m5. Eight features were the distance of each marker from center point, while the ninth (p_m5) was the distance between points to the left (p_m1) and right (p_m2) of the mouth (S7 Fig). In the current study, distance feature m1 was calculated using the Pythagorean theorem [31]. Every marker was assigned their own x-y coordinates [e.g., center point, (x c , y c ); p_m1, (x m1 , y m1 )].
In Fig 3, in the left mouth column, line m1 is the hypotenuse of a right triangle, wherein the line parallel to the x-axis is dx [the difference between x-coordinates of the center point (x c ) and p_m1 (x m1 )]; and the line parallel to the y-axis is dy [the difference between y-coordinates of the center point (y c ) and p_m1 (y m1 )]. The formula for the computation of feature m1 is given in Eq (2): Therefore, the formula for feature m1 computation is given as in Eq (2): In a similar fashion, the distance of each marker from the center point was calculated using Eq 2. The coordinates of each marker were calculated using trigonometry formulas. The Virtualized Real-Time Facial Emotions position of (x,y) each marker was found after calculating the feature distances at specific angles. Markers p_m3 and p_m4 were placed on the y-axis. Thus, their x-coordinates were the same as that of the center point, and their y-coordinates were found from the ratio of the y-axes. The coordinates of each feature were subjected to the OFA for future coordinate prediction. Initial coordinate values of each marker were replaced with future coordinate values for each marker, and the new distance from the center point was evaluated during facial expression. S2 Table  presents the changes in distance of each marker for different emotions. Distance features e1, e2, e3, e4, m1, m2, m3, m4, and m5 indicate the initial position of the marker before facial expression, while features e1', e2', e3', e4', m1', m2', m3', m4', and m5' show the new position of each marker after facial expression.

Results and Discussion
Facial expression recognition has been considered as a major research topic over the past several decades for developing intelligent systems [16,17,20,37]. Most early work focused on AUs, and a little attention was paid to manual and virtual marker-based facial expression detection. AU-based facial expression detection is computationally intensive due to the large number and combination of AUs. On the other hand, manual marker placement is highly intrusive and subjects need not wear the markers (stickers) at all times. Indeed, this work demonstrated an automated lesser (8 marker) virtual marker placement. Most previous research has focused on recognizing six basic emotions (happiness, sadness, anger, fear, disgust, and surprise) through different numbers of AUs [20,21,[38][39][40][41][42] were taken into the consideration.

Data Collection for Emotion Recognition
The performance and reliability of emotion recognition systems are mainly based on data samples that are used to train the system. Participants provided written informed consent to participate in this study. The individual in this manuscript has given written informed consent to publish these case details. All research involving human participants have been approved by the Human Research Ethics Committee of Universiti Malaysia Perlis (HREC-UniMAP) and written consent has been obtained from the participants. In the present study, a sum of 30 subjects (14 male, 16 female) with a mean (± standard deviation) age range of (22.73 ± 1.68 years) were used to collect data on the six different emotional facial expressions in a video sequence. The subjects included in this study were of mixed ethnicities and religions (Hindu, Muslim, and Chinese) in Malaysia. All of the subjects were asked to express specific emotions following an instructor command in a controlled environment (lighting intensity: 41 lx; room temperature: 26°C; distance between the subject and camera: 0.95 m). The lighting intensity and distance between the subject and camera for this experiment were selected based on studies on different light intensities (3.00, 23.00, and 41.00 lx) and distances (near, 0.75 m; middle, 0.95 m; long, 1.15 m). Each emotional facial expression lasted 6 second and each expression was performed twice by each subject (two trials). Data collection was performed in a laboratory environment with two different backgrounds (one completely black in color, and another with a wall poster display). The total time required for completing the six emotional facial expressions (including instructions and baseline state) by one subject was 20 min. All subjects were healthy university students without any previous history of muscular, cognitive, or emotional disorders. Different emotional facial expressions for one subject with the eight virtual markers shown in Fig 5 and  Initially, manual marker placement on 10 subjects (triplicate) for each emotion was analyzed to identify the marker position (distance between the center of the face to each marker) on each subject's face. Each subject was asked to place eight markers at defined facial locations based on the FACS. The mean value of each marker position with reference to the centre point and its position angle over 10 subjects were used to develop the automated marker placement algorithm and the results shown in Table 1. Different set of subjects was used for automated and manual marker placement. All the subjects are considered as unknown. Since a different ethnic group of subject was tested, different type of expression given by the subjects for each emotion. This issue was overcome by choosing eligible subjects and well explain the task to each subject. Table 1 shows the marker distance ratios and angles of three trials of six emotional expressions for 10 subjects using manual marker placement. As a result, the angle of deviation of left eye_1 and right eye_1 markers was approximately 45°, left eye_2 and right eye_2 markers was 65°, left mouth and right mouth markers was 130°, and upper and lower mouth markers was 90°. Similarly, marker distance ratios from the center point to the left eye, right eye, mouth, and above and below the mouth were approximately 0.72, 0.55, 0.73, and 0.33, respectively. These mean marker angles and distance ratios were used for marker positioning using the automated marker placement algorithm that was later used to evaluate marker positioning on the same 10 subjects who underwent manual marker placement. S3 Table shows the differences (error) in marker distance ratios and position angles between manual and automated marker placement algorithms. In some cases, the marker angle over eight markers gave an error value <0.05°and a distance ratio <0.2 between automated and manual marker placement methods (S3 Table). This indicates that automated marker placement successfully located markers on subject's faces and effectively recognized emotional facial expressions. Hence, marker distance ratios and angles reported in Table 1 were used to develop and test our proposed emotional facial expression recognition system with a greater number of subjects.

Evaluation with Classifier
Next, a new set of 30 subjects was recruited to test the six emotional facial expressions using the automated marker placement algorithm to develop a facial emotion recognition system. The same experimental setup described in manual marker placement was used to develop the facial expression recognition system. The proposed marker placement algorithm placed  Virtualized Real-Time Facial Emotions markers on each subject's face using distance ratios and position angles shown in Table 1. Each marker transferred its original position (x, y) to the OFA to trace future marker movement and direction. In any facial expression occurred immediately afte the automated marker placement method, then the new position of each marker for each emotion was saved by the system. The data were collected in real-time with automated marker placement, and facial expressions were recorded for every subject. Most existing facial expression recognition systems in the literature perform offline analysis rather than real-time system development [37,38]. In the current study, three simple statistical features [mean, root mean square (RMS), and variance] of marker distance (the distance between each marker to the center of the face) and changes in marker distance between the original (neutral) (between the original and new marker positions) and new marker position during emotional facial expression were extracted and normalized using binary and bipolar normalization methods [43]. Finally, these normalized features were fed into two nonlinear classifiers (KNN [44] and PNN [45]) to classify the emotional facial expressions. To accomplish this approach, a 10-fold cross-validation was used to segregate the training and testing data for facial expression classification [20]. In the KNN classifier, the value of K is varied from 2 to 10, and the value of K at 5 provided a higher mean emotion recognition rate; Euclidean distance was used as a distance measure. Therefore, only an emotion recognition rate with K = 5 was reported in the present study. In the PNN, the spread value (σ) varied from 0.01 to 0.1.
Marker distance (the distance between each marker to the center of the face) (MD). The feature was extract from the distance of each marker from centre point. Table 2 and S4  Table shows the KNN and PNN classifier results for MD features respectively.
Changes in marker distance (small change of distance between the original and new marker positions) (CMD). The feature will investigate the small changes of each marker when facial expression takes place. The system will calculate the distance moved by the marker and classify the rate of recognition. S5 Table and Table 3 shows the KNN and PNN classifier results for CMD features respectively.
Based on current experimental results, the RMS feature of the changes in marker distance gave a slightly higher mean accuracy rate (96.94%) than the marker distance (96.81%) using the PNN. This result is likely due to the fact that changes in mean marker distance (i) effectively reflect the effect of different emotional expressions on selective markers compared to all markers used for analysis of marker distance, and (ii) measures subtle changes in marker position with each emotional expression. Research have previously analyzed marker distance for facial expression recognition and achieved a maximum mean classification rate of 94% using 19 facial features and the Random Forests classifier [46]. Herein, simple nonlinear classifiers KNN and PNN were used to classify emotional facial expressions. Statistical features extracted from two different methods of feature extraction and normalization (bipolar and binary) was used to map corresponding emotional expressions using the KNN and PNN. Bipolar normalization gave a slightly higher mean emotional expression classification rate (96.94%) compared with binary normalization (96.81%) and unnormalized data (93.83%) in the PNN. In the case of the KNN, binary normalization gave a higher mean emotional expression recognition rate (92.36%) than bipolar normalization (90.00%) and unnormalized data (91.39%).
The current experimental results indicate that the RMS gives a higher facial expression recognition rate compared to other statistical features (mean and variance). For both marker distance and changes in marker distance methods, the RMS gave maximum mean facial expression recognition rates of 96.94% and 96.81%, respectively; the mean feature performed better than the variance but worse than the RMS in emotional facial expression classification. In both methods of feature extraction, variance provided a much lower mean emotional facial expression recognition accuracy (81.81%). However, use of the variance feature with the KNN classifier performed better than the PNN on emotional facial expression recognition. Researchers have used these statistical features for emotional facial expression recognition [44,46,47] and shown that variance provided a lower mean facial expression recognition rate with raw data than other statistical features (i.e., mean and RMS) [46]. Table 4 shows a comparison of emotional facial expression classification from the present work with previous reports [20, 32, 37-40, 46, 48-51]. Most previous studies have used a greater number of facial features or AUs to classify emotional facial expressions. A maximum mean classification rate of 96.00% was achieved by classifying six emotional facial expressions using 26 facial features and the Random Forests classifier. The CK database [52] and geometric facial features were commonly used in earlier studies, and facial expression analysis was performed offline. The highest (122) and lowest (12) number of facial features used previously for facial  [37] Bashir et.al [38] Kotsia et.al [39] P.Michel et.
al [50] Aliaa Most previous research has utilized virtual markers to analyze the movement of facial muscles during emotion recognition tasks [20,21,[38][39][40][41][42]. Recently, virtual marker-based facial emotion recognition has become popular in addition to AUs [4]. Different sets of markers, from 12 to 62 [39], have been used to detect facial expressions in laboratory and real-time environments. In these works, virtual markers were placed on the subject's face manually, and no automated marker placement procedure was reported. In contrast to AUs, virtual markers are highly flexible when investigating the movement of markers when facial expressions take place, convenient (no physical stickers/labels are worn on the face), and therefore, more suitable for real-time facial expression detection [38]. However, virtual marker-based face detection is affected by poor lighting (light intensities <30 lx) and camera pixel resolution. Virtual markers used in real-time applications are more stable if the lighting intensity is >30 lx and minimum camera pixel resolution is 640 × 480 [38]. Besides facial features, geometric features based emotion recognition is more popular in human expression detection [53].
In contrast to earlier studies, this present work requires fewer facial features and a simple feature extraction and classification method to classify emotional facial expressions in realtime with a higher recognition rate. Most of the earlier studies adopted manual marker placement and facial AU methods for identifying different facial expressions and very few studies have addressed real-time facial expression detection. Our proposed automated marker placement algorithm works effectively in real-time scenarios with less computational complexity (memory and computation time) than previously reported methods. However, it also has the following limitations: (i) emotional facial expression recognition was performed with a limited number of subjects. The accuracy of the emotion recognition rate might differ if our method were tested with more unknown subjects. (ii) The currently proposed algorithm should be tested with other international databases for standardization purposes. In the future, we hope to analyze different types of statistical features that more efficiently reflect subtle changes in marker position with each emotion to improve the mean classification accuracy. In addition, we intend to implement more intelligent statistical learning algorithms, such as SVM and ANN, to enhance the mean emotional facial expression classification rate.

Conclusion
Facial expression recognition is an intense research topic with several applications. This paper presents a novel automated marker placement algorithm for emotional facial expression classification using marker distance ratios and angles. The lowest number of virtual marker (8 Marker) was placed in paticular places on a subject's face in proposed automated manner to identify their emotional facial expressions. The OFA was used to track future positions of the markers during emotional expression. A simple set of statistical features were used for classifying the six different emotions tested (happiness, sadness, anger, fear, surprise, and disgust) using two nonlinear classifiers (KNN and PNN). The proposed automated marker placement method gives a maximum mean emotional facial expression recognition rate of 96.94% relative to earlier studies and the computational time approximately 0.3 seconds. Our proposed automated marker placement algorithm will be highly useful for developing intelligent diagnostics for clinical investigation and analysis of emotional behaviors in facial muscle disorder patients and other clinical situations. Furthermore, it may also prove to be very useful for developing of intelligent automated systems that can assist the elderly with HMI devices that enable communication with their surroundings.   Table. Facial emotional expression recognition rate (in %) based on changes in marker distance (CMD) using KNN (K = 5).