Facial expressions of Asian people exposed to constructed urban forests: Accuracy validation and variation assessment

An outcome of building sustainable urban forests is that people’s well-being is improved when they are exposed to trees. Facial expressions directly represents one’s inner emotions, and can be used to assess real-time perception. The emergence and change in the facial expressions of forest visitors are an implicit process. As such, the reserved character of Asians requires an instrument rating to accurately recognize expressions. In this study, a dataset was established with 2,886 randomly photographed faces from visitors at a constructed urban forest park and at a promenade during summertime in Shenyang City, Northeast China. Six experts were invited to choose 160 photos in total with 20 images representing one of eight typical expressions: angry, contempt, disgusted, happy, neutral, sad, scared, and surprised. The FireFACE ver. 3.0 software was used to test hit-ratio validation as an accuracy measurement (ac.) to match machine-recognized photos with those identified by experts. According to the Kruskal-Wallis test on the difference from averaged scores in 20 recently published papers, contempt (ac. = 0.40%, P = 0.0038) and scared (ac. = 25.23%, P = 0.0018) expressions do not pass the validation test. Both happy and sad expression scores were higher in forests than in promenades, but there were no difference in net positive response (happy minus sad) between locations. Men had a higher happy score but lower disgusted score in forests than in promenades. Men also had a higher angry score in forests. We conclude that FireFACE can be used for analyzing facial expressions in Asian people within urban forests. Women are encouraged to visit urban forests rather than promenades to elicit more positive emotions.


Introduction
The purpose of planting, growing, and managing urban tree populations is to provide the ecological services of urban forests and to promote human well-being in urban green spaces [1]. The principle of the construction of sustainable urban forests is that a community should analysis for people in urban forests, although most expressions were subtle. It is necessary to further increase matching accuracy by validating facial expression scores. The change in setting along the urbanization gradient serves as cues for people to respond with varied emotional expressions [27,28]. The perception to infrastructure and openness are determinants of emotional variation at different places [23,28]. When urban forests are considered an objective infrastructure, people will show particular expressions different from those in city settings [24,25]. In this study, FireFACE ver. 3.0 was used to test the difference in the emotional perceptions between people in constructed urban forests and people in promenades. Only facial expression that passed the validation test were used for geographical comparison. Based on the current success of subtle expressions analysis using FireFACE, we hypothesized that: (i) at least five out of the eight matching scores can meet the validation accuracy of a commercial software, and (ii) people in urban forests will show a significant difference in emotions, not only in basic expressions (happy, sad, and neutral) but also in implicit ones (angry, surprised, scared, disgusted, and contempt).

Field data collection
Field data were collected from an urban forest park and an promenade in Shenyang City (411 1'-42˚17' N, 122˚21'-123˚48' E). Shenyang is located in the transitional belt between Changbai Mountains and the alluvial plain of Liaohe River. Shenyang had 8.3 million permanent residents distributed across an area of 6.3 million km 2 built-up region in 2018. Shenyang is located in a semi-humid, temperate continental climate zone with annual average temperatures of 6.2-9.7˚C and a range of -32.9˚C and 38.4˚C. Annual rainfall in Shenyang ranged between 600 and 800 mm with a historical maximum precipitation of 716.2 mm. Yearly frostfree periods lasted for 155-180 d. Climatic data spanned from 1951 to 2018 [29].
Shenyang Expo Garden (SEG) (41˚49' N, 123˚37' E) was chosen as the site of urban forests and Shenyang Middle Street (SMS) (41˚48' N, 123˚25' E) as the promenade (Fig 1). SEG was established in February of 1959 with an openness area of 211 ha with 196 ha of green lands and 6.5 ha of watershed. Urban forests in SEG were constructed since 1988. The daily number of visitors in SEG ranged between 0.3 million and 0.7 million which is the highest record for all green spaces around the plains of Liaohe River. SMS has a length of 579.3 m and a width of 11.7 m, the longest promenade in mainland China. SMS has had a long history of use from 1625 up to now. SMS is rarely greened along the sidewalk and has areas fully occupied by groceries, markets, and plazas, which attracts anywhere from 0.4 to 2 million daily visitors. Therefore, SEG and SMS are two typical infrastructures with contrasting green spaces and constructed landscapes.

Participants
Eight students from College of Forestry, Shenyang Agricultural University were recruited as data collectors in this study. They were assembled as a group of volunteers on 19 June, 2020. All had been informed about the aim, process, and possible obstacles of the study. Only those who agreed to all the details of the study were recruited. The constant of participants has been documented as the S1 Raw data where participants provided written informed consent. Candidates with smoking or alcohol consumption habits were excluded in the recruitment. Eight students were randomly assigned to two groups with four in each group. One group investigated SEG and the other investigated SMS the first day. In the following day, places were exchanged. Two students in one group took photos and the other two asked participants for the consent of using photos for scientific work. Photographers used a camera of imx-586 (Sony NEC Optiarc Inc., Tokyo, Japan) with 4 million px which was embedded in the cellphone.
The Ethic Committee of the Research Group of Urban Forests and Wetlands, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, provided approval for your study. On weekends of the 20 th and 21 st of June, 2020, all visitors with typical Asian faces were photographed at SEG and SMS. All visitors whose faces have been photoed and recorded for this study had been informed about the aim of the study and provided their oral consents. This procedure was approved by the Ethics committee. Participants for photo collection supplied written informed consent. Visitors with faces characteristic of Chinese populations were subjects to be photographed [30]. Faces were easily identified through subjective recognition by students, but it was hard to distinguish between those of Chinese ethnicity and those of other East Asian countries. Therefore, we extended the standard to Asians in general. Both days were sunny and cloudless except for June 21 st from 12:00 to 14:00. The temperature ranged between 21˚C and 32˚C in daytime with southwesterly winds at a velocity of Beaufort force 4 (24 km/h average speed). Photos in both sites were taken from 09:00 am to 05:00 pm (GMT+8) in accordance with the opening time of SEG. The route in SEG started at the entrance and ended at the exit with 4 repeated cycles of data collection along the sidewalks, while the route in SMS started at the northern entrance along the western side of the sidewalk in the morning and the eastern side in the afternoon to avoid building shadows.

Available photos with facial expressions
All photos that fulfilled the standard for further analysis needed to connect at least one visitor's face with the five facial organs-eyebrows, eyes, nose, mouth, and ears-no matter which angle the face was photographed from. Photos were labeled as potential candidates when only one ear can be seen, with the rest of the facial organs visible. All photos were cropped so that the subject's face is in the center and all organs are clearly exposed. A singular photo with all attributes is the best for facial expression analysis, but multiple photos with some attributes can be pieced together. A total of 2,886 photos met the criteria for further analysis. The approval for ethic statement has been documented as the S1 Raw data.

Validation of matching accuracy
A dataset of facial photos was generated from all documented photos for validation. Groups of 20 photos were selected from the pool of both SEG and SMS with each group demonstrating a particular emotion: angry, contempt, disgusted, happy, neutral, sad, scared, and surprised. A total of 160 photos were reviewed by six experts in the domain of urban ecology from four affiliations. The constant of experts for the dataset review has been documented to the S1 Raw data. The final edition of the dataset was revised according to suggestions from all experts and selections received unanimous agreement.
Validation was determined by the 'matching accuracy' variable, which is the matching percentage of the number of photos that were correctly recognized by the instrument for the predicted emotional expressions of prototypical faces for each of the 20 images [31][32][33]. Therefore, matching accuracy can be regarded as the percentage for correct matching. It is possible that facial photos may contain multiple expressions in different emerging values. Only the expression with highest value was considered for matching [20].
Give that the matching accuracy of validation for facial expressions varied widly depending on the choice of database, methodology, and instrument, we established a set of standards to screen for the validation of each of the eight expressions in our photos from SEG or SMS. A combination of the keywords of 'validation' + 'accuracy' + 'facial expression' was checked in the search engine of Web of Science (Clarivate Analytics, Philadelphia, Pennsylvania, USA). The 20 most relevant studies with specific sources of data (either from figure, words, or tables) were documented for data extraction. The criteria when screening for usage was adapted as the mean of 20 studies (Table 1) [16,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48]. The specific process is shown in Fig 2. Only expressions that passed the criteria were used for further assessment.

Assessment of variation and statistics
The eight students were invited again to create demographic categories. All photos were classified by gender (man vs woman) and age (senior [over 60 years-old], middle-aged [35-50 years-old], youth [15-25 years-old], toddler [0-5 years-old]). We categorized gender according to visually identifiable, biological characteristics. Age was categorized by empirical identification to visual standards of the median of each age group. Some age categorizations were ambiguous for identification between senior and middle ages or between young and toddler ages. All eight students assembled together, discussed, and voted for the final choice.
Each of the validated expressions, the dependent variable, was analyzed in response to the combined independent variables of gender (n = 2), age (n = 4), and location (n = 2). Only facial expressions that passed the validation with no difference from the average in other literature was used for next-step analysis.
Data were analyzed using SAS software (STS Institute, Cary, NC, USA). A new parameter, termed positive response index (PRI) [23][24][25], was employed to evaluate the net difference between happy and sad expression scores. In validation, the Kruskal-Wallis test was repeatedly used to detect the difference between the critical standard for expressions from literature (n = 20) and expressions from our database. The basic probability of significance was taken at the 0.05 level as adjusted by the Bonferroni method to 0.00625 due to eight repeated comparisons. Thus, scores for contempt, neutral, scared, and surprised expressions were not recorded in the 20 documented studies for validation (Table 1). Only the expressions that did not show significant difference between our database and previous studies were used as parameters for further assessment. In variation assessment, all data were ranked to avoid abnormally distributed data that invalidated use of a general linear model. Every expression was tested for response to three-way analysis of variance (ANOVA) across gender, age, and location. When significant effect was found, data were ranked and compared by a one-way ANOVA with all combined factors together as the single source of variance (α = 0.05). The principle component analysis (PCA) was used to bridge the grouped tendency of correlation that had been used in several former studies [49,50].

Validation of recognition accuracy
As shown in Table 1, the selected 20 publications reporting facial expression accuracy did not supply data for all eight expressions. For example, only three out of the 20 publications evaluated accuracy for contempt expression and 15 out of 20 for neutral expression. The highest accuracy was found for happy expression scores, followed by neutral and surprised expressions. The lowest accuracy was found for contempt expression scores, and the rest were all above 50%. Results on FireFACE's accuracy in recognizing the eight facial expressions are shown in Table 2. The accuracy in recognizing contempt and scared expressions was significantly lower than the average from the 20 publications. Although averaged accuracy, when comparing our database to historical ones, was lower by 18-69% for the rest of the facial expressions, repeated Kruskal-Wallis tests did not indicate any significant difference because raw data showed a large variation in both our and previous databases. Therefore, we accept FireFACE's accuracy in recognizing facial expressions, but only for the six expressions of anger, disgust, happiness, sadness, neutral, and surprise.

Analysis of variance on facial expressions in Shenyang
As shown in Table 3, excluding scores for anger and PRI, scores for the rest of the facial expressions demonstrated significant responses to variation between forest and urban locations. All facial expression scores, except scores for surprise, were different by gender. Happy expression scores did not vary by visitor age, but responded significantly to the interaction between location and age. Happy expression scores also responded to the interaction between gender and age. In addition, angry, surprised, and disgusted expression scores showed a significant response to the interaction between location and gender as well.

Response of happy expression scores
Women had higher happy expression scores than men in both urban forest and promenade ( Fig 3A). Women had the highest happy expression score in the forest locations. Youths and senior women had higher happy expression scores than toddlers and senior men (Fig 3B).
Forest visitors had higher happy expression scores than those in promenades by 13%. Women had higher happy expression scores than men by 8%. Young visitors had higher happy expression scores than toddlers by 12%. The happy expression scores of youths were not different from middle-aged or old-aged visitors.

Responses of angry, surprised, and disgusted expression scores
Men in forest areas had the highest angry expression score in response to interaction between gender and location ( Fig 4A). In contrast, women in promenades had the highest surprised and disgusted expression scores (Fig 4B and 4C). Although the surprised expression scores of men in promenades were lower than women, men's scores were higher than those of women in forest areas (Fig 4B). There were no distinct angry expression scores between visitor in different locations (Table 3). Women generally had a higher angry expression score than men by 4%. Middleaged visitors had higher angry scores than toddlers and senior citizens by 22% and 11%, respectively. Both surprised and disgusted expression scores were higher in promenade than in forests. Both expression scores were also higher for youths than for toddlers and senior citizens. Women had higher disgusted expression scores than men by 9%.

Responses of neutral and sad expression scores
Both neutral and sad expression scores were higher in the urban forests than in cities (Fig 5A  and 5B). Women had higher neutral and sad expression scores than men (Fig 5C and 5D). The neutral expression scores generally decreased as age increased, but older visitors had higher sad expression scores than toddlers and young visitors (Fig 5E and 5F).

Response of PRI
PRI did not show any significant response to the difference between forest and urban locations ( Table 3). Men had higher PRI than women by 8%. Toddlers and youths had higher PRI than older visitors.

PCA analysis
The data pool of the six facial expressions and PRI combined indicate a variation in which the first PC explains 35.3% and the first two PC cumulatively explains 53.93%. Therefore, an explanation above 50% supports a further analysis on the synthesis of the first two PCs. The sad expression score had an inverse relationship with the happy expression score and PRI (Fig 6). Both the disgusted and angry expression scores had an inverse relationship with the neutral expression score, but the relationship was weaker for anger than for disgust. The surprised expression score did not show any obvious relationship with any other emotional expressions.

Validation of recognizing accuracy
In our study, contempt and scared expression scores evaluated by FireFACE failed to pass the validation test. The scared expression was also recognized at a low accuracy for the faces of Chinese people even when using the three-dimensional paradigm technique [30]. The contempt emotion is difficult to detect, not only with FireFACE but also with other instruments with a wider range of users. This is further corroborated by the fact that only three out of the recent 20 relevant publications had accuracy in recognizing the contempt expression. The matching accuracy in these three cases were around 30%, which was much lower than the recognition of other expressions. In addition, two cases of accuracies were achieved through artificial rating [31,48], and only one case published a matching accuracy that was given by an instrument, E-Prime [41]. Therefore, further improvement is needed for FireFACE to recognize the contempt expression because scores were too low for accurate determination. Machine learning techniques need improvement to recognize the exhibition of the contempt emotion in different groups of people.
Chinese people, characteristic of how Asians typically display emotions, show emotions implicitly rather than explicitly. For example, Chinese people's expression of fear forms more slowly than other negative expressions when depicting pain, even after sad emotions priming [51]. Chinese people's tendency to suppress expressions of fear by adopting a self-reserved character can be extended to the Korean population's Yonsei database [47]. Krumhuber et al. [32] compared human and machine (FACET software) validations across 14 datasets of dynamic facial expressions and only obtained a 34% accuracy in recognizing a scared expression. They further found that the scared expression was easily confused with the surprised expression. Matuszewski et al. [37] checked a dataset of facial expressions from 80 clinic patients and, again, found a low recognition of the scared expression and corroborated the easy confusion with the surprised expression. Matuszewski et al. [37] further compared different levels of the scared scores and indicated that patients directly expressed fear only when exposed to extreme pain. Otherwise, they would choose to reserve their expressions to avoid perception by others. Overall, more precisely distinguishing between scared and surprise expressions is suggested to increase the accuracy of recognition.
Our matching accuracy was generally lower than those found in previous studies. This can be explained with two reasons. First, our dataset that was used to train FireFACE contained subjective errors when artificially documenting different facial photos into any of the eight types of expressions. Second, our objects receiving the tests were collected by randomly photographing visitors and subjectively labeling the type of facial expressions, hence, the precision was limited in addition to subjective error. In contrast, both machine-training and objectstesting in the 20 studies reviewed in this paper employed models with instructions to exhibit aimed expressions. Even so, our results on the matching accuracies for the six facial expressions of anger, disgust, happiness, sadness, neutral, and surprise were not statistically different from current ones. Therefore, FireFACE recognition for Asian facial expressions has an acceptable accuracy for six expressions. Thus, we can accept our first hypothesis.

Facial expression in constructed forests and promenade
It is unexpected that both happy and sad expressions were higher for visitors in constructed forests than in urban promenades. However, there was no difference in net positive score between the two locations that differed in greenspace. Therefore, the experience in the forest did not result in expressions that were extremely different from experience in the promenade. Instead, forest visitors showed less disgusted expressions than those in the promenade. Disgust is a type of negative emotion that is less extreme than sadness. Therefore, our results confer findings that more negative emotions can be reduced in forests than in built-up regions [10][11][12] according to the reason that people in cities elicited more negative emotions in the form of disgust than compared to those in forests. Wei et al. [24] also found that people in an urban forest park near the center of a city would show more negative expressions than those in forest parks in remote rural regions. Eigenvalues of the neutral scores were positively correlated with happy scores and both were inverse to sad scores, which suggests that people generally perceived net positive emotion scores in forests. Results of Wei et al. [25] concur with our finding in that people in forests showed more happiness than in an urban street and PRI. We can accept our second hypothesis.
Both men and women showed less happy expressions in urban locations than in forests, supporting the above-mentioned higher happy scores in the forest. Women also showed more positive emotions from a forest experience than in the city. These results concur with findings of Wei et al. [24] who also reported that women showed more positive expressions than men. Our results also revealed that women in forests showed lower surprised and disgusted expressions than those in the city. Negative expressions of men were not statistically different. These findings suggest that women are more sensitive and responsive than men, and thus showed more positive and negative expressions in forests and promenades, respectively. This corroborates a previous investigation which indicated that healthy men showed reduced emotion processing efficiency relative to women [52]. However, men reacted to forest locations by processing highly efficient angry expressions while women did show any differences in angry expressions between urban and forest locations. This did not lead to a divergence of angry expressions between urban and forest locations; hence men's higher angry score in forests was the result of their psychological response. A review using a functional-evolutionary analysis indicated that it is more advantageous for men to show angry facial expression as it signals dominance, averts aggression, and deters mate poaching; it is more advantageous for women to display happy facial expressions as it signals their willingness for childcare, tending, and befriending [53].
There was a negative relationship between neutral expressions and disgusted and surprised expressions. There was a higher neutral expressions score in forests than in promenades in accordance with less disgust and surprise expressions. From this, we concluded that people in forests are more calm than in the city, whereas the excited emotion was easily confused with, and represented as, a disgusted expression. It is the peaceful environment with abundant green color and moisture in the forests that caused the calm feeling expressed as neutral faces for visitors in forests [25].

Limits of the study
Because of accuracy, we excluded ratings on contempt and scared expressions from the analysis. It may be hard to obtain desired accuracy in recognizing contempt expressions until deep learning techniques can improve upon current limitations. A scared emotion is an important facial expression that frequently emerges in daily life. Some other instruments, such as E-Prime [44] and FaceReader [16], were reported to recognize fear in face reading at an accuracy as high as 80%, and deserves to be used in future studies to test the hit ratio for our database on Asian faces in urban forest parks. We turned to experts to help classify the typical facial expressions from our dataset. This is a useful approach to document different types of expressions. However, a more reasonable way is to classify the facial expressions by running the same photo through different software. The average score across results from different machines will provide a more reliable identification of emotions than human perception can. Furthermore, we did not discuss the interaction between gender and age on facial expressions because they do not match the theme of this study and our study results cannot support a deeper analysis on this relationship. Finally, only two locations were employed in this study, which is enough to support a frontier study on validation and assessment. Although we employed practical methodology with validation, our data still may suffer uncertainties from collection and unexpected errors. It is likely that more tests on dataset from more cities and locations would increase accuracy with the results. The inaccuracy of matching human and machine-recognized scores will be reduced with the increase of object number. Future work is encouraged to build upon this study and include additional demographic data.

Conclusions
We compared the accuracies to match the recognized facial expressions given by FireFACE with those assessed by other instruments or artificial approach. Facial expressions of angry, disgusted, happy, sad, neutral, and surprised emotions passed the validation test because their scores were within a level of statistical acceptance. However, contempt and scared expression scores were too low and these were excluded from further analysis. We collected a total of 2,886 photos from visitors in constructed urban forests and in a promenade during summertime in Shenyang, Northeast China. There were no extreme differences in emotional expressions between forest and urban locations. A gender interaction with location showed that women exhibited more positive, but less negative, expressions in forests than in the promenade. Synthesizing these findings, we suggest that women visit constructed urban forest parks more often to elicit greater happiness and decreased negative emotions.