TECLA: A temperament and psychological type prediction framework from Twitter data

Temperament and Psychological Types can be defined as innate psychological characteristics associated with how we relate with the world, and often influence our study and career choices. Furthermore, understanding these features help us manage conflicts, develop leadership, improve teaching and many other skills. Assigning temperament and psychological types is usually made by filling specific questionnaires. However, it is possible to identify temperamental characteristics from a linguistic and behavioral analysis of social media data from a user. Thus, machine-learning algorithms can be used to learn from a user’s social media data and infer his/her behavioral type. This paper initially provides a brief historical review of theories on temperament and then brings a survey of research aimed at predicting temperament and psychological types from social media data. It follows with the proposal of a framework to predict temperament and psychological types from a linguistic and behavioral analysis of Twitter data. The proposed framework infers temperament types following the David Keirsey’s model, and psychological types based on the MBTI model. Various data modelling and classifiers are used. The results showed that Random Forests with the LIWC technique can predict with 96.46% of accuracy the Artisan temperament, 92.19% the Guardian temperament, 78.68% the Idealist, and 83.82% the Rational temperament. The MBTI results also showed that Random Forests achieved a better performance with an accuracy of 82.05% for the E/I pair, 88.38% for the S/N pair, 80.57% for the T/F pair, and 78.26% for the J/P pair.


Introduction
The study of psychological types or temperament lead us to the understanding of how a person relates with the world, either by the choices he makes or the way he absorbs information. For a long time, this theme has been researched and associated with well-being, lifestyle, employment, leadership, study, etc. One way of knowing a person's psychological type is by submitting him to questionnaires about his habits and choices, for example the MBTI (Myers-Briggs Type Indicator), which returns the psychological type of a person and is based on the studies of Jung and Myers-Briggs, and the Keirsey Temperament Sorter (KTS), which returns a profile associated with the temperament taxonomy created by David Keirsey. In general, such forms involve many questions and can be biased by the environment in which the respondent is. One way to balance this bias would be to extract information in a passive way, for example, in the interactions (posts, likes, etc.) within social media, a service increasingly present in our daily lives. Social media can be seen as repositories of actions, behaviors and preferences that can be mapped onto psychological features. This occurs due to a user-free content creation, where each person has a role in creating and sharing content [1]. Wiszniewski and Coyne [2] argue that whenever an individual interacts in a social sphere he paints before himself a mask of his identity that becomes even more pronounced as the individual needs to fill in a profile.
The goal of this research is to identify if there are behavioral patterns in the information shared in social media that can be mapped with high precision into the psychological types of the MBTI or the temperaments of Keirsey. This is, therefore, an exploratory paper on the ability of traditional text mining techniques and natural language processing to assist in the extraction and classification of patterns. From our literature review we expand the combinations of text pre-processing techniques and classification algorithms in relation to the papers presented here. We also mapped a database of MBTI results in the Artisan, Guardian, Idealist and Rational types in order to demonstrate the applicability also in the concept of temperament proposed by David Keirsey. In terms of application, it is useful for the preparation of marketing campaigns, more accurate hiring and promotion processes, turnover reduction, improvement of working environment quality, and many other applications related to human capital recruitment, selection and maintenance.
The goal of automating the prediction of temperament and psychological type is not to replace the use of tests already validated, but, instead, to provide a new tool based on a completely different and passive data to support specialists. More specifically, this research will be based on Twitter data as case study, mainly due to its flexibility in providing open data for collection and analysis. This paper presents a series of classifiers evaluations to map the behavior of social media users, based on their Twitter posts, in relation to the temperament and psychological type and summarize the methodology in a structure called Temperament Classification Framework (TECLA).
To assess the performance of the proposed framework we used a dataset from the literature containing over a million tweets from 1,500 users. Five classification algorithms were evaluated: Naïve Bayes (NB); Support Vector Machines (SVM); Decision Tree (J48); Multilayer Perceptron (MLP); and K-Nearest Neighbors (KNN). We compare these algorithms with Twitter features and three text representation schemes (MRC, LIWC, Apache OpenNLP) to find a suitable combination to determine the temperament and psychological types based on Twitter messages.
The paper is organized as follows. Section 2 provides a brief historical perspective on temperament theories, emphasizing the models proposed by Myers-Briggs and later Keirsey. Section 3 brings a brief review of the works in the literature dealing with the automatic classification of temperament and psychological types. The Temperament Classification Framework (TECLA) is presented in Section 4, and its performance is analyzed in Section 5. The paper is concluded in Section 6 with a general discussion and perspectives for future research.

A Brief historical perspective on temperament theories
Temperament characterizes a set of mental tendencies related to the way someone perceives, analyzes and makes daily decisions [23]. It represents the uniqueness and intensity of psychic affects and the dominant structure of mood and motivation in each individual. It is a form of reaction and sensitivity of a person to the world, which is revealed by his/her attitudes and behaviors, thus composing his/her organic basis [24]. This set of trends is innate, that is, it appears from birth, and is closely linked to biological or physiological determinants, which therefore change relatively little with development [25]. It can change and weakens throughout life, but it is never eliminated [24]. In the present research, temperament is defined as a set of innate and hereditary tendencies, responsible for how one perceives and interacts with the world.
The literature is filled with different terminologies to refer to temperament, based on the authors' view of such characteristics. For instance, Hippocrates called it the four humors, Carl Jung, Isabel Myers Briggs and Katharine Cook Briggs called it psychological types, and Carlos Galeno and David Keirsey, called it temperament [25,26,27]. We summarize the temperament as a concept that converges to a set of innate characteristics of an individual, closely linked to biological or physiological determinants, which change relatively little during the personal development [25].
We adopted the temperament model proposed by David Keirsey [27] and the psychological types introduced by Myers and Briggs [28]. Keirsey's model maps temperament into four types: artisan; guardian; idealist; and rational. This model is widely accepted for the understanding of professional trends, thus being potentially applicable in recruitment and selection processes, promising areas for social media data analysis. The Myers and Briggs' model has a set of 16 psychological types that were investigated and defined from the studies of Carl Jung on the psychological types.
Carl Gustav Jung proposed one of the most comprehensive and well-known temperament typologies in his book Psychological Types [29]. Jung analyzed the temperament according to the workings of the mind. For him the mind is composed of an association between attitudes and functions. The attitudes (extroversion (E) and introversion (I)) would be the source of psychic energy and the functions correspond to the way each individual acquires and processes information. Jung related four functions, two referring to obtaining information: sensation (S) and intuition (N); and two for decision-making: thought (T) and feeling (F) [25]. Then, Isabel Myers and Katheryn Myers Briggs added a new pair of functions: judgment (J) and perception (P), which assess whether an individual's orientation to the outside world comes from a rational (judging) or irrational (perceiving) function. D. Keirsey [27] focused his research on the parallel between the Myers-Briggs taxonomy and the observation of temperament in action at the time of choices, behavior patterns, logic and consistency. He assumed that the temperament associated with character forms the personality of the individual; the temperament being innate and the character emergent, developed by the interaction of temperament with the environment. Thus, the types are driven by aspirations and interests, which is what motivates us to live, act, move and play a role in society [27]. He noted that the interests and aspirations are more related to the perception (S-N), totally instinctive, more than to decision-making (T-F), which is fully rational. The sensation (S) can combine with judgment (J) or perception (P), while intuition (N) with feeling (F) or thinking (T). This observation resulted in four temperament types: Guardian (SJ); Artisan (SP); Idealist (NF); and Rational (NT) [23,27].
Although the characteristics of Myers-Briggs model is binary (dichotomic), there are studies that suggest that a better representation would be continuous with degrees of belonging to each function and attitude [30,31,32]. The inventory provided by Myers and Briggs aims to determine which of two functions or attitudes is preferred. The score indicates the tendency in the dichotomy. Results with low scores suggest a tension between the opposite pairs rather than an indication of equal preference. However, the tension is unclear whether the equal represents strength in both pair, equal weakness in both areas, or equal neutrality in both areas [33]. We have adopted the binary standard due to our methodology for acquiring a dataset since the disclosure of the MBTI result by a social media user occurs through the label (ENTJ, INFP, etc.), without direct association with the score in each pair.

Automatic temperament classification: A literature review
Understanding social media users involves the analysis of their behaviors and interactions in social media, like their followers, mentions, messages, friends, photos, videos and comments. Understanding the users means being able to quantify and qualify how they present themselves [34]. The automatic recognition of temperament by means of computational techniques can help many business sectors and social researchers in understanding social media users. To date, there are only a few works related to the automatic temperament/psychological types classification in the literature, that is, Keirsey and MBTI labels. The main reason for the scarcity of works in this area is the difficulty in finding data for training classifiers. This section provides a review of the specific works found in the literature related with these two topics. Although there are many other works addressing the prediction of user characteristics from social media data, these are out of the scope of the present paper.
Luyckx and Daelemans [35] created a 200,000-word Personae corpus consisting of 145 undergraduate student essays about an Artificial Life documentary written in Dutch. Besides, the students submitted their MBTI profile. In this work, the authors performed an authorship attribution and personality prediction. The Memory-Based Shallow Parser (MBSP), n-gram and Lexical features were used to extract the text features. For personality prediction, a 10-fold cross-validation training was performed with a method based on the K-NN algorithm, called TiMBL (Memory-based learning). The experiments contained 84 binary classification tasks, each one for the MBTI dichotomy. The authors concluded that the prediction of introvertedextraverted and intuitive-sensing were fairly accurate, with average F-measures of 65.38% and 61.81%, respectively.
Komisin and Guinn [36] developed a system based on the classification of documents to determine the psychological type according to Myers-Briggs model. In their experiments, they used a Naïve Bayes classifier and Support Vector Machines. Data were collected as part of a postgraduate course in conflict management offered to undergraduate students, in which students performed the MBTI and Best Possible Future Self (BPFS) tests. The BPFS contains selfdescriptive elements, in present and future, in different contexts (e.g., work, school, family, finances). Data were collected over three semesters between 2010 and 2011. The n-gram and Linguistic Inquiry and Word Count (LIWC) were used to provide a representation of texts. The authors concluded that the dichotomies Thinking/Feeling (T/F) were predicted with over 75% accuracy for the precision and recall measures using Naïve Bayes with leave-one-out cross validation. For the Intuitive/Sensing (N/S) dichotomy, the LIWC features resulted in less successful predictions. Introversion/Extroversion (I/E) and Judgement/Perception (J/P) did not achieve good precision and recall results.
Brinks and White [37] used various algorithms to detect the Myers-Briggs temperament types in tweets. The aim of the project was to develop a computer system capable of performing the function of the human analyst trained to apply the MBTI based on textual communication. The authors argued that although the results of the MBTI are confidential, many individuals openly reveal their type in a variety of ways and media, including Twitter. They showed that, in a search on Twitter with the term "#INFP" messages were found such as: "I just reread the Myers-Briggs description of my #INFP personality type. It's scary accurate". Thus, the data were collected from users that revealed their temperament profiles. 6,358 Twitter users were observed and it were collected two hundred tweets from each. In total, it was analyzed 960,715 tweets. On average, classifiers achieved a precision of 66.25%.
Plank and Hovy [38] collected 1.2 million tweets classified according to the Myers-Briggs system. For these, the authors monitored messages that mentioned any of the 16 types associated with the words Briggs or Myers. Thus, they obtained 1,500 different users, and collected between 100 and 2,000 of their latest tweets, resulting in a corpus of 1.2 million tweets. The authors structured the messages using n-grams, in addition to the genre information, tweets count, number of followers, number of followings, among other service features. One goal was to find out which attributes would be more characteristic in each dimension of the Myers-Briggs model. They used logistic regression to analyze the attributes in each dimension and concluded that the data can provide enough linguistic evidence to predict the dimensions reliably: Introversion/Extroversion and Feeling/Thinking. Verhoeven et al [39] created a MBTI dataset in six languages (Dutch, German, French, Italian, Portuguese and Spanish) with 18.168 users and approximately 34 million tweets in total distributed among the languages. They used the same methodology presented in [38] to collect the data. After the construction of the database, the authors performed classification tests to predict both gender and Myers-Briggs personality dimensions (I/E, N/S, T/F and J/P). For the experiments the authors used 200 tweets per user and discarded those who had fewer than 200 messages. The authors used LinearSVC with standard parameters with n-grams. The classification was performed using 10-fold cross-validation. Considering all languages, the average Fmeasure for the I/E dimension was 67.87%, 73.01% for the N/S dimension, 58.45% for the T/F dimension, and 56.06% for the J/P dimension.
Lukito et al. [40] used Twitter as data source in Indonesia to predict personality and performed an MBTI psychological test with a user base of 97 people. Approximately 240,000 tweets were collected, an average of 2,500 tweets per Twitter user. They selected 15 users for testing and changed the training set size according to the experiment. The classification algorithm used was Naïve Bayes and the messages were structured by n-gram and POS-tag. The best result was achieved for the I/E dichotomy with 80% accuracy, the other dichotomies had the same 60% accuracy levels. The authors compared their results with the work proposed by [38], concluding that their proposal was superior for the pairs I/E and J/P, being the latter one of the most difficult to predict.
Lima & de Castro [1] developed a framework called TECLA to predict temperament types (Artisan, Guardian, Idealist, and Rational). The dataset with approximately 29.200 tweets was collected from Twitter. They used LIWC text representation and Twitter user's account information (like tweets count, number of followers, and number of followings). The authors used NB, KNN, SVM and Decision Tree algorithms to evaluate the proposal. The best accuracy results were in Artisan and Guardian with 87.67% and 83.56%, respectively. The accuracy did not exceed 60.27% for the Idealist temperament and 58.90% for Rational. Table 1 shows a summary of the papers found, detaching the classification algorithms, main features and performance measures used. It also presents the best results obtained based on the measures adopted. The results of [37,38,40], all based on tweets, suggest a higher predisposition for I/E and N/S pairs. The F-measure in [36] was obtained from the Precision and Recall.

TECLA: The temperament classification framework
The Temperament Classification Framework (TECLA) was developed as an outcome of the use of text mining and natural language processing techniques to classify the temperament or psychological type of social media users. The goal is to provide a modular structure that allows us to use and evaluate different techniques quickly and intuitively. Furthermore, it follows the main steps of KDD (Knowledge Discovery in Databases) [41]. Hence, the TECLA has the following modules: data acquisition module; message preprocessing module; temperament classification module; and evaluation module. Each one of them will be detailed in the following.

Data acquisition module
The data acquisition module is responsible for monitoring and receiving information from the users to be classified. For example, in the case of Twitter, it is necessary to obtain usage information, such as number of tweets, number of followers and followed, plus a set of tweets.

Message pre-processing module
The TECLA framework does not work directly with the tweets, but uses information extracted from them, called meta-attributes. Such information can be divided into two categories: grammatical and behavioral. The behavioral category extracts information about the social media use and is specific to each type of media. In the case of Twitter, it includes the number of tweets, number of followers, followed, favorites, number of listings and number of times the user was favorited. The grammar category considers information from LIWC [42,43], MRC [44], sTagger [45], or oNLP [46], extracted from the user's set of messages, similarly to what was proposed in the Polarity Analysis Framework introduced by the authors [14]. Therefore, the message pre-processing module is responsible for extracting meta-attributes from the data (usage and message corpus) and building a new base, called meta-base, from the extracted meta-attributes. The list of meta-attributes used in TECLA is summarized in Table 2.

Temperament classification module
The temperament classification module infers a temperament from the characteristics (metaattributes) extracted in the previous module. In principle, this module is based on the application of a specific algorithm and can incorporate any kind of classifier. For the classification of the MBTI model, the system was designed with four classifiers (Fig 1) that receive the same data, but is trained to identify the opposing pairs of attitudes and functions. A classifier is trained and responsible for defining the attitude (Extroversion/Introversion-E/I) and the others the functions (Intuition/Sensation-N/S, Thinking/Feeling-T/F, Judgment/Perception-J/P), all trained in isolation. These classifiers were called decomposing classifiers. Each of these classifiers is binary, so the answer is either Extroversion or Introversion, Intuition or Sensation, Thought or Feeling, Judgment or Perception. After training, the response of the four classifiers will define the psychological type, e.g., ISTJ or ENFP (Section 2). Therefore, the psychological type of each user was split into four binary classes. The user may be extroverted or introverted, intuitive or sensory, thinker or sentimental, and judgmental or perceptive, as illustrated in Fig 2. For the classification based on the Keirsey model a sequence of classifiers was constructed. As pointed out in [47] one of the strategies to work with multiclass classifiers is the combination of classifiers generated in binary subproblems. With this, there is a decomposition of the problem into binary problems. Separating the problem into binary classifiers can reduce the computational complexity involved in solving the total problem with simpler subtasks. In this case, the classifier has the same scheme shown in Fig 1, however, the first classifier that returns the result "1" will determine the class of the object, as illustrated in

Evaluation module
In order to measure the TECLA performance, it was used the accuracy, F-measure, which involves precision and recall, and the area under the ROC curve (AUC). Accuracy is the number of objects correctly classified over the sum of all objects. The F-measure represents the harmonic mean between precision and recall, where precision is the percentage of a class correctly classified and recall is the number of objects correctly classified over the total number of objects that really belong to that class [48,49]

Performance assessment
The goal of this study is to design a temperament predictor that can infer the temperament of a certain individual (social media user) based on what he writes in the social media, instead of applying him a specific temperament test. This is a very interesting and promising approach, because it allows one to know someone's temperament in spontaneous situations. To assess the performance of TECLA we used a recent, public dataset with over one million tweets.

Data acquisition
The database used comes from the [38] paper, in which the Twitter users are classified according to the psychological types of Myers-Briggs. The dataset contains 1.2 million tweets from 1,500 users. The number of tweets varies from one user to another. To be part of the database a user needs to have at least 100 tweets and we downloaded at most 2,000 tweets per user. The attributes available and useful are: MBTI; gender; number of followers; number of tweets; number of favorites; and number of listings. Table 3 shows the user distribution for each psychological type of the Myers-Briggs taxonomy. Although considered rare, the intuitive types, especially the INFP and INTJ, were the most common types within the collected database. By  The ratio between each element of the E/I, N/S, T/F, and J/P pairs can be seen in Table 4. There is a clear imbalance between the N/S pair, which may reflect the classification results. However, for this study, no class balancing was performed because it would imply a reduction in the number of users in other pairs.
To evaluate the Keirsey model, each MBTI type was mapped into its model (Artisan, Guardian, Idealist and Rational). Table 5 describes the number of users by temperament. The Artisan and Guardian classes have the smallest number of users, because of the predominance of intuitive in the database (Idealists and Rational).

Pre-processing
The attributes provided by the Plank dataset are called behavior attributes, in reference to the behavior of users in the microblog. Table 6 shows the average value of the behavior attributes for each temperament (followers, statuses, favorites, listed and gender). In all temperaments/ psychological types the predominant gender was female. In the N/S pair we emphasize the fact that the sensorial ones have, on average, more followers and tweet more frequently, although this is the function with fewer representatives in the database (only 22.53%). The difference between Guardians and Artisans, both sensory, is greater in relation to the number of followers and listed count. On the other hand, among the intuitive there is a greater balance in the way of using the microblog.

Experimental results
All tests were performed with 10 runs of a k-fold cross-validation (k = 5). First the results will be presented for the Keirsey model, then the MBTI model. In both cases, it is expected to show the ability of the classifiers to infer each of the classes, that is, if from the input data it is possible to identify an Artisan, Guardian, Idealist or Rational person, or, based on the MBTI model the pairs E/I, N/S, T/F, J/P. In all cases the measures adopted to evaluate the classifiers were the accuracy per class (percentage of correct classification per class, ACC), the F-measure (F), which is the harmonic mean between Precision and Recall, as discussed previously, and the area under the curve (AUC). The AUC is a summary of the ROC curve (sensitivity versus specificity), and high levels in AUC indicate that, on average, the true positive rate is higher than the Idealist; and Rational. The goal is to answer the following question: "Is it possible to infer the user's temperament based on his posts?". Our tests began with an attribute analysis to understand the best possible configuration. We performed a ranking of the importance of the attributes based on the information gain to perform attribute selection tests and analyze the best results by observing the accuracy and F-measure. Note that our technique separates binary classifiers for each temperament, so the results are divided into ACC, F-measure for class 0 ("No", which means does not have the temperament), F-measure for class 1 ("Yes", which means has the temperament), and AUC of positive result ("Yes").
Our first analysis refers only to the Twitter attributes and Table 7 below summarizes these results. It is possible to note that, in general, there is a tendency for the classifiers to choose the "No" class, which is the predominant class. Thus, the F-measure for the "Yes" is low. By comparing the ACC and AUC the best result was achieved with the Random Forest using the 5 attributes (total number of tweets posted by the user so far, number of followers, number of followed, number of times the user was listed, and number of times the user was favorited).
We proceeded testing in scenarios in which the Twitter attributes would not be available, but only the text of tweets. For this case, we have tested three text structuring techniques separately, as mentioned in the pre-processing section: MRC, LIWC and oNLP. Note that the performance had the same behavior of the previous evaluation with a low F-measure for the "Yes" class, indicating the trend in the classifiers for one of the classes. Table 8 presents the best performance with 9 attributes with the Random Forest (87.48%±0.25%) and Bagging (83.23% ±0.42%) algorithms. The combination SVM + MRC Features was not successful, because the algorithm could not identify patterns for the class Yes (0.00±0.00%). The LIWC showed a better performance, which could be noticed in the AUC measure. By analyzing the results we observed that the Random Forest performance is usually superior; the best accuracy was 87.99%±0.29% with 25 attributes (Table 9). In general, there was no significant change in accuracy and the choice for these attributes was due to the F-measure (Yes). However, there was substantial improvement in the AUC value. Thus, the best performance was obtained by the Random Forest with 25 attributes: 91.14%±0.13% for the F-measure (No); and 70.52%±0.81% for the F-measure (Yes).
Similarly to the Twitter and MRC results for oNLP, Bagging and Random Forest (also J48) achieved an AUC above 70%, indicating a better identification of "Yes". By observing the other measures, again the Random Forest algorithm had the best performance with the oNLP 24 attributes (Table 10). Therefore, the average accuracy was 87.60%±0.33%, the average F-measure (No) was 90.95%±0.31%, the average F-measure (Yes) was 69.68%±0.63% and AUC was 86.12%±0.76%.
Based on its superior performance for all text representation mechanisms, Table 11 details the results of the Random Forest algorithm. By observing the different text representations, the best classification result occurred in the Artisan temperament with 96.46%±0.27% of accuracy for LIWC (25 attributes). These results suggest that the system can be more precise to find what is not Artisan, with all features with an average F-measure of 97.60%±0.24% for Twitter, 98.09%±0.22% for MRC, 98.11%±0.14% for LIWC and 98.08%±0.13% for oNLP. This can also be observed for the Guardian with F-measure (No) of 94.66%±0.30% for Twitter, 95.42% ±0.24% for MRC, 95.61%±0.24% for LIWC and 95.51%±0.25% for oNLP. For the idealist the classifier was able to better discriminate the two classes. In the best scenario (LIWC 25 features) the F-measure (No) was 81.47%±0.50% and F-measure (Yes) 74.89%±0.87%. The AUC Results for the MBTI Model. The second set of results presented here are for the decomposed classifiers for the MBTI model. Each classifier is responsible for one of the MBTI pairs. In all cases, the same classification algorithm will be run for all classifiers. The goal is to answer the following question: Is it possible to identify the user's psychological attitudes and functions based on what he/she writes in social media? If it is possible, then a deeper understanding of the virtual persona can be achieved by analyzing social media data. As our previous analysis with the Keirsey model prediction, we also performed an attribute analysis for the MBTI model prediction. Table 12 summarizes the results of the Twitter attributes' evaluation. Both F-measure (No) and F-measure (Yes) have a value less than 70%, except for Bagging (71.97%±0.34%) and Random Forest (79.29%±0.23%) with 5 attributes, that is, with all the original attributes of the The results for the LIWC attributes analysis, presented in Table 14, show a better performance (AUC) for 24-28 attributes with the best result for 27 attributes associated with the Bagging (82.85±0.43), J48 (77.04±0.43) and Random Forest (87.79±0.56) algorithms. This suggests that the LIWC attributes may better characterize the problem when compared with the previous results also for the Naïve Bayes, AdaBoost and SVM.
As with LIWC, the oNLP (Table 15) results were satisfactory for Bagging, J48 and Random Forest, mainly for 22 attributes. The highest accuracy was 82.15%±0.14% for the Random Forrest. The Naïve Bayes classifier had the worst accuracy level with only 60.69%±0.13%. Ada-Boost and SVM achieved, respectively, 65.02%±0.16% and 64.75%±0.15% of accuracy. Comparing the AUC results, the SVM had the worst performance. Table 16 details the Random Forest algorithm results due to its overall superior performance. For the studied database it was possible to predict the E/I pair with a mean average accuracy of 82.05%±0.65% for the oNLP features. The F-measure (No) indicates the first letter in the pair. In the E/I case, the Random Forest with oNLP features had an F-measure for Extroversion of 87.12%±0.44% and 70.38%±1.26% for Introversion. The pair S/N achieved 88.38% ±0.68% of accuracy also with oNLP. The F-measure for N (intuition) was 92.66%±0.41% and 72.13%±1.94% for S (Sensation). In T/F the accuracy was 80.57%±0.80% for LIWC with 27 attributes, 84.49%±0.63% of F-measure to F (Feeling) and 74.01%±1.15% of T (Thinking) Fmeasure. The pair J/P had the lowest accuracy of 78.26%±0.79% (LIWC with 27 attributes). The precision was better in J (Judging) with 81.49%±0.66% of F-measure. Like the Keirsey type prediction, the AUC indicates a good performance of true positive in relation to false positive rate.

Comparing with results from the literature
Finally, in Table 17 we compare our Keirsey and MBTI results with the literature. By analyzing the results from [38] our performance was superior for all MBTI pairs. We have also been more effective in the I/E and N/S pairs, however the use of Random Forest combined with other forms of text representation has promoted better performance for T/F and J/P pairs. For the classification results of Keirsey temperaments we compared with previous results obtained in the first steps to build this tool. In this case, we have also achieved an increase in performance. The F-measure in [36] was obtained from the Precision and Recall.

Discussion and future trends
The purpose of this paper was threefold: to provide a brief historical review on temperament theories; to make a brief survey of machine-learning research on temperament and psychological type prediction; and to investigate the temperament and psychological types prediction based on data produced by social media users. In this latter contribution, the hypothesis this work tries to validate is if it is possible to predict the virtual persona temperament without using a questionnaire, that is, to use artificial intelligence techniques to understand and classify the profile of users based on what they share and how they behave in social media. The importance of this tool lies in trying to lessen a possible bias provided by questionnaires, when a user knows he is being explicitly evaluated. From the literature review we seek to extend our previous results [14], both on text processing techniques and algorithms for building predictive models. With this, we present a set of results based on the combination of different text structuring techniques and classification algorithms. Derived from the proposals presented by [37] and [38], on Twitter data, we aim to identify the ability of the models to estimate the temperament typology proposed in [27]. We believe in the importance of understanding the behavior of users on social media, and we also believe that information such as psychological types can help in this regard. This information can serve as input to many profiling systems in various areas. Here, we did an exploratory study aimed at understanding the potential of machine learning techniques for temperament identification. We would like to expand this research to new databases both from Twitter and other social media in order to explore the framework potential. We would also like to present case studies applying TECLA to different groups of users, and thus answer questions such as: What are the profiles of people who talk about the same subject? What is the profile of people who watch a TV show, movie or series? What is the profile of people who consume a specific product or service?
Finally, further research will also assess the computational scalability of TECLA when using High Performance Computing (HPC) platforms. We performed some preliminary experiments in this direction with one of the best scenarios obtained in this paper (i.e., Random Forest for the Keirsey model and Twitter 5 features) using an Intel Xeon Platinum 8160 processor @ 2.10 GHz, each one with 24 physical cores (48 logical) and 33 MB of cache memory, 190 GB of RAM and obtained a significant gain in performance. As social media data arrives continually, a comprehensive set of experiments will be performed to assess the scalability of TECLA in HPC platforms.