Analysis of English free association network reveals mechanisms of efficient solution of Remote Association Tests

We study correlations between the structure and properties of a free association network of the English language, and solutions of psycholinguistic Remote Association Tests (RATs). We show that average hardness of individual RATs is largely determined by relative positions of test words (stimuli and response) on the free association network. We argue that the solution of RATs can be interpreted as a first passage search problem on a network whose vertices are words and links are associations between words. We propose different heuristic search algorithms and demonstrate that in “easily-solving” RATs (those that are solved in 15 seconds by more than 64% subjects) the solution is governed by “strong” network links (i.e. strong associations) directly connecting stimuli and response, and thus the efficient strategy consist in activating such strong links. In turn, the most efficient mechanism of solving medium and hard RATs consists of preferentially following sequence of “moderately weak” associations.


I. INTRODUCTION
Representation of a large number of interacting agents by a network is one of the most powerful ways of efficient treatment of various types of data in biological, technological, and social systems [1,2], as well as in cognitive processes. A network is a set of nodes (the elementary indivisible units of a distributed system) and binary relations (links) between them. There is a plenty of ways to build networks in the cognitive science, with various setups relevant for a problem-dependent specific conditions. Historically, semantic networks were used to represent a "knowledge" by establishing directed or undirected semantic relations (graph links) between the "concepts" (graph nodes) [3]. Such networks are useful to study the "intermediate" (or "mesoscopic") scale of organization in the human cognition. However, in attempts to model cognitive processes, it has been realized that the "microscale" network organization, i.e the structure of the detailed concept-to-concept connections, is very important.
Advances in graph-theoretic methods of studying cognitive functions are inextricably linked with pioneering works [4][5][6]. Since then, the number of investigations in the field has grown rapidly, and in particular a lot of attention has been paid to the study of large-scale semantic networks. In such networks, words (e.g., nouns) are nodes connected by links indicating semantic relations between them. There is a variety of characteristics of semantic proximity: one can connect the nearest neighboring words in sentences (so-called syntactic networks), or one can connect words according to standard linguistic relations between them (synonymy, hyper-or hyponymy, etc.). Finally, one can assemble networks of words based on various psycholinguistic experimental data.
Large-scale semantic networks possess a specific pattern of connectivity, presumably imposed by the growth processes by which these networks are formed. Typically, such networks demonstrate a power-law structure in the distribution of links across the nodes, such that most network nodes have a low vertex degree, however there are some nodes with a very high degree -they play the role of hubs.
Important and fast-growing area in the field of linguistic networks is related to the socalled "word embedding" [7]. "Word embedding" is a set of language-modeling techniques based on mapping of words to vectors of numbers, usually in a multidimensional Eucledian space. The semantic similarity of two words is defined as a scalar product of the corresponding vectors. Such a procedure results in construction of a complete weighted network of words (each pair of words is connected by a weighted edge, whose weight is the semantic proximity) and the majority of edges have very small weights. Removing all links with weights less than a preset threshold, results in a network with nontrivial topological properties. It might be very productive to generalize the "word embedding" ideology to non-Eucledian spaces, in particular to spaces of constant negative curvature which are natural target spaces for scale-free networks [8,9]. The detailed discussion of these questions goes beyond the scope of the current work and will be published separately [10]. To complete this short overview of various theoretic approaches, let us mention that recently several attempts have been made to treat semantic networks as multiplexes (i.e. multilayer networks). Such approaches seem to give deeper insight into the formation of mental lexicon [11] and early word acquisition [12].
One particularly interesting class of semantic networks is a network of free associations [13][14][15][16][17]. That is a class of networks obtained in the following real experiment. Participants ("test subjects") receive words ("stimuli") and asked to return, for each stimulus, the first word coming to their mind in response. The responses of many test subjects are aggregated and a directed network of weighted links between the words (stimuli and responses) reflecting the frequencies of answers is constructed. The study of these networks has a long history [13,14]. In what follows, we use a free association network obtained in the "English Small World of Words" project (SWOW-EN) [15]. The online data collection procedure allowed the authors of [15] to aggregate data for more than 12 000 stimuli words. The data was collected in 2011-18 and consists of responses of more than 90 000 test subjects. As a result, the network contains many weak (i.e. rare) associations, which were not registered in earlier experiments.
In our work we use free associations network and propose some heuristic mechanisms of solving the so-called Remote Associates Test (RAT). The RAT has been invented by S. Mednick in 1962 [21] and later has been repeatedly used in cognitive neuroscience and psychology [18][19][20] to study insight, problem solving and creative thinking. The RAT test subjects are given sets of three stimuli words (e.g. "surprise", "line", "birthday") and are requested to find a fourth "return" word, which is simultaneously associatively related to all three stimuli (in our example it is the word "party").
Possible mechanisms of RAT solving have been extensively discussed in the literature. In [22] authors analyzed sequences of guesses, which came to mind during RAT solving.
They measured the similarity between guesses, stimuli, and responses using the "Latent Semantic Analysis" [22] and concluded that there are two systematic strategies of solving multiply constrained problems. In the first strategy generation of guesses is primarily based on just one of three stimuli, while in the second strategy it is implied that the test subject is making new guesses based partially on his/her previous guesses. In [23] the Metropolis-Hastings search model has been used, which involved the transition probabilities based on geodesic (shortest) distances along the network from the stimuli to the response. The authors underline the importance of association strength between words in the process of RAT solving. The work [24] is devoted to the design, implementation and analysis of a computational solver, which can answer RAT queries in a cognitively inspired manner. In [24] authors developed an artificial cognitive system based on an unified framework of knowledge organization and processing. They took into account associative links between the concepts in the knowledge base and the frequency of their appearance. In a later work [25], it has been shown that the association strength and the number of associations both have important separated effects on performance of RAT solving. Finally, the spiking neural network model is proposed in [26]. There, RAT solving is simulated as a superposition of two cognitive processes: the first one generates potential responses, while the second one filters the responses.
In our study we address two main questions. First, we study connection between the average hardness of a particular RAT and the position of stimuli and response on the free association network. We show that the RAT hardness can be predicted reasonably well by examining the network structure. Second, we discuss possible heuristic cognitive search algorithms of solving RATs, and study ways of their optimization.
The paper is organized as follows. In Section II we provide a brief characteristic of used datasets: the structural properties of the free association network, and the quantitative definition of RAT hardness. In Section III we study correlations between the RAT hardness and the relative position of stimuli and response on the free association network. We show that the RAT hardness significantly correlates with the aggregated weight of directed bonds stimuli → response, as well as with the aggregated weight of multi-step chains of associations. On the other hand, there is essentially no correlation between the RAT hardness and the weights of reverse (response → stimuli) bonds. We argue that such an asymmetry can be interpreted as a sign that solving a RAT is a first-passage problem: the correct response is easy to identify as soon as one finds it along a directed path on the network. In Section IV we study various ways of enhancing the probability of a fast solution of a RAT. We argue that the search strategies with resetting seem to be preferred for both nearest-neighbour search, and the search by infinitely long chains of associations. We discuss in details the role of weak associations in the search. In particular, we show that the best strategy for solving easy RATs implies removing all weak associations, and following only the strong ones. In turn, solving medium and especially hard RATs in the same way is often impossible. Instead, the probability to find a solution of hard RATs gets enhanced when the search runs preferentially along moderately weak bonds (associations). In Discussion we summarize the obtained results and propose possible direction of further investigations.

II. DATA ANALYSIS
We use the free association network described in [15] and known as "English Small World of Words project" (SWOW-EN). It is a weighted directed network with N = 12 217 stimuli words. The brief summary of the network topological characteristics is presented in Table I. In Fig. 1a we show the distribution of in-and out-degrees of the network. The out-degree distribution (blue) is Poissonian, its average is controlled by the experimental setup: the bigger the number of test subjects per stimulus word, the larger the average degree. In turn, the in-degree distribution ρ(k) (orange), where k is the vertex degree of incoming bonds, has a power-law tail, i.e. ρ(k) ∼ k −γ with γ ≈ 3. Interestingly, such a shape of the in-degree distribution seems to be quite universal: similar values of γ are observed for networks in other experiments with English networks [13,14], and also for the Russian free association networks [16,17]. In Fig. 1b we demonstrate the cumulative distribution P (w) of weights w of links of SWOW-EN (all weights lie within the interval [0.01, 1.00]) and in Fig. 1c we depict the size of largest strongly connected component of SWOW-EN as a function of the link weight cutoff, w cut . Note that the size of strongly connected component of SWOW-EN collapses at the link cutoff about w * cut = 0.08, which corresponds to removal of 95% of network edges. The SWOW-EN above w * cut is not percolating anymore, and splits in disjoint components. We use data for the hardness of RATs solution provided in [27]. We restrict ourselves to 138 problems (combinations of three stimuli and response) out of 144 studied in [27], for which all four words are present in SWOW-EN network and have non-zero out-degree. The hardness of a RAT is quantitatively characterized by the value H (0 ≤ H ≤ 1), measuring the fraction of test subjects who correctly solved it in 15 seconds (for each of 138 problems under consideration). Additionally, we divide problems into three broad categories: "easy", "medium", and "hard". The problem is easy if it has been solved in 15 seconds by more than 64% test subjects (0.64 ≤ H ≤ 1), medium if it was solved by 32% ÷ 64% test subjects (0.32 ≤ H ≤ 64), and hard otherwise (0 ≤ H ≤ 0.32). There are 15 easy, 38 medium, and 85 hard problems. For completeness, we have reproduced in Appendix A used dataset of 138 problems (out of 144 presented in [27]).

III. CORRELATION BETWEEN AVERAGE RAT HARDNESS AND WEIGHTS OF EDGES IN A FREE ASSOCIATION NETWORK
The strength of an association between two given words (vertices) in a free association network, G, is described quantitatively by the weight of the corresponding directed link. The whole set of weights is encoded in the weighted adjacency matrix, W (G). The element, w ij , of the matrix W is equal to the strength of directed association i → j if such association exists, and 0 otherwise (i.e. if the directed link i → j is absent).
Our main heuristic assumption is that to solve a RAT problem, a test subject performs a search on a free association network, which is a reflection of a search process happening in memory. Such a search process might imply, for example, exploration of all direct associations of all three stimuli words, or following chains of consequently extracted associations starting from stimuli words (such a chain may or may not be limited in length). More sophisticated search strategies can be used as well: for example, one may follow paths on network with preference of weak associations, or one may use some synergy between stimuli words (e.g. choosing words with strong associations with two or more stimuli words), etc. Finally, there exist a possibility that the solution is found but it is not recognized as the right one.
In order to test the basic hypothesis that the RAT solution is governed by some search process on the free association network, we study correlations between the RAT hardness and probabilities of finding a solution in various simple search strategies. We begin with a simplest possible one-step strategy: (i) choose one of the stimuli words at random, (ii) jump to its neighbor along the directed link on the free association network (the jump probability is given by the link weight), (iii) check whether the solution is correct. The probability of finding a correct answer in such a strategy is, obviously, where α enumerates different RAT problems (1 ≤ α ≤ 138), the indices s 1 α , s 2 α , s 3 α designate three stimuli words (vertices of the network) for a given problem α, and the correct response is a network vertex with the index r α . Thus, w s j α ,rα is the weight of the directed bond s j α → r α , where j = 1, 2, 3. Another simple hypothetic model is as follows. Consider a search on the network as a sequence (Markov chain) of associations: one generates a random walk trajectory with jumping probabilities equal to w ij , starting from one of stimuli words. In that case, the probability π s,r of reaching the response word from one of the stimuli is π s,r = w s,r + k =r w s,k w k,r + k,l =r w s,k w k,l w l,r + · · · = where W − is the adjacency matrix W , in which all matrix elements w r,i are set to zero, which guarantees that only first passage of the response word is counted (the inverse elements, w i,r , are kept in the matrix W − ). If the starting stimulus word is chosen at random, the resulting probability of solving the task by the proposed mechanism reads: Every search is restricted in time. Therefore, the Markov chain representing the search on the network, should be finite. Thus, it seems reasonable to truncate the maximal length of search trajectories: if the search is not completed during the allowed time interval, we stop the search and start the new one from the same stimuli. Such a strategy resembles the random search with resetting [28]. In case of a random resetting, the probability of solution in one search, given a stimulus s and a response r, can be written as follows with λ being the resetting probability.
In Fig. 2a-c scatter plots providing correlations between various search strategies and the empirical hardness, H, are shown. In particular, Fig. 2a presents the correlation between the average association weight (4) from stimuli words to the response, p λ=0 , and H; Fig. 2b -the correlations between the estimated probability of the random walk with resetting, p λ=1/2 and H; (c) -the same as (b) for p λ=1 and H. Dashed lines show slopes of the linear regression analysis, the corresponding Pearson correlation coefficient is designated by ρ. In all cases we observe sufficiently large values of ρ, which confirms our hypothesis that the RAT hardness correlates with relative locations of words in the associative network.
There is, meanwhile, another interesting question. The simplest strategies suggested above imply that solving RATs is a first-passage problem, that is to say, if the solver finds a solution, she immediately recognizes it. Is it indeed the case? One can address this question indirectly by studying correlation between the hardness of a RAT and the inverse association weigths w rα,s i α . Indeed, if the problem of recognizing an already found solution is at all relevant, it is natural to expect that it will be easier to recognize a solution for which there are strong associations from solution to the stimuli, and much harder to recognize ones for which there are no strong inverse associations. To check the existence of such a relation we plot in Fig. 2d the relation of the average inverse weight, w α = 1 Figure 2: The scatter plots of the empirical hardness of the RAT problems versus following variables: (a) the average association weight from the stimuli words to the response p 0 (α); (b) the estimated probability of random walk with resetting with λ = 1/2, p 1/2 (α); (c) the estimated probability of unlimited (λ = 1) random walk , p 1 (α), (d) the average association weight from the response to the stimuli words w α . In all figures ρ is the Pearson correlation coefficient.
Markov chain search, regardless its length. In reality, since the search is limited in time, the test subject might have enough time to try: either ten 1-step searches, or only one 10-step search. Moreover, one expects that there is a high variability how different test subjects solve problems. Thus, the question "How to maximize the probability of solving a RAT?" seems to be more reasonable than the question "How do people solve RATs on average?"

IV. ENHANCING THE PROBABILITY OF CORRECT SOLUTION
Here we address the optimal strategy of maximizing the probability of solving a RAT problem. In particular, we are interested whether the heuristic optimal strategy depends on a RAT hardness. Clearly, two simplest strategies, (1) and (4), outlined above, have significant drawbacks from that point of view. Searching only in the immediate proximity of a stimuli might by sufficient to solve easy RATs. But for hard RATs, there typically is no direct associations (direct links) between stimuli and response, thus solving a problem by such a strategy is simply impossible. In turn, searching via a random walk on a network might lead to excessively long solution times. Therefore, it seems natural to construct a search algorithm on a SWOW-EN network in a way that search trajectories, while not artificially constrained to nearest neighboring nodes of the stimuli, still are not fully random walks. One can think of these trajectories as of random walks in an external attractive potential, which guarantees that the test subject does not ever loose the stimuli words from his/her mental view. Such a strategy seems to be in agreement with the experimental data on sequences of guesses provided in [22] and discussed briefly in the Introduction.

A. Search with an attraction to the stimuli
The proposed heuristic search algorithm on SWOW-EN is organized as follows. At time t = 0 there are three stimuli (nodes of the network) s i , i = 1, 2, 3 which are considered active. At the next time step, t = 1, one of nearest neighbors of the active nodes, x, is activated with probability P (x) proportional to the sum of links from active network nodes towards it, i.e where index a enumerates active nodes, while index k enumerates all possible target nodes from the set of nearest neighbors of the active ones NN({a}).
Thus, at time t = 1 there are four activated words. If the newly activated word is the correct response, r, the search is completed. If it is not, on the next step, t = 2, we activate a new neighboring word with the probability (5) subject to the modification that now there are 4 instead of 3 active words in the set {a}. Simultaneously, we deactivate the word which was activated on a previous step, and mark it as checked, so that it will not be ever activated again. Now we check if the newly activated word is the correct response, r. If yes, we exit the search, if not, proceed recursively along with the procedure described above. At all times except t = 0 there are exactly 4 active words, and by time t exactly t different possible response words have been checked.
By such rules we mimic a search strategy according to which the activated word, if it is not a response (i.e. a correct answer), still can affect the search trajectory leading to the answer. The fact that three stimuli remain always active at each time step, while intermediate guess words are activated and deactivated during a search, guarantees that there is an effective "attraction" of the search trajectory to the set of stimuli, which can be interpreted as a permanent "memory" about the initial stimuli.
The search algorithm stops if either the correct answer is executed, or t max search attempts are exceeded. We performed 10 4 runs of that algorithm for each RAT, and counted the fraction of runs leading to the correct answer, this fraction is considered as the measure of the search accuracy. The resulting accuracy is a monotonously increasing function of t max (see Fig. 3a) and at t max = 20 the average accuracy of hard, medium and easy RATs numerically coincides with corresponding typical probabilities of correctly solved RATs in 15 seconds [27].
In Fig. 3b the scatter plot of the empirical RAT hardness versus the model search accuracy is shown. Note that the strategy presented in this section not only have more sense than those discussed in the previous one (the big majority of RATs are solvable by this procedure without relying on excesively long search trajectories), but also the correlation between experimental and model accuracy (the Pearson correlation coefficient, ρ = 0.742) is larger than for those naive strategies. We believe therefore that this heuristic strategy is a better approximation to the way people solve RATs in reality.

B. Activation algorithm with a threshold
Consider now a modification of our heuristic search algorithm described in Section IV A. It is known that many activation processes need a certain threshold (minimal activation impulse) to get trigged. In the psycholinguistic context, the importance of the association strength and the number of associations in the search processes is well known [25]. In the spirit of [25] we introduce an activation threshold to our model. We modify (5) as follows Assume that a target x can be activated only if the sum of activation signals exceeds a certain threshold, τ .
We study the predicted accuracy for RATs of different hardness as a function of the threshold, τ . For each τ , the accuracy is averaged over 10 4 simulations of all RATs in a given hardness category (easy/medium/hard). In Fig. 4a,c,e we show average predicted accuracy for different hardness categories as a function of τ (τ ∈ [0, 0.1]), while in Fig. 4b,d,f we show corresponding average times needed to solve easy/medium/hard RATs. Solvability of easy RATs (Fig. 4a) grows monotonously and approaches unity with increasing τ . It is easy to explain that: in easy RATs there is at least one strong directed link from a stimulus to the response, elimination of weak bonds makes such a strong link relatively stronger, which enhances the probability of a correct solution. The situation is different for medium and hard RATs. Eliminating very weak links (small τ ) leads to increased solvability similarly to easy RATs. That can be expected: presumably significant number of very weak bonds is just an experimental noise and its removal helps to reduce random and irrelavant trajectories. However, further increase of τ results in the accuracy of solu-tion passing through a maximum at around τ m = 0.04 (which is still below the percolation threshold corresponding to w * = 0.08 -see Fig. 1c).
The probability of a correct solution at the maximum, P (τ m ), significantly exceeds the result of both a no-threshold model and of a model where only strong links are retained. Compared to the last model (with only strong links left), the gain is by a factor of 1.3 for medium RATs and by a factor of almost 2 for hard RATs. This means that moderately weak links are instrumental for solving medium and hard RATs: eliminating these moderate links decreases the solvability, and, as shown in Fig. 4b,d,f, increases the mean length of search trajectories.

C. Enhancing the role of weak associations
The result of Section IV B gives rise to the following natural question. Is it possible to enhance the solvability of medium/hard RATs further by preferentially following moderately weak links? To check this, let us, apart from removing weak links, remove also the strong ones. That is to say, introduce a new adjacency matrixW with matrix elements where H(x) is the Heaviside function, and w max is the upper cutoff parameter. Studying the behavior of the model (6) for w max = 0.05, we conclude that in that case the maximal accuracy for hard and medium RATs significantly increases (by a factor of 1.1 for medium RATs and by a factor of 1.3 for hard RATs) and the mean length of solving trajectories essentially decreases compared to the null model with w max = 1 shown in Fig. 4c,e. The obtained result indicates the crucial role of moderately weak associations in the solution of medium and, especially, hard RATs.

V. DISCUSSION
We have applied the network approach for studying the psycholinguistic mechanisms of solving Remote Association Tests (RATs). Our treatment is based on available open data on the network of free associations in English language (SWOW-EN) [15], and on standardized concept of hardness of RATs [27].
First, we have quantitatively characterized the correlation between the hardness of a particular RAT and the location of stimuli on the directed network of free associations. We have argued that hardness of a RAT is strongly correlated with the aggregated weight of bonds from the stimuli to the response, as well as with the aggregated weight of multistep chain of consecutive associations. On the other hand, we have not found any significant correlation between the RAT hardness and weights of reverse (response → stimuli) bonds.
Secondly, we have investigated the efficiency of RAT solutions using an activation algorithm which resembles the random walk in a potential well with attraction to the stimuli words of the RAT. We show that while for easy RATs the solution is mostly governed by strong associative bonds from stimuli to response, the solution of medium and especially hard RATs, is mostly determined by moderately weak bonds, i.e. bonds with weights about w = 0.04 ± 0.01. Indeed, introducing an activation threshold we have seen that while neglecting very weak bonds is beneficial for the solution efficiency, neglecting moderately weak bonds suppresses the efficiency of finding the correct response. Even more, one can further enhance the solution probability for medium and hard RATs by removing strong bonds with weights larger than w = 0.05. We see that "very weak" and "moderately weak" bonds behave differently in our consideration. That could be related to the fact the accuracy of measurement of very weak bonds in a free association network experiment is rather poor, and the significant number of very weak bonds is just experimental noise. So, the efficiency of the solution might be additionally increased by replacing the experimental free association network with a "cleaned-up" one in the spirit of [29]. From a more general perspective, the importance of weak associations in the solution of RATs seems to be an example of the ubiquitous importance of weak ties in social sciences [30].
To conclude, let us notice that there are numerous other standard tools of network treatment whose potential application to linguistic and psychological problems seems very promising. Spectral analysis is among the most effective modern approaches. Since the majority of semantic networks are directed, the eigenvalues of corresponding adjacency matrices are complex. The simplest objects attributed to the graph (network) spectrum are the spectral density and the level spacing distribution. Recently, the standard tools for the investigation of real spectra of symmetric adjacency matrices have been extended to complex spectra of non-symmetric matrices. Hence, we are hopefully well equipped to attack the spectral structure of RATs on directed networks. The corresponding spectral approach would provide the answers to questions associated with dynamic problems on directed networks, such as diffusion, localization and synchronization. The spectral analysis of a RAT problem will be presented in the forthcoming publication [31].