Modeling learner-controlled mental model learning processes by a second-order adaptive network model

Learning knowledge or skills usually is considered to be based on the formation of an adequate internal mental model as a specific type of mental network. The learning process for such a mental model conceptualised as a mental network, is a form of (first-order) mental network adaptation. Such learning often integrates learning by observation and learning by instruction. For an effective learning process, an appropriate timing of these different elements is crucial. By controlling the timing of them, the mental network adaptation process becomes adaptive itself, which is called second-order mental network adaptation. In this paper, a second-order adaptive mental network model is proposed addressing this. The first-order adaptation process models the learning process of mental models and the second-order adaptation process controls the timing of the elements of this learning process. It is illustrated by a case study for the learner-controlled mental model learning in the context of driving a car. Here the learner is in control of the integration of learning by observation and learning by instruction.


Introduction
To describe the mental processes involving learning and problem solving in humans, often mental models are used, e.g., [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. As a specific case, mental models of devices and their usage are formed to be able to adequately use these devices, e.g., [17,18]. It is an interesting challenge to determine how mental models are formed or learnt, and how to control such learning processes. Computational models which represent such processes are almost absent, e.g., [19][20][21]. One exception is [8] in which a production rule modeling format is used to simulate students' construction of energy models for learning physics. In general, however, research into how mental models develop or are learnt and how that is controlled, is hard to find.
The current paper proposes such a computational model for mental model learning and its control, based on multi-order adaptive network-oriented modeling [22,23]. It is illustrated by a case study for learning how a car works and how to drive it. A driver's mental model and how it can be learnt in an effective manner can be a basis for the designing virtual pedagogical agents, and for support of a driver by adaptive automation in a car. Network-oriented modeling for adaptive networks [22][23][24] is an effective approach to model the adaptive mental processes as an adaptive interplay of mental states. Here the connections between the mental states change based on specific adaptation principles such as Hebbian learning [25]. Learning mental models involves such adaptation, but it also involves controlling this learning; the latter is a form of second-order adaptation. The network-oriented modeling approach from [22][23][24] covers such multi-order adaptive processes.
Then the picture is that a mental model can be modeled as a base network and learning the mental model can be modeled as (first-order) adaptation of this base network. In addition, the controlling of this learning process can be modeled as second-order adaptation, which adapts the first-order adaptation. In this way, a three-level second-order adaptive network architecture for mental model development is obtained. It is illustrated here for learning a mental model, by a learner-controlled interplay of observational and instructional learning in a case study for learning how a car works and how to drive it.
Part of this work was addressed in a preliminary form in [26]. However, the current paper extends this by more than 110%. In the current paper, in particular, (1) the description of the computational model and its background (Sections 2 and 4) has been extended much so that now a much more detailed design description is provided and (2) also an extensive analysis of equilibria of the model is now described (Section 6) which has been conducted to obtain a more solid basis for the implemented model by verification (Section 6 on equilibrium analysis of the model in the current paper is completely new). All this was not addressed in [26].
In the paper, Section 2 presents a brief literature overview. In Section 3, the design of the proposed second-order adaptive network architecture is presented, addressing the controlled interplay between observational and instructional learning of mental models. Section 4 presents a refinement of this architecture, addressing the case study of learner-controlled integration of observational and instructional learning. Simulation results for an example scenario can be found in Section 5. In Section 6, a detailed analysis of equilibria is addressed by which verification of the model was performed. Section 7 is for discussion.
Within educational science, the term model-based learning is used for learning based on constructing coherent mental models [1,9,[36][37][38]. Buckley formulates this as: 'Model-based learning is a dynamic, recursive process of learning by building mental models.' (Buckley [1]) More specifically, the following elements play an important role in such learning.

Learning by observation
Observational learning occurs when observation of others is the main source for the formation of a mental model. For example, the trainees see someone else perform a target behavior and then attempt to imitate or reenact it; e.g., [36,39]. Demonstration is an often applied method to let others learn a specific motor task. This type is called observational motor learning. Empirical research has found that observational motor learning improves action perception and motor execution. From a neuroscientific perspective, mirror neurons are considered responsible for the ability to learn by observing and imitating others, e.g., [40][41][42].

Learning by instruction
Instructional learning assumes that instructions from an expert instructor are supportive for learning. For a beginner, only learning by discovery or observation often involves much trial and error; e.g., [14,15]. Hence, instructions from an expert are considered a useful addition to build effective mental models. This is supported by a format of scaffolded model-based learning in which many supporting actions such as prompts, questions, hints, stories, conceptual models, visualizations are performed to facilitate a learner's progress during learning tasks, e.g., [43].

Learner-controlled learning
For the integration of observational and instructional learning, control is a crucial element. It is discussed, for example, by Gibbons and Gray [44] that instructions serve learning processes best when the learner has control over them. The scaffolded model-based learning format mentioned above supports this. Kozma [28] suggested that individuals actively use external information sources for mental model formation. Learners are sensitive to characteristics of the learning environment such as the availability of certain information at a given time, the structure of the information and the ease with which it can be accessed. Thus, the learner's need for instruction and the ease acquiring it are crucial for building effective mental models. In learning methods based on guided discovery, the learner seeks for information to complete the initial mental model. This requires the learner to be proactive and in control of the learning process. In contrast, in expository teaching methods, an instructor aims at directing the mental model formation by providing adequate information according to some temporal sequence [33]. Meela,and Yuenyong [29] demonstrated in their study that Model-Based Inquiry (MBI) can support a student's mental model formation in scientific learning. MBI focuses on students' formulations of questions and procedures [31]. Feedback on performance are a significant factor in learning [1,45]; many studies support that feedback is crucial in skill acquisition [46].
Thus, in the adaptive network model introduced in the current paper, the learner can seek for instructions whenever it is useful or needed or as a feedback about what she/he has learnt by observation. The control for this was modeled by control states for instructions on a separate level within the adaptive network model. Using this, the learner controls timing and content of incoming information by seeking it only when it seems appropriate to her/him. More specifically, in Section 3 it is shown how a learning process based on mental models can be modeled by a generic three-level adaptive network architecture. In this architecture, the mental models themselves can be modeled at the base level as networks. In addition, during learning the mental models change; this can be modeled by (first-order) network adaptation at a second level. Control of the learning process is a form of adaptation of the learning process; this can be modeled at the third level addressing adaptation of the first-order adaptive network for the learning process: second-order network adaptation.

Network architecture for controlled mental model learning
In this section a global view on the architecture of the introduced network model for learnercontrolled mental model learning is discussed. Following with what was concluded in Section 2, this architecture must cover the following three types of processes in an integrated manner: 1. The mental models themselves described by base networks 2. Learning as change of mental models described by first-order network adaptation 3. Control of learning processes described by second-order network adaptation.
Using the notion of self-modeling network (also called reified network) [22][23][24], these three description levels indeed can be modeled adequately by a three-level second-order adaptive network architecture as depicted in Fig 1. Here, for any specific application each plane contains a specific network and the specific upward and downward connections define the interactions between the different levels.
More specifically, adding a self-model to a network model is done in the way that for some of the network structure characteristics additional network states (self-model states) are added. In the network-oriented modeling approach [23] applied here, in particular for nodes (also called states) X and Y, the following network structure characteristics are used: • ω X,Y for connectivity (connections X ! Y with their connection weights) • γ i,Y and π i,j,Y for aggregation (combination function choices and their parameters for each node Y) • η Y for timing (speed factors for each node Y) Then to obtain adaptive networks, self-model nodes can be added to the network for any of these characteristics to make it adaptive: • Connectivity self-model Self-model nodes W X,Y are added representing connection weights ω X,Y • Aggregation self-model Self-model nodes C j,Y are added representing combination function weights γ i,Y and/or selfmodel states P i,j,Y representing combination function parameters π i,j,Y • Timing self-model Self-model nodes H Y are added representing speed factors η Y The notations W X,Y , C i,Y , P i,j,Y , H Y for the self-model states indicate the referencing relation with respect to the characteristics ω X,Y , γ i,Y , π i,j,Y , η Y : here W refers to ω, C refers to γ, P refers to π, and H refers to η, respectively. These W, C, P and H notations are considered to indicate the roles these W-, C-, P-and H-states play in the network, so that at the base level the values of them are used for the intended characteristics. Sometimes slightly different notations are used, for example, by adding the letter R for representation to emphasize that it represents some characteristic of the network: RW X,Y , RC i,Y , RP i,j,Y , RH Y . This construction can easily be applied iteratively to obtain multiple levels of self-models. For example, by adding a secondorder self-model state H WX,Y for W X,Y , the adaptation speed η WX,Y of W X,Y can be made adaptive. Therefore second-order adaptation plays an important role here to control adaptive processes which can be easily modelled as well.
The more specific adaptive network model described in Section 4 will be a refinement of the overall network architecture depicted in Fig 1. Tables 1 and 2 summaries the generic types of states and connections used at and between the three levels within this architecture. Note that the colours used in these tables indicate to which level the states belong, as they correspond to the colours of the planes in the 3D figures such as Fig 1. At the base level, the learner's (subjective) mental model is defined by connections between base states BS X ! BS Y ; in addition, the connections between observation states OS X ! OS Y define the (objective) relations in the real world. Note that, following the quote of Craik [6], p. 51 in Section 2, the causal relations BS X ! BS Y defining the mental model, are in a one-toone correspondence with the causal relations OS X ! OS Y between the (observed) world states. Therefore, as can be seen in Figs 2 and 3, within the base plane the subnetwork for the BSstates has a connectivity structure that is isomorphic to the connectivity structure of the subnetwork for the OS-states.
Moreover, the connections from observation state to base state OS Y ! BS Y define the mirroring process by which the observations affect the learner's own states.  At the first-order self-model level, the self-model of the mental model from the base level is modeled by states RW X,Y that explicitly represent the connection weights of the mental model as used in the processing of this mental model at the base level. At first sight, this may seem a double (and therefore redundant) representation of the same mental model, but to handle the process of learning of this mental model well, this explicit 'additional' representation in the form of the mental model's self-model is crucial, because when adaptivity is addressed, the network characteristic (in this case connection weight) is no longer one static parameter value but becomes a variable with values that change over time (as happens for every dynamical system model for which some of its parameters are made adaptive). The way of conceptualization applied here is used more often within neuroscience as a distinction between (1) activation propagation through a network of neurons, (2) plasticity of this network, and (3) metaplasticity as control over this plasticity; e.g., [25,[47][48][49][50]. Inspired by this, in the current paper, (1) and (2) of this form of conceptualization are used to model use and adaptation of a mental Table 2. Types of connections in the introduced adaptive network architecture.

Intralevel connections
The learner's (subjective) connections between the base states, indicating the current mental model of the learner The real world's (objective) connections between the observation states, indicating the realworld process Mirroring connections defining the mirroring process for the base states. These connections model the effect of observations on the learner.
Being informed by the instructor: the communicated instruction concerning the connection from X to Y. These connections IS X,Y ! IW X,Y can be controlled by control states CIW X,Y at the second reification level Integration of knowledge obtained by observational learning  (3) for the control over this adaptation, as will be explained in more detail below.

Interlevel connections
In addition to the states RW X,Y , also self-model states LW X,Y and IW X,Y are used as part of the mental model's self-model. Here, LW X,Y represents what has been learnt about the connection from X to Y within the mental model by observational learning and IW X,Y represents what has been acquired from instructional learning.
The intra-level connections LW X,Y ! RW X,Y and IW X,Y ! RW X,Y model the integration within RW X,Y of what is learnt by observational learning and what is learnt by instructional learning. The connections IS X,Y ! IW X,Y model the instruction itself: the communication actions from instructor to learner.
These communication actions from the instructor to the learner depend on control. To this end, in the second-order self-model, states CIW X,Y are included. Such a state indicates that the learner wants to hear the instructor's knowledge about the connection from X to Y. It is assumed that the instructor will respond accordingly. This happens by giving CIW X,Y the role of connection weight representation W ISX,Y,IWX,Y for the intended connection IS X,Y ! IW X,Y from the instructor to the learner. This works by the processing of the first-order self-model for the connection weight ω ISX,Y,IWX,Y the value of CIW X,Y is used. Therefore, as long as the value of CIW X,Y is 0, no communication takes place, while as soon as this value of CIW X,Y is 1, this communication does take place. This represents the way in which that communication becomes controlled. The effect of activation of CIW X,Y can be interpreted in the sense that the communication channel from the instructor state IS X,Y to the learner state IW X,Y is opened, so that this information is transferred from the instructor state IS X,Y to the learner state IW X,Y . This opening of the channel IS X, The only remaining piece then is to determine when exactly CIW X,Y should become active. This is done via its incoming observational learning monitoring connection LW X,Y ! CIW X,Y which makes that the control state CIW X,Y will become active depending on the corresponding LW-state LW X,Y . This models that the part where the learner asks the instructor for verification and confirmation of what was just learnt by observation (and the learner does not ask anything about what not yet has been observed). A more detailed explanation of the network's connectivity for a specific case study can be found in Section 4.

Detailed description of the second-order adaptive network model for a case study
In this section, a more detailed description can be found of the designed second-order adaptive network model for a realistic case study, that was created for illustrative purposes. It is described by the following scenario: Person A has almost no knowledge about a car's components and their interplay and how to drive a car. This person's mental model of the car and driving it has to be learned during driving lessons. During person A's first driving lesson, instructor B demonstrates how to start a car and get it moving. The observation of B makes that A learns an initial mental model of the car and how it can be operated it (observational learning). During A's further learning, an iterative process of extending and/or modifying the mental model takes place, leading to a more accurate and complete mental model. Besides observational learning, also learning from instruction plays an important role (instructional learning). This instructional learning takes place by incorporating incoming information communicated by B. In this scenario this instructional learning only takes place upon request of the learner (learner-controlled instructional learning), as a form of verification and consolidation after A learnt about it by observational learning.
The network-oriented modeling approach for adaptive networks [22][23][24] used here has been briefly introduced in Section 3. Some more details will follow here. Recall that for adaptive networks the notion of self-modeling network is used. For example, for adaptive connectivity characteristics, states RW X,Y are added representing adaptive connection weights ω X,Y . They form a self-model of the network's own structure in the form of a subnetwork within the network. To graphically distinguish them from states at the level of X and Y, these self-model states are depicted at one level higher (e.g., see the blue planes in Figs 1-3 with representations of weights of adaptive connections from the base planes).
As in this case the learning is controlled, it is adaptive itself, which is depicted by the third level (purple plane) for second-order adaptation in Figs 1-3, which include second-order reification states CIW X,Y that represent the weight of the connection IS X,Y ! IW X,Y of the middle level (see Section 3). The structure formed by the lowest two (interacting) levels distinguish the two types of processes (and their interaction): using the mental model by changing the BSstates represented at the base level (used for internal simulation of the mental model) versus adjusting the mental model by changing the representations at the self-model level. The different types of states for the detailed model are explained in Tables 4, 5, 6. Fig 2 depicts the connectivity for only a part for a small number of the states for better understanding. Fig 3 shows the connectivity for the complete network model. The second-order self-model level (the purple plane) enables to control the learning process by changing some of the intra-level connections within the first-order self-model (which in turn affects the dynamics of these first-order self-model states), based on the second-level reification CIW-states (control states); this is used to model learner-controlled instruction, as discussed in Section 3.
The conceptual representation of a network model as mentioned above can easily be transformed in an automated manner into a numerical representation using a dedicated modeling environment; within the software, this results in difference equations ( [22,23], Chapter 9):

PLOS ONE
Here the overall combination function c Y (‥) for state Y is the weighted average of available basic combination functions c j (‥) (in the Combination Function Library) by specified weights γ j,Y and parameters π 1,j,Y , π 2,j,Y of c j (‥) for Y: In case of self-models, the self-model states define the dynamics of state Y in a canonical manner according to (1), whereby the adaptive characteristics among ω X,Y , γ i,Y , π i,j,Y , η Y are replaced by the state values of self-model states W X,Y , C i,Y , P i,j,Y , H Y at time t, respectively (for more details, see [22,23]).
In the model presented here, for the states the following combination functions were used, all generating values in [0, 1] (assuming that their arguments are in [0, 1]). The Euclidean combination function eucl n,λ (V 1 , . . ., V k ) where n is the order (any positive number), and λ the scaling factor is defined by: eucl n;λ ðV 1 ; . . . ; V k Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where V 1 , . . ., V k 2 [0, 1] indicate the impacts ω Xi,Y X i (t) from the states X 1 , . . ., X k from which Y has an incoming connection. In addition, the advanced logistic sum combination function alogistic σ,τ (. . .) is used: with steepness σ and threshold τ (with similar V 1 , . . ., V k as above) Table 3 provides an overview of the base states used to model the mental model (the BSstates) and the base states for the observations (the OS-states). Table 4 summarises the firstorder self-model states for the learner's learning (the RW-, LW-and IW-states) and Table 5 addresses the instructor's Information States (the IS-states).
The Hebbian learning combination function hebb μ (‥) for learning of the connection from state X to state Y, and used in particular for the LW-states is defined by where μ is the persistence parameter, V 1 stands for state value X(t), V 2 for Y(t), and W for the learnt connection weight reification state value LW X,Y (t), which all are in the [0, 1] interval. Hebbian learning is a well-known adaptation principle addressing adaptive connectivity, which can be explained by: 'When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.' [25], p. 62 This is sometimes simplified (neglecting the phrase 'one of the cells firing B') to: 'What fires together, wires together' [51,52]

PLOS ONE
In formula (5), the condition 'what fires together' is modeled by the part with the product V 1 V 2 , as that is high if both states X and Y have a high value and low otherwise. The factor (1-W) provides a kind of normalisation that makes that the value for W = LW X,Y (t) does not exceed 1. The term μW in (5) models persistency, where μ indicates the fraction of the previously learnt value W that persists per time unit; for example if μ = 0.9, then every time unit 10% is lost (also called extinction).
In Table 6 an overview can be found of the second-order self-model states. They all are CIW-states for the control of the instructional learning of certain connection weights. For these CIW-states the logistic sum combination function alogistic σ,τ (V 1 , . . .,V k ) is used.

Simulation results for an example scenario
The second-order adaptive network model was simulated using the dedicated software environment implemented in Matlab as described in [23], Ch 9, to study the learning of a mental model for a car's functioning and driving it; see Fig 4 and further. For the simulation Δt = 0.5 was chosen, the total time is 800 (so 1600 simulation steps); the time scale is left abstract here. In the S1 Appendix section the full specification of the network characteristics can be found. The speed factors for the BS-states were set at 0.4, for OS-states at 0.05, for IW-states at 0.1,   (4), all LW-states the Hebbian learning function (5), the IW-states the logistic sum function (4), the RW-states the first-order Euclidean function (3), and the CIW-states the logistic sum function (4). All BS-states have initial value 0. All OS-States have initial value 0, except the first OS-State X 15 which has an initial value of 1. For all the IW-, LW-, and RWstates, the initial value was set at 0.1.
The IS-states have constant value 1, as they refer to the knowledge of the instructor (see also Fig 5). Note that it has been specified in such a way that only one of the IW-state or LW-state is not enough to get a related RW-state with a high value close to 1. A typical pattern is that first, based on a learnt LW-state only, the RW-state gets a value somewhere in the middle of the 0-1 interval, and only after instructional learning making the IW-state high, the RW-state value increases to a high value close to 1. This makes that first based on the value of LW-state (i.e., by observational learning), the second-order CIW-state is activated (Fig 6) which in turn makes the IW-state getting a value close to 1 (Fig 7). Only after the learner seeks instructional information making the IW-state high, the RW-state value increases to 1. This shows that the learner actively engages seeking more information to confirm the accuracy of what he/she has learnt by observation. PLOS ONE Table 6. Explanation of the second-order self-model states for control of the mental model learning in the network model.
The learner hence controls the amount of information (s)he needs in addition to complete her/his learning based on her/his current level of understanding by own observation (see also Section 3). The results indicated in Fig 5 display the connection between two BS-states X 6 and X 7 . Here it can be seen that, together with the value of X 6 becoming 1 at time 210, the OS-state X 21 affects the value of X 7 together with RW-state X 46, which combines the weights of the related LW-and IW-states. The CIW-state controls the weight of IW-state according to the LW-state's weight. State X 7 reaches value 1 at time 250 by an S-curve. The IS-state representing knowledge of the instructor remains at 1 all the time (a knowledgeable instructor).
As a form of evaluation, in Figs 6 and 7 it is displayed how the activation of each CIW-state indeed follows the activation of the corresponding LW-state, and how in turn the activation of the CIW-state indeed is followed by the corresponding IW-state. This confirms that the model displays the intended behavior that first observational learning takes place, after which there is a learner initiative to request corresponding instructional information, and appropriate instructional learning indeed takes place after that.
The simulation results presented by these figures are in accordance with and illustrate the educational science literature such as [28,44] (as discussed in Section 2) on the use of learnercontrol of the timing of instruction.

Verification of the network model by equilibrium analysis
To verify whether the introduced and implemented self-modeling network model behaves as expected from its design specification, a number of network states equilibrium values were analyzed for the example simulation.

Criterion for equilibria of self-modeling network models
A stationary point for a state Y occurs at time t if dY(t)/dt = 0. An equilibrium occurs when all states have a stationary point simultaneously. From Eq (1) in Section 4, for any state Y for being stationary the following general criterion in terms of the network characteristics can be derived: where with X 1 to X k the states from which Y gets its incoming connections. The equation in (6) is also called an equilibrium equation. As a test, using the example simulation presented above for the apparent equilibrium at time 800, for 68 of the 113 network states it has been verified (independent of the implemented model) that the aggregated impact aggimpact Xi (t) matches the state value for equilibrium values observed in the example simulation. In particular, this has been done for all 4x17 = 68 LW-, IW-, RW-, and CIW-states. The results are discussed in Sections 6.2 to 6.4.

Equilibrium analysis of the LW-states and the CIW-states
The LW-states use the Hebbian learning function hebb μ (V 1 , V 2 , W) as combination function. Using this function, by (5) for any LW-state Y holds where V X1 , V X2 are the state values of the connected base states X 1 and X 2 and V LW the state value of LW-state Y. So, for this case the equilibrium equation in (6) becomes Assuming the denominator nonzero, this can also be rewritten into (also see [40], Section 3.6.1): For the example simulation, it was set μ = 1; therefore (8) is equivalent to In the simulation, at t = 800 all LW-states have value 1 in a precision of 15 digits (and the values V X1 , and V X2 are always nonzero). Therefore, for all LW-states criterion (6) is fulfilled with deviations < 10 −15 . This provides one piece of evidence that the implemented network model is correct with respect to its design specification.
The 17 CIW-states use the combination function alogistic 10,0.4 (‥) described by (4) and the weight of the connection from the related LW-state to the CIW-state is 1 so where V LW is the value of the LW Therefore, for this case the equilibrium equation from criterion (6) is where V CIW is the value of the CIW-state. Now, as already found above at t = 800, for all LW-states V LW = 1 in a precision of 15 digits, and alogistic 10,0.4 (1) = 0.997482089170521. Moreover, at t = 800 it is found that V CIW = 0.997482089170520 for all CIW-states. This makes a deviation of 0.997482089170520-0.997482089170521 = -10 −15 . This very small deviation provides a second piece of evidence that the implemented network model is correct with respect to its design specification.

Equilibrium analysis of the IW-states
The IW-states use the combination function alogistic 10,0.7 (‥) described by (4) and has incoming connections from themselves (with weight 1), and from the related IS-state. Moreover, the connection from this IS-state to the IW-state has weight represented by the related CIW-state, whereas the state values of the IS-states are constant 1. Therefore, it holds where V IW is the value of the IW-state itself and V CIW is the value of the CIW-state. So, for this case the equilibrium equation from criterion (6) is These values have been computed (independent of the implemented model), as shown in Table 7. Here the second and fifth column display the values for the CIW-and IW-state from the simulation at t = 800, and the values in the third and fourth columns were calculated based on that. The fourth column indicates the left hand side of the above Eq (14), the fifth column the right hand side and the sixth column the difference between the two. It turns out that all deviations are < 10 −7 , which is a third piece of evidence that the implemented network model is correct with respect to its design specification.

Equilibrium analysis of the RW-states
The RW-states use the combination function eucl 1,2 (.,.) described by (3), which makes the average of its two arguments. They have incoming connections with weight 1 from the related LW-state and IW-state. Therefore it holds where V LW is the value of the LW-state and V IW is the value of the IW-state.
Then for this case the equilibrium equation from criterion (6) is where V RW is the value of the RW-state. Like above, these values have been computed (independent of the implemented model) from the simulation values at t = 800 for the IW and LW-states, and compared to the simulation values of the RW-states as shown in Table 8. It turns out that all deviations are < 10 −8 , which is a fourth piece of evidence that the implemented network model is correct with respect to its design specification.

Discussion
In this paper, a computational network model was presented for controlled learning of a mental model. Learning of a mental model often involves observational learning and instructional

PLOS ONE
learning. To obtain an effective learning process, appropriate timing of these types of learning is needed, which requires some form of control. For such control, a mental model adaptation process itself has to be made adaptive as well, which is a form of second-order adaptation for this mental model. So, all in all, a mental model can be used in three different manners: (1) it is executed to draw conclusions from it, (2) it is adapted to learn and improve it, and (3) these adaptation processes are controlled. These three properties, and their interplay require three different types of modeling that interact with each other. In this paper, the network-oriented modeling approach for self-modeling adaptive networks described in [23] was applied to address these processes for mental models. A generic three-level self-modeling network architecture introduced in [26] was applied to support this. Based on this general architecture, a second-order adaptive mental network model was presented, in which the base level includes a mental model as it can be used, the second level models a first-order adaptation process for the learning process of this mental model and the third level models a second-order adaptation process that controls the focus and timing of the types of learning. It has turned out that the network self-modeling mechanism (also called network reification) fits very well to what is needed for (1), (2) and (3) for mental models. The idea of selfmodeling networks was originally (in [22,23]) mainly inspired by the extensive neuroscience literature on plasticity versus metaplasticity in the brain; e.g., [47][48][49][50]. Paper [26] was the first paper that demonstrated the usefulness of the same conceptualisation for the social domain of teaching and learning. As far as the authors know, there is no other computational model covering (1), (2) and (3). The introduced network model was illustrated for a case study of learner-controlled mental model learning for how a car works, and driving it. Here the learner is in control of the use and timing of observational learning and instructional learning. Using the dedicated software environment described in [40], Ch. 9 the network model was implemented and simulated. By this it was shown to work as expected from the literature. Moreover, by verification of the implemented model based on equilibrium analysis (for a representative test set of 68 of the 113 network states), it was found that all deviations are <10 −7 (see Section 6). This provides strong evidence that the implemented model is correct with respect to its design specification. Further validation by comparison to empirical data would be interesting for future research; currently, such data are not available to the authors.
Much literature exists which describes the learning of mental models and was discussed in the paper. However, computational models addressing it are very rare; a few exceptions are [13,19,20,53]. For example [20], addresses simulation of students' construction of energy models in physics in a production rule modeling format and in [13] the PDP modeling format was applied to model mental models. In [53] a mental God model was addressed, and in [19] the focus is on model-based learning to drive a car. In all four cases [13,19,20,53] no control of the learning processes is modeled, which is a main difference with the current paper, where the focus is on the control and this is addressed by designing a second-order adaptive mental network model. In the meantime, the self-modeling network modeling perspective [22,23] and the general three-level second-order adaptive network architecture introduced in [26] and also described in more detail in the current paper to model dynamics, adaptation of control for mental models, has been found to be very general and applicable for handling mental models in many other application cases where mental models are used. For example, in [54] this architecture and also the learning mechanisms contributed by [26] have been applied successfully to model how shared mental models are used in hospital teamwork. A more detailed overview of this general approach to mental models originating to a large extent in [26] and many of its applications will be presented in the forthcoming book [55].
However, note that the literature on mental models is very diverse. Therefore, although having turned out applicable in many cases, it cannot be claimed that the way in which mental models are addressed from a network-oriented perspective here, would be applicable for all forms of mental models addressed in the literature.
Supporting information S1 Appendix. Full specification of the second-order adaptive network model. (DOCX)