Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures

We investigate fundamental decisions in the design of instruction set architectures for linear genetic programs that are used as both model systems in evolutionary biology and underlying solution representations in evolutionary computation. We subjected digital organisms with each tested architecture to seven different computational environments designed to present a range of evolutionary challenges. Our goal was to engineer a general purpose architecture that would be effective under a broad range of evolutionary conditions. We evaluated six different types of architectural features for the virtual CPUs: (1) genetic flexibility: we allowed digital organisms to more precisely modify the function of genetic instructions, (2) memory: we provided an increased number of registers in the virtual CPUs, (3) decoupled sensors and actuators: we separated input and output operations to enable greater control over data flow. We also tested a variety of methods to regulate expression: (4) explicit labels that allow programs to dynamically refer to specific genome positions, (5) position-relative search instructions, and (6) multiple new flow control instructions, including conditionals and jumps. Each of these features also adds complication to the instruction set and risks slowing evolution due to epistatic interactions. Two features (multiple argument specification and separated I/O) demonstrated substantial improvements in the majority of test environments, along with versions of each of the remaining architecture modifications that show significant improvements in multiple environments. However, some tested modifications were detrimental, though most exhibit no systematic effects on evolutionary potential, highlighting the robustness of digital evolution. Combined, these observations enhance our understanding of how instruction architecture impacts evolutionary potential, enabling the creation of architectures that support more rapid evolution of complex solutions to a broad range of challenges.


Introduction
Over the past 50 years, the field of evolutionary computation has produced many successful tools, including genetic algorithms [1], genetic programming [2], and evolutionary strategies [3] (for a recent overview, see [4]). These evolutionary algorithms abstract the evolutionary process by alternating between selecting the most promising prospective solutions from a diverse population, and randomly mutating copies of those solutions to create further diversity. Evolutionary algorithms now rival human designers in wide-ranging problem domains, from controlling finless rockets [5] to automatically patching software bugs [6]. However, these methods abstract the evolutionary process and tend to be limited in the complexity of the solutions they produce while also losing some of the inherent robustness that occurs in naturally evolved organisms.
Digital evolution is a type of linear genetic programming that provides a rich environment to study evolution in a more natural environment; populations of self-replicating computer programs must survive in a computational world where they are subject to mutations, environmental effects, interactions with other programs, and the pressures of natural selection [7]. These ''digital organisms'' evolve in more of an unconstrained manner, enabling biologists to explore questions that are difficult or impossible to study in natural systems (e.g., [8][9][10][11]). In turn, these more nuanced systems have proven their ability to come up with effective algorithms for practical applications, such as distributed problem solving [12,13], software models for dynamic systems [14], and robot movement and decision making [15][16][17][18]. In short, digital evolution is becoming an essential model system for studying evolutionary mechanisms, while discerning these natural processes is equally crucial to constructing flexible and resilient computing systems [19].
The instruction set architecture is the core of every instance of digital evolution, defining the characters and syntax of the genetic language, as well as the virtual hardware upon which that language executes. The design of the instruction set architecture within an evolvable system plays an important role in influencing the robustness and flexibility of evolved solutions [20]. As the scope and complexity of research performed using digital evolution expands, it is important to ensure that our language is as general purpose as possible, as well as to understand how changes to architecture impact the evolutionary potential of the system. Our previous work has shown that digital evolution is surprisingly robust to poor design decisions [21]. Here we have investigated a series of engineered instruction set architecture modifications built upon the underlying von Neumann architecture of Avida, progressively identifying and integrating architectural features that enhance evolutionary potential. In order to test the effect of each modification, we utilized seven computational environments representing a wide range of desired capabilities for solving primarily static optimization problems. We evaluate the final results of experiments performed in each environment with each instruction set modification.

Methods
We performed all experiments using executables based on Avida version 2.12, with modifications to support each of the new instruction set architectures that we investigated. The full Avida 2.12 source code is available for download, without cost, from http://avida.devosoft.org/. We tested each instruction set architecture with 200 replicate populations in each of seven computational environments. The populations consisted of 3,600 individuals on a 60660 toroidal grid, and were run for 100,000 updates, where an update is a unit of time in Avida equal to an average of 30 instructions executed per living organism; in practice this translates to a widely varying number of generations depending on the evolved complexity of the digital organisms (somewhere between 500 and 100,000 generations; a mean of 12,423 for the experiments presented here). Organisms were subject to mutations at a standard substitution rate of 2:5|10 {3 per site in the genome, along with a 5|10 {4 probability each for a single instruction insertion or deletion per site in the genome. All substitutions, insertions, and deletions occurred upon division of the offspring. We seeded each population with a single ancestral organism capable only of self-replication. Small variations in the initial genotype used in each architecture were often necessary, due to functional differences among the instruction sets, but we limited these variations specifically to neutral labeling instructions (nop-sequences, as described below) used in self-replication.

Instruction Set Architectures
The HEADS instruction set architecture is the default virtual CPU configuration in all versions of Avida 2.x, consisting of a Turing complete, von Neumann style architecture. The virtual hardware that implements this instruction set is designed to operate on a genomic program within a circular memory space (as shown on the left side of Figure 1). By default, it has three registers, each capable of holding a 32-bit number, two stacks that can each hold ten values, four heads that point to positions in the genome, input and output channels, and the ability to execute 26 standard instructions (see Table 1 for a complete glossary of instructions). The default instructions include three no-operation instructions (nops): nop-A, nop-B, and nop-C, which can serve to modify the default behavior of other instructions, but do not otherwise affect the state of the virtual CPU when executed by themselves. Most instructions observe the value of one subsequent nop instruction and alter their behaviors accordingly. For example, the inc instruction increments the BX register by default, but if it were followed by a nop-A it would increment the AX register instead. In addition to instruction modification, nop instructions can serve as patterns that act as labels for genome locations. Label matching uses cyclic complementary matching, where nop-A matches to nop-B, nop-B matches to nop-C, and nop-C matches to nop-A.
The HEADS instruction set has five flow-control instructions: hsearch, jmp-head, mov-head, gethead, and set-flow. Each of these instructions can affect the position of one of the four architectural heads: the instruction pointer (IP), READ head, WRITE head, and FLOW head. The h-search instruction searches the genome, starting from the first executed instruction in the genome, for a label (a sequence of one or more nop instructions) that matches the cyclic complementary label that follows the instruction, placing the FLOW head after the matching sequence; if the sought-after label is not found, it places the FLOW head on the instruction immediately subsequent to itself. Thus if the h-search instruction were followed by nop-A nop-A nop-B it would search for the genome for the sequence nop-B nop-B nop-C. This is one of only two instructions in the default HEADS instruction set that is affected by more than one nop instruction, the other being ifcopied described below. The mov-head instruction moves the IP to the current location of the FLOW head. The jmp-head instruction shifts the position of the IP by the amount specified in a register. The get-head instruction places the current location of the IP into a register. Finally, the set-flow instruction moves the FLOW head to the absolute genome location specified by the value in a register.
The HEADS set also contains three conditional instructions that will skip a subsequent instruction if the test condition is false. The two basic conditional instructions, if-n-equ and if-less, perform a comparison between two registers. The if-copied instruction interacts with the READ head, evaluating to true if the last sequence of instructions copied matches the complement of the label that follows the instruction. This instruction is primarily for use in conjunction with the replication instructions described below to identify the portion of the genome most recently copied.
Seven arithmetic and logic operations are supported in the default HEADS instruction set: add, sub, inc, dec, nand, shift-l, and shift-r. All of these instructions operate on values stored within registers and accept a single nop modifier, which changes the source and destination registers depending on the operation. Five instructions in HEADS facilitate data movement and environmental interaction. The push, pop, and swap-stk instructions all operate on the two stacks within the architecture. Only one stack is accessible at a time, with the swap-stk instruction toggling the currently active stack, while push and pop copy numbers from registers to the top of the active stack and viceversa. Each of these instructions can be nop-modified to specify which register should be used. The swap instruction exchanges the values of two registers. The IO instruction interacts with the environment of the digital organism, outputting the current value in a register and replacing it with a value from the environmentally controlled input buffer. Values output via this instruction are evaluated by the environment, potentially triggering a reward or other action if they match one of the tasks in the environment as explained below.
Lastly, there are three instructions that facilitate self-replication. The h-alloc instruction allocates additional memory within which the digital organism can copy its offspring. Copying is performed  by repeated execution of the h-copy instruction, which duplicates the current instruction found at the READ head to the position marked by the WRITE head and advances both heads. Once copying has been completed, the organism must execute the hdivide instruction to finalize the replication process, extracting the memory between the READ head and the WRITE head as the genome of the offspring.

Tested Architecture Modifications
In the default HEADS instruction set, most instructions can have one aspect of their function modified by a single nop instruction that follows in the genome. We aimed to improve the flexibility by which data could be accessed and modified in the virtual CPUs by implementing the FULLY-ASSOCIATIVE (FA) instruction set. We extended the nop modification system used by instructions so that most instructions could be modified by more than one nop. The default behavior of all instructions remains the same when not followed by any nop instructions. Instructions that affect only a single register or head retain identical behavior to the HEADS in the presence of a nop. However, for arithmetic, logic, and conditional instructions that use multiple registers, the FA instruction set will shift all registers to correspond with a signal nop given, as well as read subsequent nops, if present, to further specify those parameters. For example, an add instruction, by default will perform regB~regBzregC. If it is followed by one nop-A, this will alter both the source and destination registers such that it performs regA~regAzregB. When followed by nop-A nop-C nop-B, the add instruction in the FULLY-ASSOCIATIVE set will perform regA~regCzregB. In this manner, very specific operations may be invoked, while retaining robust default behavior.
The REGISTER-series of instruction set architectures build upon the FULLY-ASSOCIATIVE architecture to increase the working register set beyond the three default registers, exposing one or more additional architectural registers, in sets R4, R5, R6, R7, R8, R12, up to a total of 16 in R16. The original design choice was made to minimize the number of registers in order to simplify the complexity of using them, but a larger number of registers has not previously been systematically tested. For each additional register, we added a corresponding nop instruction to the instruction set (nop-D, nop-E, nop-F, etc.). None of the default registers used by the instruction set were altered, meaning that these additional registers can be accessed only when the new nop instructions are used to modify an instruction. Since nop modification is also used for head selection, the additional nop-D in the R4 architecture provides direct access to the FLOW head. In the R5 through R16 architectures, extra unassigned heads that may be used as genome place-markers are available for each additional nop instruction.
The LABEL-series of instruction set architectures extends the R6 architecture (which proved to be the most effective, as described in the results below), explicitly separating genome labels from nop sequences used to modify instruction operands. The intent of this change was to prevent instruction argumentation as facilitated by the FULLY-ASSOCIATIVE architecture from otherwise conflicting with labeled genome positions, especially those used for self-replication. Instructions that operate on genome labels, search-seq-comp-s and if-copied-seq-comp, were extended with variants (search-lbl-comp-s and if-copied-lbl-comp) that recognize sequences of nop instructions only if they begin with the special label instruction (see Table 2 for details about the specific instructions included in each set). When executed directly, the label instruction performs no operation. The LABEL-DIRECT -series architectures alter the pattern matching Table 2. LABEL Instruction Sets Tested. Evolutionary Potential in Virtual CPUs algorithm from the default of cyclic-complementary to direct sequence matching. The LABEL-BOTH architectures include both pattern matching algorithm instruction variants. In order to increase the power of labeled execution flow, all Label -series instruction sets omit the set-flow instruction that performs absolute addressing.
The SPLIT-IO instruction set architecture alters the LABEL-SEQ-DIRECT architecture, splitting the IO instruction into two separate input and output instructions. Both of the new instructions use the same default register location as the IO instruction and can each be modified by one nop.
The SEARCH-series of instruction set architectures extend the SPLIT-IO architecture with enhanced searching and jumping capabilities. The SEARCHDIRECTIONAL set adds two pairs of directional search-instructions that scan the genome forward or backward relative to the instruction pointer for a label or sequence match. The SEARCH-GOTO set, adds a single goto instruction that reads the nop sequence that follows the instruction, if present, and will unconditionally jump to the first genome location following the matching label that begins with a label instruction. If no matching label is found, execution ignores the goto instruction. The SEARCH-GOTOIF group adds two conditional goto variants, goto-if-n-equ and goto-if-less, that execute the jump only if the conditional test evaluates to true.
The FLOW-series of instruction set architectures builds upon the flow control features of the SearchDirectional architecture, testing multiple combinations of additional flow control instructions ( Table 3). The IF0 group adds four single argument if instructions, if-not-0, if-equ-0, if-gtr-0, and if-less-0, that conditionally execute the following instruction based on the comparison of the argument with 0. The IFX group adds two if variants, if-gtrx and if-equ-x, that conditionally execute the following instruction based on the result of comparing regB with a nop modified number. The default value used by if-gtr-x and if-equ-x is a 1. For each nop in the label following a given if-gtr-x or if-equ-x instruction, the bit is left shifted 1, 2, or 3 times for each nop-B, nop-C, or nop-D, respectively. Whenever a nop-A is found in the label sequence, the sign-bit of the value is toggled. Finally, the MOVHEAD group adds two conditional mov-head variants, mov-head-if-n-equ and mov-head-if-less, that operate similarly to the conditional goto instructions.

Environments
We use seven distinct computational environments to evaluate the effectiveness of all tested instruction set architectures. Each environment focuses on a different aspect of the virtual architecture. Environments contain a set of tasks that carry a metabolic reward associated with their performance. These metabolic rewards increase the computation speed of the digital organism's virtual CPU, making it possible to obtain a competitive advantage relative to other organisms in the population.
The Logic-9 environment consists of metabolic rewards for all possible 1-and 2-input binary logic operations; there are 9 unique operations after removing symmetries and the trivial function 'echo'. The tasks are rewarded multiplicatively, thus virtual CPU speed will increase exponentially as additional tasks are performed. The logic operations are grouped into five reward levels, ranked by difficulty. The easiest group will double computational speed, while the highest level increases execution speed by thirty-two times. Each task is rewarded only once during an organisms' lifetime. This environmental setup is the default for Avida and has been used in most previous experiments (e.g. [10,23,24]).
The Logic-77 environment increases the size and complexity of the Logic-9 environment by adding a reward for all 68 unique three-input binary logic operations. Performance of each of the 77 operations provides an equal benefit, doubling the execution speed of the organism for the first time the computation is performed.
We designed the Match-12 environment to test the organisms' ability to build arbitrary numbers, a task that has been observed to be difficult for organisms to perform and confirmed in the experiments described below. Rewards are granted additively for outputting any or all of twelve possible numbers. The numbers are spaced approximately exponentially throughout the 32-bit number space, but have no explicit pattern to them. Each number is rewarded only once during an organisms lifetime. Near matches are allowed, but the reward decays via a half-life function based upon the number of bits that are incorrect with a minimum threshold of 22 bits correct.
The Fibonacci-32 environment rewards organisms multiplicatively for each number in the Fibonacci sequence until the 32nd iteration of the sequence. After this target, an organism is penalized for additional numbers output, whereby outputting 64 additional numbers will effectively negate all benefit of the first 32. The purpose of this setup is to examine the capacity of an instruction set to support bounded recursion and conditional looping.
The Sort-10 environment supplies a list of 10 random inputs and offers a reward for outputting those values in descending order. Similar to the Match environment, the reward value decays via a half-life function for each incorrectly sorted value, based on the number of moves required to shift it to the correct order. Given the limited number of available registers in most of the instruction sets we tested, this task requires the use of the stacks and non-trivial flow control.
The Limited-9 environment is based on the Logic-9 environment, offering the similar metabolic rewards for all possible 1-and 2-input binary logic operations. However, unlike the Logic-9 environment, a separate, consumable resource is associated with Instruction set by row, with marks in each column indicating that the set contains the relevant instruction group. doi:10.1371/journal.pone.0083242.t003 each task. Each of the resources flows into the environment at a fixed rate (100 units per update) and out proportional to current concentration (1% per update), creating an equilibrium concentration of 10000 units when not consumed by organisms. Organisms may only consume 0.25% of an available resource at a given time, impacting the actual metabolic reward collected for performing the task associated with that resource. This property of Limited-9 makes it unique among our tested environments. Unlike our other test environments, which represent instances of static optimization, the fitness landscape of the Limited-9 environment is dynamic. The fitness measurements of a given genotype will be highly dependent upon current resource conditions, and indirect interaction between competing organism niches may lead to ecological complexity. Finally, the Navigation environment rewards organisms for successfully navigating a circuitous path marked by sign posts, as described in [16]. This task requires an organism to use sensors to retrieve a cue from their local grid position and react to that cue by turning left, turning right, moving straight ahead, or repeating the action indicated by the previous cue (requiring the organisms to also evolve memory). Importantly, this environment also tests the robustness of instruction set architectures to the addition of several, experiment specific instructions, in this instance for sensing and moving in the virtual maze. The virtual maze is completely separate from the organism replication space, and varies randomly across replication cycles.

Assessment of Evolutionary Potential
We have focused on two measures to evaluate how well populations solved the computational challenges of the environment when evolved with each instruction set architecture: mean fitness and task success. Both measure ability of the evolved organisms to perform tasks within the environment.
Mean fitness averages the fitness values of each living organism in the population at the moment the experiment finished. It takes into account both the computational capability of the organism and the efficiency of self-replication. We examined the distributions of these fitness values for all instruction set variants in each environment. For each modified instruction set, we compared the 200 population fitness values with those of a reference instruction set architecture using a Wilcoxon rank-sum test. We determined significance using a~0:05 with sequential Bonferroni correction. Confidence intervals, as shown in tables below, represent 2.5% and 97.5% quantiles that we generated using non-parametric bootstrap with 10,000 iterations. Since all seven environments present metabolic rewards that are exponential (base-2), all fitness values are shown in log 2 .
Task success, in contrast to fitness, is a direct examination of the computational capabilities of the organisms within the final population, for the specific environment of the experiment. We measure the task success of a population as the sum of the qualities by which the average organism performs each task. To calculate a task success t p of population p, we determine each organism's quality at each task and then sum over these values, finally dividing by the total number of organisms in the population. More formally, where N p is the number of organisms in population p, T is the number of tasks in the environment, and q i,j is the quality q at which organism i is performing task j. Task quality (q) is a value between 0 and 1, where 1 means the organism has found a perfect solution for a task. Environments that support near-matches use task quality to adjust the metabolic reward accordingly. The maximum task success for a given environment is equal to the total number of tasks rewarded in that environment; for example the maximum task success of the Logic-9 environment is nine. Normalized task success, as presented in the following results, divides the observed task success by the maximum in each environment, thus constraining these values to be between zero and one. Similar to population mean fitness, we compared the distribution of task success of each instruction set to the control architecture using a Wilcoxon rank-sum test, sequential Bonferroni correction, and non-parametric bootstrap confidence intervals. In most environments task success will be highly correlated with fitness. Since organisms in digital evolution must self-replicate, it is possible for genotypes with identical task success to exhibit vastly different fitness measurements, so both metrics can be informative.  Additionally, in some environments task success provides a more consistent measure of the evolutionary potential of the instruction set. For example, in the Limited-9 environment the reduction in resources due to additional task performance may actually reduce average fitness, even though more tasks are being performed.

Results
We evaluated each of the six tested types of hardware modifications in consecutive evolutions of the instruction set architecture. The first hardware modification tested was the FULLY-ASSOCIATIVE set, followed by the REGISTER sets,

Fully-Associative Argumentation
In conducting our analysis, the FULLY-ASSOCIATIVE (FA) instruction set, which addresses the flexibility of register data flow, shows significant improvement in six of the seven environments (Tables 4 and 5). The logic-based environments (Logic-9, Logic-77, and Limited-9) all show substantially improved fitness and task success. The Logic-77 environment in particular, benefits from the FA instruction set with nearly 2.9 times increase in median task success and dramatically increased average fitness. The fullyassociative capability, facilitating specific instruction formats, appears crucial within the highly diverse Logic-77 environment. Indeed, on average 9.2% of the instructions that may utilize more than one nop-modifier that were present in the dominant genotype at the end of the FA experiments with the Logic-77 environment indeed used more than one nop. The Fibonacci-32 environment also sees a notable 44% improvement in task success, with a corresponding increase in fitness. Mean usage of multiple nop modifiers was 16.4% of multi-nop modifiable instructions in the final dominant genotype of the Fibonacci runs. The Sort-10 and Match-12 environments show statistically significant gains for both metrics, but none of these improvements are substantial in nature. The Navigation environment shows a slight, non-significant decline in fitness (pv0:054, Wilcoxon rank-sum test) and task success (pv0:178, Wilcoxon rank-sum test) when tested with the FA instruction set.

Number of Registers
The REGISTER-series instruction sets generally show little variation in performance (Tables 6 and 7). In the Logic-77 environment there is a slight positive trend as the number of registers increases, but none are significant after Bonferroni correction, and the magnitudes of the changes are not particularly notable. The only substantial differences observed among all tested configurations are a drop in task success and a drop in fitness with R16 in the Logic-9 environment, indicating a potential drag on the system due to the dramatic increase in instruction set size with the addition of 13 more nops, though not as severe as completely nonfunctional bloat [21]. The Sort-10 environment demonstrates significant loss of performance in all treatments, relative to the FA architecture, though none of the variation observed is substantial in nature (% 1% difference in task success). The Navigation environment does show what initially appears to be a substantial uptick in performance under R16, but with task success still well below 1%, it is not enough to allow the populations to complete this task. It does, however, indicate that we may wish to explore higher register counts again in configurations where populations have more success with this task.

Explicit Labels
The LABEL-series instruction sets show mixed results (Tables 8  and 9). The Logic-9, Limited-9, Sort-10, and Navigation environments show virtually no substantial differences in task success, regardless of the set used. The Limited-9, Sort-10, and Navigation environments shows slight positive fitness trends as more labeling options are included in the instruction set. The Logic-77 environment shows significantly detrimental results for both fitness and task success when only the label-based instructions are included. When any form of sequence matching instructions are included in the Logic-77 environment, both metrics return to the reference levels. The Match-12 environment shows no significant difference for either metric among all but one instruction set. LABEL-SEQ-BOTH, the most complete instruction set in this group, shows a notably significant drop of both metrics in the Match environment. The Fibonacci-32 environment shows positive gains in all LABEL-series instruction sets. The positive gains observed in the Fibonacci-32 environment were both significant and substantial, with 24.6% and 27.2% improvement in fitness and task success, respectively, when using the LABEL-SEQ-BOTH instruction set. In the Navigation environment using the LABEL-SEQ-DIRECT instruction set, 8 outlier populations notably demonstrated task success greater than 0.10, with two at 0.141, indicating that substantial progress was made in those particular runs. No previous runs in this environment have exhibited such success in the short time period of 100,000 updates used [16,17].

Split Input/Output Operations
The SPLIT-IO instruction set shows improvements that are both significant and often substantial in the Logic-9 and Logic-77 environments, the Match-12 environment, and the Fibonacci-32 environment (Tables 10 and 11). Indeed the Logic-77 and Fibonacci-32 environments show 21% and 17% improvements in median task success, respectively. The Sort-10 environment, on the other hand, completely collapses, showing effectively 0 task success and correspondingly low fitness. The Limited-9 environment shows mixed results, with a small gain in task success but a drop in fitness. The Navigation environment shows marginal drops in both metrics, though neither significant and, similar to previous instruction sets tested, still well below 1% of the success possible.

Search
The three SEARCH-series instruction sets showed little measurable difference in performance for the Logic-9, Match-12, Fibonacci-32, Sort-10, Limited-9, and Navigation environments (Tables 12 and 13). The Logic-77 environment showed small, significant drops in fitness for all sets, with a corresponding drop in task success.
In the SEARCH-GOTO instruction set, we initially tested a variant of the jmphead instruction, which changed the default head it operated on to be the flow head. A notable and often significant drop in fitness was observed in all seven environments with these two instruction sets, leading to the architectures explored here.

Flow Control
The FLOW-series instruction sets tested three groups of flow control instructions separately and in several combinations (Tables 14 and 15). Throughout all instruction sets tested, the Fibonacci-32 environment showed no significant variation from the SEARCH-DIRECTIONAL instruction set performance. The Match-12 environment had some significant drops in fitness, but these were not substantial and also not coupled with a drop in task success. The Logic-9 environment showed significant, though again insubstantial, loss of fitness with all FLOW-series instruction sets. Three instruction sets, FLOW-IF0, FLOW-IFX, and  FLOW-IF0-IFX-MOVHEAD, had corresponding small significant decreases in task success.
Individually, the IF0 instruction group made virtually no difference in performance among any of the seven environments. When tested in combination with the other instruction groups, there is no clear indication of interaction, positive or negative.
The IFX instruction group both individually and in combination with other groups shows positive gains in the Navigation environment, both fitness and task success. This outcome is likely due to the nature of the signposts in this environment [16], such that comparing against certain ''magic'' numbers for decision making is likely beneficial. The remaining six environments show no substantial variation attributable to these instructions.
The third instruction group, MOVHEAD, shows the greatest variation in performance among those tested. In the Logic-77 environment, all instruction sets containing the MOVHEAD group show substantial decreases in median fitness, 14.3% on average. The two combination sets containing MOVHEAD, FLOW-IFX-MOVHEAD and FLOW-IF0-IFX-MOVHEAD, also show corresponding decreases in task success in the Logic-77 environment. The Sort-10, Limited-9, and Navigation environments, on the other hand, show substantial improvements in task success, and often fitness, for all three instruction sets containing the MOVHEAD group. The Navigation environment, notably, approaches median task success around 1% when the IFX and MOVHEAD instruction groups are combined, indicating the importance of effective flow control for that environment. The Sort-10 environment improvements are difficult to observe from median values. Indeed the greatest driver of the improvements are infrequent outliers approaching 0.7% task success, the highest ever observed in the Sort-10 environment (see Figure 2).

Discussion
We have investigated the evolutionary potential of six groups of modified instruction set architectures of a digital evolution system, each within seven different computational environments (see Figure 3). Among the groups investigated there were three classes of outcomes, broad multi-environment improvement, mixed results, and no discernible trend. Notably absent from the observed classes were changes that were negative on balance, let alone broadly detrimental; although, this was not entirely unexpected since the particular changes we chose to test were ones that we expected could help. Some instruction set architectures did demonstrate decreased performance in the mixed result grouping, yet only one example demonstrated highly substantial degradation, the SPLIT-IO instruction set in the Sort-10 environment. We explore potential explanations for this particular case below. In general, evolution has proven to be surprisingly robust to the explored genetic hardware changes, regardless of environment.
Two groups of instruction-set modifications yielded broadly beneficial changes in both fitness and task success. The FULLY-ASSOCIATIVE (FA) architectures instruction data flow enhancements led to highly significant gains in five of the seven environments. The remaining two environments, Sort-10 and Navigation, show some slight improvement and no discernible difference, respectively. The second group that demonstrated broadly positive results was the SPLIT-IO instruction set. The separation of the input and output operation allows finer-grained data flow between the CPU and the environment. This control afforded by the SPLIT-IO architecture was beneficial to the same five environments as the FA architecture. The Navigation environment showed no particular change in fitness performance, and a small, but insubstantial change in task success. The only major detriment to the splitting of input and output operations was observed in the Sort-10 environment. As a whole, these two groups indicate that it is beneficial to maintain as much flexibility as possible with regard to instruction interactions. This flexibility allows evolution to finely tune interactions, yielding greater evolutionary potential.
The REGISTER-series, LABEL-series, and SEARCH-series architectures all demonstrated no discernible trend in performance, despite representing 17 of the 25 tested architectures.  There were some particular environment/instruction set combinations that had significant variations, yet these were rarely substantial in nature. It is particularly surprising that the REGISTER-series instruction sets showed such minimal deviation, given that going from the FA architecture to the R16 architecture represents a greater than five-fold increase in working set and a 50% increase in instruction set size. Similarly, the LABEL-SEQ-BOTH instruction set represents a 20.6% increase in instruction set size, with no substantially negative effect. Taken together these groups provide additional evidence that the evolutionary process is rather robust to genetic language dilution [21], maintaining the ability to adapt successfully to the environment despite searching a much larger genotype space.
The FLOW-series of instruction set architectures represents a third class of outcomes, yielding improved results in a subset of environments and degradation of performance in one environment. The Sort-10, Limited-9, and Navigation environments all show substantial gains in both fitness and task success metrics when using instruction sets containing the IFX and MOVHEAD instruction groups. The Logic-77 environment, on the other hand, shows a notable drop in performance. It is possible that this environment does not require a great deal of flow control, thus is being negatively affected by the disruptive nature of the additional flow control instructions. In environments where flow control decisions are critical for success, such as the Sort-10 and Navigation environment, the benefits of more flexible flow control outweigh their disruptive effects.
The Sort-10 environment stands out as the only example where a single, small change -splitting the input and output instructions -made a large destructive difference in performance. Median task success collapsed to be statistically indistinguishable from 0, and remained there despite further beneficial instruction set modifications. These results are likely an artifact of the environment itself, rather than a general trend. We set up the Sort-10 environment to control for random inputs and to, on average, provide no benefit unless active sorting was performed by an organism. However, the inputs for sorting are indeed a random sample of 10 integers. It is possible, due to chance, for a partial ordering of numbers to yield a positive metabolic reward even if the sequence of inputs is simply echoed back to the environment. When using instruction sets featuring the paired-input-and-output instruction, simply mutating this instruction into the section of the genome responsible for replication may be enough to confer the echo capability, presenting an opportunity for lucky organisms to occasionally reap rewards. When the operations are split into two separate instructions, it then requires two coordinated mutations to confer the echo capability and doubles the execution cost for performing the task. The combination of these factors most likely contributes to the observed drop in median performance.
Instruction data flow, working set size, and flow control are the three main features addressed by the six groups of instruction set modifications presented here. All of these features play an important role in implementing a successful sorting algorithm. Despite the modifications in the instruction set architectures we tested, no significantly beneficial change was observed in either fitness or task success within the Sort-10 environment. Most likely, the highly constrained memory size of these architectures limits the potential within this environment. In fact, a hand-written organism that performs the task successfully with the Heads architecture requires nearly every single stack location in both available stacks. Another factor limiting potential may simply be the time allotted for evolution, which was held constant in our current study. The additional flow control instructions tested in the Flow -series architectures show some signs of improved success in this environment, with numerous outlier populations. Given  additional time to evolve, these and other populations would likely be able to refine the emerging solutions. When features from all six instruction set groups are combined to form the HEADS-EX architecture, significant and substantial improvements relative to the base HEADS architecture are observed in six of the seven environments (Tables 16 and 17). Despite these improvements, there still remains a great deal of unexploited opportunity in five of the environments. Specific architectural changes to address these environments may yield greater results, such as the addition of an instruction capable of building arbitrary numbers for the Match-12 environment. However, such focused modifications could mask the need for more sweeping changes. Even with significant gains under two instruction sets, the Logic-77 environment still shows room for substantial improvement, as median task success shows populations utilizing less than 55% of the opportunities present. Even more so, the Sort-10 and Navigation environments exploit less than 1% of the available potential.
It is clear from this present study that we have just started to identify the most effective genetic hardware for adaptive evolution in digital organisms and there remains room for significant future improvement. Indeed, our current study has focused on modifications within the framework of von Neumann machine code formalisms. We expect that further studies of instruction set architecture enhancements for evolvable systems, both within the limits of von Neumann architectures and the broader range of programming formalisms, will unlock this potential, facilitating advancements in the application of digital evolution and artificial life.