^{1}

^{2}

^{¤}

^{1}

^{1}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: SW FF JK JW. Performed the experiments: FF JK JW. Analyzed the data: SW FF JK JW. Contributed reagents/materials/analysis tools: SW FF JK JW. Wrote the paper: SW FF JK JW.

Current Address: ul. Piotrowo 2, 60-965 Poznan, Poland

Crowdsourcing, understood as outsourcing work to a large network of people in the form of an open call, has been utilized successfully many times, including a very interesting concept involving the implementation of computer games with the objective of solving a scientific problem by employing users to play a game—so-called crowdsourced serious games. Our main objective was to verify whether such an approach could be successfully applied to the discovery of mathematical equations that explain experimental data gathered during the observation of a given dynamic system. Moreover, we wanted to compare it with an approach based on artificial intelligence that uses symbolic regression to find such formulae automatically. To achieve this, we designed and implemented an Internet game in which players attempt to design a spaceship representing an equation that models the observed system. The game was designed while considering that it should be easy to use for people without strong mathematical backgrounds. Moreover, we tried to make use of the collective intelligence observed in crowdsourced systems by enabling many players to collaborate on a single solution. The idea was tested on several hundred players playing almost 10,000 games and conducting a user opinion survey. The results prove that the proposed solution has very high potential. The function generated during weeklong tests was almost as precise as the analytical solution of the model of the system and, up to a certain complexity level of the formulae, it explained data better than the solution generated automatically by Eureqa, the leading software application for the implementation of symbolic regression. Moreover, we observed benefits of using crowdsourcing; the chain of consecutive solutions that led to the best solution was obtained by the continuous collaboration of several players.

Among the many interesting trends associated with the development of technology, we can distinguish two that result directly from its popularization. First, the rapid increase in the number of computers, mobile devices, sensors, and other electronics has caused an exponential increase in the amount of data collected, which, unfortunately, is not accompanied by an equally rapid development in techniques for data processing and knowledge discovery [

The trends described above are probably the main reason for the rapid growth in the popularity of crowdsourcing. The term “crowdsourcing” was introduced in 2006 by Jeff Howe [

An interesting application of the crowdsourcing concept is in the implementation of computer games whose objective is to solve some scientific problem by playing the game. These games, which are called crowdsourced serious games [

The great success of crowdsourcing in so many disciplines suggests that it could also be successfully applied to the automatic discovery of models to explain data. For many centuries, scientists and mathematicians have attempted to find natural laws that explain the world surrounding them and models of various dynamic systems. Given the development of artificial intelligence, especially with regard to evolutionary algorithms [

The main objective of the present study was to verify whether crowdsourcing could be successfully applied to the discovery of mathematical equations that explain data gathered from dynamic systems. To achieve this goal, we designed and implemented a game in which players attempt to design a spaceship that represents an equation that models a given system. Additional objectives during the design of the game were to prepare a game targeted at people without advanced mathematical backgrounds and to make use of the collective intelligence observed in these crowdsourced systems. The game was tested by several hundred players in almost 10,000 games. Finally, we compared the results with the approach based on symbolic regression and analysed the users’ opinions of the game.

The game was designed as a web application, “Throw the hamster” (available at

The tree of upgrades to the spaceship represents the function

If the tested solution turns out to be one of the best, it appears in the ranking list. Ranked models are publicly available for everyone to see. Moreover, anyone can use them for further modifications and, thus, incrementally construct a new solution based on them. As a result, players do not need to build their own models from scratch; they can improve the most accurate solutions, thus enabling them to boost the score creatively but with minimal work and time invested.

Each solution prepared by players was evaluated via an objective function. For this purpose, we used a standard mean absolute error defined using the following formula:

Here, _{i} denotes the value of the data point recorded at time _{i}, and _{i}) is the value predicted by the player’s solution. As the solution becomes more accurate, the value of the objective function decreases, eventually reaching a very small value close to 0. The scoring procedure defined in

Another function that characterizes the solution was used to measure its complexity and compare it with the results of the Eureqa software (described later). That is why we used the method of calculating the complexity of the mathematical formulae used in Eureqa. We assigned a weight to each of the operations that could appear in the formula and then summed the weights of all operations. The weights used by us are the same as the default weights used in the Eureqa software, and they are presented in

Operation | Weight | Operation | Weight |
---|---|---|---|

addition | 1 | negation | 1 |

subtraction | 1 | exponentiation | 5 |

multiplication | 1 | logarithm | 4 |

sine | 3 | natural logarithm | 4 |

cosine | 3 | division | 2 |

tangent | 4 | constant | 1 |

cotangent | 4 | variable | 1 |

The testing procedure was conducted based on the data on Hepatitis C Virus (HCV) infections provided by Dahari et al. [

The tests were conducted using three iterations. After each iteration, the feedback from users was collected, the game design was analysed, bugs were fixed, and some new features were implemented. The statistics summarizing each iteration are presented in

Games played equals number of hamster launches. The time column presents the total time spent by all players on playing the game. The average duration of a single game is equal to 29.8 s.

Iteration | Players | Sessions | Games played | Games per user | Time |
---|---|---|---|---|---|

#0 | ∼20 | ∼40 | ∼400 | ∼20 | ∼3.3 |

#1 | 616 | 928 | 7,628 | 12.38 | 63.1 h |

#2 | 90 | 212 | 1,525 | 16.94 | 12.6 h |

The preliminary iteration of tests (#0) consisted of internal tests on a small group of players. It helped to detect several bugs and a few problems related to the user experience. Statistics for this iteration are only estimations because bugs in the code and frequent database updates prevented their precise calculation. The first large-scale iteration (#1) was the first major test. The main objective of this iteration was to verify the proposed concept. The game was published on the Internet and presented to a wider group of people. We found several minor bugs, but, most importantly, we identified several misconceptions in the game’s design. We corrected them and proceeded to the final iteration of tests (#2). These final results were also compared to the output of the Eureqa software. Details on the results are presented in the following sections.

The study involved Internet users who played the online game and were recruited by messages published on social networks. All statistics used during the research were collected anonymously, and all players were informed such that, according to Polish law, there was no need to collect consent from participants or obtain approval from the institutional review board. The authors did not have access to any potentially identifying information at any point of the study (including user IP addresses).

We used version 0.99.9 of the Eureqa software, which was the most recent version when the experiment was conducted. The search was executed using the default values of parameters and the same mathematical operations, complexity weights and objective function as in the final iteration of the game (#2). The search was executed for 18 h using four cores of a 64-bit i7 CPU and, during that time, evaluated 1.5 × 10^{11} formulae.

During the two large test iterations, a survey was carried out to collect feedback on the game. The survey consisted of seven questions. We asked players how they liked the game and what they wanted to change about it. We also checked whether they were aware that this game was designed not only to be fun for the players but also to solve an important scientific problem. Finally, we asked about the players’ attitudes toward mathematics to check whether there was a correlation between mathematical skills and the results in the game. Fifty-seven players out of 726 completed the survey. Detailed questions and all collected answers are presented in

The main objective of the technical part of the project was to provide the application on the largest possible number of platforms. The game was implemented using the latest portable technologies, including HTML 5, CSS 3, Javascript, PHP and MySQL. Signing in to the game supports integration with Facebook and Google+ accounts and anonymous access. All screens are created according to a single-page-application pattern to provide a fluid user experience. The backend of the game stores all solutions in the database, which provides advanced analytical functions using the SQL language. Every solution has the structure of a tree, in which nodes are mathematical operations, variables or constant values. The tree is stored in JSON format, thus supporting interoperability.

As presented in

In the first phase, the influence of trigonometric functions is easily visible. Moreover, when the denominator of the component containing the cosine function approaches 0, we can observe spikes in function values.

The following contains a comparison of the best functions constructed by players in each test iteration with results found by the Eureqa software. Eureqa does not find a single solution but a set of them; it stores the best solution for each level of formula complexity. From all of these solutions, we compared the best one with the best among all with a complexity less than or equal to the complexity of the solution constructed by players during the second phase of tests. For each function, we present the mean absolute error, the score presented to players in the game, and the complexity of the function. The functions can usually be simplified using basic arithmetic operations, but we avoided this to present the raw form of the formula generated by players or Eureqa.

The best solution constructed by players. Mean absolute error:

The best solution constructed by players. Mean absolute error:

The best solution found by Eureqa with complexity less than or equal to the best solution from the second iteration of tests. Mean absolute error:

The best solution found by Eureqa. Mean absolute error:

The best result was obtained during the first phase of tests thanks to the use of trigonometric functions, which outperformed even the best solution generated by Eureqa. In a much more realistic case modelled during the second iteration of tests, the result is much worse; however, it is still slightly better than the result generated by Eureqa for the same complexity. Nevertheless, when not constrained by the complexity level, the result generated by Eureqa is better than the solution constructed by the players. The reason for this is probably the problem pertaining to the handling of complex solutions by players, which is later discussed in detail.

The time spent on calculating solutions in both approaches was comparable. Eureqa calculated the solution in approximately 18 h. In the second half of this time, improvements to the solution were very small, and at the end of this time, the solution stopped improving. Iteration #1, measured as the total time spent by all players on playing the game, was approximately three times longer and equal to approximately 63 h. However, in iteration #2, when more dedicated players were playing the game, the total time spent on solving the problem was a few hours shorter and equal to approximately 13 h. Details are presented in

Figs

Most of the players constructed their solutions based on some other solution—their own solution or one of the best solutions constructed by other players. We define an increment as the difference between the score of the new solution and the base score.

Most increments were not high and were in the range of 0–100 points. Only a few players modified their functions in such a way as to achieve a result that was much higher than the base solution (with a score of at least 300 points higher). This is consistent with the analysis of the process on how the best solution was constructed (detailed data can be found in

The survey confirmed that the appropriate game design and method for sharing information about the game ensured that people were aware of the scientific objective of the game. Eighty-two percent of them confirmed that they read the description explaining the scientific background of the game, and another 16% admitted that, although they had not read it, they were aware that there was some scientific aim. Many players declared that they played the game several times regardless of the fact that the game was not very interesting to them. This could suggest that they really understood the significance of the game and continued to play because of the scientific objective. Moreover, most players were more likely to recommend the game to others than not.

The survey also allowed the formation of some ideas about improvements that could be introduced to the game. Many people complained that they did not understand how adding upgrades to the spaceship influences its flight. This problem was partially solved during the second phase of tests by adding a description to each of the upgrades, thus explaining its influence and significantly improving the reception of the game during the second phase. In this iteration, for each upgrade, we provided the mathematical operation that it represents and an explanation for people with lower mathematical knowledge’for example,

All collected opinions are presented in

The objective of the research was to verify whether it is possible to use a crowdsourced game to solve the problem of finding mathematical formulae to explain experimental data. To answer this question, we implemented a simple web game and integrated it with social networks. Based on the large number of games (almost 10,000), we can conclude that the verification was successful. The group of people could, in a relatively short time, construct a solution better than that found by the leading software application that uses artificial intelligence algorithms based on symbolic regression. However, it should be noted that both solutions work only for the discovery of a formula of a single equation. Their application would be much more interesting if they could find formulae of a set of equations or, even better, a set of differential equations. Thus, it can be more easily used to explain how the system described by these equation works in reality. Nevertheless, these are much more complex problems that require separate consideration.

However, the artificial intelligence methods performed better for very complex formulae. The reason for this is probably the problem with controlling complex solutions manually by humans. This can be clearly seen in

The key observed advantage of crowdsourcing is the collaboration of many players. The creation of the best solution was possible thanks to the cooperation of 17 players who constructed a chain of more than 100 improvements (see

Another interesting conclusion from the tests is the observation of how quickly players realized that using trigonometric functions could easily improve the solution’s score. This was an obvious error in the game design that was successfully corrected before the second phase of tests. It is also worth noting that, for players, the biggest problem in the game was understanding how a change in the design of the spaceship influenced its flight trajectory. According to the user survey, it was a problem or a large problem for 40% of players. This was partially solved before the second phase of tests by adding a more detailed description of each upgrade; however, this solution can be improved further to obtain better results by including some type of tutorial.

One of the important conclusions from the tests and user opinion survey is that the game itself should be more interesting and engaging to attract players. The many volunteers that participated in the tests of the game have already proved that the implemented approach could be successful from a scientific point of view. However, they emphasized that the game should be more attractive to players to stimulate them to play the game for a longer time. This is why we have currently suspended the search for new players and are preparing a new version of the game that will be more interesting for users to play while addressing the most serious concern observed during the tests—difficulties in understanding how the design of the spaceship influences its flight.

To summarize the article, we presented a novel approach for finding mathematical formulae that explain experimental data gathered from the analysis of dynamic systems. The solution is a crowdsourced serious game, which proved to be very successful in solving this problem. There are still some drawbacks that must be solved before widespread implementation of this method, but they were identified during the research and well defined, and we have some ideas on how to solve them. Currently, the game can be classified as a very difficult puzzle game, but adding some action elements to the simulation of the hamster’s flight could make it much more entertaining. The best proof of how a minimal, score-based action game can engage millions of players can be provided, for example, by the success of the simple Flappy Bird game [

Video that demonstrates an example interaction with the game: construction of the spaceship, its flight and selecting another user’s solution as a starting solution to improve the spaceship.

(MP4)

The table contains all answers to the closed-ended questions inside the survey. It also contains the description of each question and possible answers. The information about the game was spread among our friends; that is why most of the open-ended questions were answered in Polish, so we do not include them. The analysis of answers to open-ended questions is included in the article.

(XLSX)

Game results generated during the first and the second iterations of tests. Each row presents the solution designed by the player, its value, the player’s id and the base solution that was used.

(XLSX)

Chain of improvements that lead to the best solution. It presents the sequence of solutions, each of which is based on the previous one with the value of the increment and id of the player.

(XLSX)