Ten Simple Rules for Effective Computational Research

In order to attempt to understand the complexity inherent in nature, mathematical, statistical and computational techniques are increasingly being employed in the life sciences. In particular, the use and development of software tools is becoming vital for investigating scientific hypotheses, and a wide range of scientists are finding software development playing a more central role in their day-to-day research. In fields such as biology and ecology, there has been a noticeable trend towards the use of quantitative methods for both making sense of ever-increasing amounts of data [1] and building or selecting models [2]. 
 
As Research Fellows of the “2020 Science” project (http://www.2020science.net), funded jointly by the EPSRC (Engineering and Physical Sciences Research Council) and Microsoft Research, we have firsthand experience of the challenges associated with carrying out multidisciplinary computation-based science [3]–[5]. In this paper we offer a jargon-free guide to best practice when developing and using software for scientific research. While many guides to software development exist, they are often aimed at computer scientists [6] or concentrate on large open-source projects [7]; the present guide is aimed specifically at the vast majority of scientific researchers: those without formal training in computer science. We present our ten simple rules with the aim of enabling scientists to be more effective in undertaking research and therefore maximise the impact of this research within the scientific community. While these rules are described individually, collectively they form a single vision for how to approach the practical side of computational science. 
 
Our rules are presented in roughly the chronological order in which they should be undertaken, beginning with things that, as a computational scientist, you should do before you even think about writing any code. For each rule, guides on getting started, links to relevant tutorials, and further reading are provided in the supplementary material (Text S1).


Introduction
This supplementary material, for the paper 'Ten Simple Rules for Effective Computational Research', is designed to provide more detailed guides to getting started with the rules specified in the paper.
It is broken up into sections, which map onto the rules. Each section contains the following information: a short guide/tips for 'Getting started'; 'Useful links' to articles and websites of interest for the general reader to help with getting started; and a list of more in-depth references for 'Further reading'.
While we have strived to be as complete as possible we could not hope to provide an exhaustive list of software and links. From our varied interests we have compiled a set of links and references that should serve as a solid starting point.

Rule 1: Look before you leap Getting started
• Talk to other people in your group to discover the sorts of software they use and whether you can make use of it. • Be aware of licensing restrictions on software you want to use; the Open Source Initiative (http://opensource.org/licenses/category) and OSS Watch (http://www.oss-watch.ac.uk/) are good websites to consult. • An article on conducting systematic literature reviews in software engineering: ○ P.

Rule 2: Develop a prototype first
Getting started • Consider prototyping your code by implementing a simplified version first, and build up the functionality over several steps. • While low-level languages such as C++ are undoubtedly faster in the long run, high-level languages such as those listed below offer useful debugging tools and other inbuilt functionality to help speed up prototype development.

Rule 3: Make your code understandable to others (and yourself)
Getting started • Making your code easily understood will help you, and others, maintain and debug it. This involves not just the code itself, but also descriptive comments. Consider what you would want to find if you were looking at someone else's code for the first time: your code should tell you how something is done and your comments should tell you why. • Start with the basics: name and date each section of code you write (or consider using the date it was last edited

Rule 4: Don't underestimate the complexity of your task Getting started
There are some basic computational tools that will pay dividends if they are learnt early, some of which are more specific to coding practices, but some are also useful for the presentation of work: • The Linux command line, as well as a command line text editor such as emacs or vi.
• Simple text file processing tools, such as sed and awk, and grep for searching for phrases in files (useful when you are debugging). • Scripting languages such as bash (or more usefully perl or python).
• A build utility such as make, CMake or SCons.
• A job scheduler (if you don't want to be sat at your computer overnight!) such as cron.
• LaTeX for presenting mathematics in written documents.
• A literature reference (bibliography) manager.
• And of course the appropriate manual and help pages for the packages you are using.

Further reading
• The act of reviewing your code, looking for repeated functionality, and moving it into useful functions is commonly known as

Rule 5: Understand the mathematical, numerical and computational methods underpinning your work Getting started
The development of any computational method should involve a number of important issues: • Ensure that your scientific approach is sound (modelling assumptions, suitability of analyses) • Ensure that your chosen methods are accurate and stable • Ensure that your implementation is efficient and bug free While balancing these can be time-consuming and tricky, these issues underpin the 'correctness' of your approach and you should be aware of them in the development of your models and programs. For example, testing (Rule 8) can be useful for checking that you've implemented an algorithm correctly (or the library you're using has) by validating results with some known solutions.

Useful links
• MIT Open Courseware in Numerical Analysis: http://ocw.mit.edu/courses/mathematics/18-03sc-differential-equations-fall-2011/unit-i-first-order-differential-equations/numericalmethods/ • Consider profiling to quantify the computational cost of different parts of your programs • Investing more time and effort in visualisation and graphics can enhance everything you dofrom exploring data and debugging code/models all the way to conference presentations -so reassess the amount of effort and time you will invest. Find some galleries of useful visualisations/graphics and reconsider what you could achieve. • From the outset think about what you want to achieve and what the visualizations purpose is.
Who is it for? What is the key message? Writing a design brief (even quickly) can formalise these goals and provide a set of requirements that you can evaluate your visualisation by. Consider the audience and end-users at the start. • Understand your workflow and what role your visualisations and graphics will have. Is it a throw-away graph to check for outliers in the data? Or will it really serve a purpose later on as well, e.g. in a presentation, software, a publication, or tutorial? • Start with pen and paper before you start coding a visualisation and try different variations for the main axes of the figure, and the secondary axes. Understanding and thinking about the visual encodings you will use will save time later. If you can't say way you have made one design choice over another then go back to the drawing board. • When implementing your design ensure that you write the visualisation as a function so that it can be re-used with any data set, and so any changes are easy to implement. For instance, if you are editing a paper and want to change the colour of some points you don't want to have to run all your models again. You should be able to supply all the data and code for a publication's figures as independent units. • Treat you visualisations in the same way as you would text and test it out on your colleagues, friends and family. If it requires a PhD to read and understand your visualisations then you may need to go back to the drawing board. 'Cool' graphics serve a different purpose to 'informative' graphics so make sure that you are true to your design brief.

Useful links
Give your training in visualisation a reboot and read: •  • http://processing.org/ -'Processing is an open source programming language and environment for people who want to create images, animations, and interactions.' • http://www.paraview.org/ -'ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView's batch processing capabilities.' • http://d3js.org/ -'D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3's emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.' • http://research.microsoft.com/enus/um/cambridge/groups/science/tools/datasetviewer/datasetviewer.htm -'DataSet Viewer is a simple standalone menu-driven tool for quickly exploring and comparing time series, geographic distributions and other patterns within scientific data. DataSet Viewer combines selection, filtering and slicing tools, with various chart types (scatter plots, line graphs, heat maps, as well as tables), and geographic mapping (using Bing Maps). The resulting views can be exported as images or movies, or bundled into an interactive package that be shared with colleagues. '

Further reading
• A book about visualisation in practice and the 'Processing' visualisation language: B.