Easyreporting simplifies the implementation of Reproducible Research layers in R software

During last years “irreproducibility” became a general problem in omics data analysis due to the use of sophisticated and poorly described computational procedures. For avoiding misleading results, it is necessary to inspect and reproduce the entire data analysis as a unified product. Reproducible Research (RR) provides general guidelines for public access to the analytic data and related analysis code combined with natural language documentation, allowing third-parties to reproduce the findings. We developed easyreporting, a novel R/Bioconductor package, to facilitate the implementation of an RR layer inside reports/tools. We describe the main functionalities and illustrate the organization of an analysis report using a typical case study concerning the analysis of RNA-seq data. Then, we show how to use easyreporting in other projects to trace R functions automatically. This latter feature helps developers to implement procedures that automatically keep track of the analysis steps. Easyreporting can be useful in supporting the reproducibility of any data analysis project and shows great advantages for the implementation of R packages and GUIs. It turns out to be very helpful in bioinformatics, where the complexity of the analyses makes it extremely difficult to trace all the steps and parameters used in the study.

a second part which uses these results with one or more web services or third party tools, getting additional results, and a last part of R code which uses as input a selection of the previous result files? I guess you could generate a report for the R dependent parts, but I'm missing your recommendation for this kind of scenario in the manuscript or in supplementary materials.
I'm also missing some clearer hints to the readers of your manuscript related to provenance tracking or provenance injection in the generated report which help to obtain all the supplementary data needed to successfully reproduce the analysis. R1 -We thank the referee for the interesting comments. Easyreporting is an R package. Therefore, it allows supporting reproducible research when the analyses are performed within the R environment. In the case of a mixed scenario to preserve full reproducibility is more challenging. In Section Conclusion, we added a discussion about how to preserve reproducibility in such cases and when the reproducibility might be lost.
"However, in omics data analysis, it is common to use mixed scenarios and combine different programming languages depending on the availability of specific methods and functionalities in the community or to integrate analysis with queries to on-line databases. \textit{Easyreporting} can provide support also for these cases, at least up to a certain level. Using interface functions between R and other programming languages, we can execute under R a few steps of an analysis implemented in other languages. For example, thanks to the R package \textit{reticulate} it is possible to run under R some python functions. Similarly, using suitable API functions, we can easily interface R with on-line databases and execute the queries within the R environment. For example, using the R packages \textit{GEOquery} or \textit{TCGAbiolinks}, we can query and process data from the Gene Expression Omnibus (GEO) and the Genomic Data Commons (GDC) Data Portal, respectively. Nowadays, in the Bioconductor repository, many API packages allow interfacing the most common biological databases. By contrast, we should underline that if the analyses are not performed within the R environment, the user must manually document the external steps (for example, in the current release, using comments, adding .bib files, or linking to external resources as databases identifiers). Unfortunately, if the manual documentation is not detailed, the full reproducibility might be lost. Future releases of \textit{easyreporting} will provide additional tools for handling the mixed case scenario." Q2-Page 3, paragraph starting at line 73. I agree that it is relevant for the end user realizing that an analysis report is an instance of `easyreporting`, but is relevant for the end user of the package to learn the technical detail of `easyreporting`being structured as an S4 class (http://adv-r.had.co.nz/S4.html)?
R2 -The class provided is a schematic representation of an rmarkdown file. For end-users that want to create a report it is not relevant knowing that it is a S4 class. However, for end-users that are developers and want to use`easyreporting`in other products, it is important, since the S4 class is treated differently from R5 or R6 classes.
Q3. I have also been having a look at the supplementary materials 1, and its associated repository https://github.com/drighelli/easyreporting_supplementary , and I have found an issue against reproducibility outside the installation where the report was generated. The filè rnaseq_report_live.Rmd` at `Report_files` directory, generated from `report.R` (on the same directory), has next sentence on each of its codeblocks so it limits the reproducibility, as code blocks cannot be run as is. It would not work in a Linux or a Windows installation without an extensive revision, for instance. Also, the report could disclose sensitive data related to these absolute paths.
R3 -We thank the reviewer for raising this issue in our original approach. We improved our external source script depending methods by automating the copy of the imported file scripts to solve this problem locally. This is reflected on the final supplementary analysis report where the absolute paths in the source calls are substituted simply by the script name (which has previously been copied in the same folder of the report).
Q4-I have several recommendation for future releases of `easyreporting`. In analyses with more than one author, package should allow showing all of them on the generated reports. I'm also missing richer ways to attach metadata related to an specific analysis: `easyreporting` should provide a mechanism to attach the optional ORCID, Researcher Id or similar of each author, in order to avoid ambiguity among researchers with similar names.
R4 -We thank the reviewer for these recommendations.
Following these suggestions, since version 1.3.2 we implemented the possibility to add more than one author accompanying each of them with an email, ORCID, affiliation, affiliation website, personal url. We added this information in Section "General Description and Initialization", and a practical example is in the first chunk of code of Supplementary File 1. We also update Table 1 with the new attributes.

Q5 -The report could optionally contain a list of used files in the analysis, both sourced scripts and input files, along with their digest. This could help identifying whether the report is stale because either the scripts or input files have changed.
R5 -As described at point 3, we copy each used script in the report directory.

Q6 -I'm missing some specific method which allows injecting high level annotations about the inputs.
If some analysis depends on specific public or under request data which are not provided by R / Bioconductor packages, there should be a standardized or, at least, recommended way to tell where to find those inputs, or publicly accepted identifiers associated to the data (i.e. EGA or COSMIC ids, DOIs, etc...). These annotations in the generated report would help a lot when some researcher tries to reproduce the described analysis, but it depends on the report authors' collaboration.
R6 -We thank the referee for this suggestion.
Since version 1.3.2 on Bioconductor 3.13, we provided two additional methods for adding additional resources to the final report. Please note that Bioconductor 3.13 is on the official Bioconductor at the moment in the development state and will be released as stable from the next beginning of April.
The first one is based on a bibliography (.bib) file that needs to be specified during the report creation, which cited articles will be included in the final report during its compilation.
We added this information in the "General description and Initialization" Section: "Additionally, during the class creation, it is possible to define a bibliography latex file through the \textit{bibliography} argument, that will be compiled with the report (see table \ref{tab1} for additional details)." The second one a more user-free approach where the user can provide additional resources in the form of source-reference-description with the method addResource(). This method will collect the resources in a data.frame that will be added in a specific section named "Resources Availability" in the final report during the compilation.
We referred to this approach in the "General Description and Implementation" Section: "Additionally, this process creates two others optional sections, one with the cited references when a bibliography file has been specified during the easyreporting instance creation, and another named \textit{Resources Availability} with the resources specified with the \textit{addResource} method (see Supplementary Section 3.9 for more detail). " We update Table 1 with the new attributes and functionalities.

Reviewer #2:
The authors present a new R package to facilitate the creation of reproducible research reports. Their package can be added to a growing number of resources for this important task and it's commendable that researches from within the *omics are investing time and ingenuity to improve the mechanics of the modern, largely computational scientific process in these fields.
So while this is by itself a great goal, the paper did not convince me that the suggested workflow for reporting script based analysis is superior to using simple RMarkdown reports. RMarkdown is not (!) a complicated markup language and learning it appears to me less involved then using the easyreporting wrapper functions, which add another layer of complexity to the already long RMarkdown rendering pipeline. Q1-So I would suggest to the authors not to focus on this first (rather theoretical) usecase of easyreporting, but to go all in on the second one, which is the ability to add a reporting level to Shiny applications. Here the paper presents indeed a novel solution which might be relevant AND practical. Of course the paper then only addresses the admittedly small group of R Shiny developers, but again I do not see how normal R users outside of this circle would benefit from the suggested additional layer on top of RMarkdown.
R1 -We thank the reviewer for this comment. We agree that the main advantage of "easyreporting" is to support developers in implementing RR layers in other tools such as GUIs or other R-packages. Despite that, we do not think that the example of using "easyreporting" for writing an analysis report should be removed, and the use of "easyreporting" should be limited to the development of shiny interfaces.
We agree with the referee that if the user is only interested in writing analysis reports, there might be other alternatives to "easyreporting". We also agree that learning rmarkdown is not a problem. Therefore, we relaxed the presentation of "easyreporting" in such a context, and we make a more clear statement about the advantages of "easyreporting" for developers.
In this context, we believe that "easyreporting" can be used not only inside Shiny applications (as shown in our example), but in principle also in any general Graphical User Interface. Moreover, it can also be used in any R package or any user-defined R-script. We modified the manuscript to make this statement more clear.
Q2 -So accordingly I suggest to change the title, wording and section order of the paper to emphasise the Shiny-enhancing feature of the package.
R2 -As mentioned in the previous point, we tried to make this point more clear, without changing the structure of the manuscript.
Q3 -Here I would also ask the authors to state more clearly that this is only intended to work with Shiny apps, not any GUI framework. Shiny is of course only one of many GUI systems and it should be clear from the beginning that easyreporting can only provide a human-readable logging system for this specific one. Shiny is simple and popular at the moment, but also too slow and fickle for many of the large-data applications in the *omics.