Skip to main content

Advertisement

PLOS Pathogens

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
Submit Your Manuscript

PLOS Pathogens publishes Open Access research and commentary that significantly advance the understanding of pathogens and how they interact with host organisms.

Get Started
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Workflow for aggregating and transforming laboratory-derived experimental results for data science use.
Prior to formal analyses, diverse experimental data must be aggregated and organized into tidy data files (see Fig 2 for examples of different data types that may be sourced for this purpose). Next, additional data transformation steps will likely take place, concurrent with initial exploratory analyses to identify and refine key parameters (see Fig 2 for examples of different considerations that can modulate data transformation outcomes). Once these steps (outlined in green) are completed, hypothesis-driven research questions and model development can take place (in isolation or in tandem), such as generation of simple statistical models (outlined in blue) and ML models (outlined in purple). Establishing ML models may necessitate additional experimental considerations to be optimized prior to training/testing/refining of models. Best practices involve validating models with new, independent data, and the use of cross-validation methods to ensure accurate predictive outcomes not influenced by data noise.

More »

Fig 2 — Fig 2.

Considerations when aggregating and tidying in vivo and molecular/laboratory data.
In vivo-generated data can encapsulate a wide range of serially collected and/or discrete (stand-alone) specimens and observations, and experimental outcomes. Results from in vivo experimentation are frequently contextualized with a diversity of pathogen sequence-based information and laboratory-based assays. Examples of data types within these groupings are shown on the left-hand side of this figure. Depending on the data type, there are a range of options available for distilling complex laboratory-based readouts into discrete values which are necessary for many data science applications; these decisions can meaningfully impact the conclusions drawn from the work. Examples of how complex data can be tidied for this purpose for each data type are shown on the right-hand side of this figure. AUC, area under the curve; RBS, predicted receptor binding preference; PA, predicted polmerase activity. Data types and analysis considerations are representative only and do not encapsulate all potential parameters employed in data science applications employing in vivo data. Image generated entirely by CDC illustrators by hand.

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US