An R Package for Computing Canadian Assessment of Physical Literacy (CAPL) scores and interpretations from raw data

The Canadian Assessment of Physical Literacy (CAPL) is the first comprehensive protocol designed to assess a child’s level of physical literacy. Current approaches to analyzing CAPL-2 raw data are tedious, inefficient, and/or can lead to computation errors. In this paper we introduce the capl R package (open source), designed to compute and visualize CAPL-2 scores and interpretations from raw data. The capl package takes advantage of the R environment to provide users with a fast, efficient, and reliable approach to analyzing their CAPL-2 raw data and a “quiet” user experience, whereby “noisy” error messages are suppressed via validation. We begin by discussing several preparatory steps that are required prior to using the capl package. These steps include preparing, formatting, and importing CAPL-2 raw data. We then use demo data to show that computing the CAPL-2 scores and interpretations is as simple as executing one line of code. This one line of code uses the main function in the capl package (get_capl()) to compute 40 variables within a matter of seconds. Next, we showcase the helper functions that are called within the main function to compute individual variables and scores for each test element within the four domains as well as an overall physical literacy score. Finally, we show how to visualize CAPL-2 results using the ggplot2 R package.


Introduction
Physical literacy is defined as "the motivation, confidence, physical competence, knowledge, and understanding to value and take responsibility for engagement in physical activities for life" [1]. It has been recognized as the foundation for lifelong healthy active living [2], and subsequently impacted the work of numerous sectors, including physical activity, sport, recreation, education, and public health [1]. Though the construct of physical literacy has gained significant attention in recent years [3][4][5], early advocates emphasized its importance and highlighted the need for a comprehensive and objective measurement of physical literacy as a means to understand the state of physical literacy in children, evaluate the effectiveness of physical activity programming initiatives, and increase the robustness of physical education assessment [6].
The Canadian Assessment of Physical Literacy (CAPL) was the first comprehensive protocol designed to assess a broad spectrum of skills and abilities that contribute to and characterize the physical literacy level of a participating child [2]. The CAPL was developed on the premise that a physically active child is more likely to possess adequate knowledge and understanding of physical activity, motivation and confidence, and physical competence than a physically inactive child. The first version of CAPL was developed and refined between 2009 and 2013 [7] and later revised (CAPL-2) in 2017. The CAPL-2 reflects revisions based on assessments of over 10,000 Canadian children and is the culmination of test development efforts, with input from well over 100 researchers and practitioners within related fields of study [8]. The CAPL-2 comprises four domains: physical competence, daily behaviour, motivation and confidence, and knowledge and understanding. Each domain consists of different test elements. These test elements can be scored and interpreted independently to provide an assessment of each attribute of physical literacy or can be combined to provide comprehensive scores for each domain. An overall physical literacy score can also be calculated using each of the four domain scores, with suggested interpretations -based on normative data from over 10,000 Canadian children [7] -by age and gender. Numerical CAPL-2 scores are assigned to one of four categories: beginning, progressing, achieving, and excelling. The beginning and progressing categories include children who have not yet achieved the optimal level of physical literacy, the achieving category identifies children who have achieved a score associated with sufficient physical literacy, and the excelling category reflects children with a high level of physical literacy.
The number of published research studies using the CAPL/CAPL-2 continues to grow. In Canada alone, 14 papers from the Royal Bank of Canada Learn to Play -Canadian Assessment of Physical Literacy study (RBC -Learn to Play CAPL) were published in a supplemental issue of BMC Public Health (bmcpublichealth.biomedcentral.com/articles/supplements/volume-18-supplement-2). Data in each paper included approximately 10,000 children aged 8 to 12 years, recruited from several provinces across Canada. The CAPL-2 manual and materials have been translated in five languages and have been used internationally [9].
In this paper we introduce the capl R package, designed to compute and visualize CAPL-2 scores and interpretations from raw data. R is a programming language and free software environment for statistical computing and graphics (https://www.r-project.org/about.html). R is widely used among statisticians and data analysts, and is among the top 10 most popular programming languages according to the TIOBE Programming Community index (www.tiobe. com/tiobe-index). The capl package is open source and was built to provide users with a fast, efficient, and reliable approach to analyzing CAPL-2 raw data. Currently, users can analyze CAPL-2 raw data either manually or through the CAPL-2 website (www.capl-eclp.ca). Manually analyzing CAPL-2 raw data requires users to navigate through approximately 60 variables and perform dozens of tedious calculations which are derived from different cut-off criteria, existing variables, and newly created variables. Hence, this method is often time-intensive and can lead to errors in scores and interpretations due to incorrect calculations or data entry errors. The data entry feature on the CAPL-2 website reduces user burden associated with data manipulation and analysis by computing scores and interpretations for the user. The primary disadvantage of this feature, however, is that only one participant's data can be analyzed at one time, making this approach monotonous and time-intensive. As shown in Supplementary File A, users using the data entry feature on the CAPL website are required to enter the raw data of every test element for each participant. Another disadvantage associated with the website is that some users from academic institutions seeking to analyze their raw data via the website are often prohibited because of privacy and ethical concerns raised by institutional research ethics boards. The capl package was specifically designed to address these issues. As shown below, the capl package was developed to analyze raw data from hundreds and thousands of observations (i.e., participants) all at once, in one simple line of code. A number of helper functions in the package serve to validate the raw data in order to minimize errors and nonsensical scores and interpretations (see section 2.7.1).

Installation
Users can download and install the most recent version of the capl package directly from GitHub (www.github.com/barnzilla/capl) using the devtools R package.
Once the capl package is loaded, any available tutorials for the package can be accessed by calling the browseVignettes() function.
The name and description of each function included in the capl package is outlined in Table 1.

Importing raw data
Users must first import their raw data before using the capl package to compute CAPL-2 scores and interpretations. The import_capl_data() function enables users to import data from an Excel workbook into the R global environment.  get_adequacy_score() This function computes an adequacy score (adequacy_score) for responses to items 2, 4 and 6 of the CSAPPA (Children's Self-Perceptions of Adequacy in and Predilection for Physical Activity; Hay, 1992) Questionnaire as they appear in the CAPL-2 Questionnaire. This score is used to compute the motivation and confidence domain score (mc_score).
get_binary_score() This function computes a binary score (0 = incorrect answer, 1 = correct answer) for a response to a questionnaire item based on the value(s) set as answer(s) to the item.
get_camsa_score() This function selects the maximum CAMSA (Canadian Agility and Movement Skill Assessment) skill + time score for two trials (camsa_score) and then divides by 2.8 so that the score is out of 10. This score is used to compute the physical literacy score (pc_score get_capl_domain_status() This function computes the status ("complete", "missing interpretation", "missing protocol" or "incomplete") of a CAPL domain (e.g., pc_status, db_status, mc_status, ku_status, capl_status).
get_capl_score() This function computes an overall physical literacy score (capl_score) based on the physical competence (pc_score), daily behaviour (db_score), motivation and confidence (mc_score), and knowledge and understanding (ku_score) domain scores. If one of the scores is missing or invalid, a weighted score will be computed from the other three scores.
get_db_score() This function computes a daily behaviour domain score (db_score) based on the step and self-reported physical activity scores. This score is used to compute the overall physical literacy score (capl_score).
get_fill_in_the_blanks_score() This function computes a score (fill_in_the_blanks_score) for responses to the fill in the blanks items (story about Sally) in the CAPL-2 Questionnaire. This score is used to compute the knowledge and understanding domain score (ku_score).

get_intrinsic_motivation_score ()
This function computes an intrinsic motivation score (intrinsic_motivation_score) for responses to items 1-3 of the the Behavioral Regulation in Exercise Questionnaire (BREQ) as they appear in the CAPL-2 Questionnaire. This score is used to compute the motivation and confidence domain score (mc_score).
get_ku_score() This function computes a knowledge and understanding domain score (ku_score) based on the physical activity guideline (pa_guideline_score), cardiorespiratory fitness means (crf_means_score), muscular strength and endurance means (ms_score), sports skill (sports_skill_score) and fill in the blanks (fill_in_the_blanks_score) scores. If one of the scores is missing or invalid, a weighted domain score will be computed from the other four scores. This score is used to compute the overall physical literacy score (capl_score).
get_mc_score() This function computes a motivation and confidence domain score (mc_score) based on the predilection (predilection_score), adequacy (adequacy_score), intrinsic motivation (intrinsic_motivation_score) and physical activity competence (pa_competence_score) scores. If one of the scores is missing or invalid, a weighted domain score will be computed from the other three scores. This score is used to compute the overall physical literacy score (capl_score).
get_missing_capl_variables() This function adds required CAPL-2 variables (see Details for a full list) to a data frame of raw data if they are missing. When missing variables are added, the values for a given missing variable are set to NA. This function is called within get_capl() so that CAPL-2 score and interpretation computations will run without errors in the presence of missing variables.
get_pa_competence_score() This function computes a physical activity competence score (pa_competence_score) for responses to items 4-6 of the the Behavioral Regulation in Exercise Questionnaire (BREQ) as they appear in the CAPL-2 Questionnaire. This score is used to compute the motivation and confidence domain score (mc_score).
get_pacer_20m_laps() This function converts PACER (Progressive Aerobic Cardiovascular Endurance Run) shuttle run laps to their equivalent in 20-metre laps (pacer_laps_20m). If laps are already 20-metre laps, they are returned unless outside the valid range . This variable is used to compute the PACER score (pacer_score).

Required variables
The capl package requires 60 variables in order to compute CAPL-2 scores and interpretations. Users can use the get_missing_capl_variables() function to retrieve a list of the required variables. The required variables are outlined in the Details section of the documentation. get_pc_score() This function computes a physical competence domain score (pc_score) based on the PACER (Progressive Aerobic Cardiovascular Endurance Run), plank and CAMSA (Canadian Agility and Movement Skill Assessment) scores. If one protocol score is missing or invalid, a weighted domain score will be computed from the other two protocol scores. This score is used to compute the physical competence domain score (pc_score).
get_pedometer_wear_time() This function computes pedometer wear time in decimal hours for a given day (e.g., wear_time1). This variable is used to compute the step_average variable and the step score (step_score).
get_plank_score() This function computes a plank score (plank_score) based on the duration of time (in seconds) for which a plank is held. This score is used to compute the physical competence domain score (pc_score).
get_predilection_score() This function computes a predilection score (predilection_score) for responses to items 1, 3 and 5 of the CSAPPA (Children's Self-Perceptions of Adequacy in and Predilection for Physical Activity; Hay, 1992) Questionnaire as they appear in the CAPL-2 Questionnaire. This score is used to compute the motivation and confidence domain score (mc_score).
get_self_report_pa_score() This function computes a score (self_report_pa_score) for a response to "During the past week (7 days), on how many days were you physically active for a total of at least 60 minutes per day? (all the time you spent in activities that increased your heart rate and made you breathe hard)?" in the CAPL-2 Questionnaire. This score is used to compute the daily behaviour domain score (db_score).
get_step_average() This function computes the daily arithmetic mean of a week of steps taken as measured by a pedometer (step_average). This variable is used to compute the step score (step_score).
get_step_score() This function computes a step score (step_score) based on the average daily steps taken as measured by a pedometer. This score is used to compute the daily behaviour domain score (db_score).
import_capl_data() This function imports CAPL-2 data from an Excel workbook on a local computer. ?get_missing_capl_variables

Loading the pre-installed dataset
The capl package comes with a demo (fake) dataset of raw data, capl_demo_data, which contains 500 rows of participant data on the 60 variables that are required by the capl package. Users can load the demo dataset and start exploring.
The base R str() function allows users to get a sense of how the CAPL-2 raw data should be structured and named for upstream use in the capl package (see Fig 1).

PLOS ONE
The 60 required variables can also be quickly accessed by calling the base R colnames() function.

Generating demo raw data
The capl package is also equipped with the get_capl_demo_data() function. This function allows users to randomly generate demo raw data and takes one parameter, n (set to 500 by default). This parameter is used to specify how many rows of demo raw data to generate and must, therefore, be an integer and greater than zero. Users, for example, can randomly generate demo raw data for 10,000 participants by executing a single line of code: The base R str() function can be called to verify how many rows of data were created.

Exporting data to Excel
If users prefer to examine the CAPL-2 demo raw data in a workbook, the export_capl_ data() function allows them to export data objects to Excel.

Renaming variables
If users import their own raw data and plan to use the main function (get_capl()) in the capl package to compute CAPL-2 scores and interpretations, they must ensure their variable names match the names of the 60 required variables. Users can rename their variables by calling the rename_variable() function (see Fig 2). This function takes three parameters: x, search, and replace. The x parameter must be the raw data object, the search parameter must be a character vector representing the variable name(s) to be renamed, and the replace parameter must be a character vector representing the new names for the variables specified in the search parameter. Below we show how to rename variables using a fake dataset called raw_data.

Eliminating noisy errors with validation
One of the coding philosophies behind the capl package is to create a "quiet" user experience by suppressing "noisy" error messages via validation. That is, the capl package returns missing or invalid values as NA values instead of throwing "noisy" errors that halt code execution. As important as error messages are, there is potential for many error messages to be thrown in the capl package due to the large number of computations performed across a diverse set of variables. This might discourage some users who are not able to eliminate these error messages in a timely manner. We have, therefore, opted to develop a package that offers a "quiet" user experience. If any variable is missing, for example, the get_capl() function will continue to execute without throwing error messages. The get_missing_capl_variables() function will create required variables that are missing and populate these variables with NA values. In order to implement the validation philosophy, every capl function enlists helper functions to validate the data. If a given value is not of the correct class or out of range, an NA will be returned.

PLOS ONE
Users can learn more about these functions by accessing the documentation within the R environment.

Validation of age.
The CAPL-2 is currently validated with 8-to 12-year-old children. However, when a function requires the age variable to execute a computation (e.g., get_capl_interpretation()), the age variable is validated via the validate_age () function.
Notice the NA values in the results.
The first element is NA because the original value is 7 and the next five elements are identical to their original values because they are integers between 8 and 12. Recall that the CAPL-2 is validated with children aged 8 to 12 years, hence why the first value is NA (i.e., outside the validated range). The next two elements because the original values ("" and NA) are obviously invalid. The last element is 8, but notice that the original value is a decimal. Because 8.5 is between 8 and 12, it is considered valid but the floor of the value is returned since CAPL-2 performs age-specific computations based on integer age.

Validation of gender.
The CAPL-2 is currently validated for children who identify as boys or girls. When a function requires the gender variable to execute a computation, the gender variable is validated via the validate_gender() function.
Notice the results again. This function accepts a number of case-insensitive options (e.g., "Girl", "G", "female", "F", 1) for the female gender and returns a standardized "girl" value. The two elements that are returned as NA have original values that are obviously invalid ("" and NA). The validate_gender() function behaves in a similar fashion for the male gender; it also accepts a number of case-insensitive options and returns a standardized "boy" value.

PLOS ONE
The 40 new computed variables related to/including the CAPL-2 scores and interpretations can be confirmed by calling the base R str() function (see Fig 5). As illustrated on the first line of the output, there are now 500 rows of participant data on 100 variables.

Forty new variables computed by get_capl()
The 40 new variables related to/including the CAPL-2 scores and interpretations that are outputted from the get_capl() function include:

Computing CAPL-2 scores and interpretations manually
Some users may want to validate and compute individual variables and scores. The following sections introduce the helper functions in the order they appear when called in the get_ capl() function.

Physical competence functions
As illustrated in Fig 4 and in the CAPL-2 manual on page 43 (www.capl-eclp.ca/capl-manual), the physical competence score is computed by summing the plank, PACER and CAMSA scores: 4.1.1 PACER 20-metre laps. The pacer_laps_20m() function is used to convert PACER (Progressive Aerobic Cardiovascular Endurance Run) 15-metre shuttle run laps to 20-metre shuttle run laps. If laps are already 20-metre laps, the data are returned as is unless outside the valid range . This variable is used to compute the PACER score.

PLOS ONE
4.1.2 PACER score. The get_pacer_score() function computes a PACER score that ranges from zero to 10 based on the number of PACER laps run at a 20-metre distance. This score is used to compute the physical competence domain score variable.
4.1.7 CAMSA skill + time score. The get_camsa_skill_time_score() function computes the CAMSA skill + time score for a given trial that ranges from one to 28 (see Fig 7). This score is used to compute the CAMSA score.
4.1.8 CAMSA score. The get_camsa_score() function computes the maximum CAMSA skill + time score for two trials and then divides by 2.8 so that the score is out of 10. This score is used to compute the physical literacy score. 4.1.9 CAMSA interpretation. The get_capl_interpretation() function computes an age-and gender-specific CAPL-2 interpretation for a given CAPL-2 protocol or domain score 4.1.10 Physical competence score. The get_pc_score() function computes a physical competence domain score that ranges from zero to 30 based on the PACER, plank and CAMSA scores. If one protocol score is missing or invalid, a weighted domain score is computed from the other two protocol scores. This score is used to compute the physical competence domain score.

Daily behaviour
As illustrated in Fig 8 and in the CAPL-2 manual on page 26 (www.capl-eclp.ca/capl-manual), the formula for computing the daily behaviour score is:

4.2.1
Step average. The get_step_average() function computes the daily arithmetic mean of a week of steps measured by pedometry. This variable is used to compute the step score.
There must be at least four valid days of pedometer step counts for an arithmetic mean to be computed. If there are less than four valid days, one of the step values from a valid day is randomly sampled and used for the fourth valid day before computing the mean. Other  important capl functions called by the get_step_average() function include get_-pedometer_wear_time() and validate_steps() (see Fig 10).

4.2.2
Step score. The get_step_score() function computes a step score that ranges from 0 to 25 based on the average daily steps taken as measured by a pedometer. This score is used to compute the daily behaviour domain score.

Self-reported physical activity score.
The get_self_report_pa() function computes a score that ranges from zero to five based on the response to "During the past week (7 days), on how many days were you physically active for a total of at least 60 minutes per day? (all the time you spent in activities that increased your heart rate and made you breathe hard)?" in the CAPL-2 Questionnaire (www.capl-eclp.ca/wp-content/uploads/2018/ 02/CAPL-2-questionnaire.pdf). This score is used to compute the daily behaviour domain score.

Motivation and confidence functions
As illustrated in Fig 11 and in the CAPL-2 manual on page 79 (www.capl-eclp.ca/caplmanual), the formula for computing the motivation and confidence score is: 4.3.1 Predilection score. The get_predilection_score() function computes a predilection score that ranges from 1.8 to 7.5 based on responses to three items from the Children's Self-Perception of Adequacy in and Predilection for Physical Activity as they appear in the CAPL-2 Questionnaire (www.capl-eclp.ca/wp-content/uploads/2018/02/CAPL-2-questionnaire.pdf). This score is used to compute the motivation and confidence domain score.

Adequacy score.
The get_adequacy_score() function computes an adequacy score that ranges from 1.8 to 7.5 based on responses to three items from the Children's Self-Perception of Adequacy in and Predilection for Physical Activity as they appear in the CAPL-2 Questionnaire (www.capl-eclp.ca/wp-content/uploads/2018/02/CAPL-2-questionnaire.pdf). This score is used to compute the motivation and confidence domain score.  6 3.6 4.2 3.6 6.1 3.6 4.3 4.8 5.6 3.0 4.3 4.3 6.1 4.

4.3.4
Physical activity competence score. The get_pa_competence_score() function computes a physical activity competence score that ranges from 1.5 to 7.5 based on responses to three items from the Behavioural Regulation in Exercise Questionnaire as they appear in the CAPL-2 Questionnaire (www.capl-eclp.ca/wp-content/uploads/2018/02/CAPL-2-questionnaire.pdf). This score is used to compute the motivation and confidence domain score. The get_mc_score() function computes a motivation and confidence domain score that ranges from zero to 30 based on the predilection, adequacy, intrinsic motivation and physical activity competence scores. If one of the scores is missing or invalid, a weighted domain score is computed from the other three scores. This score is used to compute the overall physical literacy score.

Motivation and confidence interpretation.
The get_capl_interpretation () function computes an age-and gender-specific CAPL-2 interpretation for a given CAPL-2 protocol or domain score.

Knowledge and understanding functions
As illustrated in Fig 12 and in the CAPL-2 manual on page 75 (www.capl-eclp.ca/caplmanual), the formula for computing the knowledge and understanding score is: 4.4.1 Physical activity guideline score (Q1). The get_binary() function computes a binary score (0 = incorrect answer, 1 = correct answer) for a response to a questionnaire item based on the value(s) set as answer(s) to the item. capl_demo_data$pa_guideline_score <-get_binary_score (capl_demo_data$pa_guideline, c(3, "60 minutes or 1 hour")) capl_demo_data$pa_guideline_score

PLOS ONE
The get_binary() function is also called to analyze responses for Q2, Q3, and Q4.

Muscular strength definition score (Q3)
capl_demo_data$crf_means_score <-get_binary_score(capl_de-mo_data$crf_means, c(2, "How well the heart can pump blood and the lungs can provide oxygen")) capl_demo_data$crf_means_score Knowledge and understanding score. The get_ku_score() function computes a knowledge and understanding domain score that ranges from zero to 10 based on the physical activity guideline (Q1), cardiorespiratory fitness means (Q2), muscular strength and endurance means (Q3), sports skill (Q4) and fill in the blanks (Q5) scores. If one of the scores is missing or invalid, a weighted domain score is computed from the other four scores. This score is used to compute the overall physical literacy score. capl_demo_data$ku_score <-get_ku_score( pa_guideline_score = capl_demo_data$pa_guideline_score, crf_means_score = capl_demo_data$crf_means_score, ms_means_score = capl_demo_data$ms_means_score,

Overall physical literacy score
The get_capl_score() function computes an overall physical literacy score that ranges from zero to 100 based on the physical competence, daily behaviour, motivation and confidence, and knowledge and understanding domain scores. If one of the scores is missing or invalid, a weighted score is computed from the other three scores.

Data visualization
The capl package makes use of the famous ggplo2 R package to create custom functions that render beautiful plots for visualizing CAPL-2 results.

Plots
CAPL-2 scores can be grouped by their associated interpretative categories and visualized in a bar plot by calling the get_capl_bar_plot() function. The mean score for each interpretative category appears above each bar (see Fig 13).
The color palette can be customized by setting the colors argument (see Fig 14).

Export results
If users want to export their data, the export_capl_data() function allows them to export their data to Excel or SPSS.

Conclusion
In this paper we introduce the capl package developed for use in the R environment. The primary motivation for developing the capl package was to offer interested users -most likely researchers -a fast, efficient, and reliable approach to analyzing CAPL-2 raw data. We begin this paper by discussing several preparatory steps that are required prior to using the capl

PLOS ONE
package. These steps include preparing, formatting, and importing CAPL-2 raw data. We then use demo data to show that computing the CAPL-2 scores and interpretations is as a simple as executing one line of code. This one line of code uses the main (wrapper) function in the capl package (get_capl()) to compute 40 variables. Next, we introduce each helper function that is called within the main function to explain how to compute individual variables and scores for each test element within the four domains as well as how to calculate an overall physical literacy score. Finally, we show how to visualize CAPL-2 results using the ggplot2 R package.
One limitation of the current capl package is that it is specifically built for CAPL-2 raw data, and therefore not fully accessible to users with earlier versions of the CAPL. In the future,

PLOS ONE
we intend to make the capl package available to users across all versions of the CAPL. The future version of the capl package will also include more data visualization features. We also plan to release an R Shiny application that runs locally in a web browser, providing users with a web-based interface for the capl package (e.g., a form for uploading CAPL raw data into R; a form for downloading CAPL raw and computed data out of R into various output formats [CSV, Excel, SAS, SPSS]; a reactive table that updates on the fly as data are uploaded, sorted or filtered, or as new columns are computed or renamed; reactive plots that update on the fly as data and/or variable selections change).
With the development of the capl package, users are no longer required to perform a large number of computations nor are they burdened with the monotonous task of entering data individually for each participant via the CAPL-2 website. Furthermore, we carefully crafted the package to create a "quiet" user experience, whereby "noisy" error messages are suppressed via validation. Instead of throwing noisy errors that halt code execution, the capl package returns missing or invalid values as NAs. The release of the capl package will contribute to the growing and popular topic of physical literacy, and will not only support current users of CAPL-2 but may also attract new users to this area of research.