The authors have declared that no competing interests exist.
This paper describes the development of an RStudio (now known as Posit) dashboard derived from the Integrated Postsecondary Educational Data System, the United States Census Bureau, and the Bureau of Labor Statistics and provides the user with institutional, community, and career information of IPEDS reporting higher education institutions in the United States and its territories. With this dashboard, users can select and learn about institutions, explore enrollment trends and demographics, compare outcomes, and correlate community and institutional variables. Users can also link degrees to career projections and wages. This paper explains how the dashboard was developed with examples of R programming language.
R is a statistical programming and graphics software that allows for powerful analytics and data visualizations, including the creation of interactive dashboards [
The purpose of this paper is to demonstrate how to query and munge publicly available IPEDS, United States Census Bureau (USCB) and Bureau of Labor Statistics (BLS) data to then program and develop a Shiny application. The resulting dashboard provide users with information about enrollment, graduation rates, demographic information, correlations between college and community factors, as well as employment and income projections by classification of instructional programs (CIP) code and programs by institution. This paper will teach data scientists, data analysts or other professionals how to extract from data sets and program similar applications. In so doing, the reader will learn about the datasets themselves, and they will learn about the R packages and programming techniques. This paper presents a dashboard using variables of interest to higher education professionals, but thestrategies and programming code presented here will also be useful with other public data sets in such fields as transportation, healthcare, and law enforcement.
These datasets contain thousands of variables. IPEDS has 250 variables [
First, we present an overview of the databases used for the study including where to locate these databases, and if relevant, how to access them directly through R statistical software using the
This section will provide an overview of the datasets used for this demonstration. First, it will go into thebackground and public availability of IPEDS, BLS, and USCB. These are just three examples of publicly available datasets that are accessible by anyone with an internet connection and an interest in studying them. Public datasets are available from many sources, nations, and organizations across a variety of content areas or topics.
The integrated postsecondary data system (IPEDS) has been collecting data since the 1980s [
Few peer reviewed studies have used IPEDS data for institutional analyses. Some literature discusses the use of IPEDS data to help institutions with benchmarking, or making comparisons with other institutions [
No peer-reviewed scholarly literature was found that combined IPEDs data with U.S. Census and Bureau of Labor Statistics data in publicly available dashboards. A search of non-peer-reviewed sources on Google also yielded no findings of such dashboards or search tools outside of what NCES provides individually in IPEDS.
The Bureau of Labor Statistics (BLS) was founded in 1884 under the Department of the Interior, and became independent from 1888 to 1903 when it was housed under the Department of Commerce and Labor. In 1913, it found its current home under the Department of Labor [
The Bureau of Labor Statistics (BLS) is housed under the department of labor [
A The United States Census Bureau (USB) collects geocoded data, or data coded and related to geographical locations, every decade on a vast number of demographic and economic metrics [
Little peer reviewed literature demonstrates the use of publicly available data outside of benchmarking. Crellin et al. [
This study was approved as exempt by the institution’s review board. The data in this study and dashboard are all publicly available. This section provides an overview of the R packages used and then moves to data collection and then to programming and analyses of the dashboard application.
This section provides examples of the coding strategies used to query and munge the data. This link provides the programming code for all the data querying and munging:
When linking multiple data sets, we stress the importance of making sure that the dates and concepts align logically. Misalignment of data sets, even if joinable, renders invalid results, and worse, could lead to poor decisions given the bad data rendered.
The IPEDS data system consists of multiple tables on a Microsoft Access file, each table consisting of numerous variables. Since these tables are separate, it is necessary to query individual variables from specific tables and then join those variables using the institution identifier, a unique code assigned to every IPEDS participating institution. The following code demonstrates how to request a specific dataset using the
As shown in the code, using the
IPEDSDatabase <- odbcDriverConnect("Driver = {Microsoft Access Driver (*.mdb, *.accdb)};DBQ = C:/Users/mperki17/Documents/IPEDS201920.accdb"
institutioninformation <- sqlFetch(IPEDSDatabase, "HD2019")
institutioninformation <- subset(institutioninformation, CONTROL = = 1)
institutioninformation %>%
select(UNITID, COUNTYCD, STABBR, INSTNM, IALIAS, F1SYSNAM, LONGITUD, LATITUDE, LOCALE, C18SZSET)
Once the data are queried, it may be necessary to recode variables using
institutioninformation <- mutate(institutioninformation, locale = case_when(LOCALE = = 11 ~ "City",
LOCALE = = 12 ~ "City",
LOCALE = = 13 ~ "City",
LOCALE = = 21 ~ "Suburb",
LOCALE = = 22 ~ "Suburb",
LOCALE = = 23 ~ "Suburb",
LOCALE = = 31 ~ "Town",
LOCALE = = 32 ~ "Town",
LOCALE = = 33 ~ "Town",
LOCALE = = 41 ~ "Rural",
LOCALE = = 42 ~ "Rural",
LOCALE = = 43 ~ "Rural",
LOCALE = = -3 ~ "Unknown"))
We also provide an example of how we recoded another variable called “C18SZSET”, or the variable that determines institution type. IPEDS classifies institutions as two general categories of four-year (or bachelor degree institutions) and two-year (associate degree institutions), though they also have a category for institutions that only teach graduate students. These general categories are split into 19 specific categories ranging from “Two-year, very small” to “Not applicable”. The following code illustrates how we reduced these to the categories of “two-year”, “four-year”, “Exclusively Grad.”, and “Not Applicable”. These 19 c were derived from the “valuesets18” table of the IPEDS Access file’s data dictionary by filtering for the HD2018 table on the C18SZSET variable.
institutioninformation <- mutate(institutioninformation,
Type = case_when(C18SZSET = = 1 ~ "Two_Year",
C18SZSET = = 2 ~ "Two_Year",
C18SZSET = = 3 ~ "Two_Year",
C18SZSET = = 4 ~ "Two_Year",
C18SZSET = = 5 ~ "Two_Year",
C18SZSET = = 6 ~ "Four_Year",
C18SZSET = = 7 ~ "Four_Year",
C18SZSET = = 8 ~ "Four_Year",
C18SZSET = = 9 ~ "Four_Year",
C18SZSET = = 10 ~ "Four_Year",
C18SZSET = = 11 ~ "Four_Year",
C18SZSET = = 12 ~ "Four_Year",
C18SZSET = = 13 ~ "Four_Year",
C18SZSET = = 14 ~ "Four_Year",
C18SZSET = = 15 ~ "Four_Year",
C18SZSET = = 16 ~ "Four_Year",
C18SZSET = = 17 ~ "Four_Year",
C18SZSET = = 18 ~ "Exclusively_Grad",
C18SZSET = = -2 ~ "Not_Applicable"))
It is important to consider the objectives of a dashboard when conducting research with IPEDS or other data. For example, this particular dataset may not include every category of higher education institutions. For example, some institutions that are not public, only award less than two-year certificates, or other types of credentials may not be classified in the desired way by IPEDS or other data sources. Therefore, it is imperative of the researcher to choose datasets that meet their dashboard’s objectives.
It may also be necessary to recode variable names so that they are easier to understand by the end-user of the final interface. The following example uses the “rename” function to change “INSTNM” to “Institution”, thus changing the name of the column [
ipedsdashdata <- ipedsdashdata %>% rename("Institution" = INSTNM)
Building the dataset may require pulling several tables from various columns within the IPEDS database and then joining them using the institution identifier. The
ipedsdashdata <- left_join(institutioninformation, enrollmentinformationgender, by = "UNITID")
The result is an unduplicated table with several columns of institution information that requires joining several IPEDS variables from several tables. Once completed, a single unduplicated table with all desired variables is generated for further analyses and programming of dashboard features.
Combining institution information with USCB county data requires the use of
install.packages(‘tidycensus’)
library(tidycensus)
census_api_key("key_goes_here")
After entering the key, we recommend downloading the USCB variable dictionary and writing it as a.csv file. This file serves as a data dictionary to all of the metrics that are provided. The following code shows how to write the key as a database and then save it as a.csv file. In this case, the file is saved in the same folder as the RMarkdown file and is called “censuskey.csv”. The user is able to open this in Excel and explore the different variables and how they are coded. This code shows how the variable key for the 2019 census data was generated and then written to a.csv file.
key <- load_variables(2019, "acs5", cache = TRUE)
write.csv(key, "censuskey.csv")
It is necessary to study and familiarize oneself with the USCB data. Each individual metric, or variable, comes with a code, and the USCB does not calculate proportions in the
This example pulls health insurance data by county. First, it creates a table called “HealthTot” and then queries the county level data by category. For example the total number of white people with health insurance is “C27001B_007” from the
#Total with health
HealthTot <- get_acs(geography = "county", variables = c(Whitet = "C27001A_007",
Blackt = "C27001B_007",
Nativet = "C27001C_007",
Asiant = "C27001D_007",
Pacifict = "C27001E_007",
Othert = "C27001F_007",
TwoMoret = "C27001G_007",
WhiteNott = "C27001H_007", Hispanict = "C27001I_007"))
HealthTot <- HealthTot %>% select(GEOID, NAME, variable, estimate)
HealthTot <- HealthTot %>% spread(variable, estimate)
HealthTot <- HealthTot %>% replace_na(
list(Whitet = 0,
Blackt = 0,
Nativet = 0,
Asiant = 0,
Pacifict = 0,
Othert = 0,
TwoMoret = 0,
WhiteNott = 0,
Hispanict = 0))
HealthTot <- HealthTot %>% mutate("HTot" = Whitet + Blackt + Nativet + Asiant + Pacifict + Othert + TwoMoret + WhiteNott + Hispanict)
#Join the health variables
Health <- left_join(HealthTot, HealthPro, by = "GEOID")
Health <- rename.variable(Health, "NAME.x", "County")
Health <- select(Health, GEOID, County, HPro, HTot)
Health <- Health %>% mutate("NoHealth" = HTot/HPro)
Health <- Health %>% mutate("WithHealth" = 1-NoHealth)
To obtain all the data required, several iterations like this may be necessary. Once all metrics are obtained, a final join of all the tables on county code will build an unduplicated USCB dataset that can then be duplicated with the master IPEDS dataset for a final table with all desired variables. The following code shows what this dashboard used. Each join was done individually to check each iteration. The final tables were also written into.csv files, but this is not always necessary.
#Final join for Census data
census <- left_join(employ, Health, by = "GEOID")%>%
rename.variable(census, "GEOID.x", "GEOID")%>%
left_join(income, by = "GEOID")%>%
left_join(TotalWhite, by = "GEOID")%>%
left_join(Veteran, by = "GEOID")%>%
left_join(HousePercent, by = "GEOID")%>%
left_join(Married, by = "GEOID")%>%
left_join(Education, by = "GEOID")%>%
left_join(Tribes, by = "GEOID")%>%
left_join(Parenting, by = "GEOID")%>%
left_join(Citizen, by = "GEOID")%>%
left_join(Renters, by = "GEOID")%>%
rename.variable(census, "County.x", "County")
write.csv(census, "census.csv")
#Join with IPEDS Database
ipedscensusdata <- left_join(ipedsdashdata, census, by = c("COUNTYCODE" = "GEOID"))
write.csv(ipedscensusdata, "ipedscensusdata.csv")
R has a Bureau of Labor Statistics (BLS) package called
First, the CIP to SOC crosswalk file was pulled from the same folder as the RMarkdown file. Then, four variables were renamed using the
CIPSOC <- read.csv("CIP_SOC.csv")
CIPSOC <- CIPSOC %>% rename("CIPCODE" = CIP2020Code)
CIPSOC <- CIPSOC %>% rename("Degree_Title" = CIP2020Title)
CIPSOC <- CIPSOC %>% rename("Career_Title" = SOC2018Title)
CIPSOC <- CIPSOC %>% rename("SOCCODE" = SOC2018Code)
Next, similar code was used with the wage dataset where 15 different variables were renamed. The
Wage <- read.delim("wage.txt")
Wage <- Wage %>% rename("SOCCODE" = OCC_CODE)
Wage <- Wage %>% rename("Total_Employed" = TOT_EMP)
Wage <- Wage %>% rename("Standard_Error" = EMP_PRSE)
Wage <- Wage %>% rename("Mean_Hourly" = H_MEAN)
Wage <- Wage %>% rename("Mean_Annual" = A_MEAN)
Wage <- Wage %>% rename("Meand_Standard_Error" = MEAN_PRSE)
Wage <- Wage %>% rename("Tenth_%ile_Hourly" = H_PCT10)
Wage <- Wage %>% rename("Twenty_fifth_%ile_Hourly" = H_PCT25)
Wage <- Wage %>% rename("Hourly_Median" = H_MEDIAN)
Wage <- Wage %>% rename("Seventy_Fifth_%ile_Hourly" = H_PCT75)
Wage <- Wage %>% rename("Ninetieth_%ile_Hourly" = H_PCT90)
Wage <- Wage %>% rename("Tenth_%ile_Annual" = A_PCT10)
Wage <- Wage %>% rename("Twenty_Fifth_%ile_Annual" = A_PCT25)
Wage <- Wage %>% rename("Median_Annual" = A_MEDIAN)
Wage <- Wage %>% rename("Seventy_Fifth_%ile_Annual" = A_PCT75)
Wage <- Wage %>% rename("Ninetieth_%ile_Annual" = A_PCT90)
Next, the wage data were merged with the CIP and SOC code data file and named “Wage”. Then, the CIP codes of every institution in the United States were downloaded from the IPEDS dataset and written to a table called “instdegree”. A third table was built from the “institution information” table that included identifiers, names, states, counties, locale, and type and named “instname.” Variables were renamed in that table as “Institution Name”, “Select a State,” and “Community Type”.
Wage <- left_join(CIPSOC, Wage, by = "SOCCODE")
instdegree <- sqlFetch(IPEDSDatabase, "C2018DEP")
instdegree <- instdegree %>% select(UNITID, CIPCODE, PTOTAL)
instdegree <- instdegree %>% rename("Total_Programs" = PTOTAL)
instname <- institutioninformation %>% select(UNITID, INSTNM, STABBR, COUNTYCD, locale, Type)
instname <- instname %>% rename("Institution_Name" = INSTNM)
instname <- instname %>% rename("Select_a_State" = STABBR)
instname <- instname %>% rename("Community_Type" = locale)
The “instname” and “instdegree” tables were joined with the UNITID variable to create a “degree” table, which was then joined with the wage table by CIP code. A final table called “Degrees_and_Jobs” was generated from “degree” using the
degree <- left_join(instname, instdegree, by = "UNITID")
degree <- left_join(degree, Wage, by = "CIPCODE")
Degrees_and_Jobs <- degree %>%
select(Institution_Name, Select_a_State, Degree_Title, Career_Title, CIPCODE, SOCCODE, Total_Programs, Total_Employed, Mean_Hourly, Mean_Annual, Hourly_Median, Median_Annual)
write.csv(Degrees_and_Jobs, "Degrees_and_Jobs.csv", fileEncoding = "UTF-8")
The combined IPEDS, USCB and BLS CIP/SOC data sets serve as platforms for the different elements of the dashboard. Querying and munging data, in our view, is the most laborious but crucial element of any form of analyses, whether it be dashboard creation or inferential statistics. However, once the foundational data are obtained, one must not assume that the data munging has ended. As will be seen, it is often necessary to further customize datasets to meet the needs of the dashboard. To develop this dashboard, we first developed each tab on the dashboard on an RMarkdown file. This allowed us to test our programming code’s functionality. After the elements were coded on RMarkdown, they were transferred to a Shiny application file, which required additional coding and logic before the actual dashboard was read to be published.
A Shiny dashboard is created holistically while it is also created in parts, making it a challenge to learn dashboard programming in R. When learning this code, it is important first to study the pattern of a Shiny dashboard. The first requirement is to load all necessary libraries. The next requirement is to load and label all the data. This dashboard loaded data from.csv files that were custom generated and written from RMarkdown. Each section will show how these files were generated and written, but it is important to know that these.csv files were saved in the same folder as the Shiny application file, thus making them easier to write. In addition, some datasets can be further customized as needed on the application file.
After the data are downloaded, the user interface (ui) is programmed. The ui includes the dashboard header, side bar menu items, and dashboard body content within each tab including code to link specific graphs and interfaces to specific tabs. After the interface is the server, which basically builds the graphics and interfaces for the users. This dashboard starts with all the user input controls with names linked to specific graphs. The following sections will include code for generating these input controls. Within the server are the outputs, which are the specific graphics of the dashboard.
Programming a dashboard on Shiny requires grit and patience as well as strong organization and documentation skills.
When programming a Shiny web application, the first thing to do is open a Shiny file. This is the space to program the application. This link provides the programming code for the demonstrated application:
Below these paragraphs is an excerpt of the programming code for the user interface tab. The first term in the demonstrated programming code, “ui”, stands for user interface. Here Shiny is saying that the following programming will consist of the content that the end user of the dashboard will see. This is why it is followed by “<-“. As shown, the next sections of the code list elements of the user interface in succession. The first element is the skin of the dashboard, which is purple. Then it contains header elements. A main sidebar function,
The link to the RPubs page gives the complete programming code for the welcome tab as well as the other tabs. The welcome tab is labeled as “tabName = ‘intro’” under the
ui <- dashboardPage(skin = "purple",
dashboardHeader(title = "IPEDS Dashboard [2019 Data)"),
dashboardSidebar(
sidebarMenu(
menuItem("Introduction", tabName = "intro", icon = icon("user")),
menuItem("Institution Map", tabName = "instmap", icon = I con("user")),
menuItem("Historic Enrollment", tabName = "histenroll", icon = icon("user")),
menuItem("Demographics", tabName = "demos", icon = icon("user")),
menuItem("Graduation and Retention", tabName = "ccgraduation", icon = icon("user")),
menuItem("Dynamic Scatterplot", tabName = "correlations", icon = icon("user")),
menuItem("Correlation Coefficients", tabName = "matrix", icon = icon("user")),
menuItem("Degrees and Careers", tabName = "jobs", icon = icon("user")),
menuItem("Degrees and Job Projections", tabName = "projections", icon = icon("user")),
menuItem("Data Dictionary", tabName = "dictionary", icon = icon("user")))),
dashboardBody(
tabItems(
tabItem(tabName = "intro",
img(src = "image.png", height = 180, width = 320],
h1("IPEDS Dashboard", align = "center"),
h2("About this Dashboard"),
##Program the rest of your UI
)}
Shiny offers users a variety of features and widgets that allow the end user to explore the data, one of which consists of different input controls that allow the user to filter and slice the data. For example, a user may click on an input control and select a category to change the graph to only see that category. Shiny allows for several types of input controls, but this application uses two of them. The first,
On the “server <-” section of the RPubs link, the input controls programming is done in succession starting after the correlations tab was program and starts with the below programming code. In this paper, we explain how each of these were programmed while we go through each tab. However, the reader can refer to the link to see the geography of our programming language in the application. The point is that the input controls should be programmed on the server and linked to specific names in the ui. The example shows the first 10 lines of programming code in the sever section of the Shiny application file. The input control shown allows for a person to filter by state and colleges. We cover this later in the paper as we go through each tab of the application.
server<-function(input, output, session) {
df0<-eventReactive(input$stateInput, {
GradRate %>% filter(State %in% input$stateInput)
})
output$instInput<-renderUI({
selectInput("instInput", "Next Select One or More Colleges:", sort(unique(df0()$Institution)), selected = "Casper College", multiple = TRUE)
})
In addition, this demonstrated application also makes use of tooltips, which allow the user to hover their mouse over graphs or other data features and get additional information. It also has features that allow the user to filter with legends. Finally, there are widgets that allow the user to download data, which we use on the first tab of our ui (the code for which is on the RPubs link), and other features available in their literature [
The first page that pops up when users visit the dashboard is the welcome page. It was coded on the Shiny app file’s interface using hypertext markup language (HTML) to include written text and pictures. After the
dashboardBody(
tabItems(
tabItem(tabName = "intro",
img(src = "image.png", height = 180, width = 320),
h1("IPEDS Dashboard", align = "center"),
h2("About this Dashboard"),
h3("This dashboard is still under development…"),
h2("Using the Dashboard"),
h3("The left bar…."),
h2("R Packages"),
h3("The development of this dashboard…"),
h2("Data Sets"),
h3("The buttons below…”),
h3(HTML("<p> If you would like to see the complete programminmg code, the link to the RPubs page
<a href =’
fluidRow(
box(width = 10,
downloadButton("dataset", "Download Institution Data"),
downloadButton("Degrees_and_Jobs", "Download Degree and Job Data"),
downloadButton("Career_Projections", "Download Career Projection Data"))),
h2("This dashboard will be updated…")),
The
This dashboard required a custom dataset from the master data file as shown on following code. The data were then written into a.csv file for use in the Shiny application. We place the.csv file into a “ShinyFiles” folder. This is where the Shiny application file resides. When we load the data into the application file, this file will be readily readable to the application.
mapdata <- select (ipedscensusdata, Institution, State, County, Type, Community_Type, Longitude, Lattitude, Tot_Enrolled, TwoYGradRate150, FourYGradRate150, Cost_Off_Campus)
write.csv(mapdata, "../Dashboard/ShinyFiles/mapdata.csv")
The map itself was coded as shown below under the server function. The resulting map allows the user to locate an institution of interest and quickly learn basic information about it. The following code begins with a “output$map < renderLeaflet({“. This is telling Shiny to render the leaflet package using the map content as programmed in the ui (i.e. “menuItem("Institution Map", tabName = "instmap", icon = Icon("user"))”). The rest of the code details different options and features of the map including search options, marker options, and tooltip options. The
output$map <- renderLeaflet({
m <-
leaflet(mapdata) %>%
addTiles() %>%
addSearchOSM(options = searchOptions(zoom = 10, collapsed = TRUE, hideMarkerOnCollapse = TRUE)) %>%
addCircleMarkers(group = "name", color = ~pal(mapdata$Type), fillOpacity = .8, lng = mapdata$Longitude, lat = mapdata$Lattitude, popup =
paste0("Name:", ’\n’, mapdata$Institution, ’<br/>’, "State:", ’\n’, mapdata$State, <br/>’, "Fall Enrollment:", ’\n’, comma(mapdata$Tot_Enrolled, digits = 0),
’<br/>’, "Cost off Campus:", ’\n’, paste0("$", comma(mapdata$Cost_Off_Campus, digits = 0)),
’<br/>’, "Bachelor Grad Rate:", ’\n’, mapdata$FourYGradRate150%>% paste0("%"),
’<br/>’, "Associate/Cert Grad Rate:", ’\n’, mapdata$TwoYGradRate150%>% paste0("%"))) %>%
addLegend("bottomright", pal = pal, values = ~mapdata$Type, title = "College Type", opacity = 1) %>%
setView(lng = -98, lat = 38.87216, zoom = 3) %>%
addResetMapButton()
m
The historical enrollment tab was developed using a custom.csv file that was coded by combining several years of IPEDS data. This dashboard allows the user to select a state and then one or more institutions to track their historic enrollment. Thus, this required a line plot using
selectInput("stateInput4", "First Select a State (Use ’Delete’ Key to Disselect):",
choices = sort(unique(EnrollmentDB$State)),
selected = "WY", multiple = TRUE),
uiOutput("instInput4"))
This next chunk of code shows how the input control to filter for institution was integrated with the input control to filter by state. The
df6 <- eventReactive(input$stateInput4, {%>% filter(State %in% input$stateInput4)
})
output$instInput4 <- renderUI({selectInput("instInput4", "Next Select One or More College:", sort(unique(df6()$Institution)),
selected = "Casper College", multiple = TRUE)})
df7 <- eventReactive(input$instInput4, {
df6() %>% filter(Institution %in% input$instInput4)})
enroll <-
ggplot(df7(), aes(x = factor(Year), y = Enrollment, group = Institution, color = Institution, text = paste("Institution:", Institution, "<br />State:",
State, "<br />Year:", Year, "<br />Enrollment Total:", Enrollment)))+
geom_line(stat = "summary", fun = "mean")+
geom_point(stat = "summary", fun = "mean")+
ggtitle("Fall Enrollment")+
xlab("")+
ylab("Enrollment")+
geom_text(aes(label = Enrollment), position = position_nudge(y = 10), size = 3, stat = "summary")+
scale_color_brewer(palette = "Dark2")+
theme(axis.text.x = element_text(angle = 45))
ggplotly(enroll, tooltip = ’text’)
This tab includes a side bar chart that includes demographic information for each selected institution. The input controls are similar to those in the historical enrollment tab; they are simply given different names. Thus, it is important to know which names match which plots as they use the same dataset. This plot also uses a.csv file that is loaded in Shiny. The following provides the programming code. The input control code is similar to the historic enrollment input control code. For this plot, the data name used is “df9”.
demo <- ggplot(df9(), aes(x = Demographic, y = Percent, group = Institution, fill = Institution,
text = paste("Institution:", Institution, "<br />State:", State, "<br />Demographic:", Demographic,
"<br />Percent:", Percent %>% paste0("%"))))+
geom_bar(stat = "summary", fun = "mean")+
ggtitle("Demographics")+
xlab("")+
ylab("Percent")+
facet_grid(vars(Institution))+
geom_text(aes(label = paste0(Percent, "%")), position = position_nudge(y = 3.5), size = 3, stat = "summary")+
scale_y_continuous(labels = function(x) paste0(x, "%"))+
scale_fill_brewer(palette = "Dark2")+
coord_flip()+
theme(axis.text.x = element_text(angle = 45))
ggplotly(demo, tooltip = ’text’)
This tab is essentially the same as the demographic tab. It uses a similar interface, similar input controls and programming code, and pulls from a.csv file. It provides graduation and retention information for selected institutions. The code is provided here.
grad <- ggplot(df1(), aes(x = RateLevel, y = Rate, group = Institution, fill = Institution,
text = paste("Institution:", Institution, "<br />State:", State, "<br />Rate Level:", RateLevel, "<br />Graduation
Rate:", Rate %>% paste0("%"))))+
geom_bar(stat = "summary", fun = "mean")+
ggtitle("Graduation Rates")+
xlab("")+
ylab("Rate")+
facet_grid(vars(Institution))+
geom_text(aes(label = paste0(Rate, "%")), position = position_nudge(y = 1), size = 3, stat = "summary")+
scale_y_continuous(labels = function(x) paste0(x, "%"))+
scale_fill_brewer(palette = "Dark2")+
coord_flip()+
theme(axis.text.x = element_text(angle = 45))
ggplotly(grad, tooltip = ’text’)
The dynamic scatterplot provides the correlation between institutional and county factors. Thus, it uses a custom dataset built from the master IPEDS data file and creates a file called “corrdata” and writes it as a.csv file as shown. The code starts by loading a package called
library(shinyWidgets)
Community <-
ipedscensusdata %>%
select(Institution, State, Community_Type, Type, County, FT_Retention, PT_Retention, TwoYGradRate100, TwoYGradRate150, TwoYGradRate200, FourYGradRate100, FourYGradRate150, FourYGradRate200, Cost_Off_Campus, Cost_on_Campus, Percent_Women, Percent_FT, Percent_White, Median_Household_Income, County_Percent_Veteran, County_Percent_in_Same_House, County_Percent_Never_Married, County_Percent_Married, County_Percent_Divorced, County_Percent_Separated, County_Percent_Widowed, County_Percent_Single, County_Percent_Less_than_HS, County_Percent_HS, County_Percent_Some_or_AS, County_Percent_Bach, County_Percent_Grad_or_Pro, County_Percent_Single_Parent, County_Percent_Not_Citizen, County_Percent_Imigrant, County_Percent_Rent, County_Percent_Unemployed, County_Percent_White)
Community$County <- iconv(Community$County, from = ’UTF-8’, to = ’ASCII//TRANSLIT’)
write.csv(Community, "../Dashboard/ShinyFiles/Community.csv", row.names = FALSE)
Drawing from the “Community” file, we write a “corrdata” file to be used in the Shiny application. This may be an unnecessary step as both processes could be combined. We prefer to do some processes iteratively in case steps need to be retraced.
corrdata <-
Community %>%
select(Institution, State, Community_Type, Type, County, FT_Retention, PT_Retention, TwoYGradRate100, TwoYGradRate150, TwoYGradRate200, FourYGradRate100, FourYGradRate150, FourYGradRate200, Cost_Off_Campus, Cost_on_Campus, Percent_Women, Percent_FT, Percent_White, Median_Household_Income, County_Percent_Veteran, County_Percent_in_Same_House, County_Percent_Never_Married, County_Percent_Married, County_Percent_Divorced, County_Percent_Separated, County_Percent_Widowed, County_Percent_Single, County_Percent_Less_than_HS, County_Percent_HS, County_Percent_Some_or_AS, County_Percent_Bach, County_Percent_Grad_or_Pro, County_Percent_Single_Parent, County_Percent_Not_Citizen, County_Percent_Imigrant, County_Percent_Rent, County_Percent_Unemployed, County_Percent_White)
write.csv(corrdata, "../Dashboard/ShinyFiles/corrdata.csv")
These data are then used to generate a scatterplot where the user can select the variables for the X and Y axes, and where each institution is colored as a dot by locale. An additional filter is added to select specific states. The variable selection code is given in the
varSelectInput(
inputId = "xvar",
label = "Select an X variable",
data = Community,
selected = "County_Percent_Unemployed"),
varSelectInput(
inputId = "yvar",
label = "Select a Y variable",
data = Community,
selected = "Median_Household_Income"),
pickerInput("stateInput6", "Select a State:",
choices = sort(unique(Community$State)),
options = list(’actions-box’ = TRUE), multiple = TRUE,
selected = Community$State))
The first thing to do is to program and label the state input control that will be linked in the plot. The following provides that code, which is named “ab”.
ab <- reactive({
Community %>%
filter(State %in% input$stateInput6)
An additional feature is added below the scatterplot that gives the regression results of each variable combination given every intuition in the dataset (so it does not change when state if filtered). This feature includes slope, intercept,
model <- eventReactive(c(input$xvar, input$yvar), {
req(c(input$xvar, input$yvar))
lm(as.formula(paste(input$yvar, collapse = "+", " ~ ", paste(input$xvar, collapse = "+"))), data = ab())
Finally,
com <-
ggplot(ab(), aes_string(x = input$xvar, y = input$yvar))+
geom_point(aes(color = Community_Type, label3 = State, label4 = County, label5 = Institution))+
geom_smooth(method = "lm")+
scale_color_discrete(name = " ")+
theme(axis.text.x = element_text(angle = 45))
ggplotly(com)
The next tab of the dashboard allows the user to examine a matrix of correlation coefficients between all the variables. The follow code shows the variables that were loaded into the data set and it also shows how a Pearson’s
corrdata <- read.csv("../Dashboard/ShinyFiles/corrdata.csv")
corrdata <- select(corrdata, Institution, State, Community_Type, Type, County, FT_Retention, PT_Retention, TwoYGradRate100, TwoYGradRate150, TwoYGradRate200, FourYGradRate100, FourYGradRate150, FourYGradRate200, Cost_Off_Campus, Cost_on_Campus, Percent_Women, Percent_FT, Percent_White, Median_Household_Income, County_Percent_Veteran, County_Percent_in_Same_House, County_Percent_Never_Married, County_Percent_Married, County_Percent_Divorced, County_Percent_Separated, County_Percent_Widowed, County_Percent_Single, County_Percent_Less_than_HS, County_Percent_HS, County_Percent_Some_or_AS, County_Percent_Bach, County_Percent_Grad_or_Pro, County_Percent_Single_Parent, County_Percent_Not_Citizen, County_Percent_Imigrant, County_Percent_Rent, County_Percent_Unemployed, County_Percent_White)
corrmatrix <-
round(cor(corrdata[sapply(corrdata, is.numeric)], use = ’pairwise’), 2)
write.csv(corrmatrix, "../Dashboard/ShinyFiles/corrmatrix2.csv")
This next section of code provides a method of shading the cells in the correlation matrix depending on the magnitude of the correlation using the
brks <- seq(-1, 1, .01)
clrs <- colorRampPalette(c("white", "#6baed6"))(length(brks) + 1)
dataCol_df <- ncol(corrmatrix) - 1
dataColRng <- 1:dataCol_df
After setting the color preferences given correlation coefficient, a table was programmed where the user was able to select the X and Y variables and custom build the table where the coefficients would be shaded according to the strength of the correlation using the DT library [
server <- function(input, output, session){
varfilter <- reactive({
filtered <- corrmatrix %>%
filter(variable %in% input$varInput)
})
output$corrtable <- DT::renderDataTable(datatable({
if (length(input$columnInput) = = 0) return(varfilter())
varfilter() %>%
dplyr::select(!!!input$columnInput)
}, rownames = TRUE, extensions = "FixedColumns",
options = list(paging = TRUE, searching = FALSE, info = FALSE,
sort = TRUE, scrollX = TRUE, fixedColumns = list(leftColumns = 2))) %>%
formatStyle(columns = dataColRng, backgroundColor = styleInterval(brks, clrs)))
}
The degrees and careers tab includes a table with picker input controls that allow the user to select multiple states, institutions, and degrees. The input controls are linked so they limit each other’s selection choices when one or more input control is selected. The data for this table is derived from the CIP/SOC data table that was generated in RMarkdown and exported to the folder with the Shiny application file. The following code shows how the picker input controls were generated in the ui.
pickerInput("stateInputd", "Select or type one or more states",
choices = sort(unique(Degrees_and_Jobs$Select_a_State)),
options = list(’actions-box’ = TRUE),
multiple = TRUE,
selected = "AK"),
pickerInput("instInputd", "Select or type one or more institutions:",
choices = sort(unique(Degrees_and_Jobs$Institution_Name)),
options = list(’actions-box’ = TRUE),
multiple = TRUE),
pickerInput("degreeInput", "Select or type one or more degrees:",
choices = sort(unique(Degrees_and_Jobs$Degree_Title)),
options = list(’actions-box’ = TRUE),
multiple = TRUE)
The following program language shows how the input controls link in the server section of the Shiny application file. The logic of code is assigned the name of “state-deg”, which is used to render the data table, linking the picker input controls and their desired behaviors to the table. The table is rendered as output using the
state_deg <- reactive({
filter(Degrees_and_Jobs, Select_a_State %in%input$stateInputd)
})
observeEvent(state_deg(), {
choices <- sort(unique(state_deg()$Institution_Name))
updatePickerInput(session = session, inputId = "instInputd", choices = choices, selected = Degrees_and_Jobs$Institution_Name)
})
institution_deg <- reactive({
req(input$instInputd)
filter(state_deg(), Institution_Name %in% input$instInputd)
})
observeEvent(institution_deg(), {
choices <- sort(unique(institution_deg()$Degree_Title))
updatePickerInput(session = session, inputId = "degreeInput", choices = choices, selected = Degrees_and_Jobs$Degree_Title)
})
output$degrees <- DT::renderDataTable(options = list(autoWidth = TRUE, scrollX = TRUE, searching = FALSE), {
req(input$degreeInput)
institution_deg() %>%
filter(Degree_Title %in% input$degreeInput) %>%
select(Institution_Name, Degree_Title, Career_Title, Mean_Hourly, Mean_Annual, Hourly_Median, Median_Annual)
})
}
This table and its linked input controls was programmed using similar code as the degrees and careers table on the previous tab menu [
Microsoft Excel was used to manually enter all the variables used in the dashboard including the variable’s name as it is used, the data source of that variable, and a description of the variable. This dictionary was developed to help the users better understand the elements of the dashboard. The Microsoft Excel file was saved as a.csv in the folder with the Shiny application file, loaded into the application and included as the last tab of the dashboard with the following simple code [
output$dictionarytable <- DT::renderDataTable(datadictionary, options = list(scrollX = TRUE))
This dashboard was generated on free and open-source software provided by RStudio (now known as Posit) using RMarkdown to munge the data and Shiny to deploy it. To deploy the dashboard, we first recommend running the Shiny application by hitting the “Run” button. Shiny will process the code and give specific errors that can be looked up and located by line number. Once you trouble shoot and the application runs, go through it tab by tab and make sure it’s functional. After it is satisfactory, press “publish”. If you don’t have a Shiny account yet, it will ask you to set one up. Shiny offers good directions on how to link your account to your application and also how to deploy it here:
Shiny offers a free account with a limit of five applications and 25 active hours per month. If a Shiny application becomes more popular with more user hits, there are options to upgrade to more applications (i.e. dashboards) and more hours [
Maintaining the dashboard and keeping the data current requires annual activities. First, one must understand the data collection and refresh schedule of each of the data sources. IPEDS updates its data early every summer [
To update USCB data, simply change the programming language to the desired date. For example, instead of 2020, change it to 2021. This is achieved by meticulously going through the code and changing each dated variable or table, sometimes with a search and replace. To update IPEDS, first obtain the latest Access file and then update the years of your tables and variables. For example, the “HD2019” table will change to “HD2020”. BLS data can be updated by obtaining the latest data from their website. If you name the new data file the same as the previous one, it should load and run, but be sure to first check the variable names and other details of the file. Once you feel your data are refreshed, run the application and go through your updated dashboard and randomly check numbers and functions for errors before deploying it. It is also a good idea to have somebody else look at it with fresh eyes, preferably somebody with content expertise of your dashboard.
The information available in federal datasets is invaluable in providing stakeholders and analysts information. Despite the availability of national data through systems such as the USCB, BLS, and IPEDS, people may find it difficult to navigate the vast choices and spreadsheets to make these data useful for research and decision making. User friendly data dashboards that join multiple public data sets can be used by researchers, data scientists, analysts, administrators, and other professionals to inform decision making and policy and engage in continuous improvement in their respective fields Though some research has combined the USCB, BLS, and IPEDS datasets [
(DOCX)
PONE-D-21-37932Using RShiny to Develop a Dashboard using IPEDS, U.S. Census, and Bureau of Labor Statistics DataPLOS ONE
Dear Dr. Perkins,
Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.
Please submit your revised manuscript within two months. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at
Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see:
We look forward to receiving your revised manuscript.
Kind regards,
Barbara Szomolay
Academic Editor
PLOS ONE
Journal Requirements:
When submitting your revision, we need you to address these additional requirements.
1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at
2. Please note that PLOS ONE has specific guidelines on software sharing (
3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.
4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide
[Note: HTML markup is below. Please do not edit.]
Reviewers' comments:
Reviewer's Responses to Questions
1. Is the manuscript technically sound, and do the data support the conclusions?
The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.
Reviewer #1: No
Reviewer #2: Yes
**********
2. Has the statistical analysis been performed appropriately and rigorously?
Reviewer #1: Yes
Reviewer #2: N/A
**********
3. Have the authors made all data underlying the findings in their manuscript fully available?
The
Reviewer #1: Yes
Reviewer #2: Yes
**********
4. Is the manuscript presented in an intelligible fashion and written in standard English?
PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.
Reviewer #1: Yes
Reviewer #2: Yes
**********
5. Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #1: I thank the authors for the opportunity to read their paper. Overall, I appreciate their contribution and all the effort placed in the development of this dashboard. I agree with their assessment, so far, I am not aware of any interface dashboards using IPEDS, USCB, and BLS. In the following lines I offer some reactions but overall, I agree with their coding schemes.
My main feedback would be to clarify the purpose and scope of the study. Their dashboard is functional, but it is not clear why would this paper need to be published at PONE? Are the authors interested in teaching researchers to develop similar applications? Are they interested in easing access to these data sources? If the latter is true, I recommend adding a functionality to download the data merged/compiled from their platform. This would be useful for the incorporation of these indicators from a multiplicity of data sources. I can definitely see master’s students taking advantage of this resource.
Other questions that emerge are, why are their merging approaches limited to two institutional types (public 2- and 4-year colleges)?
I am also wondering if the authors have considered relying on the college scorecard (
In sum, I believe that the ability to download these data may be an important addition to their dashboard.
Reviewer #2: I found this application very interesting and I applaud the effort to publish the methods behind developing this {shiny} application. These types of dashboards are vital for the research community and often do not receive the academic credit they deserve. Below are some of my comments and concerns.
1. I'm not sure the purpose of the paper is entirely clear: is it to describe how someone might create a dashboard like this? Or is it to describe the contents of this particular dashboard? At the moment it seems like a bit of a blend, but I found that hard to follow since the audience for the two uses would likely be quite different. If possible, I would try to narrow the focus to one or the other. If trying to describe *how* someone could build a dashboard like this, there needs to be more focus on how to develop a shiny dashboard. For example, the basic concepts like the `ui` and `server` were not well explained in the text. Similarly the `input$` `output$` system in {shiny} was not explained. Additionally, if this is the focus, less is needed on the background about the particular datasets chosen to integrate here (since presumably the next user would not be building this exact dashboard), and more attention could be paid to the general concepts of downloading, munging, and merging data together (with these datasets as a particular example, but not the main focus). If, on the other hand, the purpose of this paper is to *describe* the contents of this particular dashboard, then the focus on the code is not as necessary.
2. It is not entirely clear to me what the purpose of this application is. Is it just to allow users to explore this data? If it is for analysis (but presumably for people who are less keen on pulling all of the data themselves) it seems like it would need to have the ability to subset the large datasets for download. Currently, it seems all the user can do is calculate the correlation between two variables or compare a few variables between institutions.
3. The R code was hard to read and a bit inconsistent style-wise. I recommend using a linter to keep the code consistent (for example:
4. I'm not sure "RShiny" is how RStudio would refer to this product (I think just Shiny for the product {shiny} for the package?).
5. Since the purpose of the paper is not entirely clear, I'm not sure if this review includes a review of the dashboard itself, but if so, here are a few comments:
* On the "Institution map" page it says to "use the search box", however I do not see a search box on this page (other than the magnifying glass on the map itself?)
* On the "Historic Enrollment" page, the x-axis could be cleaned up to just say 2015, 2016, 2017, 2018 (instead of `2015_Enrollment` etc.). Additionally, when more than one school is added, the numbers are completely obsured by the points on the graph. Either the points could be removed (and replaced with just the numbers), the numbers could be moved up a bit, or the user could just hover to see the numbers.
* On the "Demographics" page, the y-axis could be cleaned up (to remove the underscore and also the word "Percent" since it is redundant with the x-axis). The ordering of the bars could be improved (maybe ordered by frequency based on the top selection?) to make it easier to read for a viewer. It is also hard to compare categories between institutions - maybe position_dodge rather than a facetted chart would make sense? Otherwise, perhaps flipping the direction so the categories are aligned across institutions to make it easier to compare between them. (The same for the Graduation and Retention tab)
* In the "Dynamic Scatterplot" page, it is not clear why the points are colored by location type? It also would be nice to have a direct link on this page to the Data Dictionary since that is necessary to know what the X and Y variables are. Additionally, it would be nice to output the results in a Tabel rather than the `lm` output directly, if possible, since the intended audience is likely not familiar with R (it seems).
* In the "Correlation Coefficients" page "variable" is forced to stay in the selection. It is trivial to force this in the dataset without relying on the user to not delete it (create a string of the names to pass to the selectizeInput that removed "variable" from the choice and similarly when picking the variables on the server side add "variable" back in). It would also be nice to have a direct link to the Data Dictionary from this page.
**********
6. PLOS authors have the option to publish the peer review history of their article (
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Reviewer #1: No
Reviewer #2: No
[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]
While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,
April 29, 2022
PLoS One Editorial Board
1265 Battery Street
Suite 200
San Francisco, CA 94111
To the Editors and Peer-Reviewers:
This letter is regarding manuscript PONE-D-21-37932R1. First, we would like to thank the editors and reviewers for their time and focus on our paper titled, “Using RStudio to Develop a Dashboard using IPEDS, U.S. Census, and Bureau of Labor Statistics Data.” We would like to emphasize that this letter does not dispute any of the feedback we received on the paper or on the dynamic dashboard application. In fact, this letter acknowledges all suggestions and edits and addresses how each were approached in the attached revised manuscript. Further, we would like to note that the feedback was greatly appreciated as we are always looking for ways to improve this work and have found submitting to PLoS One rewarding because of the feedback we have received. Therefore, we give our sincerest thanks to the reviewers and editors for the time and thought they put into examining our work.
Second, we would like to state that we have created an RPubs page on which to share our programming and code. The submitted version of the manuscript and the updated RStudio application both contain links to this page where the reader will be able to view all of the programming code that was used to generate the dashboard. We mention this because this will not require any accession numbers or DOIs. All available data and programming language will be accessible through the application, or through the link to RPubs which is here:
In addition, we have included download buttons on the application itself where the user can download all datasets from the application. The combination of the programming code and the data download buttons allows for open-source data access.
We found all of your feedback valuable and used it to improve our paper and the dashboard itself. The following provides point-by-point overview of our response to your feedback.
Formatting feedback from editor:
Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at
Response:
We reformatted the manuscript to use Vancouver Brackets style of formatting and followed the PLOS guidelines.
Software feedback from editor:
Please note that PLOS ONE has specific guidelines on software sharing (
Response:
This paper presents a web application that was programmed using R open-source software. We reviewed this policy and find it to be open source and available through a publicly accessible web address.
Software feedback from editor:
We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.
Response:
We have included a link to the data in the application and a link to the programming code to get the data on an RPubs web page. Both the application and the web page are publicly accessible. We also address this in a draft of our new cover letter and in the manuscript.
Reviewer 1 Comment 1:
My main feedback would be to clarify the purpose and scope of the study. Their dashboard is functional, but it is not clear why would this paper need to be published at PONE? Are the authors interested in teaching researchers to develop similar applications? Are they interested in easing access to these data sources? If the latter is true, I recommend adding a functionality to download the data merged/compiled from their platform. This would be useful for the incorporation of these indicators from a multiplicity of data sources. I can definitely see master’s students taking advantage of this resource.
Response:
We identified with this feedback as we grappled with that question ourselves, and after much discussion, have decided to emphasize the main purpose of the paper as a demonstration of how to develop the application including data querying, munging, and programming. You will see this change of focus in the revised manuscript. We also emphasize that the programming and approach may be applicable to other datasets.
Reviewer 1 Comment 2:
Why are their merging approaches limited to two institutional types (public 2- and 4-year colleges)?
Response:
We explained how and why two categories were used and that other programmers could code more by studying the accompanying website with our code on it. We coded two categories because IPEDs generally classifies institutions with several levels and by reducing the code to two categories we are able to demonstrate recoding programming language. Again, the reader could use this to code any number of categories desired. We point this out in the paper as well.
Reviewer 1 Comment 3:
I am also wondering if the authors have considered relying on the college scorecard (
Response:
This is an excellent resource and many of the data elements available (e.g. ACT/SAT scores) on the scorecard are actually (and coincidentally) the source for a project in the works to develop an additional dashboard to examine those data. We considered these data when first developing this project and agree that the college scorecard is an excellent resource that does use some of the IPEDs data elements that we used in our analysis. The most recent data dictionary of the scorecard notes that many of the Treasury elements have been discontinued. We found it easier to use the tidycensus package to get county level elements regarding income and other demographics. In addition, we note your comment about standardization and did not find that necessary for this project as we just needed descriptive values for our outputs. However, our future project (as noted above) will need this and thus requires us to consider these data elements. We are specifically interested in the rscorecard package and API and that project is under way. This will lead to a later paper and project.
Reviewer 2 Comment 1:
I'm not sure the purpose of the paper is entirely clear: is it to describe how someone might create a dashboard like this? Or is it to describe the contents of this particular dashboard? At the moment it seems like a bit of a blend, but I found that hard to follow since the audience for the two uses would likely be quite different. If possible, I would try to narrow the focus to one or the other. If trying to describe *how* someone could build a dashboard like this, there needs to be more focus on how to develop a shiny dashboard. For example, the basic concepts like the `ui` and `server` were not well explained in the text. Similarly the `input$` `output$` system in {shiny} was not explained. Additionally, if this is the focus, less is needed on the background about the particular datasets chosen to integrate here (since presumably the next user would not be building this exact dashboard), and more attention could be paid to the general concepts of downloading, munging, and merging data together (with these datasets as a particular example, but not the main focus). If, on the other hand, the purpose of this paper is to *describe* the contents of this particular dashboard, then the focus on the code is not as necessary.
Response:
The first reviewer’s concern aligns with that of the second reviewer. Therefore, re-clarified the purpose of this dashboard in the revised manuscript. We do provide an overview of the datasets for context, but we emphasize that the user could apply this to other datasets.
We also went through the paper and made significant additions in the methods section to describe many of the functions and the logic of shiny. We also added the entirety of our programming code as a link.
Reviewer 2 Comment 2:
It is not entirely clear to me what the purpose of this application is. Is it just to allow users to explore this data? If it is for analysis (but presumably for people who are less keen on pulling all of the data themselves) it seems like it would need to have the ability to subset the large datasets for download. Currently, it seems all the user can do is calculate the correlation between two variables or compare a few variables between institutions.
Response:
We have decided to focus the emphasis on how to program dashboards using publicly available data sets. We retain information about the datasets used but articulate that our methods are applicable to other datasets. We also included the ability to pull the data and examine the code if the user desires to use it for their own purposes. We also demonstrate how to include a download handler in the programming of the application.
Reviewer 2 Comment 3:
The R code was hard to read and a bit inconsistent style-wise. I recommend using a linter to keep the code consistent (for example:
Response:
We ran both the data munging file and the application through lintr and addressed several formatting details given the specific suggestions of Hadley Wickham. This helped us clean up our code on such details as visible binding for global variables, spacing issues (particularly around infix operators), trailing blank spaces, and trailing white spaces. We incorporated these changes to the paper, but more importantly, wrote them on the RPubs page using knitr on RMarkdown. We did not address 80 character spaces in our HTML code, and we did not make all variable and function names snake case, the latter mainly for functionality purposes. In addition, we utilized page-break to keep the chunks of code together so they don’t split pages. We did not use track changes when adjusting the code in the revised manuscript as we felt it was too confusing.
Reviewer 2 Comment 4:
I'm not sure "RShiny" is how RStudio would refer to this product (I think just Shiny for the product {shiny} for the package?).
Response:
We have adjusted this throughout the manuscript.
Reviewer 2 Comment 6:
Since the purpose of the paper is not entirely clear, I'm not sure if this review includes a review of the dashboard itself, but if so, here are a few comments:
Response:
We addressed all of these comments in the following list. These were very helpful in improving our application.
• On the "Institution map" page it says to "use the search box", however I do not see a search box on this page (other than the magnifying glass on the map itself?)
o This is a great observation and our original directions could well have been confusing to users. We have therefore changed the text to say “magnifying glass” instead of “search box”.
• On the "Historic Enrollment" page, the x-axis could be cleaned up to just say 2015, 2016, 2017, 2018 (instead of `2015_Enrollment` etc.). Additionally, when more than one school is added, the numbers are completely obscured by the points on the graph. Either the points could be removed (and replaced with just the numbers), the numbers could be moved up a bit, or the user could just hover to see the numbers.
o The dates have been addressed by recoding the value labels in the dataset. We agree that this improves the look and feel of the application.
o The obscured numbers was an excellent observation. We eliminated this problem as suggested by eliminating them and leaving it to the tool tip and examination of the axis.
• On the "Demographics" page, the y-axis could be cleaned up (to remove the underscore and also the word "Percent" since it is redundant with the x-axis). The ordering of the bars could be improved (maybe ordered by frequency based on the top selection?) to make it easier to read for a viewer. It is also hard to compare categories between institutions - maybe position_dodge rather than a facetted chart would make sense? Otherwise, perhaps flipping the direction so the categories are aligned across institutions to make it easier to compare between them. (The same for the Graduation and Retention tab)
o The y-axis labels have been cleaned as suggested and it is much improved.
o We used “position = ‘dodge2’” and replaced that with the facet_grid. We also got rid of the numbers like the enrollment report and kept them in the tooltip. This feedback significantly improved the graph.
o We tried to order by frequency by using the reorder() function and that worked well with one institution, but we would get an error when the maximum category was not in agreement between institutions. For example, if the highest was percent women for institution A but the highest was white for institution B, then it would generate an error.
• In the "Dynamic Scatterplot" page, it is not clear why the points are colored by location type? It also would be nice to have a direct link on this page to the Data Dictionary since that is necessary to know what the X and Y variables are. Additionally, it would be nice to output the results in a Table rather than the `lm` output directly, if possible, since the intended audience is likely not familiar with R (it seems).
o We created a download button to get the data dictionary on this page.
o We were unable to put the output results in a table. However, we re-coded the text and simplified the results so that the r-squared values were much easier to read and the intercepts and coefficients were cleaner without all the programming code in the way to distract.
o However, with the adjustments with position dodge, this is improved.
• In the "Correlation Coefficients" page "variable" is forced to stay in the selection. It is trivial to force this in the dataset without relying on the user to not delete it (create a string of the names to pass to the selectizeInput that removed "variable" from the choice and similarly when picking the variables on the server side add "variable" back in). It would also be nice to have a direct link to the Data Dictionary from this page.
o We implemented code to eliminate this from the fixed variable and forced an unfilterable string variable as row labels using this coding logic: rownames(corrmatrix) <- corrmatrix$variable
o This was an element that bothered us, so we are glad that it was pointed out.
o We included a button to download the data dictionary as well as a link.
o We also included this coding logic in the manuscript and on the RPubs site.
Thank you once again for your feedback.
Sincerely,
Authors
Submitted filename:
PONE-D-21-37932R1Using RStudio to Develop a Dashboard using IPEDS, U.S. Census, and Bureau of Labor Statistics DataPLOS ONE
Dear Dr. Perkins,
Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.
Please submit your revised manuscript by Nov 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at
Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see:
We look forward to receiving your revised manuscript.
Kind regards,
Sathish A.P. Kumar
Academic Editor
PLOS ONE
Journal Requirements:
Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.
[Note: HTML markup is below. Please do not edit.]
Reviewers' comments:
Reviewer's Responses to Questions
1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.
Reviewer #2: (No Response)
Reviewer #3: (No Response)
**********
2. Is the manuscript technically sound, and do the data support the conclusions?
The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.
Reviewer #2: Yes
Reviewer #3: Yes
**********
3. Has the statistical analysis been performed appropriately and rigorously?
Reviewer #2: N/A
Reviewer #3: Yes
**********
4. Have the authors made all data underlying the findings in their manuscript fully available?
The
Reviewer #2: Yes
Reviewer #3: Yes
**********
5. Is the manuscript presented in an intelligible fashion and written in standard English?
PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.
Reviewer #2: Yes
Reviewer #3: Yes
**********
6. Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #2: # Review
I appreciate the updates, I have a few additional comments, please see below.
1. In the abstract, it looks like "R Shiny" was replaced with "RStudio" -- this may have been due to my previous comment be unclear -- I think this should say "R shiny dashboard", not "RStudio dashboard"
2. There are inconsistencies in how Shiny is referred to (for example RShiny, R shiny, RStudio, shiny etc) I would recommend replacing these all with "Shiny" when referring to the application and "shiny" when referring to the package.
3. The detailed background on the datasets does not seem necessary, particularly the historical context. The information on what the datasets provide, however, seems relevant.
4. I found the following explanation of a pipe confusing:
"You’ll also notice the use of “%>%” in the code. That is a pipe that basically tells R to use the dataset that was previously stated, or to continue processing given what was done before."
Perhaps something like:
"The %>%, known as the "pipe" is a function from the magrittr package. It takes the object on the left hand side and "pipes" it into the first argument of the subsequent function. For example institutioninformation %>% select(UNITID) is equivalent to select(institutioninformation, UNITID). A benefit of the pipe is it can allow the code to be more readable than a series of nested functions."
5. Perhaps explans what the tidyverse is, i.e. "Once the data are queried, it may be necessary to recode variables using tidyverse, a suite of R packages used to manipulate data frames".
6. There are several times the authors refer to assigning an object in R as "names the table" -- technically this should be something like "a table is created called x" since the process is actually creating the table and naming the object something rather than just assigning a name.
7. The code on page 17 could be reduced to a series of left joins connected by the pipe (rather than creating something named "census" over and over again, ie:
census <- left_join(employ, Health, by = "GEOID") %>%
rename(GEOID = GEOID.x) %>%
left_join(income, by = "GEOID") %>%
left_join(TotalWhite, by = "GEOID") %>%
left_join(Veteran, by = "GEOID")
The same is true for all of the code on pages 18 and 19. It seems unnessesary to use the pipe if you aren't going to chain the functions together (I like the pipe, but if you are using it I would recommend coding like above).
8. In the getting started with shiny section, instead of saying you need to open a "shiny" file (this is not a file type), I would say you need to create a .R file.
Small typos:
1. When referring to a function in text, it generally should have the open and closed brackets (or no brackets at all) (i.e obcDriverConnect( should be obcDriverConnect() or obcDriverConnect)
2. Page 15 look says "install. Packages" should be "install.packages"
3. Page 22 "page gives the complete programming language" should read "page gives the complete programming code"
4. Page 25 HTLM -> HTML
Reviewer #3: The authors are to be commended on a producing a polished, well-thought out shiny application using R.
The manuscript provides a discussion of how the authors coded a dashboard and what coding choices facilitated this.
I have a few questions that if answered in the text would strengthen this manuscript as a resource for individuals who would replicate the development of these dashboards:
1. What process did the authors use to select these variables and not others to appear in the dashboards? There are many datapoints left out.
2. A main claim in the paper is that there isn’t a dashboard cited in the literature that combines US federal census, labor and education. Are there any studies that combine this data themselves without a dashboard? This would strengthen the argument that this dashboard was needed and provide a basis to evaluate the effectiveness of this dashboard in future years.
3. Some discussion is needed about the logistics of hosting and maintaining this dashboard. The authors seem to be using a free account on shinyapps.io. What are the limitations as the dashboard is used more frequently for this approach? Did the authors consider other deployment paths? (i.e. was cost the only factor, or were other deployment options evaluated).
4. What plans are there to maintain the data set and what is the workload? i.e. what effort is needed to add 2020 data and beyond? How long will that take given the initial code is in place?
The response to a prior reviewer comment about 2-year or 4-year institutions I think misses the mark. This is a somewhat misleading variable in IPEDS that actually can lead to wrong conclusions if a researcher is unaware what it represents. IPEDs codes institutions as 4-year institutions if they have any bachelor’s degree programs. 2-year institutions (in the IPEDS data set) are those institutions with only associates degrees. Most times this is not actually what a researcher actually wants and excludes nearly all community & technical colleges in some states that award primarily associates degrees and less than two year certificates, but have a few bachelor’s degree programs.
**********
7. PLOS authors have the option to publish the peer review history of their article (
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Reviewer #2: No
Reviewer #3: No
**********
[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]
While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,
October 23, 2022
PLOS One Editorial Board
1265 Battery Street
Suite 200
San Francisco, CA 94111
To the Editors and Peer-Reviewers:
We would like to thank you all for the time and attention you have given to our project and paper. We have reviewed your recommendations and feedback carefully and feel we now have a much better paper and dashboard. You will find our responses to your feedback in this document.
Responses to Feedback
Journal Requirements:
Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.
Response:
We have reviewed all our references and found nothing retracted. We only had to change the order. This is tracked.
Reviewer 2
1) In the abstract, it looks like "R Shiny" was replaced with "RStudio" -- this may have been due to my previous comment be unclear -- I think this should say "R shiny dashboard", not "RStudio dashboard"
Response:
We changed it to say “Using R Shiny to Develop a Dashboard . . .”
2) There are inconsistencies in how Shiny is referred to (for example RShiny, R shiny, RStudio, shiny etc) I would recommend replacing these all with "Shiny" when referring to the application and "shiny" when referring to the package.
Response:
We went through it and changed all of this to Shiny.
3) The detailed background on the datasets does not seem necessary, particularly the historical context. The information on what the datasets provide, however, seems relevant.
Response:
We removed historical content from the document.
4) I found the following explanation of a pipe confusing:
"You’ll also notice the use of “%>%” in the code. That is a pipe that basically tells R to use the dataset that was previously stated, or to continue processing given what was done before."
Perhaps something like:
"The %>%, known as the "pipe" is a function from the magrittr package. It takes the object on the left hand side and "pipes" it into the first argument of the subsequent function. For example institutioninformation %>% select(UNITID) is equivalent to select(institutioninformation, UNITID). A benefit of the pipe is it can allow the code to be more readable than a series of nested functions."5)
Response:
We really liked how you put that and adapted your words. This was very helpful. You’ll notice we wrote:
“The “%>% is known as a pipe and is a function of the magrittr package. This function takes the object on the left hand side and “pipes” it onto the first argument of the subsequent function. For example, institutioninformation %>%select(UNITID) is equivalent to select(institutioninformation, UnitID). The pipe allows the code to be more readable than a series of nested functions.”
5) Perhaps explain what the tidyverse is, i.e. "Once the data are queried, it may be necessary to recode variables using tidyverse, a suite of R packages used to manipulate data frames".
Response:
We adopted your language here again, changing only one word:
“Once the data are queried, it may be necessary to recode variables using tidyverse, a suite of R packages used to clean and munge data frames”
6) There are several times the authors refer to assigning an object in R as "names the table" -- technically this should be something like "a table is created called x" since the process is actually creating the table and naming the object something rather than just assigning a name.
Response:
We went through the paper and changed the language accordingly.
7) The code on page 17 could be reduced to a series of left joins connected by the pipe (rather than creating something named "census" over and over again, ie:
census <- left_join(employ, Health, by = "GEOID") %>%
rename(GEOID = GEOID.x) %>%
left_join(income, by = "GEOID") %>%
left_join(TotalWhite, by = "GEOID") %>%
left_join(Veteran, by = "GEOID")
Response:
The code has been changed to this. As a side note, we started piping our newest code this way. We used to do it the longer way to detect errors better, but decided your suggestion was better.
We also fixed this with the IPEDS data joins.
ipedsdashdata <- left_join(institutioninformation, enrollmentinformationgender, by = "UNITID")%>%
left_join(enrollmentinformationrace, by = "UNITID")%>%
etc.
We also changed it in the RMarkdown and republished it.
8) Small typos:
a. When referring to a function in text, it generally should have the open and closed brackets (or no brackets at all) (i.e obcDriverConnect( should be obcDriverConnect() or obcDriverConnect)
b. Page 15 look says "install. Packages" should be "install.packages"
c. Page 22 "page gives the complete programming language" should read "page gives the complete programming code"
d. Page 25 HTLM -> HTML
Response:
a. We went through all the brackets and closed any open ones.
b. We changed it to “install.packages”.
c. We changed “language” to “code”.
d. We changed “HTLM” to “HTML”
Reviewer 3
1) What process did the authors use to select these variables and not others to appear in the dashboards? There are many data points left out.
Response:
The purpose of this paper was to demonstrate the development of this dashboard. The variables chosen for inclusion in the dashboard are meant to inform higher education administrators, institutional researchers, and postsecondary policymakers. However, this paper is intended to inform the reader how a similar dashboard using large, publicly available datasets might be created. We encourage the readers of this paper to study all the data sources and dictionaries of any dataset they are interested in using. To address this, we added this language to the paper:
” These datasets contain thousands of variables. IPEDS has up to 250 variables [4], the USCB over 18,000 [3] and BLS has several datasets containing tens of thousands of variables (would could find no final number) [5]. For this demonstration, we choose variables related to demographics, educational outcomes and labor outcomes. However, other programmers may choose to use any number of variables from these datasets depending on the goals of their project. No matter what variables are chosen, it is imperative that the programmer studies the variable sources, code books, and logic. This is the case whether they choose these demonstrated datasets, or other datasets. It is our goal to use the variables in this paper to demonstrate basic skills that could be applied to any number of variables from any number of data sources”
2) A main claim in the paper is that there isn’t a dashboard cited in the literature that combines US federal census, labor and education. Are there any studies that combine this data themselves without a dashboard? This would strengthen the argument that this dashboard was needed and provide a basis to evaluate the effectiveness of this dashboard in future years.
Response:
On our first submission, we had some literature review on the few sources that we found that combine these types of data. We have re-introduced a short paragraph about those as we agree it adds to the context or need. You will note a new section titled “These Data Sets Combined”.
3) Some discussion is needed about the logistics of hosting and maintaining this dashboard. The authors seem to be using a free account on shinyapps.io. What are the limitations as the dashboard is used more frequently for this approach? Did the authors consider other deployment paths? (i.e. was cost the only factor, or were other deployment options evaluated).
Response:
We have added a section on deployment and hosting of the dashboard called “Deployment of the Dashboard”. We note that the free version of Shiny comes with a data limit and that there are purchasable subscriptions if more data needs to be stored. We also note that this demonstration allows for the creation and deployment of a Shiny application. R does offer other applications where this code may be useful, but the focus of this project was to use Shiny since it is free and open-source.
4) What plans are there to maintain the data set and what is the workload? i.e. what effort is needed to add 2020 data and beyond? How long will that take given the initial code is in place?
Response:
We have added a section at the end called “Maintenance and Updates” where we discuss the need to update the data each year. We provide a schedule of when Census, BLS, and IPEDS data are updated and suggest updating all of the simultaneous with IPEDS data. We note that updating the data requires an adjustment of term codes in the programming language and the obtainment of the most recent data sets from each entity. We also note that with practice, updating the data need not take more than a few hours a year.
5) The response to a prior reviewer comment about 2-year or 4-year institutions I think misses the mark. This is a somewhat misleading variable in IPEDS that actually can lead to wrong conclusions if a researcher is unaware what it represents. IPEDs codes institutions as 4-year institutions if they have any bachelor’s degree programs. 2-year institutions (in the IPEDS data set) are those institutions with only associates degrees. Most times this is not actually what a researcher actually wants and excludes nearly all community & technical colleges in some states that award primarily associates degrees and less than two year certificates, but have a few bachelor’s degree programs.
Response:
We agree with you and have added language that encourages the user, if wanting to use this variable, to study the IPEDS data dictionary on the ACCESS file. Under the “valuesetsYY” tab, you can look up the CYYSZSET variable in the HDYYYY table and note the different categories as they are coded by IPEDS. These include a range of categories from “Two-year, very small” for a coded value of 1, to a “Not applicable” category for a value of 20. We thus agree that this is a misleading variable if the researcher wants to use IPEDS data for other purposes and have added language in our paper to emphasize that the researcher or coder should consider the limitations of the variable names in IPEDS or in any other source, but for purposes of demonstrating how to reprogram variables in R, we have used this variable.
You will also note that we included our coding logic for that variable in the paper, which is followed by this statement:
“It is important to consider the objectives of a dashboard when conducting research with IPEDS or other data. For example, this particular dataset may not include every category of higher education institutions. For example, some institutions that are not public, only award less than two year certificates, or other types may not be classified in the desired way by IPEDS or other data sources. Therefore, it is imperative of the researcher to choose datasets that meet their dashboard’s objectives.”
Thank you once again for your feedback.
Sincerely,
Authors
Submitted filename:
Using R Shiny to Develop a Dashboard using IPEDS, U.S. Census, and Bureau of Labor Statistics Data
PONE-D-21-37932R2
Dear Dr. Perkins,
We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.
Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.
An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at
If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact
Kind regards,
Sathish A.P. Kumar
Academic Editor
PLOS ONE
Additional Editor Comments (optional):
Reviewers' comments:
Reviewer's Responses to Questions
1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.
Reviewer #3: (No Response)
**********
2. Is the manuscript technically sound, and do the data support the conclusions?
The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.
Reviewer #3: Yes
**********
3. Has the statistical analysis been performed appropriately and rigorously?
Reviewer #3: N/A
**********
4. Have the authors made all data underlying the findings in their manuscript fully available?
The
Reviewer #3: Yes
**********
5. Is the manuscript presented in an intelligible fashion and written in standard English?
PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.
Reviewer #3: Yes
**********
6. Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #3: The authors have addressed my feedback in the prior review. I have no other feedback for this paper.
**********
7. PLOS authors have the option to publish the peer review history of their article (
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Reviewer #3: No
**********
PONE-D-21-37932R2
Using R Shiny to Develop a Dashboard using IPEDS, U.S. Census, and Bureau of Labor Statistics Data
Dear Dr. Perkins:
I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.
If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact
If we can help with anything else, please email us at
Thank you for submitting your work to PLOS ONE and supporting open access.
Kind regards,
PLOS ONE Editorial Office Staff
on behalf of
Dr. Sathish A.P. Kumar
Academic Editor
PLOS ONE