The link to the dashboard described in this case study is here.

To access the GitHub Repository for this case study see here: https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/.

You may also access and download the data using our OCSdata package. To learn more about this package including examples, see this link. Here is how you would install this package:

install.packages("OCSdata")

This case study is part of a series of public health case studies for the Bloomberg American Health Initiative.

For users or instructors who only wish to look at the basics of how to create a dashboard in R with the flexdashboard package, please see the Dashboard Basics Section.

Disclaimer: The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts.

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

To cite this case study please use:

Wright, Carrie and Ontiveros, Michael, and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard. Open Case Studies: School Shootings in the United States (Version v1.0.0).

The total reading time for this case study is calculated via koRpus and shown below:

Reading Time Method
110 minutes koRpus

Readability Score:

A readability index estimates the reading difficulty level of a particular text. Flesch-Kincaid, FORCAST, and SMOG are three common readability indices that were calculated for this case study via koRpus. These indices provide an estimation of the minimum reading level required to comprehend this case study by grade and age.

Text language: en 
index grade age
Flesch-Kincaid 9 14
FORCAST 10 15
SMOG 11 16

Please help us by filling out our survey.

Motivation


This case study is motivated by this article:

Flannery, D. J., Modzeleski, W. & Kretschmar, J. M. Violence and School Shootings. Curr Psychiatry Rep 15, 331 (2013). DOI: 10.1007/s11920-012-0331-6

The article explores characteristics of school shootings and violence in schools and discusses why these events may occur, as well as their impact on the communities in which they occur.

The article also states that the shooters are most commonly white males, but that many previous studies of shooter characteristics could not identify any particular “profile” of shooters.

“To date, studies of school shootings have concluded that no consistent and reliable profile of school shooters exist…”

However previous studies note some commonalities such as:

“…most shooters were depressed, had experienced some significant loss, felt persecuted or bullied by others, and had prior difficulty coping or had previously tried suicide.”

Therefore in our dashboard we will examine how often a shooter was male or attempted or committed suicide during an event.

Photo by Joshua Hoehne on Unsplash

"School shootings are not all the same and may require different approaches to prevention and treatment, especially with respect to identifying risk factors at the individual, school or community levels, and particularly with regard to examining the role that mental health issues may play to increase risk for perpetration.

The field needs to know more about shooting incidents that are averted, those that result in injury but not death and about the characteristics of the more common occurrence of single homicide school shootings."

Photo by Andre Hunter on Unsplash

Given this need for more research to better understand why these events occur and how they could be averted, in this case study we will demonstrate how to create a resource for others to more easily and interactively access data about school shootings. To do so we will create what is called a dashboard, which is a website that displays a report for a database. Dashboards summarize the data in a database and typically allow for users to interact with the data in some way.

Here you can see an example of a dashboard created in R about downloads of packages on CRAN.

On the website the tabs and plots are interactive. The above dashboard allows for users to get to know the data in a simple and quick way.

The data about package downloads is succinctly summarized in an impactful manner.

We can quickly get a sense that the magrittr package is among the top most widely downloaded packages on CRAN.

[source]

Now let’s learn how to create a dashboard with our data of interest.

Main Questions


Our main questions:

  1. What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)?

  2. How many individuals are typically killed in a school shooting?

  3. What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?

Learning Objectives


In this case study, we will demonstrate how to create a dashboard, which is a website that displays a report about a database. In doing so, we will focus on packages and functions from the tidyverse for the data wrangling and visualization sections. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R more human-readable and intuitive.

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Importing text from a Google Sheets document (googlesheets4)
  2. Converting date formats (lubridate)
  3. Geocoding data (ggmap) and creating a jitter for geocoded data on a map (SF)
  4. How to reshape data by pivoting between “long” and “wide” formats and drop rows with NA values (tidyr)
  5. How to create data visualizations with ggplot2
  6. An introduction to the basics of R Markdown
  7. How to create an interactive table (DT)
  8. How to create a map (leaflet)
  9. How to create an interactive dashboard with flexdashboard and shiny

Statistical Learning Objectives:

  1. Calculating percentages for data with missing values
  2. Creating summary statistics

Note: statistics is a part of data science


We will begin by loading the packages that we will need:

library(here)
library(readr)
library(googlesheets4)
library(tibble)
library(dplyr)
library(stringr)
library(magrittr)
library(tidyr)
library(ggmap)
library(sf)
library(lubridate)
library(DT)
library(htmltools)
library(ggplot2)
library(forcats)
library(ggforce)
library(waffle)
library(poliscidata)
library(flexdashboard)
library(shiny)
library(leaflet)
library(maps)
library(vembedr)
library(OCSdata)

Note some of these packages are part of the tidyverse and can be loaded together like so:

library(tidyverse)

Packages used in this case study:

Package Use in this case study
here to easily load and save data
readr to import the data as a csv file
googlesheets4 to import directly from Google Sheets
tibble to create tibbles (the tidyverse version of dataframes)
dplyr to filter, subset, join, add rows to, and modify the data
stringr to manipulate character strings within the data (collapsing strings together, replace values, and detect values)
magrittr to pipe sequential commands
tidyr to change the shape or format of tibbles to wide and long, to drop rows with NA values, and to see the last few columns of a tibble
ggmap to geocode the data (which means get the latitude and longitude values)
sf to modify the geocoded data so that overlapping points did not overlap
lubridate to work with the data-time data
DT to create the interactive table
htmltools to add a caption to our interactive table
ggplot2 to create plots
ggforce to create a plot zoom
forcats to reorder factor for plot
waffle to make waffle proportion plots
poliscidata to get population values for the states
flexdashboard to create the dashboard
shiny to allow our dashboard to be interactive
leaflet to implement the leaflet (a JavaScript library for maps) to create the map for our dashboard
maps to create the simple leaflet map example
vembedr to include a video in our case study
OCSdata to access and download OCS data files

The first time we use a function, we will use the :: to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.

Context


School shootings get a lot of attention in the the media, but it would be helpful to see all the data on them at once to better understand them. A dashboard can help with this, so that people get a boarder understanding of the issue rather than hearing about singular specific incidences from the media.

In addition to injuries and deaths, shooting events can also have broad and lasting impacts for those who witness but are not directly involved.

According to the Center for Injury Research and Prevention at the Children’s Hospital of Philadelphia:

The most common shootings on school grounds rarely involve large numbers of victims, but even a shooting of just one student at school has ramifications far beyond those directly involved.

Students and staff that witness school shootings are likely to suffer from traumatic stress symptoms, become anxious or depressed and have general concerns about their safety.

While many witnesses will have temporary symptoms, others will be symptomatic for a much longer period of time and even develop chronic psychiatric disorders.

Even short-term impairments can cause severe distress and have profound effects on academic achievement and the social and emotional growth of impacted students.

Furthermore, school shootings can have vast and lasting impacts because many students can witness a single event.

Another recently published article indicates that:

Over 240,000 American students experienced a school shooting in the last two decades.

[source]

This study followed students who experienced a school shooting the United States between 2008 and 2013 and assessed their mental well-being. They found that:

Fatal school shootings have large and persistent impacts on the mental health of local youth. In the two years following a fatal school shooting, the monthly number of antidepressant prescriptions written to individuals under age 20 is 21.3 percent higher in the shooting-exposed relative to the reference areas.

Rossin-Slater, M., Schnell, M., Schwandt, H., Trejo, S. & Uniat, L. Local Exposure to School Shootings and Youth Antidepressant Use. w26563 http://www.nber.org/papers/w26563.pdf (2019) doi:10.3386/w26563.

Thus, it is useful to better understand the characteristics of these shootings. Having better data on what they look like nationwide can help with identifying associations of shootings with key characteristics. Better descriptive information such as this may then lead to more knowledge about factors that predict school shootings, which could help develop preventive interventions. This way, we might not only prevent the direct involvement of students in future events, but also to prevent students and staff from witnessing these events.

Photo by Fernando @cferdo on Unsplash

Limitations


There are some important considerations regarding this data analysis to keep in mind:

This dashboard only uses one source of data. There may be school shooting events that are not listed in this data or errors in this data.

According to the database website itself:

“This database was developed from open-source information and may include reporting errors.”

Furthermore, according to this article, schools in 2013, schools were not required to report school shootings unless they resulted in a suicide or homicide. Therefore there may be more events that result in only injury or no injuries or death that may not be included.

There are indeed events in the dataset that include zero deaths and zero injuries, but it is very likely that many of these events are not listed.

What are the data?


We will use data from the open-source K-12 Shool Shooting Database from the Center for Homeland Defense and Security at the Naval Postgraduate School(NPS) in Monterey, California. This data is updated daily. The data used in this case study was downloaded in June of 2020.

Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, www.chds.us/ssdb.

This database includes information about school shooting events for students in grades K-12 in the United States dating back to 1970. The database has additional information not shown on our dashboard including but not limited to:

  • Location of the event at the school
  • If the event occurred during a sporting event
  • Time of day of the event
  • Day of the week of the event
  • Source for the shooting information
  • If the event was pre-planned or not
  • Shooter’s actions immediately following the shooting
  • Shooter characteristics (affiliation with the school, if they had accomplices, if they took hostages, and their age and race)
  • Victim characteristics (affiliation with the school, if they were targeted, their age and race)

According to the K-12 Shool Shooting Database website:

The School Shooting Database Project is conducted as part of the Advanced Thinking in Homeland Security (HSx) program at the Naval Postgraduate School’s Center for Homeland Defense and Security (CHDS).

The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. All of the information is based on open-source information and 3rd party reporting.

Data Import


Previously, the website for this data was located at “https://www.chds.us/ssdb/dataset/” (which is no longer an active link), which displayed an active Google Sheets document and a link to download a csv file of the data. At the time that we created this case study (June of 2020) we downloaded the data from this website.

Now the data can be found at this link and a file of the raw data can be downloaded by clicking the “DOWNLOAD RAW DATA” button. This file was previously a .csv file, but it is now an .xlsx file.

To account for changes with this website, we have made the previous .csv file available for you to download using the OCSdata package:

# library(OCSdata)
raw_data("ocs-bp-school-shootings-dashboard", outpath = getwd())

If you have trouble using the package, you may also download this .csv file here.

In our case, we downloaded this data and put it within a “raw” subdirectory of a “data” directory for our project. If you use an RStudio project, then you can use the here() function from the here package to make the path for importing this data simpler. The here package automatically starts looking for files based on where you have a .Rproj file which is created when you start a new RStudio project. We can specify that we want to look for the "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv" file within the “raw” directory within the “data” directory within a directory where our .Rproj file is located by separating the names of these directories using commas and listing “data” first.

Click here to see more about creating new projects in RStudio.

You can create a project by going to the File menu of RStudio like so:

You can also do so by clicking the project button:

See here to learn more about using RStudio projects.

We can import the raw .csv file using the read_csv() function from the readr package. Let’s start by only importing the first five rows with the n_max argument which is the max number of rows to read in from the file. By doing this, we can check for errors before reading in the entire file. Note that you would need to modify the file argument if you set your data files up differently.

shooting_data <- 
  readr::read_csv(file = 
                    here::here("data", "raw",
                         "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv"), 
                  n_max = 5)
shooting_data
# A tibble: 5 x 47
  `Updated 6/2/202~` ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11
  <chr>              <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Date               Scho~ City  State Reli~ Kill~ Woun~ Tota~ Gend~ Vict~ Vict~
2 1/5/1970           Hine~ Wash~ DC    3     1     0     1     Male  Stud~ 15   
3 1/5/1970           Sous~ Wash~ DC    3     0     1     1     Male  Stud~ <NA> 
4 1/5/1970           Unna~ Wash~ DC    2     0     0     0     No V~ No V~ <NA> 
5 2/6/1970           John~ Clev~ OH    2     0     1     1     Male  Stud~ 18   
# ... with 36 more variables: ...12 <chr>, ...13 <chr>, ...14 <chr>,
#   ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
#   ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
#   ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
#   ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
#   ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
#   ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>, ...

We see the first row is a sentence that states:

“Updated 6/2/2020 - View graphs and research methodology on www.chds.us/ssdb If you have information about other incidents, please email .”

We do not need this information, so we can skip it using the skip argument of read_csv() function. Specifically, we specify that we wish to only skip 1 row with skip = 1. We can also specify that the next row should be used for column names using the col_names = TRUE argument.

shooting_data <- 
  readr::read_csv(file = here::here("data", "raw",
                                    "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv"), 
                  col_names = TRUE, skip = 1)

We can use the glimpse function from the dplyr package to take a look at columns within the database:

# Scroll through the output!
dplyr::glimpse(shooting_data)
Rows: 1,556
Columns: 47
$ Date                                                                                                                                                 <chr> ~
$ School                                                                                                                                               <chr> ~
$ City                                                                                                                                                 <chr> ~
$ State                                                                                                                                                <chr> ~
$ `Reliability Score (1-5)`                                                                                                                            <dbl> ~
$ `Killed (includes shooter)`                                                                                                                          <dbl> ~
$ Wounded                                                                                                                                              <dbl> ~
$ `Total Injured/Killed Victims`                                                                                                                       <dbl> ~
$ `Gender of Victims (M/F/Both)`                                                                                                                       <chr> ~
$ `Victim's Affiliation w/ School`                                                                                                                     <chr> ~
$ `Victim's age(s)`                                                                                                                                    <dbl> ~
$ `Victims Race`                                                                                                                                       <chr> ~
$ `Victim Ethnicity`                                                                                                                                   <chr> ~
$ `Targeted Specific Victim(s)`                                                                                                                        <chr> ~
$ `Random Victims`                                                                                                                                     <chr> ~
$ `Bullied (Y/N/ N/A)`                                                                                                                                 <chr> ~
$ `Domestic Violence (Y/N)`                                                                                                                            <chr> ~
$ `Suicide (Shooter was only victim) Y/N/ N/A`                                                                                                         <chr> ~
$ `Suicide (shot self immediately following initial shootings) Y/N/ N/A`                                                                               <chr> ~
$ `Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A` <chr> ~
$ `Suicide (or attempted suicide) by Shooter (Y/N)`                                                                                                    <chr> ~
$ `Shooter's actions immediately after shots fired`                                                                                                    <chr> ~
$ `Pre-planned school attack`                                                                                                                          <chr> ~
$ Summary                                                                                                                                              <chr> ~
$ Category                                                                                                                                             <chr> ~
$ `School Type`                                                                                                                                        <chr> ~
$ `Narrative (Detailed Summary/ Background)`                                                                                                           <chr> ~
$ Sources                                                                                                                                              <chr> ~
$ `Time of Occurrence (12 hour AM/PM)`                                                                                                                 <time> ~
$ `Duration (minutes)`                                                                                                                                 <dbl> ~
$ `Day of week (formula)`                                                                                                                              <chr> ~
$ `During School Day (Y/N)`                                                                                                                            <chr> ~
$ `Time Period`                                                                                                                                        <chr> ~
$ `During a Sporting Event (Y/N)`                                                                                                                      <chr> ~
$ `During a school sponsored event (school dance, concert, play, activity)`                                                                            <chr> ~
$ Location                                                                                                                                             <chr> ~
$ `Number of Shots Fired`                                                                                                                              <dbl> ~
$ `Firearm Type`                                                                                                                                       <chr> ~
$ `Number of Shooters`                                                                                                                                 <dbl> ~
$ `Shooter Name`                                                                                                                                       <chr> ~
$ `Shooter Age`                                                                                                                                        <dbl> ~
$ `Shooter Gender`                                                                                                                                     <chr> ~
$ Race                                                                                                                                                 <chr> ~
$ `Shooter Ethnicity`                                                                                                                                  <chr> ~
$ `Shooter's Affiliation with School`                                                                                                                  <chr> ~
$ `Shooter had an accomplice who did not fire gun (Y/N)`                                                                                               <chr> ~
$ `Hostages Taken (Y/N)`                                                                                                                               <chr> ~

We can also use the utils str() function, which is short for “structure” to see more details about the internal structure of the data. Therefore, the str() function will give us more information about the actual values for each column within the data, not just the columns themselves. Typically we would be able to see some of the values with glimpse() function as well, but some of the columns have very long names, thus obscuring the first few values in the output.

# Scroll through the output!
str(shooting_data)
spec_tbl_df [1,556 x 47] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Date                                                                                                                                              : chr [1:1556] "1/5/1970" "1/5/1970" "1/5/1970" "2/6/1970" ...
 $ School                                                                                                                                            : chr [1:1556] "Hine Junior High School" "Sousa Junior High" "Unnamed High School" "John F. Kennedy High School" ...
 $ City                                                                                                                                              : chr [1:1556] "Washington" "Washington" "Washington" "Cleveland" ...
 $ State                                                                                                                                             : chr [1:1556] "DC" "DC" "DC" "OH" ...
 $ Reliability Score (1-5)                                                                                                                           : num [1:1556] 3 3 2 2 2 3 3 2 2 2 ...
 $ Killed (includes shooter)                                                                                                                         : num [1:1556] 1 0 0 0 0 1 1 0 0 0 ...
 $ Wounded                                                                                                                                           : num [1:1556] 0 1 0 1 2 5 0 1 2 1 ...
 $ Total Injured/Killed Victims                                                                                                                      : num [1:1556] 1 1 0 1 2 6 1 1 2 1 ...
 $ Gender of Victims (M/F/Both)                                                                                                                      : chr [1:1556] "Male" "Male" "No Victims" "Male" ...
 $ Victim's Affiliation w/ School                                                                                                                    : chr [1:1556] "Student" "Student" "No Victims" "Student" ...
 $ Victim's age(s)                                                                                                                                   : num [1:1556] 15 NA NA 18 NA NA 18 19 NA 15 ...
 $ Victims Race                                                                                                                                      : chr [1:1556] NA NA "No Victims" NA ...
 $ Victim Ethnicity                                                                                                                                  : chr [1:1556] NA NA "No Victims" NA ...
 $ Targeted Specific Victim(s)                                                                                                                       : chr [1:1556] "No" "No" "Yes" "Yes" ...
 $ Random Victims                                                                                                                                    : chr [1:1556] "Yes" "Yes" "No" "No" ...
 $ Bullied (Y/N/ N/A)                                                                                                                                : chr [1:1556] "No" "No" "No" "No" ...
 $ Domestic Violence (Y/N)                                                                                                                           : chr [1:1556] "No" "No" "No" "No" ...
 $ Suicide (Shooter was only victim) Y/N/ N/A                                                                                                        : chr [1:1556] "N/A" "N/A" "N/A" "N/A" ...
 $ Suicide (shot self immediately following initial shootings) Y/N/ N/A                                                                              : chr [1:1556] "N/A" "N/A" "N/A" "N/A" ...
 $ Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A: chr [1:1556] "N/A" "N/A" "N/A" "N/A" ...
 $ Suicide (or attempted suicide) by Shooter (Y/N)                                                                                                   : chr [1:1556] "No" "No" "No" "No" ...
 $ Shooter's actions immediately after shots fired                                                                                                   : chr [1:1556] "Unknown if Subdued Surrendered or Fled" "Immediately Surrendered" "Fled" "Unknown if Subdued Surrendered or Fled" ...
 $ Pre-planned school attack                                                                                                                         : chr [1:1556] "No" "No" "No" "No" ...
 $ Summary                                                                                                                                           : chr [1:1556] "Didn't know how to operate pistol, cocked hammer and couldn't get it to safely release causing accidental discharge" "Occurred during horseplay in the school" "Student shot at twice during attempted robbery on playground" "Argument in school hallway escalated into shooting" ...
 $ Category                                                                                                                                          : chr [1:1556] "Accidental" "Accidental" "Robbery" "Escalation of Dispute" ...
 $ School Type                                                                                                                                       : chr [1:1556] "High" "Junior High" "High" "High" ...
 $ Narrative (Detailed Summary/ Background)                                                                                                          : chr [1:1556] "Student showing off gun cocked hammer and could not get it to release causing accidental discharge and killing "| __truncated__ "14YOM student shot during \"horseplay\" in the school hallway. Friend of the victim surrendered to police." "Group of 10 teens attempted to rob 16YOM (James Owens) on school playground. When victim ran, unknown teen susp"| __truncated__ "Argument between shooter and victim escalated into shooting in school hallway." ...
 $ Sources                                                                                                                                           : chr [1:1556] "https://news.google.com/newspapers?id=AfRYAAAAIBAJ&pg=3025,1894998" "https://news.google.com/newspapers?id=AfRYAAAAIBAJ&pg=3025,1894998 https://www.newspapers.com/image/156467116/?"| __truncated__ "https://www.newspapers.com/image/156467116/?terms=school%2Bshooting" "https://www.newspapers.com/image/18059538/?terms=school%2Bshooting" ...
 $ Time of Occurrence (12 hour AM/PM)                                                                                                                : 'hms' num [1:1556] NA NA NA NA ...
  ..- attr(*, "units")= chr "secs"
 $ Duration (minutes)                                                                                                                                : num [1:1556] 1 1 1 1 NA 8 1 1 NA 1 ...
 $ Day of week (formula)                                                                                                                             : chr [1:1556] "Mon" "Mon" "Mon" "Fri" ...
 $ During School Day (Y/N)                                                                                                                           : chr [1:1556] "Yes" "Yes" "Yes" "Yes" ...
 $ Time Period                                                                                                                                       : chr [1:1556] NA NA NA NA ...
 $ During a Sporting Event (Y/N)                                                                                                                     : chr [1:1556] "No" "No" "No" "No" ...
 $ During a school sponsored event (school dance, concert, play, activity)                                                                           : chr [1:1556] "No" "No" "No" "No" ...
 $ Location                                                                                                                                          : chr [1:1556] "Inside School Building" "Inside School Building" "Outside on School Property" "Inside School Building" ...
 $ Number of Shots Fired                                                                                                                             : num [1:1556] 1 1 2 4 NA NA 1 1 2 NA ...
 $ Firearm Type                                                                                                                                      : chr [1:1556] "Handgun" "Handgun" "Handgun" "Handgun" ...
 $ Number of Shooters                                                                                                                                : num [1:1556] 1 1 1 1 2 8 1 1 1 2 ...
 $ Shooter Name                                                                                                                                      : chr [1:1556] "Minor" "Minor" "Unknown" "Gertis J. Perry" ...
 $ Shooter Age                                                                                                                                       : num [1:1556] 15 NA NA 18 NA NA 16 18 15 NA ...
 $ Shooter Gender                                                                                                                                    : chr [1:1556] "Male" "Male" "Male" "Male" ...
 $ Race                                                                                                                                              : chr [1:1556] NA NA NA NA ...
 $ Shooter Ethnicity                                                                                                                                 : chr [1:1556] NA NA NA "Not Hispanic or Latino" ...
 $ Shooter's Affiliation with School                                                                                                                 : chr [1:1556] "Student" "Student" "Student" "Student" ...
 $ Shooter had an accomplice who did not fire gun (Y/N)                                                                                              : chr [1:1556] "Yes" "No" "Yes" "No" ...
 $ Hostages Taken (Y/N)                                                                                                                              : chr [1:1556] "No" "No" "No" "No" ...
 - attr(*, "spec")=
  .. cols(
  ..   Date = col_character(),
  ..   School = col_character(),
  ..   City = col_character(),
  ..   State = col_character(),
  ..   `Reliability Score (1-5)` = col_double(),
  ..   `Killed (includes shooter)` = col_double(),
  ..   Wounded = col_double(),
  ..   `Total Injured/Killed Victims` = col_double(),
  ..   `Gender of Victims (M/F/Both)` = col_character(),
  ..   `Victim's Affiliation w/ School` = col_character(),
  ..   `Victim's age(s)` = col_double(),
  ..   `Victims Race` = col_character(),
  ..   `Victim Ethnicity` = col_character(),
  ..   `Targeted Specific Victim(s)` = col_character(),
  ..   `Random Victims` = col_character(),
  ..   `Bullied (Y/N/ N/A)` = col_character(),
  ..   `Domestic Violence (Y/N)` = col_character(),
  ..   `Suicide (Shooter was only victim) Y/N/ N/A` = col_character(),
  ..   `Suicide (shot self immediately following initial shootings) Y/N/ N/A` = col_character(),
  ..   `Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A` = col_character(),
  ..   `Suicide (or attempted suicide) by Shooter (Y/N)` = col_character(),
  ..   `Shooter's actions immediately after shots fired` = col_character(),
  ..   `Pre-planned school attack` = col_character(),
  ..   Summary = col_character(),
  ..   Category = col_character(),
  ..   `School Type` = col_character(),
  ..   `Narrative (Detailed Summary/ Background)` = col_character(),
  ..   Sources = col_character(),
  ..   `Time of Occurrence (12 hour AM/PM)` = col_time(format = ""),
  ..   `Duration (minutes)` = col_double(),
  ..   `Day of week (formula)` = col_character(),
  ..   `During School Day (Y/N)` = col_character(),
  ..   `Time Period` = col_character(),
  ..   `During a Sporting Event (Y/N)` = col_character(),
  ..   `During a school sponsored event (school dance, concert, play, activity)` = col_character(),
  ..   Location = col_character(),
  ..   `Number of Shots Fired` = col_double(),
  ..   `Firearm Type` = col_character(),
  ..   `Number of Shooters` = col_double(),
  ..   `Shooter Name` = col_character(),
  ..   `Shooter Age` = col_double(),
  ..   `Shooter Gender` = col_character(),
  ..   Race = col_character(),
  ..   `Shooter Ethnicity` = col_character(),
  ..   `Shooter's Affiliation with School` = col_character(),
  ..   `Shooter had an accomplice who did not fire gun (Y/N)` = col_character(),
  ..   `Hostages Taken (Y/N)` = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

We can see from this that many of the variables have Yes or No values, while others have relatively long descriptions. You may also notice that the State values are state abbreviations, not full state names. This is something that we will add to the data later.

Alternatively, if we wanted to make a dashboard that continually updated as data got updated, we could do the following to import the data directly from a live Google Sheets document as previously this was available for this data.

To do so we would use the read_sheet() function from the googlesheets4 package. Typically authentication is required, (meaning that you would need to sign in with your Google account using a username and or password), but since this was a public sheet we did not need to worry about authentication. To avoid being asked about this we used the gs4_deauth() function which puts the package into a de-authorized state that will not ask for users to sign in.

googlesheets4::gs4_deauth()

Great, now we would need to get the shared link from the document. We could previously do so by clicking on the link to the actually Google Sheets document like so:

Then we can click on the “share” button to get access to the link:

Finally we can click on “copy link” button to copy the link:

Once you copy a link like this, you can use the read_sheet() function to import the data by simply pasting the link in quotes, like so:

data_url <- "https://docs.google.com/spreadsheets/d/1HqbfMxnk9X3_mQvLyW_LEUe3Yyr7cXMPfwqUVfdq7sY/edit?usp=sharing"

googlesheet_data <- 
  read_sheet(data_url)

This is a great option, however, we chose not to do this for this case study to allow this tutorial to be more easily maintained over time. This was evidently a good choice since the data is no longer accessible in the same way.

To allow users to skip import we will save the data as an RDA file:

save(shooting_data, file = here::here("data", "imported", "shooting_data.rda"))

Data Exploration and Wrangling


If you have been following along but stopped, we could load our imported data like so:

load(here::here("data", "imported", "shooting_data.rda"))

If you skipped the data import section click here.

First you need to install the OCSdata package:

install.packages("OCSdata")

Then, you may download the imported data .rda file using the following function:

# library(OCSdata)
imported_data("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(here::here("OCSdata", "data", "imported", "shooting_data.rda"))

To load the downloaded data into your environment, you may double click on the .rda file in RStudio or using the load() function.

If the package does not work for you, an RDA file (stands for R data) of the data can be found here or slightly more directly here. Download this file and then place it in your current working directory. We recommend using an RStudio project and the here package to navigate to your file more easily.

We have put this file in a directory called “imported” within a directory called “data” within our working directory (which has a .Rproj file).

load(here::here("data", "imported", "shooting_data.rda"))

Click here to see more about creating new projects in RStudio.

You can create a project by going to the File menu of RStudio like so:

You can also do so by clicking the project button:

See here to learn more about using RStudio projects.


Luckily, our data is already in pretty good shape, but we want to make our data more useful for our dashboard.

Adding state name


It would be useful to have the full state name in our data, rather than just the abbreviation.

We can do so by using data related to the US 50 states in a dataset called state that is automatically loaded with R sessions in the datasets package. The state.abb object is a list of the state abbreviations and state.name is a list of the state names.

state.abb
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA"
[16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ"
[31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT"
[46] "VA" "WA" "WV" "WI" "WY"
state.name
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"      
 [5] "California"     "Colorado"       "Connecticut"    "Delaware"      
 [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"         
[13] "Illinois"       "Indiana"        "Iowa"           "Kansas"        
[17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"      
[21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
[25] "Missouri"       "Montana"        "Nebraska"       "Nevada"        
[29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"      
[33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
[37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
[41] "South Dakota"   "Tennessee"      "Texas"          "Utah"          
[45] "Vermont"        "Virginia"       "Washington"     "West Virginia" 
[49] "Wisconsin"      "Wyoming"       

We will combine these using the tibble() function from the tibble() package.

state_df <- 
  tibble(State_abb = state.abb, 
         State = state.name)

slice_head(state_df, n=4)
# A tibble: 4 x 2
  State_abb State   
  <chr>     <chr>   
1 AL        Alabama 
2 AK        Alaska  
3 AZ        Arizona 
4 AR        Arkansas

Now we will combine this with our data using the left_join() function from the dplyr package. There are several ways to join data using the dplyr package.

[source]

Here is a visualization of these options:

[source]

See here for more details about joining data.

We probably have data for all fifty states, but there may not have been school shootings in all 50 states in this dataset, therefore we don’t want to use the full_join() function.

We also don’t want the inner_join() function because DC does not have a state name. According to Wikipedia:

The U.S. Constitution provides for a federal district under the exclusive jurisdiction of Congress; the district is therefore not a part of any U.S. state (nor is it one itself)

Thus we will use the left_join(x,y) function where x in this case will be the shooting_data (as it is introduced to this code first through the %<>% compound assignment pipe operator) and y is the state_df. Thus, we add the state_df values where they match to the shooting_data.

The %<>% compound operator allows us to use the an input and reassign it at the end after all the subsequent steps have been performed. We can therefore use data_input %<>% instead of data_input <- data_input %>%. We will demonstrate this in the code below.

shooting_data %<>%
  rename("State_abb" = State) %>%
  left_join(state_df, by = c("State_abb" = "State_abb"))

In contrast, we can just use the %>% pipe operator to select a set of columns and peek at the first four rows of the new data frame.

Click here if you are unfamiliar with piping in R, which uses this %>% operator.

By piping we mean using the %>% pipe operator which is accessible after loading the tidyverse or several of the packages within the tidyverse like dplyr because they load the magrittr package. This allows us to perform multiple sequential steps on one data input. The object on the left side is used as input to any commands to the right or below.

shooting_data %>%
  select(School, City, State_abb, State) %>%
  slice_head(n = 4)
# A tibble: 4 x 4
  School                      City       State_abb State
  <chr>                       <chr>      <chr>     <chr>
1 Hine Junior High School     Washington DC        <NA> 
2 Sousa Junior High           Washington DC        <NA> 
3 Unnamed High School         Washington DC        <NA> 
4 John F. Kennedy High School Cleveland  OH        Ohio 

Reformatting dates


We also want to reformat our date values and create a Date_year variable based on the year in each date. We can use the lubridate package for this.

The mdy() function converts dates into a format where dates are listed as month, date, and year with hyphens in between. The year() function can then be used to extract just the year from each date.

shooting_data %<>%
  mutate(Date = lubridate::mdy(Date)) %>%
  mutate(Date_year = lubridate::year(Date))

shooting_data %>% 
  select(Date, Date_year)
# A tibble: 1,556 x 2
   Date       Date_year
   <date>         <dbl>
 1 1970-01-05      1970
 2 1970-01-05      1970
 3 1970-01-05      1970
 4 1970-02-06      1970
 5 1970-03-23      1970
 6 1970-04-15      1970
 7 1970-04-22      1970
 8 1970-05-08      1970
 9 1970-05-15      1970
10 1970-08-28      1970
# ... with 1,546 more rows

Looks good!

Reformatting data types


If you recall, in our dataset we have many variables that have either Yes or No values or Y and N values.

names(shooting_data)
 [1] "Date"                                                                                                                                              
 [2] "School"                                                                                                                                            
 [3] "City"                                                                                                                                              
 [4] "State_abb"                                                                                                                                         
 [5] "Reliability Score (1-5)"                                                                                                                           
 [6] "Killed (includes shooter)"                                                                                                                         
 [7] "Wounded"                                                                                                                                           
 [8] "Total Injured/Killed Victims"                                                                                                                      
 [9] "Gender of Victims (M/F/Both)"                                                                                                                      
[10] "Victim's Affiliation w/ School"                                                                                                                    
[11] "Victim's age(s)"                                                                                                                                   
[12] "Victims Race"                                                                                                                                      
[13] "Victim Ethnicity"                                                                                                                                  
[14] "Targeted Specific Victim(s)"                                                                                                                       
[15] "Random Victims"                                                                                                                                    
[16] "Bullied (Y/N/ N/A)"                                                                                                                                
[17] "Domestic Violence (Y/N)"                                                                                                                           
[18] "Suicide (Shooter was only victim) Y/N/ N/A"                                                                                                        
[19] "Suicide (shot self immediately following initial shootings) Y/N/ N/A"                                                                              
[20] "Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A"
[21] "Suicide (or attempted suicide) by Shooter (Y/N)"                                                                                                   
[22] "Shooter's actions immediately after shots fired"                                                                                                   
[23] "Pre-planned school attack"                                                                                                                         
[24] "Summary"                                                                                                                                           
[25] "Category"                                                                                                                                          
[26] "School Type"                                                                                                                                       
[27] "Narrative (Detailed Summary/ Background)"                                                                                                          
[28] "Sources"                                                                                                                                           
[29] "Time of Occurrence (12 hour AM/PM)"                                                                                                                
[30] "Duration (minutes)"                                                                                                                                
[31] "Day of week (formula)"                                                                                                                             
[32] "During School Day (Y/N)"                                                                                                                           
[33] "Time Period"                                                                                                                                       
[34] "During a Sporting Event (Y/N)"                                                                                                                     
[35] "During a school sponsored event (school dance, concert, play, activity)"                                                                           
[36] "Location"                                                                                                                                          
[37] "Number of Shots Fired"                                                                                                                             
[38] "Firearm Type"                                                                                                                                      
[39] "Number of Shooters"                                                                                                                                
[40] "Shooter Name"                                                                                                                                      
[41] "Shooter Age"                                                                                                                                       
[42] "Shooter Gender"                                                                                                                                    
[43] "Race"                                                                                                                                              
[44] "Shooter Ethnicity"                                                                                                                                 
[45] "Shooter's Affiliation with School"                                                                                                                 
[46] "Shooter had an accomplice who did not fire gun (Y/N)"                                                                                              
[47] "Hostages Taken (Y/N)"                                                                                                                              
[48] "State"                                                                                                                                             
[49] "Date_year"                                                                                                                                         

Note that in this case study, we will mostly be visualizing the data. However, for more intensive analysis, it would be better to make names more tidy, such as using lowercase and no spaces etc.

These are the variables that have Y/N in the name or the Targeted Specific Victim(s), Random Victims, Pre-planned school attack variables.

We can make these consistently TRUE and FALSE by using the case_when() function from the dplyr package. This function allows us to specify new values for existing values.

If you are familiar with the recode() function from dplyr, click here for an explanation of why case_when() is better in this case.

The benefit of the case_when() function, is that changing the values to TRUE or FALSE also results in the class type of the variable changing to type logical (which is interpreted as a binary variable with TRUE and FALSE values) otherwise, with recode() the variables would remain as class type character.


Click here for an explanation about data types in R.

There are several classes of data in R programming. Character is one of these classes. A character string is an individual data value made up of characters. This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter "a" or the number "3". If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense.

If you want your numeric values to be interpreted that way, they need to be converted to a numeric class. The options typically used are integer (which has no decimal place) and double precision (which has a decimal place).

Similarly if your data is of class character and are values of TRUE and FALSE they will be interpreted as two different strings.

However, logical data is interpreted slightly differently where a FALSE value indicates the absence of something, while a TRUE indicates the presence of something.


Click here for more details about the differences between the recode() and case_when() functions.

Note that with recode() there is the option that other values be recoded to NA although this is not the default, however with case_when() other values not explicitly assigned in the case_when() statement will be assigned to NA. Further more only values can be used on the left side when using recode() whereas case_when() accepts expressions.

OK let’s start by looking at the columns of interest by using the select() function and asking for any patterns that match the character string “Y/N” or “Specific” or “Random” or “Pre-planned”.

Formally, we can search for these using the | symbol, which is interpreted as an or, thus any variables that has a name that matches any of these patterns will be changed.

shooting_data %>% 
  select(matches("Y/N|Specific|Random|Pre-planned"))
# A tibble: 1,556 x 13
   `Targeted Specific Victi~` `Random Victims` `Bullied (Y/N/~` `Domestic Viol~`
   <chr>                      <chr>            <chr>            <chr>           
 1 No                         Yes              No               No              
 2 No                         Yes              No               No              
 3 Yes                        No               No               No              
 4 Yes                        No               No               No              
 5 No                         No               No               No              
 6 Yes                        Yes              No               No              
 7 Yes                        No               No               No              
 8 Yes                        No               No               No              
 9 No                         Yes              <NA>             No              
10 Yes                        No               No               No              
# ... with 1,546 more rows, and 9 more variables:
#   `Suicide (Shooter was only victim) Y/N/ N/A` <chr>,
#   `Suicide (shot self immediately following initial shootings) Y/N/ N/A` <chr>,
#   `Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A` <chr>,
#   `Suicide (or attempted suicide) by Shooter (Y/N)` <chr>,
#   `Pre-planned school attack` <chr>, `During School Day (Y/N)` <chr>,
#   `During a Sporting Event (Y/N)` <chr>, ...

We see the Yes and No values. Let’s look closer at one of these columns.

shooting_data %>% 
  count(`Suicide (or attempted suicide) by Shooter (Y/N)`)
# A tibble: 6 x 2
  `Suicide (or attempted suicide) by Shooter (Y/N)`     n
  <chr>                                             <int>
1 N                                                     1
2 N/A                                                  18
3 No                                                 1308
4 Officer Involved                                     26
5 Yes                                                 202
6 <NA>                                                  1

We see there are six different values in this column. To recode this column, we need to consider what we recode all the values.

To implement the case_when() recoding of values, the existing values are written on the left of the ~ sign (quotation marks are necessary around the existing values) and new values are written on the right (quotations marks are not necessary as these are TRUE and FALSE statements).

shooting_data %>%
       select(`Suicide (or attempted suicide) by Shooter (Y/N)`) %>%
       mutate(type = dplyr::case_when(. == "Yes" ~ TRUE,
                                      . == "No" ~ FALSE,
                                      . == "Y" ~ TRUE,
                                      . == "N" ~ FALSE,
                                      . == "Officer Involved" ~ TRUE))
# A tibble: 1,556 x 2
   `Suicide (or attempted suicide) by Shooter (Y/N)` type 
   <chr>                                             <lgl>
 1 No                                                FALSE
 2 No                                                FALSE
 3 No                                                FALSE
 4 No                                                FALSE
 5 No                                                FALSE
 6 No                                                FALSE
 7 No                                                FALSE
 8 No                                                FALSE
 9 No                                                FALSE
10 No                                                FALSE
# ... with 1,546 more rows

In the above code chunk, we did this for one of the columns, but now let’s do for all the columns that matched our string “Y/N|Specific|Random|Pre-planned” as above.

To do this, we will use the across() function from the dplyr package and the matches() function from the tidyselect package to allow us to apply this to all of the variables that have a pattern that that matches any of those of the variables we want to change.

The across() function then applies the case_when() function to all of these variables. Notice that the ~ symbol is necessary before the function that is applied using across().

shooting_data %<>%
  mutate(dplyr::across(.cols = matches("Y/N|Specific|Random|Pre-planned"),
                       ~ dplyr::case_when(. == "Yes" ~ TRUE,
                                          . == "No" ~ FALSE,
                                          . == "Y" ~ TRUE,
                                          . == "N" ~ FALSE,
                                          . == "Officer Involved" ~ TRUE)))

Finally, we can check out what happened after recoding the variables.

shooting_data %>% 
  select(matches("Y/N|Specific|Random|Pre-planned"))
# A tibble: 1,556 x 13
   `Targeted Specific Victi~` `Random Victims` `Bullied (Y/N/~` `Domestic Viol~`
   <lgl>                      <lgl>            <lgl>            <lgl>           
 1 FALSE                      TRUE             FALSE            FALSE           
 2 FALSE                      TRUE             FALSE            FALSE           
 3 TRUE                       FALSE            FALSE            FALSE           
 4 TRUE                       FALSE            FALSE            FALSE           
 5 FALSE                      FALSE            FALSE            FALSE           
 6 TRUE                       TRUE             FALSE            FALSE           
 7 TRUE                       FALSE            FALSE            FALSE           
 8 TRUE                       FALSE            FALSE            FALSE           
 9 FALSE                      TRUE             NA               FALSE           
10 TRUE                       FALSE            FALSE            FALSE           
# ... with 1,546 more rows, and 9 more variables:
#   `Suicide (Shooter was only victim) Y/N/ N/A` <lgl>,
#   `Suicide (shot self immediately following initial shootings) Y/N/ N/A` <lgl>,
#   `Suicide (e.g., shot self at end of incident - time period between first shots and suicide, different location, when confronted by police) Y/N/ N/A` <lgl>,
#   `Suicide (or attempted suicide) by Shooter (Y/N)` <lgl>,
#   `Pre-planned school attack` <lgl>, `During School Day (Y/N)` <lgl>,
#   `During a Sporting Event (Y/N)` <lgl>, ...

Looks good!

Geocoding with the ggmap package

For the purpose of our dashboard, we are interested in creating a map.

To do this, we need to perform a process called geocoding. Geocoding is the process of converting addresses into latitude and longitude coordinates.

To perform the geocoding we need the address of each school in the data set. The data currently does not list the actual address, but does have information about the school where the event occurred.

Since some schools have the same name, we need the city and state data as well. So we will create a new variable in our data called address using the mutate() function from the dplyr package.

This variable will collapse the values in the School, City, and State columns but with spaces in between. It is specified such that there will be space in between by the sep = " " argument.

Note: a space is typed between the quotation marks.

In this way, we then can use the address variable to look up the latitude and longitude for each school.

shooting_data %<>%
  dplyr::mutate(address = 
                  stringr::str_c(School, City, State_abb, sep = " "))

We can take a look at just this new address variable using the pull() function from the dplyr() package.

shooting_data %>%
  dplyr::pull(address) %>% 
  head()
[1] "Hine Junior High School Washington DC"       
[2] "Sousa Junior High Washington DC"             
[3] "Unnamed High School Washington DC"           
[4] "John F. Kennedy High School Cleveland OH"    
[5] "David Starr Jordan High School Long Beach CA"
[6] "Pine Bluff Coleman High School Pine Bluff AR"

Now we can use these addresses to find the latitude and longitude coordinates for each school where a school shooting occurred.

To do this, we will use the geocode() function from the ggmap package to look up these addresses on Google Maps to get the latitude and longitude values. In the geocode() function, we also need to specify that we want to use google as the source using the source argument and that we want latitude and longitude using the output = c("latlon") argument.

This step requires registering with the Google Cloud Platform to get an API key, which currently requires registering your payment information and agreeing to the Google Maps API Terms of Service.

You are not required to do this yourself! We will give you the data.

Click here to see how this process works in general.
Click here to see how we registered with the Google Cloud Platform.

If you were to do this process yourself, you could get an API key here. Again this requires registering your payment information, but it is free to got an API key and enable the APIs, however you can be billed based on how many addresses you look up using the APIs. You need to look up thousands before getting billed.

Then you need to enable the maps and places APIs, by clicking on the boxes next to each:

Then you would register like so after copying the API key: (Note this is a fake key)

ggmap::register_google(key = "mQkzTpiaLYjPqXQBotesgif3EfGL2dbrNVOrogg") 

Once we have obtained an API key and are registered, we can geocode our data.

Note that this step is time intensive, as there are many addresses to look up! Therefore, we will just show how this is done and will not evaluate the code for the next few code chunks. Again we will use the geocode() function from the ggmap package to perform this step.

shooting_data <- 
  shooting_data %>%
  mutate(coords = ggmap::geocode(address, 
                                 output = c("latlon"), 
                                 source = c("google")))

This results in tibble called coords being added to our shooting_data tibble. That’s right, we can have a tibble as a column or variable within a tibble. Using the glimpse function again, and looking at the last few variables, we can see that now the last variable listed is coords of class <tibble>.

If we take a look at the first couple of values of the coords tibble, we see a tibble that looks like this:

It would be better if each of these were their own columns in the tibble, so we will create new longitude and latitude variables again using the mutate function like so:

shooting_data <- 
  shooting_data %>%
  mutate(longitude = pull(coords,lon),
         latitude = pull(coords,lat))

In this case we use the pull() function to grab the lat and lon variables within the coords tibble which is a variable of the shooting_data tibble. This can also be done using the unpack() function from the tidyr package.

We can now remove the coords tibble like so, using the select() function from the dplyr package:

shooting_data <- 
  shooting_data %>%
  dplyr::select(-coords)

Now using glimpse() and looking at the last several variables, we can see that we no longer have a coords variable, but we do have two variables called longitude and latitude that are of class double as indicated by the <dbl>:

Now we will save the geocoded data in the wrangled directory of our data directory using the save function.

This requires listing the R object, followed by the path for where the file should be saved and what it should be called. In this case it will be called shooting_data_wrangled_pre_map.rda. First let’s create a new object called shooting_data_wrangled_pre_map so it is clear in the future what we are working with when we load the data. We will also write this data to a csv file, which can be convenient for collaborators. To do this we will use the write_csv() function from the readr package.

shooting_data_wrangled_pre_map <- shooting_data
save(shooting_data_wrangled_pre_map, file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.rda"))

readr::write_csv(shooting_data_wrangled_pre_map, 
                 file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.csv"))

You can download the wrangled data with latitude and longitude values using the OCSdata package:

# library(OCSdata)
wrangled_rda("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(file = here("OCSdata", "data", "wrangled", "shooting_data_wrangled_pre_map.rda"))

You can also access this data here.

To load the data, you may double click the downloaded .rda file in RStudio, or put the downloaded file in the appropriate directory and use the following command.

load(file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.rda"))

Geometry lists with the sf package

In this section, we will use the sf (which stands for simple features) package to create what is called a geometry list of our latitude and longitude information for the schools where shootings occurred. As some school shootings occurred in the same location, we need to alter all of the locations a bit so that when we plot the data on a map, the spots indicating where school shootings occurred will not overlap for the same location.

Let’s learn how to do this.

shooting_data_wrangled_pre_map %>% 
  select(latitude, longitude) %>% 
  slice_head(n = 5)
# A tibble: 5 x 2
  latitude longitude
     <dbl>     <dbl>
1     38.9     -77.0
2     38.9     -77.0
3     38.9     -77.0
4     41.4     -81.6
5     33.9    -118. 

First, let’s remind ourselves how many rows we have in our dataset.

dim(shooting_data_wrangled_pre_map)
[1] 1556   52

Since tibbles give dimensions, instead of using the function dim(), we might also check the dimensions of our dataset by simply doing so:

shooting_data_wrangled_pre_map
# A tibble: 1,556 x 52
   Date       School   City  State_abb `Reliability S~` `Killed (inclu~` Wounded
   <date>     <chr>    <chr> <chr>                <dbl>            <dbl>   <dbl>
 1 1970-01-05 Hine Ju~ Wash~ DC                       3                1       0
 2 1970-01-05 Sousa J~ Wash~ DC                       3                0       1
 3 1970-01-05 Unnamed~ Wash~ DC                       2                0       0
 4 1970-02-06 John F.~ Clev~ OH                       2                0       1
 5 1970-03-23 David S~ Long~ CA                       2                0       2
 6 1970-04-15 Pine Bl~ Pine~ AR                       3                1       5
 7 1970-04-22 Pierre ~ Wilm~ DE                       3                1       0
 8 1970-05-08 Carver ~ Delr~ FL                       2                0       1
 9 1970-05-15 Ben Lom~ Ogden UT                       2                0       2
10 1970-08-28 Riversi~ El P~ TX                       2                0       1
# ... with 1,546 more rows, and 45 more variables:
#   `Total Injured/Killed Victims` <dbl>, `Gender of Victims (M/F/Both)` <chr>,
#   `Victim's Affiliation w/ School` <chr>, `Victim's age(s)` <dbl>,
#   `Victims Race` <chr>, `Victim Ethnicity` <chr>,
#   `Targeted Specific Victim(s)` <lgl>, `Random Victims` <lgl>,
#   `Bullied (Y/N/ N/A)` <lgl>, `Domestic Violence (Y/N)` <lgl>,
#   `Suicide (Shooter was only victim) Y/N/ N/A` <lgl>, ...

This is important because not all rows have a recorded latitude and longitude.

shooting_data_wrangled_pre_map %>% 
  filter(is.na(latitude)) %>% 
  select(longitude, latitude, address)
# A tibble: 5 x 3
  longitude latitude address                                    
      <dbl>    <dbl> <chr>                                      
1        NA       NA John Marshall High School Los Angeles CA   
2        NA       NA John Marshall High School Los Angeles CA   
3        NA       NA John Marshall High School Los Angeles CA   
4        NA       NA Buell Elementary School Flint MI           
5        NA       NA Country Day High School Estate Concordia VI

Therefore, before we can proceed, we need to remove rows with NA values for the latitude and longitude variables. In other words, we need to remove rows of events that happened at schools with locations that were not identified by Google.

We can remove these rows using the drop_na() function from the tidyr package. We will use a . to indicate that we want to use the data that we are using as an input with our pipe, but then we will specify that we want to only drop rows were there is an NA value for either the latitude or longitude variables.

shooting_data_wrangled_for_map <- shooting_data_wrangled_pre_map %>%
 tidyr::drop_na(c(latitude, longitude))

How many did we remove? Let’s look at the dimension of our new dataset.

dim(shooting_data_wrangled_for_map)
[1] 1551   52

We see that there were 5 events that occurred at schools with unidentified complete locations (missing either latitude, longitude, or both) that were removed from our dataset.

Next, we are ready to convert our coordinates variables (latitude and longitude) into a coordinate simple feature (or sf object) using the st_as_sf() function (converts foreign object to an sf object).

To do this, we need to specify what our coordinate variables are and we will also specify what coordinate reference system,(crs) we would like to use. In our case we will use the ESPG reference number 4326, known as ESPG:4326 or the World Geodetic System (WGS) version 84 which is one of the most commonly used CPS and used by by most global positioning systems, known as GPS. This tells R to use the values for the variables called latitude and longitude as latitude and longitude coordinates.

shooting_data_wrangled_for_map %<>%
  sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4326)

dim(shooting_data_wrangled_pre_map)
[1] 1556   52

We can see that our latitude and longitude variables were used to create a single new variable called geometry of class <POINT [\(^{\circ}\)]>, thus we have one less column.

In this case, this type of variable will always be shown. Even if we were to look at just the first 4 variables using indexes (like this: [1:4]), we will also see our last sf variable appended at the end.

So now we can see all variables related to location (which happen to be the first four variables and the geometry variable) by simply typing [1:4] next to the name of our tibble shooting_data_geocoded.

shooting_data_wrangled_for_map[1:4]
Simple feature collection with 1551 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -161.7705 ymin: 21.31061 xmax: -68.85822 ymax: 61.23202
Geodetic CRS:  WGS 84
# A tibble: 1,551 x 5
   Date       School                   City  State_abb             geometry
   <date>     <chr>                    <chr> <chr>              <POINT [°]>
 1 1970-01-05 Hine Junior High School  Wash~ DC        (-76.97829 38.89261)
 2 1970-01-05 Sousa Junior High        Wash~ DC        (-76.95315 38.88397)
 3 1970-01-05 Unnamed High School      Wash~ DC        (-76.98263 38.86993)
 4 1970-02-06 John F. Kennedy High Sc~ Clev~ OH         (-81.57341 41.4425)
 5 1970-03-23 David Starr Jordan High~ Long~ CA        (-118.1842 33.87129)
 6 1970-04-15 Pine Bluff Coleman High~ Pine~ AR        (-92.05407 34.21608)
 7 1970-04-22 Pierre S. Dupont High S~ Wilm~ DE        (-75.53298 39.76428)
 8 1970-05-08 Carver High School       Delr~ FL        (-80.11266 26.45975)
 9 1970-05-15 Ben Lomond High School   Ogden UT         (-111.951 41.25111)
10 1970-08-28 Riverside High School    El P~ TX        (-106.3723 31.73379)
# ... with 1,541 more rows

To allow our points to not overlap for events that took place in the same location, we will add a bit more range so that they do not overlap one another on our map.

To do this, we will transform the coordinates using the st_transform() function of the sf package into a two dimensional projection (called the Albers equal-area conic projection) with units in meters using the crs 102008 reference from the Environmental Systems Research Institute (ERSI) and then use the st_jitter() function from the sf package to allow a specified amount of range near the actual original GPS coordinates.

To learn more about geospatial coordinate systems see here and here.

So here we can see the output after transforming our data to the crs 102008 reference:

shooting_data_wrangled_for_map  %<>%
  st_transform(crs = "ESRI:102008") 

Notice how the class for the geometry variable is now <POINT [m]> as our data has been transformed into coordinates in meters.

shooting_data_wrangled_pre_map[1:5]
# A tibble: 1,556 x 5
   Date       School                         City     State_abb `Reliability S~`
   <date>     <chr>                          <chr>    <chr>                <dbl>
 1 1970-01-05 Hine Junior High School        Washing~ DC                       3
 2 1970-01-05 Sousa Junior High              Washing~ DC                       3
 3 1970-01-05 Unnamed High School            Washing~ DC                       2
 4 1970-02-06 John F. Kennedy High School    Clevela~ OH                       2
 5 1970-03-23 David Starr Jordan High School Long Be~ CA                       2
 6 1970-04-15 Pine Bluff Coleman High School Pine Bl~ AR                       3
 7 1970-04-22 Pierre S. Dupont High School   Wilming~ DE                       3
 8 1970-05-08 Carver High School             Delray ~ FL                       2
 9 1970-05-15 Ben Lomond High School         Ogden    UT                       2
10 1970-08-28 Riverside High School          El Paso  TX                       2
# ... with 1,546 more rows

And now we will add a jitter to the points using the st_jitter() function, meaning that we will randomly move the coordinates a little bit to allow for points at the same location to not overlap on the map.

You can see the tidyverse explanation about when to use a jitter plot here, they state that a jitter:

adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

In this case we will allow for 50 meters of range using the amount = 50 argument.

shooting_data_wrangled_for_map %<>%
   sf::st_jitter(amount = 50)

We can now see that the coordinates are slightly modified.

shooting_data_wrangled_for_map[1:4]
Simple feature collection with 1551 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -5937651 ymin: -1635649 xmax: 2001949 ymax: 3621440
Projected CRS: North_America_Albers_Equal_Area_Conic
# A tibble: 1,551 x 5
   Date       School                   City  State_abb            geometry
   <date>     <chr>                    <chr> <chr>             <POINT [m]>
 1 1970-01-05 Hine Junior High School  Wash~ DC         (1544769 25000.97)
 2 1970-01-05 Sousa Junior High        Wash~ DC         (1546940 24330.63)
 3 1970-01-05 Unnamed High School      Wash~ DC         (1544914 22217.39)
 4 1970-02-06 John F. Kennedy High Sc~ Clev~ OH         (1129302 256394.4)
 5 1970-03-23 David Starr Jordan High~ Long~ CA         (-1933757 -492216)
 6 1970-04-15 Pine Bluff Coleman High~ Pine~ AR         (345323 -672074.4)
 7 1970-04-22 Pierre S. Dupont High S~ Wilm~ DE         (1638335 149770.5)
 8 1970-05-08 Carver High School       Delr~ FL         (1533744 -1445853)
 9 1970-05-15 Ben Lomond High School   Ogden UT        (-1251326 253167.8)
10 1970-08-28 Riverside High School    El P~ TX        (-937745.4 -916826)
# ... with 1,541 more rows

Note: the geometry values have changed.

Now we will transform our coordinates back into the 3D latitude and longitude degree system again using the st_transform() function and the ESPG:4326, coordinate system.

shooting_data_wrangled_for_map  %<>%
  st_transform(crs = 4326)

shooting_data_wrangled_for_map[1:4]
Simple feature collection with 1551 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -161.7708 ymin: 21.31092 xmax: -68.85789 ymax: 61.23175
Geodetic CRS:  WGS 84
# A tibble: 1,551 x 5
   Date       School                   City  State_abb             geometry
   <date>     <chr>                    <chr> <chr>              <POINT [°]>
 1 1970-01-05 Hine Junior High School  Wash~ DC        (-76.97776 38.89293)
 2 1970-01-05 Sousa Junior High        Wash~ DC        (-76.95337 38.88368)
 3 1970-01-05 Unnamed High School      Wash~ DC        (-76.98281 38.86953)
 4 1970-02-06 John F. Kennedy High Sc~ Clev~ OH        (-81.57356 41.44271)
 5 1970-03-23 David Starr Jordan High~ Long~ CA        (-118.1841 33.87105)
 6 1970-04-15 Pine Bluff Coleman High~ Pine~ AR        (-92.05433 34.21595)
 7 1970-04-22 Pierre S. Dupont High S~ Wilm~ DE        (-75.53289 39.76448)
 8 1970-05-08 Carver High School       Delr~ FL        (-80.11234 26.45968)
 9 1970-05-15 Ben Lomond High School   Ogden UT         (-111.9505 41.2509)
10 1970-08-28 Riverside High School    El P~ TX         (-106.3719 31.7336)
# ... with 1,541 more rows

Notice how the geometry variables are different from what they were originally with this coordinate system:

Next, we separate the geometry variable into longitude and latitude variables again. We can use the st_coordinates() function from the sf package to extract the coordinates from our tibble as a matrix.

shooting_data_wrangled_for_map %<>% 
  mutate(coordinates = as_tibble(st_coordinates(.)))

shooting_data_wrangled_for_map %>%
  pull(coordinates) %>%
  slice_head(n = 4)
# A tibble: 4 x 2
      X     Y
  <dbl> <dbl>
1 -77.0  38.9
2 -77.0  38.9
3 -77.0  38.9
4 -81.6  41.4

Now, just as we did previously we will create new variables called latitude and longitude from the X and Y variables within the coordinates tibble that is part of our shooting_data_wrangled_for_map using the pull() function.

We will also convert our shooting_data_wrangled_for_map object which is currently a sf into a tibble using the as_tibble() function from the tibble package and then we will remove the geometry and coordinates variables using the select() function from the dplyr package with a minus operator in front of the names of the variables to remove.

shooting_data_wrangled_for_map %<>%
  mutate(longitude = pull(coordinates,X),
          latitude = pull(coordinates,Y)) %>%
  tibble::as_tibble() %>%
  select(-geometry) %>%
  select(-coordinates)

And now we can take a look at our last 3 variables using the last_col() function, which is a select() helper function tidyr package (See here for other select() helper functions).

The last_col() function allows us to select either the last column, or with a specified offset we can select a number of columns before the last column. Thus, 2 columns before the last column would be last_col(offset = 2) and then the : symbol is interpreted as through, thus we are selecting for the third to last column through the last column with last_col(offset = 2):last_col().

shooting_data_wrangled_for_map %>% 
  select(tidyr::last_col(offset = 2):last_col()) %>% 
  slice_head(n = 4)
# A tibble: 4 x 3
  address                                  longitude latitude
  <chr>                                        <dbl>    <dbl>
1 Hine Junior High School Washington DC        -77.0     38.9
2 Sousa Junior High Washington DC              -77.0     38.9
3 Unnamed High School Washington DC            -77.0     38.9
4 John F. Kennedy High School Cleveland OH     -81.6     41.4

Great! That looks like we expected.

Finally, we will save our wrangled data, again using save() and we will also write to a CSV file using write_csv().

save(shooting_data_wrangled_for_map, 
          file = here("data", "wrangled",
                      "shooting_data_wrangled_for_map.rda"))

write_csv(shooting_data_wrangled_for_map, 
          file = here("data", "wrangled",
                      "shooting_data_wrangled_for_map.csv"))

Data Analysis and Visualization


If you have been following along but stopped, we could load our data like so:

load(here::here("data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
load(here::here("data", "wrangled", "shooting_data_wrangled_for_map.rda"))

We need the wangled data that is prepared both for the map and the data just before the last wrangling step Geometry lists with the sf package] to prepare for the map because we removed some events that did not have addresses that were identified by Google (had NA values for latitude or longitude). We want to use data for all events for our statistics, tables, and plots.

If you skipped the previous sections click here.

First you need to install the OCSdata package:

install.packages("OCSdata")

Then, you may download the wrangled data .rda files like so:

# library(OCSdata)
wrangled_rda("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(here::here("OCSdata", "data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
# load(here::here("OCSdata", "data", "wrangled", "shooting_data_wrangled_for_map.rda"))

To load the downloaded data into your environment, you may double click on each of the .rda files in Rstudio or using the load() function.

If the package does not work for you, two RDA files (stands for R data) of the data can be found here or slightly more directly here and here. Download these files and then place them in your current working directory. We recommend using an RStudio project and the here package to navigate to your files more easily.

We have put these files in a directory called “wrangled” within a directory called “data” within our working directory (which has a .Rproj file).

load(here::here("data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
load(here::here("data", "wrangled", "shooting_data_wrangled_for_map.rda"))

Click here to see more about creating new projects in RStudio.

You can create a project by going to the File menu of RStudio like so:

You can also do so by clicking the project button:

See here to learn more about using RStudio projects.


Luckily, our data is already in pretty good shape, but we want to make our data more useful for our dashboard.

Let’s shorten the name of the data that was wrangled up to the last step for the map. We will use shooting_data.

shooting_data <-shooting_data_wrangled_pre_map

We will also rename the data that is wrangled for the map to a shorter name:

shooting_data_for_map <- shooting_data_wrangled_for_map

Let’s double check that our data is expected:

dim(shooting_data)
[1] 1556   52
dim(shooting_data_for_map)
[1] 1551   52

Great, looks like we indeed have more rows in our shooting_data as we would expect.

There are several elements we would like to include in our dashboard.

One thing we would like is an interactive table.

Interactive Table


We can do this using the datatable() function from the DT package.

DT::datatable(shooting_data)

This creates a searchable table and the order in which the data is displayed can be toggled to change for each variable.

However, we have many variables or columns in our dataset, so this can be overwhelming. Instead of displaying all of the variables, let’s choose only some of the most interesting to display in our dashboard.

DT_table <- 
  shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)

DT::datatable(DT_table)

Next, we will make some data visualizations.

Yearly Shootings


We would like to create a plot of the number of school shootings per year.

To do this, we will count the number of school shootings per year using the count() function from the dplyr package. We specify that we want to count the unique values of the Date_year variable and name the new column Shootings.

shootings_per_year <-
  shooting_data %>%
  count(Date_year, name = "Shootings")

shootings_per_year
# A tibble: 51 x 2
   Date_year Shootings
       <dbl>     <int>
 1      1970        20
 2      1971        21
 3      1972        18
 4      1973        18
 5      1974        16
 6      1975        14
 7      1976        11
 8      1977        16
 9      1978        16
10      1979        14
# ... with 41 more rows

Good, this looks as expected.

Now to make a plot of this data we will use the ggplot2 package.

Click here for an introduction to ggplot2.

The ggplot2 package is generally intuitive for beginners because it is based on a grammar of graphics or the gg in ggplot2. The idea is that you can construct many sentences by learning just a few nouns, adjectives, and verbs. There are specific “words” that we will need to learn and once we do, you will be able to create (or “write”) hundreds of different plots.

The critical part to making graphics using ggplot2 is the data needs to be in a tidy format. Given that we have just spent time putting our data in tidy format, we are primed to take advantage of all that ggplot2 has to offer!

We will show how it is easy to pipe tidy data (output) as input to other functions that create plots. This all works because we are working within the tidyverse.

What is the ggplot() function? As explained by Hadley Wickham:

The grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinates system.

ggplot2 Terminology:

  • ggplot - the main function where you specify the dataset and variables to plot (this is where we define the x and y variable names)
  • geoms - geometric objects
    • e.g. geom_point(), geom_bar(), geom_line(), geom_histogram()
  • aes - aesthetics
    • shape, transparency, color, fill, line types
  • scales - define how your data will be plotted
    • continuous, discrete, log, etc

The function aes() is an aesthetic mapping function inside the ggplot() object. We use this function to specify plot attributes (e.g. x and y variable names) that will not change as we add more layers.

Anything that goes in the ggplot() object becomes a global setting. From there, we use the geom objects to add more layers to the base ggplot() object. These will define what we are interested in illustrating using the data.


For more of an introduction on creating plots with ggplot2 , see this case study


First, we start with the ggplot() function from the ggplot2 package.

This function requires that the aesthetics aes() be specified. This involves choosing what variable will be plotted on the x-axis and the y-axis.

shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings))

Using the ggplot() function alone will create an empty plot area. To make our plot not empty, we need to select one of the geom_* functions of the ggplot2 package to specify what type of plot we want to create.

Assuming the ggplot2 library is loaded, type geom into the RStudio console and you will see many options to scroll through.

Here, we use a geom_col() plot, which is a particular type of bar plot that uses the actual values to plot, rather than counts, which is the default of geom_bar(). We will specify with the fill argument, that we want our bars to be filled with the color black.

shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings)) +
    geom_col(fill = "black")

We also modify the x-axis using the scale_x_continuous() function. This function allows for specification of the range or limits of the axis using the limits argument. We can use the base seq() function to create a sequence of numbers for each tick mark.

We can add labels to our plot using the labs() function from ggplot2. This has arguments such as x and y for the axes and title and subtitle for titles. We can use NULL to remove a label. For example to remove the x-axis label we can use x = NULL

We will also modify the overall aesthetics of the plot using a theme_* function. See here for a list of options.

start <- 1970
end <- 2020

shootings_per_year_p <- 
  shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings)) +
      geom_col(fill = "black") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      theme_minimal() +
      labs(title = "Yearly School Shootings",
           subtitle = "United States",
           x = NULL,
           y = "School Shootings")

shootings_per_year_p 

Yearly Deaths


Let’s make a similar plot for the number of deaths

deaths_per_year<-
  shooting_data %>% 
  group_by(Date_year) %>%
  summarize(Deaths = sum(`Killed (includes shooter)`))

deaths_per_year_p <- 
  deaths_per_year %>%
    ggplot(aes(x = Date_year, y = Deaths)) +
      geom_col(fill = "black") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      theme_minimal() +
      labs(title = "Yearly Deaths Attributable to School Shootings",
           subtitle = "United States",
           x = NULL)

deaths_per_year_p

Note: When using the summarize() function, we don’t need to use the mutate() function here.

Next, for the purposes of the dashboard, we actually want to create just one plot that shows both the number of school shootings per year and the number of deaths.

We can do so by combining our shootings_per_year and deaths_per_year tibbles together and making what is called a faceted plot, using the facet_wrap() function to create two plots next to one another.

To combine our data we will use the full_join() function from the dplyr package. This maintains all values from both tibbles.

To do so we will be making our table “longer”, meaning that it will have fewer columns and more rows. See here for more information about different table formats, typically referred to as wide and long or sometimes narrow.

We will use the pivot_longer() function from the tidyr package to change the shape of our table.

There are 3 main arguments in this function:

  1. cols - which specifies what columns to collapse
  2. names_to - which specifies the name of the new column that will be created that will contain the column names of the columns you are collapsing
  3. values_to - which specifies the name of the new column that will be created that will contain the values from the columns you are collapsing

To specify that we want to collapse all the columns that have year values, we can choose all those except the Date_year variable by using the - negative operator.

per_year <- 
  full_join(shootings_per_year, deaths_per_year)

per_year %<>%
  pivot_longer(cols = -Date_year, 
               values_to = "events", 
               names_to = "id")

per_year
# A tibble: 102 x 3
   Date_year id        events
       <dbl> <chr>      <dbl>
 1      1970 Shootings     20
 2      1970 Deaths         8
 3      1971 Shootings     21
 4      1971 Deaths         9
 5      1972 Shootings     18
 6      1972 Deaths         6
 7      1973 Shootings     18
 8      1973 Deaths         6
 9      1974 Shootings     16
10      1974 Deaths        12
# ... with 92 more rows

Hmmm, we see the data type of the id column is a character (<chr>). Let’s convert it to a factor, so that the order in which Shootings and Deaths appear is the order in which they appear first rather than by alphabetical order (which is default).

Using the fct_inorder() function from the forcats package, we can easily reorder the id variable`.

per_year %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year
# A tibble: 102 x 3
   Date_year id        events
       <dbl> <fct>      <dbl>
 1      1970 Shootings     20
 2      1970 Deaths         8
 3      1971 Shootings     21
 4      1971 Deaths         9
 5      1972 Shootings     18
 6      1972 Deaths         6
 7      1973 Shootings     18
 8      1973 Deaths         6
 9      1974 Shootings     16
10      1974 Deaths        12
# ... with 92 more rows

Now since we the new variable for the names is called id we will use this as the variable to create the facet like so: facet_wrap(~id). We can also specify that we want both plots to have their own y-axis with the scales = "free" argument. This causes each to have the y-axis automatically scaled for the data in each plot. We can then use the scale_y_continuous() function to set both of the y-axes to be the same.

per_year %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col() +
    facet_wrap(~id, scales = "free") +
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                       labels = seq(0, 120, by = 30),
                       limits = c(0, 121))+
    theme_minimal() +
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = "Number of events",
         x = "Year")+
    scale_fill_manual(values = c("black", "black"))+
    theme(legend.position = "none", 
          legend.title = element_blank(),
          axis.text.x = element_text(angle = 90),
          strip.background =element_rect(fill="cornflowerblue"),
          strip.text = element_text(colour = 'white', face = "bold", size = 14))

Next, we can modify the plot further so that it is more obvious what each plot is showing. We can update the names of the y-axis for each plot by changing the strip.position argument of the facet_wrap() function to be placed on the left rather than above. Currently it is the label in blue that says what the value of the id variable is for each plot. This also requires some modification of the theme() function to place the strip.text outside the plot area and to remove the background.Furthermore, we also change the text using the labeller argument of the facet_wrap() function. The as_labeller() function from the ggplot2 package can change out the id values for other text like in the following code:

per_year %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col() +
    facet_wrap(~id, 
               scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left") +
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                       labels = seq(0, 120, by = 30),
                       limits = c(0, 121))+
    theme_minimal() +
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year")+
    scale_fill_manual(values = c("black", "black"))+
    theme(legend.position = "none", 
          legend.title = element_blank(),
          axis.text.x = element_text(angle = 90),
          strip.background = element_blank(),
          strip.placement = "outside",
          strip.text = element_text(face = "bold", size = 14))

Good, Now this is much easier to interpret.

Our last step in this section is to save the style settings of this plot as theme so we can reuse it for future plots. To do this, we use the base function() function:

theme_dashboard <- function(){ 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))
}

Yearly Cumulative Shootings


Now let’s make another plot of the cumulative deaths each year including those of the previous years. In this case we can use the shootings_per_year object that we previously made.

shootings_per_year
# A tibble: 51 x 2
   Date_year Shootings
       <dbl>     <int>
 1      1970        20
 2      1971        21
 3      1972        18
 4      1973        18
 5      1974        16
 6      1975        14
 7      1976        11
 8      1977        16
 9      1978        16
10      1979        14
# ... with 41 more rows

However, we want to add a new variable using the mutate function called n_cum_sum by using the cumsum() function to calculate a cumulative sum based on the yearly count.

shootings_per_year_cum <- shootings_per_year %>%
    mutate(Shootings = cumsum(Shootings))

deaths_per_year_cum <- deaths_per_year %>%
    mutate(Deaths = cumsum(Deaths))

shootings_per_year_cum
# A tibble: 51 x 2
   Date_year Shootings
       <dbl>     <int>
 1      1970        20
 2      1971        41
 3      1972        59
 4      1973        77
 5      1974        93
 6      1975       107
 7      1976       118
 8      1977       134
 9      1978       150
10      1979       164
# ... with 41 more rows

Next, we join these tables together

per_year_cum <- 
  full_join(shootings_per_year_cum, deaths_per_year_cum)

per_year_cum %<>% 
  pivot_longer(cols = c(Shootings,Deaths ), 
               values_to = "events", 
               names_to = "id")

per_year_cum
# A tibble: 102 x 3
   Date_year id        events
       <dbl> <chr>      <dbl>
 1      1970 Shootings     20
 2      1970 Deaths         8
 3      1971 Shootings     41
 4      1971 Deaths        17
 5      1972 Shootings     59
 6      1972 Deaths        23
 7      1973 Shootings     77
 8      1973 Deaths        29
 9      1974 Shootings     93
10      1974 Deaths        41
# ... with 92 more rows

Good, this looks like we would expect.

Now let’s make a plot like we did before:

per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
    ggplot(aes(x = Date_year, y = events, fill = id)) +
      geom_col() +
      facet_wrap(~id, scales = "free", 
                 labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                          Deaths = "Deaths (# of people)")), 
                 strip.position = "left") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      scale_y_continuous(breaks = seq(0, 1500, by = 500),
                         labels = seq(0, 1500, by = 500),
                         limits = c(0, 1500)) +
      theme_minimal() +
      labs(title = "Cumulative Yearly Shootings and Deaths\nAttributable to School Shootings",
           subtitle = "United States",
           y = NULL,
           x = "Year") +
      scale_fill_manual(values = c("black", "black")) +
      theme_dashboard()

Note: the limits for the y-axis were determined by first plotting without the scale_y_continuous() function.

Deaths per Shooting


Next, we will make a plot of the number of deaths per school shooting based on the Killed (includes shooter) variable.

Question Opportunity

See if you can come up with the code for the plot.


Click here to reveal the answer.
deaths_per_event <-
  shooting_data %>%
  group_by(`Killed (includes shooter)`) %>%
  count() %>%
  ungroup()

per_shooting_plot <-deaths_per_event %>%
  ggplot(aes(y = `Killed (includes shooter)`, x = n)) +
    geom_col(fill = "black")+
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "School Shootings",
         y = "")

per_shooting_plot 

This plot could also have been made using geom_bar() instead of geom_col() this makes a similar plot but automatically uses the count for one of the axes, thus it is not required to first summarize the data using the count() function.

shooting_data %>%
  ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_bar(fill = "black") +
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "School Shootings",
         y = "")

Because of the skewed distribution, it is difficult to see the school shootings that had more numerous deaths, so we will add a facet that zooms in on this portion of the plot. We can do so, using the facet_zoom() function from the ggforce package.

shooting_data %>%
  ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_bar(fill = "black") +
    scale_x_continuous(breaks = seq(0, max(pull(shooting_data, 
                                    `Killed (includes shooter)`)), by = 1),
                       labels = seq(0, max(pull(shooting_data,
                                    `Killed (includes shooter)`)), by = 1)) +
    ggforce::facet_zoom(xlim = c(4, max(pull(shooting_data, 
                                    `Killed (includes shooter)`))), 
                        ylim = c(0,20)) +
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "Deaths per shooting",
         y = "Number of events with given number of deaths") +
  theme(axis.text.x = element_text(angle = 90))

It is still difficult to see. Let’s try some other options.

The geom_freqpoly() function creates a graph that makes it very easy to see that most school shootings result in zero or one death and that the maximum number of deaths in this data for a single event is in the upper twenties.

shooting_data %>%
    ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_freqpoly()

This really shows that most school shooting events luckily result in no deaths, but what are the actual proportions of school shootings that end in 0 deaths, 1 death, 2 deaths, etc. One way to look at this is to calculate the percentage of events that resulted in each number of deaths. We can do this by dividing the number of events by the overall sum of events and multiplying by 100. The base round() function can round this value to the nearest 1 decimal place by specifying that we want 1 digit after the decimal with digits = 1.

deaths_perc_event <-
  shooting_data %>%
  count(`Killed (includes shooter)`) %>%
  rename("num_events"= n) %>%
  mutate(percent = round(num_events/sum(num_events)*100, digits =1))

deaths_perc_event
# A tibble: 11 x 3
   `Killed (includes shooter)` num_events percent
                         <dbl>      <int>   <dbl>
 1                           0        969    62.3
 2                           1        505    32.5
 3                           2         58     3.7
 4                           3         12     0.8
 5                           4          1     0.1
 6                           5          3     0.2
 7                           6          3     0.2
 8                          10          2     0.1
 9                          15          1     0.1
10                          17          1     0.1
11                          28          1     0.1
deaths_perc_event %>%
  ggplot(aes(x =`Killed (includes shooter)`, y = percent)) +
    geom_col()

We can see that greater than 60% of the events had no deaths. It is however, the plot is still unsatisfactory because there is such a long tail.

Next, we can try collapsing the events that resulted in 4 or more deaths together and create a pie chart which you are likely familiar with as well as alternative plot called a waffle plot.

First to collapse the percentage for the events that had 4 or more deaths, we need to do a bit of wrangling.

We will start with filtering the data to only these events and then we will sum each of the columns using the base R function colSums() with the goal of creating a new row in the deaths_perc_event object that will contain information about all events with 4 or more deaths. We will use the >= greater than or equal to operator.

greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

greater_than4
Killed (includes shooter)                num_events                   percent 
                     85.0                      12.0                       0.9 

Good, now we know the overall percentage for the events that unfortunately resulted in more than 4 deaths.

Next, we combine this with the rest of our data using the bind_rows() function from the dplyr package which appends a tibble to another.

[source]
deaths_perc_event %<>%
  bind_rows(greater_than4)

deaths_perc_event 
# A tibble: 12 x 3
   `Killed (includes shooter)` num_events percent
                         <dbl>      <dbl>   <dbl>
 1                           0        969    62.3
 2                           1        505    32.5
 3                           2         58     3.7
 4                           3         12     0.8
 5                           4          1     0.1
 6                           5          3     0.2
 7                           6          3     0.2
 8                          10          2     0.1
 9                          15          1     0.1
10                          17          1     0.1
11                          28          1     0.1
12                          85         12     0.9

Next, we add a new variable so that it is easy to plot and interpret the number of deaths for each percentage.

We will add the word “deaths” to each value in the Killed (includes shooter) variable using the base paste0() function. Note that this function automatically will result in no space or any other character between pasted elements. The paste() function can alternatively be used for those cases.

deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, 
                           " deaths ", "(", percent, "%)")) 

deaths_perc_event
# A tibble: 12 x 4
   `Killed (includes shooter)` num_events percent category        
                         <dbl>      <dbl>   <dbl> <chr>           
 1                           0        969    62.3 0 deaths (62.3%)
 2                           1        505    32.5 1 deaths (32.5%)
 3                           2         58     3.7 2 deaths (3.7%) 
 4                           3         12     0.8 3 deaths (0.8%) 
 5                           4          1     0.1 4 deaths (0.1%) 
 6                           5          3     0.2 5 deaths (0.2%) 
 7                           6          3     0.2 6 deaths (0.2%) 
 8                          10          2     0.1 10 deaths (0.1%)
 9                          15          1     0.1 15 deaths (0.1%)
10                          17          1     0.1 17 deaths (0.1%)
11                          28          1     0.1 28 deaths (0.1%)
12                          85         12     0.9 85 deaths (0.9%)

We can change the value for the last row about the events that resulted in more than 4 deaths.

We can use the last() function from the dplyr package combined with the pull() function to specifically grab this value.

last(pull(deaths_perc_event, category))
[1] "85 deaths (0.9%)"

Using the case_when() function, we can change this value:

deaths_perc_event %<>% 
  mutate(category =
           case_when(category == last(pull(deaths_perc_event, category)) ~ 
                        paste0("4+ deaths ", "(", percent, "%)"),
                     category == "1 deaths" ~ "1 death",
                     TRUE ~ category))

deaths_perc_event
# A tibble: 12 x 4
   `Killed (includes shooter)` num_events percent category        
                         <dbl>      <dbl>   <dbl> <chr>           
 1                           0        969    62.3 0 deaths (62.3%)
 2                           1        505    32.5 1 deaths (32.5%)
 3                           2         58     3.7 2 deaths (3.7%) 
 4                           3         12     0.8 3 deaths (0.8%) 
 5                           4          1     0.1 4 deaths (0.1%) 
 6                           5          3     0.2 5 deaths (0.2%) 
 7                           6          3     0.2 6 deaths (0.2%) 
 8                          10          2     0.1 10 deaths (0.1%)
 9                          15          1     0.1 15 deaths (0.1%)
10                          17          1     0.1 17 deaths (0.1%)
11                          28          1     0.1 28 deaths (0.1%)
12                          85         12     0.9 4+ deaths (0.9%)

Question Opportunity

We could of used thestr_replace() function from the stringr package to replace the value for the last row. This function would directly change the value of “85 deaths” to “4+deaths”, but this would not be as reproducible. Why is that?


Click here to reveal the answer.

Say we used this code again after the data got updated. Then there may be more deaths in this category and therefore this value would no longer be “85 deaths”. Instead, by using case_when(), we can use an expression for the last value of the deaths_perc_event tibble and replace that, regardless of what the value is, with “4+deaths”. Recall that case_when() replaces all other values that are not specified with NA. We do not want to lose the other values for the category variable. So to avoid this, we assign each of the values that are not the last value or the "1 deaths" value to what they currently are for the category variable, using TRUE ~ category (Note that all remaining unassigned values are indicated as TRUE).

We could also actually type out the percentage of 4+death cases, but it is always more reproducible to instead use an expression that will evaluate to the value we want. This way if we were to update our data with additional school shooting events, this evaluation would also update.


OK, this looks as we hoped. OK, now we are ready to make plots.

Let’s start with the pie chart. Historically, this has become a bit controversial type of plot. However, it can be very useful when you are actually looking at percentages and the goal is to see major trends in the data, such as all the groups are roughly equal or one group is particularly larger than the rest. When this is the case and you are presenting the data to an audience that is less familiar with data science, they may expect to see a pie chart. Thus it is useful to know how to make one. However, in most other cases pie charts do a poor job at allowing us to see more subtle differences, and they are particularly confusing when we are not looking at proportions, but raw counts. In those cases it is better to use a bar chart as we have already done.

There is no geom_* function that allows you to create a pie chart directly. Instead we will create our bar plot as we have and then use the coord_polar() function to wrap our y axis into a circular shape.

deaths_perc_event %>%
  filter(percent>0.5) %>%
  ggplot(aes(x = "", y = percent, fill = category)) +
      # adding color here adds a black outline
    geom_col(color = "black") +
    coord_polar("y", start = 0) +
    scale_y_continuous(breaks= NULL) +
    theme_minimal() +
    theme(axis.title = element_blank()) +
    scale_fill_viridis_d() +
    labs(title = "Percentages of school shooting deaths\n(including the shooter)")

This is actually a fairly easy plot to interpret. We can see that most events resulted in zero deaths and that the next largest proportion resulted in one death, while a sizable but small proportion resulted in two deaths. A very small proportion resulted in three or four or more deaths.

We also can create a waffle plot. This plot offers one advantage over the pie chart, in that it also allows for easier interpretation of more subtle proportion differences while also showing big picture differences in efficient manner.

First, we filter for only the data that we want to plot. We only want the 0,1,2,3, or 4+ categories. We can do so by using the str_detect() function from the stringr package. This allows us to find the values that match multiple patterns. The patterns are separated by the | or operator. Thus any value matching any of the patterns should be kept. Notice that the \\ is necessary before the + so that is not interpreted as a mathematical plus sign.

The waffle() function requires that the data be in wide format. Thus we need to use pivot_wider() of the tidyr package to do so. This is very similar to the pivot_longer() function, however in this case we need to specify what existing column contains the names for the new columns using names_from and what existing column contains the values for the new columns using values_from.

deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                     values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title="Deaths Per School Shooting", 
                 xlab="1 square ~ 1%") +
  scale_fill_viridis_d()

Percentages

We are also interested in including statistics in our dashboard. For example, we are interested in how many shooters committed or attempted suicide.

We previously converted variables with yes or no answers because they were inconsistently coded as yes/ y and no/n. Furthermore, logical variables are easier to work with in terms of performing calculations because TRUE values are treated like a 1 while FALSE values are treated like a 0.

We can calculate the percentage of shooters that committed or attempted suicide out of all entries that have data for this information. Thus we do not want to include NA values in the calculation, otherwise this might give us a distorted picture of the truth.

Let’s take a look at the data for this variable:

shooting_data %>% 
  count(`Suicide (or attempted suicide) by Shooter (Y/N)`)
# A tibble: 3 x 2
  `Suicide (or attempted suicide) by Shooter (Y/N)`     n
  <lgl>                                             <int>
1 FALSE                                              1308
2 TRUE                                                203
3 NA                                                   45

We can see that there are 45 NA values.

If we calculate a sum of the TRUE values, (which are those that are equivalent to 1), we can do so by just summing this variable, which is equivalent to summing values that are greater than 0.

sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`), 
    na.rm = TRUE)
[1] 203
sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`) > 0, 
    na.rm = TRUE)
[1] 203

In contrast, FALSE values are those that are equivalent to 0. Thus if we want to divide by the sum of all values that are FALSE are TRUE, then we can sum all values greater than or equal to 0.

sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`) >= 0, 
    na.rm = TRUE)
[1] 1511

Thus, we can calculate the percentage of all reporting values like so, where the TRUE values are divided by the sum of all TRUE and FALSE values: (We also multiply by 100 using *100 to get the percentage value.)

suicide <- 
  (sum(pull(shooting_data,`Suicide (or attempted suicide) by Shooter (Y/N)`), na.rm = TRUE) /
   sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE))*100

suicide
[1] 13.43481

We can use the round() function to round this value and the format() to make sure that the value has the correct number of digits.

suicide <- round(suicide, 2)
suicide
[1] 13.43

If after rounding we wanted zeros after the decimal so that the number of digits after the decimal was consistent for the different statistics we were displaying, we could use the format() function to specify this.

So we can add a zero after 13.4 like so:

format(suicide, nsmall = 2)
[1] "13.43"

To calculate the percentage of school shootings where this information was reported we can do the following, by calculating all values that are not NA using >=0 and calculating the number all possible values using the base length() function.

reporting_suic <- 
  (sum(pull(shooting_data, 
            `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, 
       na.rm = TRUE) /
   length(pull(shooting_data, 
               `Suicide (or attempted suicide) by Shooter (Y/N)`))
   )*100

reporting_suic <- round(reporting_suic, 1)
reporting_suic
[1] 97.1

We can see that 97% of the school shootings have information about this variable.

It is important to check and report this percentage so that people can better understand if our percentages are reliable. If only 2% of the school shootings had this information and in all cases of the 2% the school shootings involved a suicide (or attempt), then this would lead people to believe that 100% of school shootings involve a shooter suicide (or attempt). This would clearly be misleading! In our case the majority of the school shootings included this information, so we will indeed report the percentage and we will also let people know how much of the school shooting data had this information.

Question Opportunity

Now try to perform variations of these calculations to calculate other statistics for our dashboard, such as the percentage of the shooters that were male or the percentage of events where a single handgun was used, (hint the Firearm Type value will be Handgun).


Shooter Was Male


Click here to reveal the code.
gender <- paste(as.character(round(100 * (sum(
    case_when(pull(shooting_data,`Shooter Gender`) == "Male" ~ TRUE,
                                                        TRUE ~ FALSE),
                                      na.rm = TRUE)
    /
      sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_male <- (sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Shooter Gender`)))*100
reporting_male <- round(reporting_male, 1)


gender
[1] "95.1 %"
reporting_male
[1] 88.6

Use of a Single Handgun


Click here to reveal the code.
handgun <-paste(as.character(round(100 *(sum(case_when(
      pull(shooting_data,`Firearm Type`) == "Handgun" ~ TRUE,
                                                 TRUE ~ FALSE), na.rm = TRUE)
    /
      sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_gun <- (sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Firearm Type`)))*100
reporting_gun <- round(reporting_gun, 1)

handgun
[1] "81.7 %"
reporting_gun
[1] 83.4

Dashboard Basics


We are now ready to build our dashboard!

Let’s introduce some basics about creating dashboards in R in the flexdashboard package.

Note that you can also start the case study at this point, we will let you know how to get the data that you need.

Dashboard packages


To make our dashboard we will use three very useful packages:

  1. flexdashboard

Flexdashboard is a package that was created by RStudio and released in May of 2016. This package allows for users to more easily create dashboards using R Markdown.

See here for a video about flexdashboard and here for a more information on how to use this package.

  1. leaflet

Leaflet is the leading open-source JavaScript library for interactive maps and is used by many websites. The leaflet R package allows for users to more easily integrate leaflet maps in R, to create maps like the one below. We will use this package to create a map of where school shootings have occurred in the US.

Here is an example of an interactive map made with leaflet

  1. shiny

Shiny is an R package that makes it easier to create interactive web applications in R. See here for a gallery of examples. People have created a variety of diverse applications using this package- from interactive websites to games.

Here is an screenshot of a shiny app.

[source]

See here for a list of other packages that are useful for adding elements to dashboards created with the flexdashboard package.


R Markdown


The case study that you are reading right now was created using an R Markdown document. This means that it is a document that uses the Markdown language syntax with enhanced capabilities of executing R code in the document.

In fact, if you click the button that says “code” on the upper right corner at the top of the HTML you will download the R Markdown document for this case study.

R Markdown (Rmd) is a file format that contains Markdown syntax and embedded R code (it can also incorporate code from some other languages like Python and SQL).

Click here to see how this video was embedded in this R Markdown.

This video was included using the vembedr package. Videos on Vimeo or YouTube can be added like so, where a url is added within quotation marks and the following two lines of code allow for the video to be centered in the R Markdown output. See [here](vembedr to learn more about embedding videos with this package.

library(vembedr)
embed_url("https://vimeo.com/178485416") %>%
  div(class = "vembedr") %>%
  div(align = "center")

These Rmd files can be rendered into a variety of file outputs like PDF, word, HTML etc. by the knitr and rmarkdown packages.

This relies on conversion of the Rmd file into the Markdown language by software called Pandoc.

Markdown (which has been implemented by many languages, such as Perl, Java, Python, C#, Ruby, etc.) is a language of a particular class of programming languages called lightweight markup languages (LML).

LMLs have relatively simple and intuitive syntax, and are therefore relatively easy to write and read and are converted by software into some type of less human-friendly language to create an output document like a PDF or an HTML file. In fact, multiple output files can be created from the same LML file!

In our case we are interested in rendering our Rmd document into a website. The code in our R Markdown document will be interpreted and converted ultimately into HTML code.

Although LMLs tend to be quite similar, here you can see some of the differences in syntax:

[source]

See this book for more information on working with R Markdown files.

The RStudio cheatsheet for R Markdown and this tutorial are great for getting started.

Flexdashboard


There are several important features about the R Markdown language that the flexdashboard package leverages.

These features are used to specify the layout and elements of the dashboard.

Here are some major R Markdown features to keep in mind for flexdashboard:

  1. The beginning of an R Markdown document is what is called the YAML header. This is delineated by --- three dash marks before and after the header YAML code.

Like so:

[source]

YAML is yet another language, but unlike Markdown it is a data-oriented language and is often used for the configuration of software or to set up how a software program should work.

Whatever code you put in the YAML header will influence the rest of the document and essentially set up how the R Markdown document will render. In the example above, the type of output is specified.

Other more complicated features can be included. For example, we can specify that we are creating a dashboard with flexdashboard and we can specify how we want the layout of our dashboard to be displayed like so:

[source]

We will describe this in more detail soon.

  1. To add a page to a navigation bar (also called a navbar) the following syntax is used =======. The number of dashes does not matter. (This is a level 1 header in Markdown, just like #)

  2. To add columns or rows the following syntax is used ---------. By default this notation will create new columns, however if the YAML is modified to specify to create rows, than this same syntax will be used to create rows. The number of dashes does not matter. (This is a level 2 header in Markdown, just like ##)

  3. Components within the dashboard are delineated by using ### - if you are familiar with Markdown notation, this is a level 3 Markdown header.

If this includes text like so: ### text, this adds header text to the component, however this is not required.

  1. To include a plot or any output from R, use the following syntax: "```{r}" on it’s own line followed by your code, followed by "```". This creates what is called a code chunk.

[source]
  1. Another component of flexdashboard is value boxes. These are essentially text boxes for statistics or text that you might like to feature or emphasize. To do this again the ### syntax is used to put a text label describing what the value box contains followed by a code chunk that uses the valueBox() function from the flexdasboard package. The value to display is specified using the value argument, as well as optional other aspects using additional arguments, such as the color of the value box using the color argument like the example below:
### ValueBoxText

'''{r}
valueBox(value = 10
  color = "white")

'''

Note: in our examples of code we will use "'''" instead of "```". This is only to allow for easy viewing of examples. All code chunks require "```".

Here you can see a more thorough example which includes icons:

[source]
  1. Instead of value boxes you can also include a slight variation called a gauge. These are created with the guage() function from the flexdashboard package. This requires numeric values for a value, a min, and a max argument. Optionally, a symbol can also be added with the symbol argument. The value argument does not have to be explicitly called though, which is also true of the valueBox() function.

Here is a simple example:

### GuageText

'''{r}
flexdashboard::gauge(value = 10, 
                       min = 0, 
                       max = 100, 
                    symbol = "%")

'''

This creates the following output:

Here is a more complicated example:

Layout


Adding Columns


To add multiple columns the following syntax is used --------- for each column and nothing additional is required in the header.

Additional features about the columns, such as the width can be specified using brackets{}like in the example below. Note that the word Column isn’t necessary. In this example two columns are created that will be oriented next to one another and elements within the columns will be placed top to bottom.

Adding Rows


To add multiple rows - the yaml needs to state that the orientation is for rows instead of for columns (see the image below), and then the same syntax is used --------- for each row instead of columns. In this example, two rows are created that will be oriented on top of one another and elements within the rows will be placed next to each other.

Again the word Row is not actually necessary.

[source]

See here for template options.

Tabs


To add tabs columns/rows we can use the following:

Column {.tabset}

In this example, two columns are created and then two tabs are added to the second column.

Shiny


Interactive elements can be added to dashboards. In our dashboard, we will use packages such as DT and leaflet that have shiny functionality. This requires that shiny is enabled in the YAML header by including runtime:shiny in the YAML.

Here is an example of a YAML that includes this:


Deployment


You have a few options to publish your dashboard:

  1. If your dashboard is not interactive (does not use shiny) or uses certain widgets like the datatable() function from the DT package, you just need to knit your R Markdown file into an html file.

Then you can host this on GitHub if you choose by changing the GitHub Pages settings of your repository:

  1. If your dashboard is interactive (uses shiny), you can host it on https://www.shinyapps.io/ after making an account. To do this you need to install the rsconnect package and after you have made an account and configured it, you can use the publish button of the RStudio IDE which looks like this on the upper right corner:

Note that this also requires authentication using tokens.

See this link for a getting start for this process.

  1. You can also publish using RStudio Connect. This also involves creating an account and pushing the publish button.

Question Opportunity

Let’s take a minute to test your knowledge about flexdashboard basics:

  1. How do we create multiple pages?
  2. How do we create multiple columns?
  3. How do we create multiple tabs?
  4. How do we start creating a dashboard?
  5. How do we enable our dashboard to be interactive?

Click here to reveal the answers.
  1. How do we create multiple pages? We use the === syntax.
  2. How do we create multiple columns? We use the --- syntax.
  3. How do we create multiple tabs? We use {.tabset} syntax combined with the column break --- syntax.
  4. How do we start creating a dashboard? We create an R Markdown document and we add output: flexdashboard::flexdashboard to the YAML.
  5. How do we enable our dashboard to be interactive? We add runtime:shiny to the YAML.

Our Dashboard


OK! Now that we know a bit about the basics of creating a dashboard, let’s create our own.

The link to the dashboard described in this section is located here.

We want to create a dashboard that has several tabs that will look like this:

Getting started


The first thing we need to do to create our dashboard is to create a new .Rmd document like so in R Studio:

YAML header


Next we need to update the YAML header to look like this:

As you might expect, title: indicates the title of our dashboard.

The output: line specifies what type of output we want the .Rmd file to be rendered.

We need to include flexdashboard::flex_dashboard: as the output to create a dashboard with the flexdashboard package. This can be included on the same line as output: or on the next line with a preceding tab.

Note: that YAML is sensitive to spacing, thus this tab is required to get the proper output.

The next four lines are arguments for how the dashboard should be created.

  1. logo: allows you to include a logo on top of your dashboard. With this theme this will be in the upper left corner. The logo we chose to use came from here, but you could theoretically use any png of appropriate size.

  2. theme: allows you to specify how the dashboard will look in general. Note that this can be used to modify the general look of any type of R Markdown output, not just dashboards created with flexdashboard. See here for a list of options. In our case, the theme is called readable and will create documents that look like this:

[source]
  1. orientation: the options are columns or rows and specifies if the -------- syntax creates rows or columns for the layout. This is not necessary if the option is columns.

  2. source_code: specifies if a URL will be included as a navigation bar item with access to the source code.

  3. vertical_layout: The options are fill or scroll. Fill causes the charts to re-size to fill the page, while the scroll option renders plots as their natural height which may or may not require scrolling the page.

There are many other argument options for how the dashboard is displayed.

You can run the following command in the console to see more information about the arguments in the help pane of the R Studio IDE.

?flexdashboard::flex_dashboard()

Also see the CRAN documentation for more details.

Loading the packages and data


Since we are creating our dashboard in a new Rmd file, we need to load the necessary packages and the wrangled data that we created in this Rmd file. In that Rmd file, it looks something like this.

Note: all the following code would be added to the Rmd file for the dashboard and are simply shown here for illustrative purposes.

library(here)
library(readr)
library(dplyr)
library(flexdashboard)
library(shiny)
library(magrittr)
library(forcats)
library(stringr)
library(waffle)
library(tidyr)
library(poliscidata)
library(leaflet)
library(htmltools)
library(DT)

For more information about what these packages were used for, see the beginning of this case study and the [Helpful Links] section. The data can be found and downloaded from our GitHub repository at this link. In our case we saved this to a subdirectory called wrangled within a directory called data of our working directory. We recommend using RStudio projects and the here package to make navigating to files easy and reproducible.

shooting_data <- 
  read_csv(here("data", "wrangled",
                "shooting_data_wrangled_pre_map.csv"))

shooting_data_for_map <- 
  read_csv(here("data", "wrangled",
                "shooting_data_wrangled_for_map.csv"))

Creating pages


Recall that === is used to designate elements that are part of the navigation bar.

We want 7 items besides the source code (which was added automatically based on the YAML code).

First, we create 7 divisions for these main pages. We add icons to each from Font Awesome.

Use this link to find other icon options. If you click on the “start using this icon” button it will take you to a page with HTML code like this:

[source]

Only the fa-database portion is required in the brackets after data-icon= to add the icon to the navigation bar.


About {data-icon="fa-question-circle"}
====================================

The Data {data-icon="fa-database"}
===================================== 

US Statistics {data-icon="fa-flag"}
=====================================

State Statistics {data-icon=fa-flag-checkered}
====================================

Map {data-icon="fa-map"}
====================================

Tutorial {.storyboard data-icon="fa-list-ol"}
====================================

Hotline {data-icon="fa-exclamation-triangle"}
====================================

The About Page


Here, we create content in the About page.

Look


This is what the page will look like:

Overall Structure


Here is the overall structure for this page:

Details



Click here if you would like to see all of the code for this page.

On this page we will have two columns - one which will be wider than the other. Size specifications on flexdashboard are unit-less; the width of any column included on a page is a function of the width set for a column against the sum of widths for all columns on that page. If we set columns sizes of 600 and 300 on a page with two columns, one column will be twice as large as the other column. We want the left column to be quite a bit larger than the right, so we will set the left as 70 and the right as 30.

Question Opportunity

Can you recall how we would make these columns?


Click here to reveal the code.
About {data-icon="fa-question-circle"}
===================================== 

Column {data-width = 70}
-------------------------------------

###

Column {data-width = 30}
-------------------------------------

###

Recall that ### is used to add elements to columns and rows. Note that there is no text next to the ### syntax that designates an element of our dashboard. In the previous examples, a header was used like so ### header:

[source]

We do not actually want a header now, so we can simply use ### without any text following it. Note that you can get away with not using the ###, but some elements will not render properly.

Next, we add a block of text describing the dashboard to the first column and we will add an image to the second column like the following. Notice that two asterisks ** around text makes them appear as bold and one * makes it appear as italic. See this RStudio cheatsheet for some basic Markdown syntax for stylizing text:

[source]

This is what the code for this page looks like (notice that there is an internal link to the Tutrial page):

About {data-icon="fa-question-circle"}
===================================== 

Column {data-width=70}
-------------------------------------

### 

**What is the purpose of this dashboard?**

This dashboard has two purposes:

1. To illustrate trends in school shooting events in the United States
2. To demonstrate how to create a dashboard using `R`

**The data**

This dashboard uses data from the open-source [K-12 Shool Shooting Database](https://www.chds.us/ssdb/dataset/) downloaded from the [Center for Homeland Defense and Security](https://www.chds.us/c/) at the at the [Naval Postgraduate School(NPS)](https://en.wikipedia.org/wiki/Naval_Postgraduate_School). This data was downloaded June of 2020.

<style>
div.green { background-color:#8FBC8F; border-radius: 5px; padding: 20px; font-size: 1em;color: white;}
</style>
<div class = "green">
Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, [www.chds.us/ssdb](www.chds.us/ssdb).
</div>


  
This database includes information about school shooting events for students in grades K-12 in the United States dating back to 1970. The database has additional information not shown on our dashboard including, but not limited to: location of the event at the school, source for the shooting information, shooter characteristics, and victim characteristics. 

### 


<u>**Want to learn how to create a dashboard just like this?**</u>

Visit the [*Tutorial*](#tutorial) page of this dashboard to first learn the basics about building a dashboard with the `flexdashboard` package.

At the end of the tutorial we provide a link to this [supplementary resource by the Open Case Studies project](https://opencasestudies.github.io/ocs-bp-school-shootings-dashboard/), which provides more detailed information about how ***this dashboard*** was created.

<style>
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px; font-size: .8 em;}
</style>
<div class = "blue">

 **Acknowledgements**

This was created as part of the [Open Case Studies](https://opencasestudies.github.io){target="_blank"} project. We would like to acknowledge the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/) for funding this work. 

 **Disclaimer**

This dashboard uses data from the [K-12 Shool Shooting Database](https://www.chds.us/ssdb/about/). We acknowledge (like their website) that there may be reporting errors. The trends and statistics shown do not account for the many other factors that may influence the occurrence of shooting events. The dashboard should not be used in the context of making policy decisions without external consultation from scientific experts. 


 **License**

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 [(CC BY-NC 3.0)](https://creativecommons.org/licenses/by-nc/3.0/us/){target="_blank"} United States License.
</div>

Column {data-width=30}
-------------------------------------

###


'''{r, echo=FALSE, fig.cap="[Photograph by Nathan Dumlao](https://unsplash.com/photos/xPHmmVKS8lM)"}
knitr::include_graphics(here::here("img", "nathan-dumlao-xPHmmVKS8lM-unsplash.jpg"))
'''

Note that we will use "'''" for to show code chunks of the actual code from the dashboard.

The image used in this second column is from a website called unsplash (https://unsplash.com/) which hosts images for free use but includes information about the photographer if you chose to credit them. A short link for this image was found by clicking on it and then clicking the share button.

Notice the echo = FALSE specification for the code chunk which causes the code to be evaluated but but not shown, while fig.cap adds the figure caption.

The image is included using the include_graphics() function from the knitr package. We need to specify where this image is located for this to work. You can do this without specifying a path if the image file is in the same directory as your .Rmd file that you are using to create your dashboard. However using the here() function from the here package we can organize our files a bit. This function will automatically start the path wherever we have included an RStudio project file, this can be done in RStudio like so:

If you are new to using RStudio projects, please see this link for more information.

Then if we create a directory or folder called img and place our image files in this directory, then we can specify the full path to this file on our computer, by just using here::here("img", "name_of_image.png"). The include_graphics() function works for a variety image file types.

Also you may have noticed the <style> html code to add a blue and green background to portions of the text.

The text that we want altered with this particular style is delineated by the <div> to start and the </div> to end the style.

Let’s take a look at the first one to explain what is happening here:

<style>
div.green { background-color:#8FBC8F; border-radius: 5px; padding: 20px; font-size: 1em; color: white;}
</style>
<div class = "green">
Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, [www.chds.us/ssdb](www.chds.us/ssdb).
</div>

The instructions for the style are within the <style> and </style> content dividers. Inside these dividers is CSS code, which is what is used to stylize HTML. The div.green is the name of this particular style which involves a particular background color (#8FBC8F - see here for more options), with a boarder radius of 5 pixel to round the edges of the background color around the text with a size 5 pixel radius. The code also states that a padding specification for the size of the margins of the text box around the text and it specifies that font should be of 1 em units (which stands for element - thus 1 unit relative to the size of the element) and that the font should be white.

The div.green specifies that green is the name of this style, thus we can then use <div class = green> (called a CSS selector) to style the text this way. This can then be used again any time we want this style like so:

<div class = "green">

text 

</div>

See this website to learn more about HTML and CSS.


The Data Page (Interactive)


Let’s create a page about the data that we are using.

Look


This is what the page will look like:

Overall Structure


Here is the overall structure for this page:

Details



Click here to see the code for this page.

To create the structure for this page that will display the data, we have two columns, with the first one (on the left) wider than the other. Again we have a block of text in the column on the left like so:

The Data {data-icon="fa-database"}
===================================== 


Column {data-width=70}
-------------------------------------

###

The data used in this dashboard is from the [**Center for Homeland Defense and Security (CHDS)**](Center for Homeland Defense and Security (CHDS)) [**K-12 Shool Shooting Database**](https://www.chds.us/ssdb/about/). 

Their methods for identifying and authenticating incidents are outlined [here](https://www.chds.us/ssdb/methods/).

According to their website: 

*"The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. **All of the information is based on open-source information and 3rd party reporting... and may include reporting errors.**"*

***


Column {data-width=30}
-------------------------------------

###

Now we will add our DT_table to the first column. First, we need to include the code that we previously used to create the DT_table in our dashboard .Rmd file:

DT_table <- shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)

We then include some code to render this interactive table in our dashboard. Since we have shiny enabled in our YAML header, we can use the renderDataTable() of the DT package to produce the output we desire.

We also want to use the options argument to specify how the data is rendered. The scroller = TRUE argument adds a scroll bar to the table, the scrollY argument specifies that the scroll bar should be for they Y axis direction (up and down) of the table and specifies how large the scroller should be, the pageLength argument specifies how many rows should be displayed simultaneously within the table, and the autoWidth = TRUE argument specifies that the table should fit the space of the column or page it is within.

We will also add a caption with a link to the original data using the tags() and withTags() functions of the htmltools package. Different options for types of tags can be selected using the $.

DT::renderDataTable({
  DT::datatable(DT_table,
                caption = htmltools::tags$caption(
                  style = 'caption-side: top; text-align: Left;',
                  htmltools::withTags(
                    div(HTML('<a href="https://www.chds.us/ssdb/about/)">Click here to be redirected to a page where this data can be downloaded.</a>')))),
                  options = list(autoWidth = TRUE,
                                 pageLength = 10,
                                 scroller = TRUE,
                                 scrollY = '450px'))
})

We will also add another image to the column on the right, overall the code looks like this:


The Data {data-icon="fa-database"}
===================================== 

Column {data-width=70}
-------------------------------------

###

The data used in this dashboard is from the [**Center for Homeland Defense and Security (CHDS)**](Center for Homeland Defense and Security (CHDS)) [**K-12 Shool Shooting Database**](https://www.chds.us/ssdb/about/). 

Their methods for identifying and authenticating incidents are outlined [here](https://www.chds.us/ssdb/methods/).

Previously, according to their website: 

*"The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. **All of the information is based on open-source information and 3rd party reporting... and may include reporting errors.**"*

***

'''{r, echo=FALSE}
# Create the DT table first
DT_table <- shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)
# Instead of depending on the st_jitter algorithm to generate random placement, a custom function placing the points side by side at a set distance could be used to make points occuring at the same location appear neatly apart.
'''

'''{r, echo=FALSE}
DT::renderDataTable({
  DT::datatable(DT_table,
                caption = htmltools::tags$caption(
                  style = 'caption-side: top; text-align: Left;',
                  htmltools::withTags(
                    div(HTML('<a href="https://www.chds.us/ssdb/about/)">Click here to be redirected to a page where this data can be downloaded.</a>')))),
                options = list(autoWidth = TRUE,
                               pageLength = 10,
                               scroller = TRUE,
                               scrollY = '450px'))
})
'''

Column {data-width=30}
-------------------------------------

###

'''{r, echo=FALSE, fig.cap="[Photograph by Rubén Rodriguez](https://unsplash.com/photos/IXTvnOOSTyU)"}
knitr::include_graphics(here::here("img", "ruben-rodriguez-IXTvnOOSTyU-unsplash.jpg"))
'''

The US Statistics Page


Let’s create a page for US Statistics we would like to share.

Look


This is what the page will look like:

Overall Structure


Here is the overall structure for this page which uses a tab layout:

Details



Click here to see the code for this page.

Here we use the .tabset and .tabset-fade options specified for our first column.

US Statistics {data-icon="fa-flag"}
===================================== 


Column {data-width=70 .tabset .tabset-fade}
-------------------------------------

After having specified the .tabset and .tabset-fade options, we can create new tabs in the same way we would add elements to our dashboard with the ### syntax. Just like in this example:

[source]

Let’s make a tab for yearly school shooting events and deaths, a tab for cumulative school shooting events and deaths, and a tab about the number of deaths per school shooting. In each tab, we will include the code for the plots that we have previously created.

US Statistics {data-icon="fa-flag"}
===================================== 

Column {data-width=700 .tabset .tabset-fade}
-------------------------------------

### Yearly Deaths and Shootings

'''{r}

start <- 1970
end <- 2020

shootings_per_year<- shooting_data %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

deaths_per_year<-shooting_data %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))


per_year<-full_join(shootings_per_year, deaths_per_year)
per_year %<>%pivot_longer( cols = (-Date_year), 
                           values_to = "events", 
                           names_to = "id")

per_year%<>% 
  mutate(id = forcats::fct_inorder(id))

per_year %>%
    ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                 labels = seq(0, 120, by = 30),
                 limits = c(0, 121))+
    theme_minimal() +
   labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year")+
  scale_fill_manual(values = c("black", "black"))+
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))


theme_dashboard <- function(){ 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        #title = element_text(face = "bold", size = 16),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))
}
'''

### Yearly Cumulative Deaths and Shootings

'''{r}
shootings_per_year_cum <- 
    shootings_per_year %>%
    mutate(Shootings = cumsum(Shootings))

deaths_per_year_cum <- 
    deaths_per_year %>%
    mutate(Deaths = cumsum(Deaths))

per_year_cum <- full_join(shootings_per_year_cum, deaths_per_year_cum)

per_year_cum %<>% 
  pivot_longer(cols = c(Shootings, Deaths ), 
               values_to = "events", 
               names_to = "id")

per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (cumulative # of events)", 
                                        Deaths = "Deaths(cumulative # of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_fill_manual(values = c("black", "black")) +
    theme_minimal() +
    labs(title = "Cumulative Yearly Shootings and Deaths Attributable to\nSchool Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year") +
    theme_dashboard() 
'''

### Deaths Per Shooting

'''{r}
deaths_perc_event <- 
   shooting_data %>%
   count(`Killed (includes shooter)`) %>%
   rename("num_events"= n) %>%
   mutate(percent = round(num_events/sum(num_events)*100, digits =1))

greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

deaths_perc_event %<>% bind_rows(greater_than4)

deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, " deaths ", "\n(", percent, "%)")) 

deaths_perc_event %<>% 
  mutate(category = case_when(
    category ==  last(pull(deaths_perc_event, category)) ~ paste0("4+ deaths ", "\n(", percent, "%)"),
    category == "1 deaths" ~ "1 death",
    TRUE ~ category))

deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                    values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title = "Deaths Per School Shooting", 
       xlab="1 square ~ 1%")+  scale_fill_viridis_d()

'''

In the second column, we will include what are called value boxes to contain statistics that will remain static as the user moves through the tabs of the first column.

Column {data-width=30}
------------------------------------- 

We want to display some important statistics, such as:

  • Total number of people wounded in a school shooting
  • Total number of deaths from a school shooting
  • Median number of shots fired
  • Percentage of school shootings where the shooter was the only victim
  • Percentage of school shootings where a single handgun was used
  • Percentage of school shootings where the shooter was male

To create a value box we will use the valueBox() function from the flexdashboard package. The text for the the value box is specified by the text following the ### syntax.

There are a few arguments to be aware of for this function:

  1. value - this is the value to be displayed in the box - this usually a number, but might be text
  2. caption - if desired, you can add text to be displayed under the value but keep in mind that you will also include text with the ### syntax
  3. icon - if you would like to add an icon you can specify it like so: icon = fa-flag
  4. color - this changes the color of the box
  5. href - if you would like to add a URL link you can do so with this argument

We can create a value box for the total number of people wounded as follows, where we use the base sum() function to calculate the sum of all the values for the Wounded variable which was extracted using the pull() function from the dplyr package. We need to remove NA values to be able to calculate the sum and we can do this using the na.rm = TRUE argument.

Column {data-width=300}
------------------------------------- 


### **Total Wounded**
    
'''{r}
valueBox(value = sum(pull(shooting_data, Wounded), na.rm = TRUE),
         color = "white")
'''
    
### **Total Deaths**

'''{r}
valueBox(value = sum(pull(
  shooting_data,`Killed (includes shooter)`), na.rm = TRUE),
         color = "white")
'''

To calculate the percentage of school shootings where the shooter committed or attempted suicide, we will use our calculation which was explained in the [Data Analysis and Visualization] section. The paste0 function is used to add the percentage symbol.


### **Shooter committed or attempted suicide**

'''{r}

suicide <- (sum(pull(shooting_data,`Suicide (or attempted suicide) by Shooter (Y/N)`), na.rm = TRUE) /
            sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE))*100
suicide <- round(suicide, 1)

reporting_suic <- (sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)))*100
reporting_suic <- round(reporting_suic, 1)

valueBox(value = paste0(suicide,"%"), 
         color = "white")
'''

For the value box of the percentage of school shootings where a single handgun was used was calculated by using the case_when() function to specify all cases where the Firearm Type variable was equal to "Handgun" as TRUE and all others as FALSE. This allows us to use the base sum() function as TRUE values will be counted as a value of 1 and FALSE values will be counted as a value of 0. This sum was then divided by the total number of school shooting events by getting the length of the Firearm Type variable using the base length() function. The next value box about the gender of the shooter was calculated in a similar manner.

    
### **Use of a Single Handgun**

'''{r}

handgun <-paste(as.character(round(100 *(sum(case_when(
      pull(shooting_data,`Firearm Type`) == "Handgun" ~ TRUE,
                                                 TRUE ~ FALSE), na.rm = TRUE)
    /
      sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_gun <- (sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Firearm Type`)))*100
reporting_gun <- round(reporting_gun, 1)


valueBox(value = handgun,
  color = "white")

'''

### **Shooter Was Male**
'''{r}


gender <- paste(as.character(round(100 * (sum(
    case_when(pull(shooting_data,`Shooter Gender`) == "Male" ~ TRUE,
                                                        TRUE ~ FALSE),
                                      na.rm = TRUE)
    /
      sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_male <- (sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Shooter Gender`)))*100
reporting_male <- round(reporting_male, 1)


valueBox(value = paste(gender),
  color = "white")
'''

Additional text about the reporting rate for these statistics was added using the ### syntax. Additionally inline code is evaluated using the notation "r " Again notice that "'" was used instead of ""` just for illustrative purposes to allow this R Markdown document to render the code from the dashboard file.


###

reporting rate of shooter suicide = 'r reporting_suic'%,  
reporting rate of gun type = 'r reporting_gun'%,  
reporting rate of shooter gender = 'r reporting_male'%

The State Statistics Page (Interactive)


Let’s create a page for State Statistics we would like to share. Importantly this page allows for the user to choose what state to look at.

Look


This is what the page will look like:

Overall Structure


Here is the overall structure for this page:

Note: the other value Boxes are not included in this image. You can see that the renderPlot() function is used for plots and the renderValueBox() function is used for value boxes.

Details


On this page we want the user to be able to select data for a specific state and render plots and get statistics just for the selected state. To do this we will utilize the renderPlot() and renderValueBox() functions of the flexdashboard package, as well as the selectInput() function from the shiny package. See this website for more information on using shiny to create interactive dashboards with flexdashboard.


Click here to see the code for this page.

The first thing we need to do to allow this page to be interactive is to add runtime: shiny to the YAML header at the top of the R Markdown file.

The next thing we want to do is add the {.sidebar} attribute to the first column of this page. This allows us to use shiny input functions in this column.

Then, we use the selectInput() function to create a menu for the user to interact with and add it to this column.

Finally, we use the renderPlot() function and renderValueBox() function to use the input from the user to render plots and value boxes based on their input.

The selectInput() function allows us to provide the user with a pull down menu of options for states. The main arguments for this function are:

  1. inputId - this is what the selection will be called in subsequent code
  2. label - this is what the user sees above the pull down menu
  3. choices - this is a list of options for the menu
  4. selected - this causes a particular option to be the default choice

This is placed in a column on the far left side that is more narrow than the others.

State Statistics {data-icon=fa-flag-checkered}
===================================== 

Column {.sidebar data-width=250}
-----------------------------------------------------------------------

Note that the statistics shown do not account for other possibly influential state specific features like population density or gun laws among others.


'''{r}
  
selectInput(inputId = "state_selected", 
            label = "Select a state to explore:",
            choices = shooting_data %>% 
            pull(State) %>% 
            unique() %>%
            sort(), selected = "Alabama")

#  Washington, D.C. gets excluded by this
'''

Note that we used the unique() function to select only unique values of the State variable of the shooting_data tibble. The sort() function was used to put the options in alphabetical order.

In the next column, we have our plots like we did on the last page. Again we will use tabset. However, the difference here is that we need to include the renderPlot() function around all of our code for each plot and we need to use the data that the user selected.

This will automatically be in a data object called input and it will be within a variable called state_selected" based on what we used for the inputID in the select_Input() function (this requires the base R way of selecting a specific variable using the $).

Notice that the renderPlot() function requires that the code be within brackets {}. The data is filtered first for just the state that was selected. The code for the plots is essentially the same with minor modifications to allow for all unique cases that the different states present. For example the deaths_perc_event %<>%filter (!duplicated(category)) is added to the last plot about the number of deaths per school shooting to avoid duplication of the rows in cases like Colorado where the there is only one event that had 4 or more deaths (because in the other cases this value is a sum of all school shooting with 4 or more deaths).

It’s always good to check as many possible input values as possible to make sure that your plot shows up as you expect!

Column {data-width=750 .tabset .tabset-fade}
-----------------------------------------------------------------------

### Yearly Deaths and Shootings

'''{r}
renderPlot({
shooting_data_state <- shooting_data %>% filter(State == input$state_selected)

shootings_per_year<- shooting_data_state  %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

deaths_per_year<-shooting_data_state  %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))


per_year <- full_join(shootings_per_year, deaths_per_year)
per_year %<>% pivot_longer(cols = (-Date_year), 
                           values_to = "events", 
                           names_to = "id")

per_year %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year %<>%
    ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    theme_minimal() +
  scale_fill_manual(values = c("black", "black"))+
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year") +
    theme_dashboard()+
    theme(title = element_text(size = 16, face = "bold"),
          axis.text = element_text(size = 14))
})
'''

### Yearly Cumulative Deaths and Shootings

'''{r}

renderPlot({

shooting_data_state <- shooting_data %>% filter(State == input$state_selected)

shootings_per_year<- shooting_data_state  %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

shootings_per_year_cum <- 
  shootings_per_year %>%
  mutate(Shootings = cumsum(Shootings))

deaths_per_year<-shooting_data_state  %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))

deaths_per_year_cum <- 
  deaths_per_year %>%
  mutate(Deaths = cumsum(Deaths))

per_year_cum <- full_join(shootings_per_year_cum, deaths_per_year_cum)


per_year_cum %<>% 
  pivot_longer(cols = c(Shootings, Deaths ), 
               values_to = "events", 
                names_to = "id")
                
per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_grid(~id)+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    scale_fill_manual(values = c("black", "black"))+
    theme_minimal() +
    labs(title = "Cumulative Yearly Shootings and Deaths\nAttributable to School Shootings",
         subtitle = input$state_selected,
         y = "Cumulative number of events",
         x = "Year") +
    theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90),
        strip.background = element_rect(fill="cornflowerblue"),
        strip.text = element_text(colour = 'white', face = "bold", size = 14))

})

'''

### Deaths Per Shooting

'''{r}

renderPlot({

shooting_data_state <- shooting_data %>% filter(State == input$state_selected)
library(tidyr)
deaths_perc_event <-shooting_data_state %>%
   count(`Killed (includes shooter)`) %>%
   rename("num_events"= n) %>%
     tidyr::drop_na() %>%
   mutate(percent = round(num_events/sum(num_events)*100, digits =1))

greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

deaths_perc_event %<>% bind_rows(greater_than4)

deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, " deaths ", "\n(", percent, "%)")) 

deaths_perc_event %<>% 
  mutate(category = case_when(
    category ==  last(pull(deaths_perc_event, category)) ~ paste0("4+ deaths ", "\n(", percent, "%)"),
    category == "1 deaths" ~ "1 death",
    TRUE ~ category))

deaths_perc_event %<>% 
  filter (!duplicated(category))

deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                    values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title = "Deaths Per School Shooting", 
       xlab="1 square ~ 1%")+  scale_fill_viridis_d()

})

'''

In the third column, the state specific statistics are displayed. Some of these are static, while others update for the state selected. To calculate some of these we will also use data form the poliscidata function to get the state population values in 2010. The pop2010_hun_thou variable is the population in terms of 100,000 people.

Column {data-width=450}
-----------------------------------------------------------------------

### **Total State Deaths**

'''{r}
renderValueBox({
shooting_data_state <- shooting_data %>% filter(State == input$state_selected)


valueBox(sum(pull(shooting_data_state,`Killed (includes shooter)`), na.rm = TRUE),
         color = "white")
})
'''

### **US State Average Death Count**

'''{r}
shooting_data_state <-shooting_data %>% 
  group_by(State_abb, State) %>%
  count(na.rm = TRUE) %>%
  rename(shootings = n) %>%
  ungroup() %>%
  mutate(state_sum = sum(shootings)) %>%
  mutate(state_avg = state_sum/50)

state_data <- poliscidata::states
state_data %<>%
  select(stateid, pop2010, pop2010_hun_thou) %>%
  mutate(stateid = as.character(stateid))%>%
  mutate(stateid = str_remove_all(stateid, pattern = " "))

shooting_data_state<-left_join(shooting_data_state, state_data, by = c("State_abb" = "stateid"))

deaths_State <-shooting_data %>% 
  group_by(State) %>%
  summarize( deaths = sum(`Killed (includes shooter)`, na.rm = TRUE))

state_data <- left_join(shooting_data_state, deaths_State)

USavg <- round(mean(pull(state_data, deaths), na.rm = TRUE), 2)
valueBox(USavg, color = "white")
'''


### **State Death Rate (per 100,000 people)**

'''{r}
state_data %<>%
  mutate(percapita_deaths  = deaths/pop2010_hun_thou)

renderValueBox({
  
  shooting_data_state <- state_data %>% filter(State == input$state_selected)

  valueBox(format(round(pull(shooting_data_state, percapita_deaths), digits = 3), nsmall = 3),
         color = "white")
})


'''

### **US National Death Rate (per 100,000 people)**

'''{r}
renderValueBox({

 US_percap <-summarize(state_data, sum(deaths, na.rm = TRUE))/ (summarize(state_data,sum(pop2010, na.rm = TRUE)) /100000)
 
valueBox(value = round(US_percap, digits = 3),
         color = "white")
})
'''

### **State Shooting Rate (per 100,000 people)**

'''{r}

state_data %<>%
  mutate(percapita_shootings  = shootings/pop2010_hun_thou)

renderValueBox({
  
  shooting_data_state <- state_data %>% filter(State == input$state_selected)

valueBox(format(round(pull(shooting_data_state, percapita_shootings), digits = 3), nsmall = 3),
         color = "white")
})


'''

### **US National Shooting Rate (per 100,000 people)**

'''{r}
renderValueBox({

 US_percap <-summarize(state_data, sum(shootings, na.rm = TRUE))/ (summarize(state_data,sum(pop2010, na.rm = TRUE)) /100000)
 
valueBox(value = round(US_percap, digits = 3),
         color = "white")
})
'''

###

Per capita calculations are based on 2010 population values.

The Map page (Interactive)


Next, we create our map page. Previously, in the Data Exploration and Wrangling section, we geocoded our data and modified the latitude and longitude variables so that events that occurred in the same location would have slightly different values so that they will not cover one another in our map.

To create our map, we will use the leaflet package which uses the Leaflet JavaScript library.

Leaflet


Leaflet works by provided by adding base data (such as a map) and then adding markers if desired in layers. This is very similar to how ggplot2 functions (pun intended).

The layers displayed can be controlled using a sort of legend. Depending on the type of layers, some information may be displayed mutually exclusive of the other layers; other layers (such as circles/general markers) can be toggled on and off.

Clustering options can also be applied to circles/markers. Some examples of this can be found on the bottom of this website.

The groups in leaflet can be thought of as layer-specific IDs that create labels for legends and allow specific layers to be referred to in separate functions.

Thus, if we called a group “Layer 1” and then in a subsequent layer refer to “Layer 1”, leaflet will correctly identify which layer is being referenced.

Note that leaflet can require a lot of computational power depending on the types of maps produced.

Look


This is what the page will look like:

Overall Structure


The overall structure for this page is simple. There is just one column ,which will contain the map.

Details



Click here to see the code for this page.

First, we create a smaller dataset that just includes the data that we want to use in the map. We will include the date, the name of the school and the narrative for each point as a popup that will be shown when the user hovers over a point.

We need to do this using HTML code as the leaflet package will ultimately render the map using this language.

We use the paste() function to combine these elements as well as HTML code to create line breaks and bold the name of the school.

To create line breaks in HTML, the <br> syntax is used. This is used to separate each part of the elements that are getting pasted together with the base paste() function by being specified as the separator with the sep argument.

To create bold font in HTML, the text is surrounded by <b> and </b> like so: <b> Bold text </b>. Thus only the school name is in bold.

Finally, the <div> and </div> are content dividers in HTML. They separate the individual school shooting event information sections that will be plotted on the map. The first divider can also take information about the style of the output. This uses CSS code, which is what is used to stylize HTML.

The code here states that the height of the text box for each event should have a height that is proportional to the text, that the height of each line should be of 1 em units (em stands for element). Hence, 1 unit relative to the size of the element. Therefore gaps between lines are the same height as the lines of text. The overflow:visible code specifies what to do in case the text box text is too large - in this case users can scroll (see here for more options), and the padding specification sets the size of the margins of the text box around the text.

See this website to learn more about HTML code.

Map {data-icon="fa-map"}
===================================== 
Column
------------------------------------- 
    
### 

This map shows where school shootings took place in the United States between January 1970 to June 2020 according to the the open-source [Center for Homeland Defense and Security](https://www.chds.us/c/) (CHDS) [K-12 School Shooting Database](https://www.chds.us/ssdb/dataset/). Click the circles for more information.

    
'''{r}
# specify the popups

shooting_information0 <- paste('<div style="height:auto;line-height:1em;overflow:scroll;padding:1em">',
                              shooting_data_geocoded$Date,
                              "<b>",
                              shooting_data_geocoded$School,
                              "</b>",
                              shooting_data_geocoded$`Narrative (Detailed Summary/ Background)`,
                              "</div>",
                              sep = "<br>")


'''

The next bit of code then uses this data and the shooting_data_geocded to actually create the map!

The leaflet() function from the leaflet package creates a Leaflet map widget using the htmlwidgets package, which allows the map to be rendered as an application within HTML websites.

This first line of code starts the process of making the widget, but just like the ggplot() function from ggplot2 it creates an empty map and layers need to be added.

The addProviderTiles() function from the leaflet() package does just that, by adding the map background. We will add three different kinds of map backgrounds. See here for all the options of providers which create a variety of distinct backgrounds and then the group argument names each of these layers to be referred to later. The last layer added will be the one shown by default.

At this point we still only have a map in general. Now we need to add the data about school shooting events.

To do this, we add markers to the plot using the addCircleMarkers() function. This function takes many different arguments. See details about them here.

Importantly, we need to specify what variables in our provided data shooting_data_geocoded contains the longitude values (lng) and the latitude values (lat).

We will also use the following arguments:

  • radius - argument specifies how large the circles for the points will be
  • color - argument specifies the color of the individual points
  • fillOpacity - argument allows for the filling of the points to a bit translucent if set below 1
  • clusterOptions - argument can be used to cluster points together into larger circles
  • group - argument specifies what the points should be called in the legend and what this layer should be referred to as for later use

We also add a mini map using the addMiniMap() function, which can be useful to see where you are on the map. The type of plot style to use for the mini map is specified with the tiles argument and the toggleDisplay argument allows for the user to remove this feature.

Importantly, the addLayersControl() function allows users to toggle between different backgrounds and markers. In our case we have three different background layers which are referred to as baseGroups and we have one overlayGroups which is our circle markers for school shooting events. The group names for these need to be identified to allow users to toggle between them.

The set_view() function allows for the starting position and zoom to be modified. This allows us to center the map around the continental US.

'''{r}
leaflet(shooting_data_for_map) %>%
  addProviderTiles(provider = providers$OpenStreetMap, group = "OpenStreetMap") %>%
  addProviderTiles(provider = providers$Esri.WorldImagery, group = "ESRI World Imagery") %>%
  addProviderTiles(provider = providers$Stamen.TonerLite, group = "Toner")%>%
  addCircleMarkers(popup = ~shooting_information0,
                     lng = ~longitude,
                     lat = ~latitude,
     radius = 5,
     color = "red",
     fillOpacity = 0.2,
     clusterOptions = markerClusterOptions(),
     group = "Circles") %>%
  addMiniMap(tiles = providers$Stamen.Toner,
              toggleDisplay = TRUE) %>%
  addLayersControl(
     baseGroups = c("Toner Lite",
                    "OpenStreetMap",
                    "ESRI World Imagery"),
     overlayGroups = c("Circles")) %>%
   setView(lng = -98.35, lat = 39.5, zoom = 4)
'''

The Tutorial Page


Here, we create a Tutorial page that links to this case study. This provides a simple overview of how we created the dashboard.

Look


This is what the page will look like:

Overall Structure


To create this page we will use a special layout called a storyboard. Story boards are used in many other fields, but the idea is that there are multiple images in a sequence. To create our storyboard page with flexdashboard we will use {.stroyboard} next to the page name. Each page name will be specified using this syntax: ###.

Here you see the top part of the overall structure:

Details


The code for this page is similar to the other pages, except for the story board structure.


Click here to see the code for this page.
Tutorial {.storyboard data-icon="fa-list-ol"}
=========================================   

### **1)** Load the `flexdashboard` package.

Install the package (and other supporting optional packages) if you don't have them installed already.

'''{r, echo=TRUE, eval=FALSE}
install.packages("flexdashboard")
install.packages("shiny")
install.packages("leaflet")
install.packages("ggplot2")
'''

Once installed, load the package(s) into the `R` environment.

'''{r, echo=TRUE}
library(flexdashboard)
library(shiny)
library(leaflet)
library(ggplot2)
'''

This all needs to be done separately in the `R` console.

### **2)** Create an `RMD` document.   

Dashboards can be created with `flexdashboard` in the `HTML` format. 

The`flexdashboard` package uses `RMarkdown` to produce dashboards that can contain `R` output.

This makes it possible to include several mediums in dashboards such as plots created with `ggplot2` or maps created with `leaflet`.
    
### **3)** Create an appropriate `YAML`.

The use of `flexdashboard` alters the way R Markdown documents function. 

R Markdown documents can be rendered into many different outputs, one of which is a dashboard. 
The `YAML` header sets up how the document output should be created.

Here is an example of a `YAML` header that creates an `HTML` document from an R Markdown document:


---
title: "Untitled"
author: "John Smith"
date: "8/12/2020"
output: html_document
---


We used the following `YAML` for this dashboard, which importantly includes `flexdahsboard::flex_dashboard`which specifies that a dashboard should be created and `runtime:shiny` which allows for the dashboard to be interactive:

output: 
  flexdashboard::flex_dashboard:
    logo: https://icons.iconarchive.com/icons/icons8/windows-8/48/Programming-Dashboard-icon.png
    theme: readable
    orientation: columns
    source_code: embed
    vertical_layout: fill
runtime: shiny


We also introduced an icon as a logo, provided a theme with a color scheme, defined the orientation (and thus order) of coded output, added a navigation bar item to give users easy access to the code used, and  limited scrolling with the `verticle_layout: fill` option.

### **4)** Design the layout of the dashboard.

Dashboards are inherently visual, making this step the most time intensive after content creation. To goal is to present the data in a way that is both meaningful and visually appealing.

On this dashboard, we wanted to present static plots of the United States and of individual states. We also wanted to display the locations of school shootings and provide some information about school shootings. Aside from being a dashboard, we wanted to create an educational resource that was reproducible for others. Lastly, as this is a sensitive topic, we wanted to raise awareness and provide information that could help others act.

Given these goals, we decided on the following page layout:

+ About
+ The Data
+ US Statistics
+ State Statistics
+ Map
+ Tutorial
+ Get Help

The first page gives users to the opportunity to look at the data themselves. More complicated components such as the map of each incident were left alone on a single page. US and state-level statistics were separated from one another. This short tutorial on how to create the dashboard and source code were included in the dashboard with programmers at all levels in mind.

### **5)** Add content to the dashboard.

You can begin adding content to the dashboard once you have an initial layout in mind. Keep in mind that this will likely be an iterative process. 

The R Markdown file used to create a dashboard with `flexdashboard` works similarly as it does in other cases, with a few exceptions.

R code chunks can be defined like so:


'''{r, echo = TRUE}
# Code chunks can be explicitly included
'''

'''{r, echo = FALSE}

# Code chunks are hidden by default 
'''

Pages and columns within pages can be defined like so:


Page
=========================================   

Column {data-width=500}
-------------------------------------

Column {data-width=500}
-------------------------------------


### **6)** Add content to the pages and columns.

Plots and other elements can be added within columns like so:

### Plot name

'''{r}
# include plot code here
'''

Value Boxes, which are essentially text boxes, can be defined like so:


### ValueBoxText

'''{r}
valueBox(value = 10
  color = "white")



Gauges, can be defined like so:

### GaugeText

'''{r}
flexdashboard::gauge(value = 10, 
                       min = 0, 
                       max = 100, 
                    symbol = "%")

'''

####
Which will produce output like this:
'''{r, out.width= "40%", echo = FALSE}
knitr::include_graphics(here::here("img", "gauge_output.png"))
'''

### Additional Info
As mentioned before, the `flexdashboard` metadata included in the `YAML` also alters how R Markdown documents are rendered. For more on how you can leverage both the `RMarkdown` package and the `flexdashboard` package to produce a dashboard, click [here](https://rmarkdown.rstudio.com/flexdashboard/index.html).

This [supplementary resource by the Open Case Studies project](https://opencasestudies.github.io/ocs-bp-school-shootings-dashboard/) provides a case study on how to create this very dashboard in more detail.

The Get Help page


We create a Get Help page to spread awareness on this important public health topic.

Look


This is what the page will look like:

Overall Structure


Details


This page has two columns. The first column is much wider than the second. This first column includes two colored backgrounds which were created using CSS code. See [The About Page] Details section to for more details about how this works.

The text that we want altered with this particular style is delineated by the
to start and the

to end the style.

The instructions for the style are within the

content dividers.

Inside these dividers is CSS code, which is what is used to stylize HTML. The div.blue is the name of the first particular style which involves a particular background color (#e6f0ffF) (see here for more color options), with a boarder radius of 5 pixels to round the edges of the background color around the text with a size 5 pixel radius. The code also states that a padding specification for the size of the margins of the text box around the text and it specifies that font should be of 20 pixels.

The div.blue specifies that blue is the name of this style, thus we can then use

(called a CSS selector) to style the text this way. This can then be used again any time we want this style like so:

text

See this website to learn more about HTML and CSS.

We also see a list on this page, + signs are used to indicate new items. Importantly two spaces are necessary after each item to start a new line.

The other unique aspect about this page are the telephone links like so [+1-844-5-SAYNOW](tel:18445729669).

By using tel: and the number, users can click this link to directly call the telephone number from their computer or phone if their device has such capabilities.


Click here to see the code for this page.
Get Help {data-icon="fa-exclamation-triangle"}
=========================================   

Column {data-width=800}
-------------------------------------

###

**Warning Signs**

From [Sandy Hook Promise](https://www.sandyhookpromise.org/gun-violence/know-the-signs-of-gun-violence/)...

<style>
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
</style>
<div class = "blue">

Here is a list of potential warning signs that can signal an individual may be in crisis and/or need help:

+ Suddenly withdrawing from people and activities
+ Consistent bullying or intimidating others, or being bullied by others
+ Extreme mood or personality changes
+ Victim of constant social rejection
+ Talking about plans or actively making plans to harm themselves or others
+ Bringing a weapon to school – or threatening or talking about doing so
+ Bragging about or warning others about an upcoming act of violence
+ Recruiting others to join in a planned act of violence
+ Warning students to stay away from school or events
+ Expressing fascination with guns and/or school shootings
+ Expressing hopelessness about the future
+ Extreme, prolonged sadness or distress
+ Expressing or showing feelings of isolation
+ Bragging about access to guns

**This list is not a comprehensive list of warning signs nor does exhibiting one of these signs indicate imminent violence.**

According to the following article:

Flannery, D. J., Modzeleski, W. & Kretschmar, J. M. Violence and School Shootings. Curr Psychiatry Rep 15, 331 (2013). DOI: [10.1007/s11920-012-0331-6](https://doi.org/10.1007/s11920-012-0331-6)

"To date, studies of school shootings have concluded that no
consistent and reliable profile of school shooters exist, and
most researchers and clinicians would agree that predicting
violent behavior is a slippery slope that will usually result in
more false positives than false negatives."

"...most shooters were depressed, had experienced some significant
loss, felt persecuted or bullied by others, and had prior
difficulty coping or had previously tried suicide. Most of
the shooters did not, however, have a history of drug abuse
or violence or cruelty to animals, common psychiatric indicators of risk, nor did they report excessive exposure to
violence in the media (though many produced their own
violent themes in writings or drawings)."

</div>


<style>
div.red { background-color:#BC8F8F; border-radius: 5px; padding: 20px;}
</style>
<div class = "red">

According to the [National Institute of Mental Health (NIMH)](https://www.nimh.nih.gov/health/publications/teen-depression/index.shtml){target="_blank"}:

For youths who may be at risk for suicidal behavior, visit the **National Suicide Prevention Lifeline (NSPL)** website at [www.suicidepreventionlifeline.org](www.suicidepreventionlifeline.org){target="_blank"}.

Additionally, the **Crisis Text Line** is another free, confidential resource available 24 hours a day, seven days a week. Visit [www.crisistextline.org](www.crisistextline.org){target="_blank"} for more information.

Also see [here](https://www.mhanational.org/depression-teens-0){target="_blank"} for more information about how to recognize and help youths experiencing symptoms of depression and warning signs of suicide.

</div>


Column {data-width=200}
-------------------------------------

### 

**Respond to Warning Signs**

When concerned about troubling behaviors, tell a trusted adult.


Call **911** if you feel there is an immediate threat. 

Call [+1-844-5-SAYNOW](tel:18445729669) if you would like to submit an anonymous safety concern.

Text “HOME” to **741741** to text a trained crisis counselor 24 hours a day.

The **National Suicide Prevention Lifeline (NSPL)** is available 24 hours a day, every day at **[1-800-273-TALK (8255)](tel:18002738255)**. 

The deaf and hard of hearing can contact the **(NSPL)** via TTY at **[1-800-799-4889](tel:18007994889)**. All calls are confidential.

Summary


Synopsis


In this case study, we demonstrated the basics of R Markdown and how to create a dashboard with using the flexdashboard package. We also demonstrated how to include an interactive table with the DT package, how to include interactive plots using functions of the shiny package such as renderPlot(). We included interactive value boxes using the renderValueBox() function from the flexdashboard package, which works with the shiny package. Finally, we showed how to include interactive maps using the leaflet package.

This case study also explored how to properly calculate and interpret percentages when the data has missing values. We also discussed the benefits and limiting aspects of pie charts (using the ggplot2 package) and waffle plots (using the waffle package).

Overall, the dashboard we created which can be found here, shows that the number of school shootings per year has increased overtime. Further investigation is necessary to determine if this is simply due to increases in population alone or if the rate has increased due to other factors and if so, what those factors might be. It is also clear that the number of school shootings and the number of deaths per capita varies by state. There appears to be other aspects accounting for state differences.

Note the limitations of the dashboard in the Limitations section.

Suggested Homework


Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more school shootings.

Additional Information


Session Info


sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] OCSdata_1.0.2             vembedr_0.1.5            
 [3] maps_3.4.0                leaflet_2.0.4.1          
 [5] shiny_1.7.1               flexdashboard_0.5.2      
 [7] poliscidata_2.3.0         waffle_0.7.0             
 [9] ggforce_0.3.3             forcats_0.5.1            
[11] htmltools_0.5.2           DT_0.20                  
[13] lubridate_1.8.0           sf_1.0-5                 
[15] ggmap_3.0.0               ggplot2_3.3.5            
[17] tidyr_1.1.4               stringr_1.4.0            
[19] dplyr_1.0.7               tibble_3.1.6             
[21] googlesheets4_1.0.0       readr_2.1.1              
[23] koRpus.lang.en_0.1-4      koRpus_0.13-8            
[25] sylly_0.1-6               read.so_0.1.1            
[27] wordcountaddin_0.3.0.9000 magrittr_2.0.2           
[29] knitr_1.37                here_1.0.1               

loaded via a namespace (and not attached):
  [1] backports_1.4.1     Hmisc_4.6-0         plyr_1.8.6         
  [4] sp_1.4-6            splines_4.1.2       crosstalk_1.2.0    
  [7] usethis_2.1.5       digest_0.6.29       gdata_2.18.0       
 [10] fansi_1.0.2         checkmate_2.0.0     cluster_2.1.2      
 [13] tzdb_0.2.0          remotes_2.4.2       extrafont_0.17     
 [16] vroom_1.5.7         extrafontdb_1.0     jpeg_0.1-9         
 [19] colorspace_2.0-2    mitools_2.4         xfun_0.29          
 [22] crayon_1.5.0        jsonlite_1.8.0      lme4_1.1-27.1      
 [25] survival_3.2-13     glue_1.6.1          polyclip_1.10-0    
 [28] gtable_0.3.0        gargle_1.2.0        car_3.0-12         
 [31] weights_1.0.4       Rttf2pt1_1.3.9      abind_1.4-5        
 [34] scales_1.1.1        DBI_1.1.2           Rcpp_1.0.8         
 [37] plotrix_3.8-2       viridisLite_0.4.0   xtable_1.8-4       
 [40] htmlTable_2.4.0     units_0.7-2         bit_4.0.4          
 [43] foreign_0.8-81      proxy_0.4-26        Formula_1.2-4      
 [46] survey_4.1-1        htmlwidgets_1.5.4   descr_1.1.5        
 [49] httr_1.4.2          gplots_3.1.1        RColorBrewer_1.1-2 
 [52] ellipsis_0.3.2      mice_3.14.0         pkgconfig_2.0.3    
 [55] farver_2.1.0        nnet_7.3-16         sass_0.4.0         
 [58] utf8_1.2.2          labeling_0.4.2      tidyselect_1.1.2   
 [61] rlang_1.0.1         later_1.3.0         munsell_0.5.0      
 [64] cellranger_1.1.0    tools_4.1.2         cli_3.2.0          
 [67] generics_0.1.1      broom_0.7.11        evaluate_0.15      
 [70] fastmap_1.1.0       yaml_2.3.5          bit64_4.0.5        
 [73] fs_1.5.2            caTools_1.18.2      purrr_0.3.4        
 [76] RgoogleMaps_1.4.5.3 nlme_3.1-153        mime_0.12          
 [79] compiler_4.1.2      rstudioapi_0.13     curl_4.3.2         
 [82] png_0.1-7           e1071_1.7-9         tweenr_1.0.2       
 [85] bslib_0.3.1         stringi_1.7.6       highr_0.9          
 [88] lattice_0.20-45     Matrix_1.3-4        classInt_0.4-3     
 [91] nloptr_1.2.2.3      vctrs_0.3.8         pillar_1.7.0       
 [94] lifecycle_1.0.1     jquerylib_0.1.4     data.table_1.14.2  
 [97] bitops_1.0-7        httpuv_1.6.5        sylly.en_0.1-3     
[100] R6_2.5.1            latticeExtra_0.6-29 promises_1.2.0.1   
[103] KernSmooth_2.23-20  gridExtra_2.3       boot_1.3-28        
[106] MASS_7.3-54         gtools_3.9.2        assertthat_0.2.1   
[109] rprojroot_2.0.2     rjson_0.2.21        withr_2.5.0        
[112] parallel_4.1.2      hms_1.1.1           grid_4.1.2         
[115] rpart_4.1-15        class_7.3-19        minqa_1.2.4        
[118] rmarkdown_2.11      carData_3.0-5       googledrive_2.0.0  
[121] base64enc_0.1-3    

Estimate of RMarkdown Compilation Time:

About 29 - 39 seconds

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.

Acknowledgments


We would like to acknowledge Elizabeth Stuart for assisting in framing the major direction of the case study.

We would like to acknowledge Michael Breshock for his contributions to this case study and developing the OCSdata package.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

---
title: "Open Case Studies: School Shootings in the United States"
css: style.css
output:
  html_document:
    includes:
       in_header: GA_Script.Rhtml
    self_contained: yes
    code_download: yes
    highlight: tango
    number_sections: no
    theme: cosmo
    toc: yes
    toc_float: yes
  pdf_document:
    toc: yes
  word_document:
    toc: yes

---

<style>
#TOC {
  background: url("https://opencasestudies.github.io/img/icon-bahi.png");
  background-size: contain;
  padding-top: 240px !important;
  background-repeat: no-repeat;
}
</style>


<!-- Open all links in new tab-->  
<base target="_blank"/>  

<div id="google_translate_element"></div>

<script type="text/javascript" src='//translate.google.com/translate_a/element.js?cb=googleTranslateElementInit'></script>

<script type="text/javascript">
function googleTranslateElementInit() {
  new google.translate.TranslateElement({pageLanguage: 'en'}, 'google_translate_element');
}
</script>




```{r setup, include=FALSE}
knitr::opts_chunk$set(include = TRUE, comment = NA, echo = TRUE,
                      message = FALSE, warning = FALSE, cache = FALSE,
                      fig.align = "center", out.width = '90%')
library(here)
library(knitr)
library(magrittr)
remotes::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)
remotes::install_github("alistaire47/read.so")
library(wordcountaddin)
library(read.so)

rmarkdown:::perf_timer_reset_all()
rmarkdown:::perf_timer_start("render")
```

#### {.outline }

```{r, echo=FALSE, fig.link = "https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/"}
knitr::include_graphics(here("img", "aboutpagelook.png"))
```

####

The link to the dashboard described in this case study is [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/).  

To access the GitHub Repository for this case study see here: https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/.

You may also access and download the data using our `OCSdata` package. To learn more about this package including examples, see this [link](https://github.com/opencasestudies/OCSdata). Here is how you would install this package:

```{r, eval=FALSE}
install.packages("OCSdata")
```

This case study is part of a series of public health case studies for the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/open-case-studies).

For users or instructors who only wish to look at the basics of how to create a dashboard in R with the `flexdashboard` package, please see the [**Dashboard Basics**] Section. 

#### {.disclaimer_block}

**Disclaimer**: The purpose of the [Open Case Studies](https://opencasestudies.github.io){target="_blank"} project is **to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data**. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts. 

####

#### {.license_block}

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 [(CC BY-NC 3.0)](https://creativecommons.org/licenses/by-nc/3.0/us/){target="_blank"} United States License.

####

#### {.reference_block}

To cite this case study please use:

Wright, Carrie and Ontiveros, Michael, and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). [https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard](https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard). Open Case Studies: School Shootings in the United States (Version v1.0.0).

####

The total reading time for this case study is calculated via [koRpus](https://github.com/unDocUMeantIt/koRpus) and shown below: 

```{r, echo=FALSE}
readtable = text_stats("index.Rmd") # producing reading time markdown table
readtime = read.so::read.md(readtable) %>% dplyr::select(Method, koRpus) %>% # reading table into dataframe, selecting relevant factors
  dplyr::filter(Method == "Reading time") %>% # dropping unnecessary rows
  dplyr::mutate(koRpus = paste(round(as.numeric(stringr::str_split(koRpus, " ")[[1]][1])), "minutes")) %>% # rounding reading time estimate
  dplyr::mutate(Method = "koRpus") %>% dplyr::relocate(koRpus, .before = Method) %>% dplyr::rename(`Reading Time` = koRpus) # reorganizing table
knitr::kable(readtime, format="markdown")
```

***

**Readability Score: **

A readability index estimates the reading difficulty level of a particular text. Flesch-Kincaid, FORCAST, and SMOG are three common readability indices that were calculated for this case study via [koRpus](https://github.com/unDocUMeantIt/koRpus). These indices provide an estimation of the minimum reading level required to comprehend this case study by grade and age. 

```{r, echo=FALSE}
rt = wordcountaddin::readability("index.Rmd", quiet=TRUE) # producing readability markdown table
df = read.so::read.md(rt) %>% dplyr::select(index, grade, age) %>%  # reading table into dataframe, selecting relevant factors
  tidyr::drop_na() %>% dplyr::mutate(grade = round(as.numeric(grade)), # dropping rows with missing values, rounding age and grade columns
                                     age = round(as.numeric(age))
                                     )
knitr::kable(df, format="markdown")
```

***

Please help us by filling out our survey.

<div style="display: flex; justify-content: center;"><iframe src="https://docs.google.com/forms/d/e/1FAIpQLSfpN4FN3KELqBNEgf2Atpi7Wy7Nqy2beSkFQINL7Y5sAMV5_w/viewform?embedded=true" width="1200" height="700" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe></div>



# **Motivation**
*** 

This case study is motivated by this [article](https://link.springer.com/content/pdf/10.1007/s11920-012-0331-6.pdf):

#### {.reference_block}

Flannery, D. J., Modzeleski, W. & Kretschmar, J. M. Violence and School Shootings. Curr Psychiatry Rep 15, 331 (2013). DOI: [10.1007/s11920-012-0331-6](https://doi.org/10.1007/s11920-012-0331-6)

####

The article explores characteristics of school shootings and violence in schools and discusses why these events may occur, as well as their impact on the communities in which they occur.

The article also states that the shooters are most commonly white males, but that many previous studies of shooter characteristics could not identify any particular "profile" of shooters.

> "To date, studies of school shootings have concluded that no
consistent and reliable profile of school shooters exist..."

However previous studies note some commonalities such as:

> "...most shooters were depressed, had experienced some significant loss, felt persecuted or bullied by others, and had prior difficulty coping or had previously tried suicide." 

Therefore in our dashboard we will examine how often a shooter was male or attempted or committed suicide during an event.


```{r, echo = FALSE, out.width= "60%"}
knitr::include_graphics(here::here("img", "joshua-hoehne-CAokgx1GGKE-unsplash.jpg"))
```

<span>Photo by <a href="https://unsplash.com/@mrthetrain?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Joshua Hoehne</a> on <a href="https://unsplash.com/s/photos/high-school?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>

> "School shootings are not all the same and may require different approaches to prevention and treatment, especially with respect to identifying risk factors at the individual, school
or community levels, and particularly with regard to examining the role that mental health issues may play to increase risk for perpetration. 

> The field **needs to know more** about shooting incidents that are averted, those that result in injury but not death and about the characteristics of the more common occurrence of single homicide school shootings."


```{r, echo = FALSE, out.width= "60%"}
knitr::include_graphics(here::here("img", "andre-hunter-AQ908FfdAMw-unsplash.jpg"))
```

<span>Photo by <a href="https://unsplash.com/@dre0316?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Andre Hunter</a> on <a href="https://unsplash.com/s/photos/high-school?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>


Given this need for more research to better understand why these events occur and how they could be averted, in this case study we will demonstrate how to create a resource for others to more easily and interactively access data about school shootings. To do so we will create what is called a [dashboard](https://en.wikipedia.org/wiki/Dashboard_(business)), which is a website that displays a report for a database. Dashboards summarize the data in a database and typically allow for users to interact with the data in some way.

[Here](https://jjallaire.shinyapps.io/shiny-crandash/#dashboardl) you can see an example of a dashboard created in R about downloads of packages on [CRAN](https://cran.r-project.org/).


On the website the tabs and plots are interactive. The above dashboard allows for users to get to know the data in a simple and quick way.

The data about package downloads is succinctly summarized in an impactful manner.

```{r, echo = FALSE, fig.link = "https://jjallaire.shinyapps.io/shiny-crandash/#dashboardl"}
knitr::include_graphics(here::here("img", "dashboard.png"))
```

We can quickly get a sense that the `magrittr` package is among the top most widely downloaded packages on CRAN.


##### [[source]](https://jjallaire.shinyapps.io/shiny-crandash/#dashboardl)

Now let's learn how to create a dashboard with our data of interest.

# **Main Questions**
*** 

#### {.main_question_block}
<b><u> Our main questions: </u></b>

1) What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)? 

2) How many individuals are typically killed in a school shooting?

3) What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?

####


# **Learning Objectives** 
*** 

In this case study, we will demonstrate how to create a [dashboard](https://en.wikipedia.org/wiki/Dashboard_(business)), which is a website that displays a report about a database. In doing so, we will focus on packages and functions from the [`tidyverse`](https://www.tidyverse.org/){target="_blank"} for the data wrangling and visualization sections. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R more human-readable and intuitive.


```{r, out.width = "20%", echo = FALSE, fig.align ="center"}
include_graphics("https://tidyverse.tidyverse.org/logo.png")
```

The skills, methods, and concepts that students will be familiar with by the end of this case study are:


<u>**Data Science Learning Objectives:**</u>  
  
1. Importing text from a Google Sheets document (`googlesheets4`)  
2. Converting date formats (`lubridate`)  
3. Geocoding data (`ggmap`)  and creating a jitter for geocoded data on a map (`SF`)
4. How to reshape data by pivoting between "long" and "wide" formats and drop rows with `NA` values (`tidyr`)  
5. How to create data visualizations with `ggplot2` 
6. An introduction to the basics of R Markdown
7. How to create an interactive table (`DT`)  
8. How to create a map (`leaflet`)  
9. How to create an interactive dashboard with `flexdashboard` and `shiny`  

<u>**Statistical Learning Objectives:**</u>    

1. Calculating percentages for data with missing values  
2. Creating [summary statistics](https://en.wikipedia.org/wiki/Summary_statistics)

*Note: statistics is a part of data science*

*** 


We will begin by loading the packages that we will need:


```{r}
library(here)
library(readr)
library(googlesheets4)
library(tibble)
library(dplyr)
library(stringr)
library(magrittr)
library(tidyr)
library(ggmap)
library(sf)
library(lubridate)
library(DT)
library(htmltools)
library(ggplot2)
library(forcats)
library(ggforce)
library(waffle)
library(poliscidata)
library(flexdashboard)
library(shiny)
library(leaflet)
library(maps)
library(vembedr)
library(OCSdata)
```

Note [some of these packages](https://www.tidyverse.org/packages/) are part of the `tidyverse` and can be loaded together like so:

```{r, eval=FALSE}
library(tidyverse)
```


 <u>**Packages used in this case study:** </u>

Package   | Use in this case study                                                                      
---------- |-------------
[here](https://github.com/jennybc/here_here){target="_blank"}       | to easily load and save data  
[readr](https://readr.tidyverse.org/) |  to import the data  as a csv file  
[googlesheets4](https://googlesheets4.tidyverse.org/) | to import directly from Google Sheets
[tibble](https://tibble.tidyverse.org/) | to create tibbles (the tidyverse version of dataframes)
[dplyr](https://dplyr.tidyverse.org/){target="_blank"}      | to filter, subset, join, add rows to, and modify the data  
[stringr](https://stringr.tidyverse.org/){target="_blank"}      | to manipulate  character strings within the data (collapsing strings together, replace values, and detect values)
[magrittr](https://magrittr.tidyverse.org/){target="_blank"}      | to pipe sequential commands 
[tidyr](https://tidyr.tidyverse.org/){target="_blank"}      | to change the shape or format of tibbles to wide and long, to drop rows with `NA` values, and to see the last few columns of a tibble
[ggmap](https://cran.r-project.org/web/packages/ggmap/ggmap.pdf) | to geocode the data (which means get the latitude and longitude values)
[sf](https://r-spatial.github.io/sf/) | to modify the geocoded data so that overlapping points did not overlap
[lubridate](https://lubridate.tidyverse.org/) | to work with the data-time data    
[DT](https://rstudio.github.io/DT/) | to create the interactive table  
[htmltools](https://www.rdocumentation.org/packages/htmltools/versions/0.5.0) | to add a caption to our interactive table 
[ggplot2](https://ggplot2.tidyverse.org/){target="_blank"}      | to create plots  
[ggforce](https://cran.r-project.org/web/packages/ggforce/ggforce.pdf)   | to create a plot zoom
[forcats](https://forcats.tidyverse.org/){target="_blank"}      | to reorder factor for plot
[waffle](https://github.com/hrbrmstr/waffle) | to make waffle proportion plots  
[poliscidata](https://cran.r-project.org/web/packages/poliscidata/poliscidata.pdf) | to get population values for the states
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)     | to create the dashboard  
[shiny](https://shiny.rstudio.com/){target="_blank"}      | to allow our dashboard to be interactive   
[leaflet](https://rstudio.github.io/leaflet/shiny.html) | to implement the [leaflet](http://leafletjs.com/) (a JavaScript library for maps) to create the map for our dashboard   
[maps](https://cran.r-project.org/web/packages/maps/maps.pdf) | to create the simple leaflet map example   
[vembedr](https://github.com/ijlyttle/vembedr) | to include a video in our case study 
[OCSdata](https://github.com/opencasestudies/OCSdata) | to access and download OCS data files



The first time we use a function, we will use the `::` to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.


# **Context**
*** 

School shootings get a lot of attention in the the media, but it would be helpful to see all the data on them at once to better understand them. A dashboard can help with this, so that people get a boarder understanding of the issue rather than hearing about singular specific incidences from the media.

In addition to injuries and deaths, shooting events can also have broad and lasting impacts for those who witness but are not directly involved. 

According to the [Center for Injury Research and Prevention at the Children's Hospital of Philadelphia](https://injury.research.chop.edu/violence-prevention-initiative/types-violence-involving-youth/school-shootings):

> The most common shootings on school grounds rarely involve large numbers of victims, but even a shooting of just one student at school has ramifications far beyond those directly involved. 

> Students and staff that witness school shootings are likely to suffer from [traumatic stress](https://en.wikipedia.org/wiki/Traumatic_stress) symptoms, become anxious or depressed and have general concerns about their safety. 

> While many witnesses will have temporary symptoms, others will be symptomatic for a much longer period of time and even develop chronic psychiatric disorders. 

> Even short-term impairments can cause severe distress and have profound effects on academic achievement and the social and emotional growth of impacted students. 
 

Furthermore, school shootings can have vast and lasting impacts because many students can witness a single event.

Another recently published [article](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf) indicates that:

> Over **240,000** American students experienced a school shooting in the last two decades.

```{r, echo=FALSE}
knitr::include_graphics(here::here("img", "exposed.png"))
```

##### [[source]](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf)
 
This study followed students who experienced a school shooting the United States between 2008 and 2013 and assessed their mental well-being. They found that:

> Fatal school shootings have large and persistent impacts on the mental health of local youth. In the two years following a fatal school shooting, the monthly number of antidepressant prescriptions written to individuals under age 20 is 21.3 percent higher in the shooting-exposed relative to the reference areas. 

#### {.reference_block}

Rossin-Slater, M., Schnell, M., Schwandt, H., Trejo, S. & Uniat, L. Local Exposure to School Shootings and Youth Antidepressant Use. w26563 http://www.nber.org/papers/w26563.pdf (2019) doi:10.3386/w26563.

####


Thus, it is useful to better understand the characteristics of these shootings. Having better data on what they look like nationwide can help with identifying associations of shootings with key characteristics. Better descriptive information such as this may then lead to more knowledge about factors that predict school shootings, which could help develop preventive interventions. This way, we might not only prevent the direct involvement of students in future events, but also to prevent students and staff from witnessing these events.


```{r, echo = FALSE, outwidth = "40%"}
knitr::include_graphics(here::here("img", "fernando-cferdo-6x2iKGi6SPU-unsplash.jpg"))
```

<span>Photo by <a href="https://unsplash.com/@cferdo?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Fernando @cferdo</a> on <a href="https://unsplash.com/s/photos/depression?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></span>

 
# **Limitations**
*** 
There are some important considerations regarding this data analysis to keep in mind: 

This dashboard only uses one source of data. There may be school shooting events that are not listed in this data or errors in this data.

According to the database website itself:

>"This database was developed from open-source information and may include reporting errors."

Furthermore, according to this [article](https://link.springer.com/article/10.1007/s11920-012-0331-6), schools in 2013, schools were not required to report school shootings unless they resulted in a suicide or homicide. Therefore there may be more events that result in only injury or no injuries or death that may not be included.

There are indeed events in the dataset that include zero deaths and zero injuries, but it is very likely that many of these events are not listed.

# **What are the data?**
*** 

We will use data from the open-source [K-12 Shool Shooting Database](https://www.chds.us/ssdb/data-map/) from the [Center for Homeland Defense and Security](https://www.chds.us/c/) at the [Naval Postgraduate School(NPS)](https://en.wikipedia.org/wiki/Naval_Postgraduate_School) in Monterey, California. This data is updated daily. The data used in this case study was downloaded in June of 2020. 

#### {.reference_block}

Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, [www.chds.us/ssdb](www.chds.us/ssdb).

####

This database includes information about school shooting events for students in grades K-12 in the United States dating back to 1970. The database has additional information not shown on our dashboard including but not limited to: 

- Location of the event at the school  
- If the event occurred during a sporting event  
- Time of day of the event  
- Day of the week of the event  
- Source for the shooting information  
- If the event was pre-planned or not  
- Shooter's actions immediately following the shooting  
- Shooter characteristics (affiliation with the school, if they had accomplices, if they took hostages, and their age and race)  
- Victim characteristics (affiliation with the school, if they were targeted, their age and race)  

According to the [K-12 Shool Shooting Database](https://www.chds.us/ssdb/about/) website:


> The School Shooting Database Project is conducted as part of the [Advanced Thinking in Homeland Security (HSx)](https://www.chds.us/c/academic-programs/hsx/) program at the Naval Postgraduate School’s [Center for Homeland Defense and Security (CHDS)](https://en.wikipedia.org/wiki/Center_for_Homeland_Defense_and_Security).

> The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. All of the information is based on open-source information and 3rd party reporting.


# **Data Import**
*** 

Previously, the website for this data was located at "https://www.chds.us/ssdb/dataset/" (which is no longer an active link), which displayed an active Google Sheets document and a link to download a csv file of the data. At the time that we created this case study (June of 2020) we downloaded the data from this website.

Now the data can be found at this [link](https://www.chds.us/ssdb/about/) and a file of the raw data can be downloaded by clicking the "DOWNLOAD RAW DATA" button. This file was previously a `.csv` file, but it is now an `.xlsx` file. 

To account for changes with this website, we have made the previous `.csv` file available for you to download using the `OCSdata` package:

```{r, eval=FALSE}
# library(OCSdata)
raw_data("ocs-bp-school-shootings-dashboard", outpath = getwd())
```

If you have trouble using the package, you may also download this `.csv` file [here](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/blob/master/data/raw/K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv).

In our case, we downloaded this data and put it within a "raw" subdirectory of a "data" directory for our project. If you use an RStudio project, then you can use the `here()` function from the `here` package to make the path for importing this data simpler. The `here` package automatically starts looking for files based on where you have a `.Rproj` file which is created when you start a new RStudio project. We can specify that we want to look for the "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv" file within the "raw" directory within the "data" directory within a directory where our `.Rproj` file is located by separating the names of these directories using commas and listing "data" first. 

#### {.click_to_expand_block}
<details> <summary> Click here to see more about creating new projects in RStudio. </summary>

You can create a project by going to the File menu of RStudio like so:


```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "New_project.png"))
```

You can also do so by clicking the project button:

```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "project_button.png"))
```

See [here](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) to learn more about using RStudio projects. 

</details>

####

We can import the raw `.csv` file using the `read_csv()` function from the `readr` package. Let's start by only importing the first five rows with the `n_max` argument which is the max number of rows to read in from the file. By doing this, we can check for errors before reading in the entire file. Note that you would need to modify the `file` argument if you set your data files up differently. 

```{r}
shooting_data <- 
  readr::read_csv(file = 
                    here::here("data", "raw",
                         "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv"), 
                  n_max = 5)
shooting_data
```

We see the first row is a sentence that states: 

> "Updated 6/2/2020 - View graphs and research methodology on www.chds.us/ssdb If you have information about other incidents, please email K12ssdb@chds.us."

We do not need this information, so we can skip it using the `skip` argument of `read_csv()` function. Specifically, we specify that we wish to only skip 1 row with `skip = 1`. We can also specify that the next row should be used for column names using the `col_names = TRUE` argument. 

```{r}
shooting_data <- 
  readr::read_csv(file = here::here("data", "raw",
                                    "K-12_SSDB_(Public)-K-12_SSDB_(Public)_Linked.csv"), 
                  col_names = TRUE, skip = 1)
```

We can use the `glimpse` function from the `dplyr` package to take a look at columns within the database:

#### {.scrollable }
```{r}
# Scroll through the output!
dplyr::glimpse(shooting_data)
```

####

We can also use the utils `str()` function, which is short for "structure" to see more details about the internal structure of the data. Therefore, the `str()` function will give us more information about the actual values for each column within the data, not just the columns themselves. Typically we would be able to see some of the values with `glimpse()` function as well, but some of the columns have very long names, thus obscuring the first few values in the output. 

#### {.scrollable }
```{r}
# Scroll through the output!
str(shooting_data)
```



We can see from this that many of the variables have Yes or No values, while others have relatively long descriptions. You may also notice that the `State` values are state abbreviations, not full state names. This is something that we will add to the data later. 


####

Alternatively, if we wanted to make a dashboard that continually updated as data got updated, we could do the following to import the data directly from a live Google Sheets document as previously this was available for this data.

To do so we would use the `read_sheet()` function from the `googlesheets4` package. Typically authentication is required, (meaning that you would need to sign in with your Google account using a username and or password), but since this was a public sheet we did not need to worry about authentication. To avoid being asked about this we used the `gs4_deauth()` function which puts the package into a de-authorized state that will not ask for users to sign in. 


```{r}
googlesheets4::gs4_deauth()
```

Great, now we would need to get the shared link from the document. We could previously do so by clicking on the link to the actually Google Sheets document like so:

```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "gettodoc.png"))
```

Then we can click on the "share" button to get access to the link:

```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "gettoshare.png"))
```

Finally we can click on "copy link" button to copy the link:

```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "getlink.png"))
```

Once you copy a link like this, you can use the `read_sheet()` function to import the data by simply pasting the link in quotes, like so:

```{r eval = FALSE}
data_url <- "https://docs.google.com/spreadsheets/d/1HqbfMxnk9X3_mQvLyW_LEUe3Yyr7cXMPfwqUVfdq7sY/edit?usp=sharing"

googlesheet_data <- 
  read_sheet(data_url)
```

This is a great option, however, we chose not to do this for this case study to allow this tutorial to be more easily maintained over time. This was evidently a good choice since the data is no longer accessible in the same way. 

####

To allow users to skip import we will save the data as an RDA file:

```{r, eval = FALSE}
save(shooting_data, file = here::here("data", "imported", "shooting_data.rda"))
```

# **Data Exploration and Wrangling**
***
If you have been following along but stopped, we could load our imported data like so:
```{r}
load(here::here("data", "imported", "shooting_data.rda"))
```

#### {.click_to_expand_block}
<details> <summary> If you skipped the data import section click here. </summary>

First you need to install the `OCSdata` package:

```{r, eval=FALSE}
install.packages("OCSdata")
```

Then, you may download the imported data `.rda` file using the following function:

```{r, eval=FALSE}
# library(OCSdata)
imported_data("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(here::here("OCSdata", "data", "imported", "shooting_data.rda"))
```

To load the downloaded data into your environment, you may double click on the `.rda` file in RStudio or using the `load()` function.

If the package does not work for you, an RDA file (stands for R data) of the data can be found [here](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/tree/master/data/imported) or slightly more directly [here](https://raw.githubusercontent.com/opencasestudies/ocs-bp-school-shootings-dashboard/master/data/imported/shooting_data.rda). Download this file and then place it in your current working directory. We recommend using an RStudio project and the [`here` package](https://github.com/jennybc/here_here) to navigate to your file more easily. 

We have put this file in a directory called "imported" within a directory called "data" within our working directory (which has a .Rproj file).

```{r}
load(here::here("data", "imported", "shooting_data.rda"))
```

<hr style="height:1px;border:none;color:#333;background-color:#333;" />
<details> <summary> Click here to see more about creating new projects in RStudio. </summary>

You can create a project by going to the File menu of RStudio like so:


```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "New_project.png"))
```

You can also do so by clicking the project button:

```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "project_button.png"))
```

See [here](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) to learn more about using RStudio projects. 

</details>
<hr style="height:1px;border:none;color:#333;background-color:#333;" />
</details>

####

Luckily, our data is already in pretty good shape, but we want to make our data more useful for our dashboard. 


## **Adding state name**
***

It would be useful to have the full state name in our data, rather than just the abbreviation.

We can do so by using data related to the US 50 states in a dataset called `state` that is automatically loaded with R sessions in the `datasets` package. The `state.abb` object is a list of the state abbreviations and `state.name` is a list of the state names.

```{r}
state.abb
state.name
```

We will combine these using the `tibble()` function from the `tibble()` package. 

```{r}
state_df <- 
  tibble(State_abb = state.abb, 
         State = state.name)

slice_head(state_df, n=4)
```

Now we will combine this with our data using the `left_join()` function from the `dplyr` package.
There are several ways to join data using the `dplyr` package.

```{r, echo = FALSE, outwidth = "50%"}
knitr::include_graphics(here::here("img", "join.png"))
```

##### [[source]](https://dplyr.tidyverse.org/reference/join.html)

Here is  a visualization of these options:

```{r, echo = FALSE, outwidth = "10%"}
# inner_join
knitr::include_graphics("https://raw.githubusercontent.com/gadenbuie/tidyexplain/master/images/inner-join.gif")
```

##### [[source]](https://github.com/gadenbuie/tidyexplain/blob/master/images/inner-join.gif)

```{r, echo = FALSE, outwidth = "10%"}
# left_join
knitr::include_graphics("https://raw.githubusercontent.com/gadenbuie/tidyexplain/master/images/left-join.gif")
```

##### [[source]](https://github.com/gadenbuie/tidyexplain/blob/master/images/left-join.gif)

```{r, echo = FALSE, outwidth = "10%"}
# right_join
knitr::include_graphics("https://raw.githubusercontent.com/gadenbuie/tidyexplain/master/images/right-join.gif")
```

##### [[source]](https://github.com/gadenbuie/tidyexplain/blob/master/images/right-join.gif)


```{r, echo = FALSE, outwidth = "10%"}
# full_join
knitr::include_graphics("https://raw.githubusercontent.com/gadenbuie/tidyexplain/master/images/full-join.gif")
```

##### [[source]](https://github.com/gadenbuie/tidyexplain/blob/master/images/full-join.gif)

See [here](https://dplyr.tidyverse.org/reference/join.html) for more details about joining data.

We probably have data for all fifty states, but there may not have been school shootings in all 50 states in this dataset, therefore we don't want to use the `full_join()` function. 

We also don't want the `inner_join()` function because `DC` does not have a state name. According to Wikipedia:

> The [U.S. Constitution](https://en.wikipedia.org/wiki/U.S._Constitution) provides for a [federal district](https://en.wikipedia.org/wiki/Federal_district) under the [exclusive jurisdiction](https://en.wikipedia.org/wiki/District_of_Columbia_home_rule) of [Congress](https://en.wikipedia.org/wiki/United_States_Congress); the district is therefore not a part of any [U.S. state](https://en.wikipedia.org/wiki/U.S._state) (nor is it one itself)

Thus we will use the `left_join(x,y)` function where `x` in this case will be the `shooting_data` (as it is introduced to this code first through the `%<>%` compound assignment pipe operator) and `y` is the `state_df`. Thus, we add the `state_df` values where they match to the `shooting_data`.


The `%<>%` compound operator allows us to use the an input and reassign it at the end after all the subsequent steps have been performed. We can therefore use `data_input %<>%` instead of `data_input <- data_input %>%`. We will demonstrate this in the code below.

```{r}
shooting_data %<>%
  rename("State_abb" = State) %>%
  left_join(state_df, by = c("State_abb" = "State_abb"))
```


In contrast, we can just use the `%>%` pipe operator to select a set of columns and peek at the first four rows of the new data frame.  

#### {.click_to_expand_block}

<details> <summary>Click here if you are unfamiliar with piping in R, which uses this `%>%` operator.</summary>  

By [piping](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html) we mean using the `%>%` pipe operator which is accessible after loading the `tidyverse` or several of the packages within the tidyverse like `dplyr` because they load the [`magrittr` package](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html). 
This allows us to perform multiple sequential steps on one data input.
The object on the left side is used as input to any commands to the right or below.

</details>  

####

```{r}
shooting_data %>%
  select(School, City, State_abb, State) %>%
  slice_head(n = 4)
```

## **Reformatting dates**
***

We also want to reformat our date values and create a `Date_year` variable based on the year in each date. We can use the `lubridate` package for this.

The `mdy()` function converts dates into a format where dates are listed as month, date, and year with hyphens in between.
The `year()` function can then be used to extract just the year from each date.

```{r}
shooting_data %<>%
  mutate(Date = lubridate::mdy(Date)) %>%
  mutate(Date_year = lubridate::year(Date))

shooting_data %>% 
  select(Date, Date_year)
```
Looks good!

## **Reformatting data types**
***

If you recall, in our dataset we have many variables that have either `Yes` or `No` values or `Y` and `N` values.  

```{r}
names(shooting_data)
```

*Note that in this case study, we will mostly be visualizing the data. However, for more intensive analysis, it would be better to make names more tidy, such as using lowercase and no spaces etc.*

These are the variables that have `Y/N` in the name or the `Targeted Specific Victim(s)`, `Random Victims`, `Pre-planned school attack` variables. 

We can make these consistently `TRUE` and `FALSE` by using the `case_when()` function from the `dplyr` package. This function allows us to specify new values for existing values.

#### {.click_to_expand_block}

<details> <summary> If you are familiar with the `recode()` function from `dplyr`, click here for an explanation of why `case_when()` is better in this case. </summary>

The benefit of the `case_when()` function, is that changing the values to `TRUE` or `FALSE` also results in the class type of the variable changing to type logical (which is interpreted as a binary variable with `TRUE` and `FALSE` values) otherwise, with `recode()` the variables would remain as class type character. 

<hr style="height:1px;border:none;color:#333;background-color:#333;" />
<details> <summary> Click here for an explanation about data types in R. </summary>

There are several classes of data in R programming. 
Character is one of these classes. 
A character string is an individual data value made up of characters. 
This can be a paragraph, like the legend for the table, or it can be a single letter or number like the letter `"a"` or the number `"3"`. 
If data are of class character, than the numeric values will not be processed like a numeric value in a mathematical sense. 

If you want your numeric values to be interpreted that way, they need to be converted to a numeric class. 
The options typically used are integer (which has no decimal place) and double precision (which has a decimal place). 

Similarly if your data is of class character and are values of `TRUE` and `FALSE` they will be interpreted as two different strings. 

However, **logical data** is interpreted slightly differently where a `FALSE` value indicates the absence of something, while a `TRUE` indicates the presence of something.

</details>

<hr style="height:1px;border:none;color:#333;background-color:#333;" />

<details><summary> Click here for more details about the differences between the `recode()` and `case_when()` functions. </summary>

Note that with `recode()` there is the option that other values be recoded to `NA` although this is not the default, however with `case_when()` other values not explicitly assigned in the `case_when()` statement will be assigned to `NA`. Further more only values can be used on the left side when using `recode()` whereas `case_when()` accepts expressions.

</details>

</details>

####

OK let's start by looking at the columns of interest by using the `select()` function and asking for any patterns that match the character string "Y/N" or "Specific" or "Random" or "Pre-planned". 

Formally, we can search for these using the `|` symbol, which is interpreted as an or, thus any variables that has a name that matches any of these patterns will be changed. 

```{r}
shooting_data %>% 
  select(matches("Y/N|Specific|Random|Pre-planned"))
```

We see the `Yes` and `No` values. Let's look closer at one of these columns. 

```{r}
shooting_data %>% 
  count(`Suicide (or attempted suicide) by Shooter (Y/N)`)
```
We see there are six different values in this column. To recode this column, we need to consider what we recode all the values. 

To implement the `case_when()` recoding of values, the existing values are written on the left of the `~` sign (quotation marks are necessary around the existing values) and new values are written on the right (quotations marks are not necessary as these are `TRUE` and `FALSE` statements).

```{r}
shooting_data %>%
       select(`Suicide (or attempted suicide) by Shooter (Y/N)`) %>%
       mutate(type = dplyr::case_when(. == "Yes" ~ TRUE,
                                      . == "No" ~ FALSE,
                                      . == "Y" ~ TRUE,
                                      . == "N" ~ FALSE,
                                      . == "Officer Involved" ~ TRUE))
```

In the above code chunk, we did this for one of the columns, but now let's do for all the columns that matched our string "Y/N|Specific|Random|Pre-planned" as above. 

To do this, we will use the `across()` function from the `dplyr` package and the `matches()` function from the `tidyselect` package to allow us to apply this to all of the variables that have a pattern that that matches any of those of the variables we want to change. 

The `across()` function then applies the `case_when()` function to all of these variables. Notice that the `~` symbol is necessary before the function that is applied using `across()`.


```{r}
shooting_data %<>%
  mutate(dplyr::across(.cols = matches("Y/N|Specific|Random|Pre-planned"),
                       ~ dplyr::case_when(. == "Yes" ~ TRUE,
                                          . == "No" ~ FALSE,
                                          . == "Y" ~ TRUE,
                                          . == "N" ~ FALSE,
                                          . == "Officer Involved" ~ TRUE)))
```

Finally, we can check out what happened after recoding the variables.  

```{r}
shooting_data %>% 
  select(matches("Y/N|Specific|Random|Pre-planned"))
```

Looks good!

## **Geocoding with the `ggmap` package**

For the purpose of our dashboard, we are interested in creating a map. 

To do this, we need to perform a process called [geocoding](https://en.wikipedia.org/wiki/Geocoding). Geocoding is the process of converting addresses into latitude and longitude coordinates.

To perform the geocoding we need the address of each school in the data set. The data currently does not list the actual address, but does have information about the school where the event occurred. 

Since some schools have the same name, we need the city and state data as well. So we will create a new variable in our data called `address` using the `mutate()` function from the `dplyr` package. 

This variable will collapse the values in the `School`, `City`, and `State` columns but with spaces in between. It is specified such that there will be space in between by the `sep = " "` argument. 

**Note**: a space is typed between the quotation marks. 

In this way, we then can use the address variable to look up the latitude and longitude for each school.

```{r}
shooting_data %<>%
  dplyr::mutate(address = 
                  stringr::str_c(School, City, State_abb, sep = " "))
```

We can take a look at just this new `address` variable using the `pull()` function from the `dplyr()` package.

```{r}
shooting_data %>%
  dplyr::pull(address) %>% 
  head()
```

Now we can use these addresses to find the latitude and longitude coordinates for each school where a school shooting occurred. 

To do this, we will use the `geocode()` function from the `ggmap` package to look up these addresses on Google Maps to get the latitude and longitude values. In the `geocode()` function, we also need to specify that we want to use google as the source using the `source` argument and that we want latitude and longitude using the `output = c("latlon")` argument.

This step requires registering with the Google Cloud Platform to get an API key, which currently requires registering your payment information and agreeing to the [Google Maps API Terms of Service](https://developers.google.com/maps/terms).

You are **not required to do this yourself**! We will give you the data. 

#### {.click_to_expand_block}

<details><summary> Click here to see how this process works in general.</summary>

<hr style="height:1px;border:none;color:#333;background-color:#333;" />

<details><summary> Click here to see how we registered with the Google Cloud Platform.</summary>

If you were to do this process yourself, you could get an API key [here](https://cloud.google.com/maps-platform/). Again this requires registering your payment information, but it is free to got an API key and enable the APIs, however you can be billed based on how many addresses you look up using the APIs. You need to look up thousands before getting billed.

Then you need to enable the maps and places APIs, by clicking on the boxes next to each:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "enable.png"))
```

Then you would register like so after copying the API key: (Note this is a fake key)

```{r, eval = FALSE}
ggmap::register_google(key = "mQkzTpiaLYjPqXQBotesgif3EfGL2dbrNVOrogg") 
```

</details>

<hr style="height:1px;border:none;color:#333;background-color:#333;" />

Once we have obtained an API key and are registered, we can geocode our data.

Note that this step is time intensive, as there are many addresses to look up! Therefore, we will just show how this is done and **will not evaluate the code for the next few code chunks**. Again we will use the `geocode()` function from the `ggmap` package to perform this step.


```{r, eval=FALSE}
shooting_data <- 
  shooting_data %>%
  mutate(coords = ggmap::geocode(address, 
                                 output = c("latlon"), 
                                 source = c("google")))
```

This results in tibble called `coords` being added to our `shooting_data` tibble. That's right, we can have a tibble as a column or variable within a tibble.   Using the `glimpse` function again, and looking at the last few variables, we can see that now the last variable listed is `coords` of class `<tibble>`.

```{r,echo = FALSE, out.width="90%"}
knitr::include_graphics(here::here("img", "shooting_data_coords.png"))
```

If we take a look at the first couple of values of the `coords` tibble, we see a tibble that looks like this:


```{r,echo = FALSE, out.width="30%"}
knitr::include_graphics(here::here("img", "coords.png"))
```

It would be better if each of these were their own columns in the tibble, so we will create new `longitude` and `latitude` variables again using the `mutate` function like so:

```{r, eval = FALSE}
shooting_data <- 
  shooting_data %>%
  mutate(longitude = pull(coords,lon),
         latitude = pull(coords,lat))
```

In this case we use the `pull()` function to grab the `lat` and `lon` variables within the `coords` tibble which is a variable of the `shooting_data` tibble. This can also be done using the [`unpack()` function](https://tidyr.tidyverse.org/reference/pack.html) from the `tidyr` package.

We can now remove the `coords` tibble like so, using the `select()` function from the `dplyr` package:

```{r, eval = FALSE}  
shooting_data <- 
  shooting_data %>%
  dplyr::select(-coords)
```

Now using `glimpse()` and looking at the last several variables, we can see that we no longer have a `coords` variable, but we do have two variables called `longitude` and `latitude` that are of class double as indicated by the `<dbl>`:

```{r,echo = FALSE, out.width="80%"}
knitr::include_graphics(here::here("img", "double.png"))
```

Now we will save the geocoded data in the `wrangled` directory of our `data` directory using the `save` function.

This requires listing the R object, followed by the path for where the file should be saved and what it should be called. In this case it will be called `shooting_data_wrangled_pre_map.rda`. First let's create a new object called `shooting_data_wrangled_pre_map` so it is clear in the future what we are working with when we load the data. We will also write this data to a csv file, which can be convenient for collaborators. To do this we will use the `write_csv()` function from the `readr` package.


```{r, eval = FALSE}
shooting_data_wrangled_pre_map <- shooting_data
save(shooting_data_wrangled_pre_map, file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.rda"))

readr::write_csv(shooting_data_wrangled_pre_map, 
                 file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.csv"))
```

####

You can download the wrangled data with latitude and longitude values using the `OCSdata` package:

```{r, eval=FALSE}
# library(OCSdata)
wrangled_rda("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(file = here("OCSdata", "data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
```

You can also access this data [here](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/blob/master/data/wrangled).

To load the data, you may double click the downloaded `.rda` file in RStudio, or put the downloaded file in the appropriate directory and use the following command.

```{r}
load(file = here("data", "wrangled",
                             "shooting_data_wrangled_pre_map.rda"))
```



## **Geometry lists with the `sf` package**

In this section, we will use the `sf` (which stands for simple features) package to 
create what is called a geometry list of our latitude and longitude information for the schools where shootings occurred. As some school shootings occurred in the same location, we need to alter all of the locations a bit so that when we plot the data on a map, the spots indicating where school shootings occurred will not overlap for the same location.

Let's learn how to do this. 

```{r}
shooting_data_wrangled_pre_map %>% 
  select(latitude, longitude) %>% 
  slice_head(n = 5)
```

First, let's remind ourselves how many rows we have in our dataset.

```{r}
dim(shooting_data_wrangled_pre_map)
```

Since tibbles give dimensions, instead of using the function `dim()`, we might also check the dimensions of our dataset by simply doing so:

```{r}
shooting_data_wrangled_pre_map
```

This is important because not all rows have a recorded latitude and longitude. 

```{r}
shooting_data_wrangled_pre_map %>% 
  filter(is.na(latitude)) %>% 
  select(longitude, latitude, address)
```

Therefore, before we can proceed, we need to remove rows with `NA` values for the `latitude` and `longitude` variables. In other words, we need to remove rows of events that happened at schools with locations that were not identified by Google. 

We can remove these rows using the `drop_na()` function from the `tidyr` package. We will use a `.` to indicate that we want to use the data that we are using as an input with our pipe, but then we will specify that we want to only drop rows were there is an `NA` value for either the `latitude` or `longitude` variables.

```{r}
shooting_data_wrangled_for_map <- shooting_data_wrangled_pre_map %>%
 tidyr::drop_na(c(latitude, longitude))
```

How many did we remove? Let's look at the dimension of our new dataset. 

```{r}
dim(shooting_data_wrangled_for_map)
```

We see that there were 5 events that occurred at schools with unidentified complete locations (missing either latitude, longitude, or both) that were removed from our dataset. 

Next, we are ready to convert our coordinates variables (`latitude` and `longitude`) into a coordinate simple feature (or `sf` object) using the `st_as_sf()` function (converts foreign object to an `sf` object). 

To do this, we need to specify what our coordinate variables are and we will also specify what [coordinate reference system](https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems),(crs) we would like to use. In our case we will use the [ESPG](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset) reference number [4326](https://spatialreference.org/ref/epsg/4326/), known as ESPG:4326 or the [World Geodetic System (WGS) version 84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84) which is one of the most commonly used CPS and used by by most global positioning systems, known as GPS. **This tells R  to use the values for the variables called `latitude` and `longitude` as latitude and longitude coordinates.**

```{r}
shooting_data_wrangled_for_map %<>%
  sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4326)

dim(shooting_data_wrangled_pre_map)
```

We can see that our `latitude` and `longitude` variables were used to create a single new variable called `geometry` of class  `<POINT` [$^{\circ}$]`>`, thus we have one less column.

In this case, this type of variable will always be shown. Even if we were to look at just the  first 4 variables using indexes (like this: `[1:4]`), we will also see our last `sf` variable appended at the end.

So now we can see all variables related to location (which happen to be the first four variables and the `geometry` variable) by simply typing `[1:4]` next to the name of our tibble `shooting_data_geocoded`.

```{r}
shooting_data_wrangled_for_map[1:4]
```

To allow our points to not overlap for events that took place in the same location, we will add a bit more range so that they do not overlap one another on our map. 

To do this, we will transform the coordinates using the `st_transform()` function  of the `sf` package into a two dimensional projection (called the [Albers equal-area conic projection](https://en.wikipedia.org/wiki/Albers_projection#:~:text=The%20Albers%20equal%2Darea%20conic,that%20uses%20two%20standard%20parallels.&text=The%20Albers%20projection%20is%20used,the%20United%20States%20Census%20Bureau.)) with units in meters using the [crs 102008](https://spatialreference.org/ref/esri/102008/html/) reference from the [Environmental Systems Research Institute (ERSI)](https://en.wikipedia.org/wiki/Esri) and then use the `st_jitter()` function from the `sf` package  to allow a specified amount of range near the actual original GPS coordinates. 

To learn more about geospatial coordinate systems see [here](https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf) and [here](https://guides.library.duke.edu/r-geospatial/CRS).

So here we can see the output after transforming our data to the [crs 102008](https://spatialreference.org/ref/esri/102008/html/) reference:

```{r}
shooting_data_wrangled_for_map  %<>%
  st_transform(crs = "ESRI:102008") 
```


Notice how the  class for the `geometry` variable is now `<POINT [m]>` as our data has been transformed into coordinates in meters.

```{r}
shooting_data_wrangled_pre_map[1:5]
```

And now we will add a jitter to the points using the `st_jitter()` function, meaning that we will randomly move the coordinates a little bit to allow for points at the same location to not overlap on the map.

You can see the tidyverse explanation about when to use a jitter plot  [here](https://ggplot2.tidyverse.org/reference/geom_jitter.html), they state that a jitter:

> adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

In this case we will allow for 50 meters of range using the `amount = 50` argument.

```{r}
shooting_data_wrangled_for_map %<>%
   sf::st_jitter(amount = 50)
```


We can now see that the coordinates are slightly modified. 

```{r}
shooting_data_wrangled_for_map[1:4]
```

**Note**: the `geometry` values have changed.

Now we will transform our coordinates back into the 3D latitude and longitude degree system again using the `st_transform()` function and the [ESPG:4326](https://spatialreference.org/ref/epsg/4326/), coordinate system.

```{r}
shooting_data_wrangled_for_map  %<>%
  st_transform(crs = 4326)

shooting_data_wrangled_for_map[1:4]
```

Notice how the `geometry` variables are different from what they were originally with this coordinate system:

```{r, echo = FALSE, out.width= "90%"}
knitr::include_graphics(here::here("img", "geometry.png"))
```

Next, we separate the `geometry` variable into `longitude` and `latitude` variables again. We can use the ` st_coordinates()` function from the `sf` package to extract the coordinates from our tibble as a matrix.

```{r}
shooting_data_wrangled_for_map %<>% 
  mutate(coordinates = as_tibble(st_coordinates(.)))

shooting_data_wrangled_for_map %>%
  pull(coordinates) %>%
  slice_head(n = 4)
```

Now, just as we did previously we will create new variables called `latitude` and `longitude` from the `X` and `Y` variables within the `coordinates` tibble that is part of our `shooting_data_wrangled_for_map` using the `pull()` function.

We will also convert our `shooting_data_wrangled_for_map` object which is currently a `sf` into a tibble using the `as_tibble()` function from the `tibble` package and then we will remove the `geometry` and `coordinates` variables using the `select()` function from the `dplyr` package with a minus operator in front of the names of the variables to remove.

```{r}
shooting_data_wrangled_for_map %<>%
  mutate(longitude = pull(coordinates,X),
          latitude = pull(coordinates,Y)) %>%
  tibble::as_tibble() %>%
  select(-geometry) %>%
  select(-coordinates)
```

And now we can take a look at  our last 3 variables using the `last_col()` function, which is a `select()` helper function `tidyr` package (See [here](https://tidyselect.r-lib.org/reference/select_helpers.html) for other `select()` helper functions). 

The `last_col()` function allows us to select either the last column, or with a specified offset we can select a number of columns before the last column. Thus, 2 columns before the last column  would be `last_col(offset = 2)` and then the `:` symbol is interpreted as through, thus we are selecting for the third to last column through the last column with `last_col(offset = 2):last_col()`.

```{r}
shooting_data_wrangled_for_map %>% 
  select(tidyr::last_col(offset = 2):last_col()) %>% 
  slice_head(n = 4)
```

Great! That looks like we expected.

Finally, we will save our wrangled data, again using `save()` and we will also write to a CSV file using `write_csv()`.
```{r, eval = FALSE}

save(shooting_data_wrangled_for_map, 
          file = here("data", "wrangled",
                      "shooting_data_wrangled_for_map.rda"))

write_csv(shooting_data_wrangled_for_map, 
          file = here("data", "wrangled",
                      "shooting_data_wrangled_for_map.csv"))
```


# **Data Analysis and Visualization**
*** 

If you have been following along but stopped, we could load our data like so:
```{r}
load(here::here("data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
load(here::here("data", "wrangled", "shooting_data_wrangled_for_map.rda"))

```

We need the wangled data that is prepared both for the map and the data just before the last wrangling step Geometry lists with the `sf` package] to prepare for the map because we removed some events that did not have addresses that were identified by Google (had `NA` values for latitude or longitude).  We want to use data for all events for our statistics, tables, and plots.

#### {.click_to_expand_block}

<details> <summary> If you skipped the previous sections click here. </summary>

First you need to install the `OCSdata` package:

```{r, eval=FALSE}
install.packages("OCSdata")
```

Then, you may download the wrangled data `.rda` files like so:

```{r, eval=FALSE}
# library(OCSdata)
wrangled_rda("ocs-bp-school-shootings-dashboard", outpath = getwd())
# load(here::here("OCSdata", "data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
# load(here::here("OCSdata", "data", "wrangled", "shooting_data_wrangled_for_map.rda"))
```

To load the downloaded data into your environment, you may double click on each of the `.rda` files in Rstudio or using the `load()` function.

If the package does not work for you, two RDA files (stands for R data) of the data can be found [here](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/tree/master/data/wrangled) or slightly more directly [here](https://raw.githubusercontent.com/opencasestudies/ocs-bp-school-shootings-dashboard/master/data/wrangled/shooting_data_wrangled_for_map.rda) and [here](https://raw.githubusercontent.com/opencasestudies/ocs-bp-school-shootings-dashboard/master/data/wrangled/shooting_data_wrangled_pre_map.rda). Download these files and then place them in your current working directory. We recommend using an RStudio project and the [`here` package](https://github.com/jennybc/here_here) to navigate to your files more easily. 

We have put these files in a directory called "wrangled" within a directory called "data" within our working directory (which has a .Rproj file).

```{r}
load(here::here("data", "wrangled", "shooting_data_wrangled_pre_map.rda"))
load(here::here("data", "wrangled", "shooting_data_wrangled_for_map.rda"))

```

<hr style="height:1px;border:none;color:#333;background-color:#333;" />
<details> <summary> Click here to see more about creating new projects in RStudio. </summary>

You can create a project by going to the File menu of RStudio like so:


```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "New_project.png"))
```

You can also do so by clicking the project button:

```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "project_button.png"))
```

See [here](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) to learn more about using RStudio projects. 

</details>
<hr style="height:1px;border:none;color:#333;background-color:#333;" />
</details>

####

Luckily, our data is already in pretty good shape, but we want to make our data more useful for our dashboard. 

Let's shorten the name of the data that was wrangled up to the last step for the map. We will use `shooting_data`.

```{r}
shooting_data <-shooting_data_wrangled_pre_map
```

We will also rename the data that is wrangled for the map to a shorter name:

```{r}
shooting_data_for_map <- shooting_data_wrangled_for_map
```

Let's double check that our data is expected:

```{r}
dim(shooting_data)
dim(shooting_data_for_map)
```

Great, looks like we indeed have more rows in our `shooting_data` as we would expect. 

There are several elements we would like to include in our dashboard. 

One thing we would like is an interactive table.

## **Interactive Table**
***
We can do this using the `datatable()` function from the `DT` package.

```{r, eval = FALSE}
DT::datatable(shooting_data)
```
This creates a searchable table and the order in which the data is displayed can be toggled to change for each variable.

However, we have many variables or columns in our dataset, so this can be overwhelming. Instead of displaying all of the variables, let's choose only some of the most interesting to display in our dashboard.

```{r}
DT_table <- 
  shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)

DT::datatable(DT_table)
```

Next, we will make some data visualizations. 


## **Yearly Shootings**
***

We would like to create a plot of the number of school shootings per year.

To do this, we will count the number of school shootings per year using the `count()` function from the `dplyr` package. We specify that we want to count the unique values of the `Date_year` variable and name the new column `Shootings`.


```{r}
shootings_per_year <-
  shooting_data %>%
  count(Date_year, name = "Shootings")

shootings_per_year
```

Good, this looks as expected.

Now to make a plot of this data we will use the `ggplot2` package.

#### {.click_to_expand_block}

<details><summary> Click here for an introduction to `ggplot2`. </summary>

The [ggplot2 package](http://ggplot2.tidyverse.org) is generally intuitive for beginners because it is based on a  [grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html) or the `gg` in `ggplot2`. 
The idea is that you can construct many sentences by learning just a few nouns, adjectives, and verbs. There are specific “words” that we will need to learn and once we do, you will be able to create (or “write”) hundreds of different plots.

The critical part to making graphics using `ggplot2` is the data needs to be in a _tidy_ format. 
Given that we have just spent time putting our data in _tidy_ format, we are primed to take advantage of all that `ggplot2` has to offer! 

We will show how it is easy to pipe _tidy_ data (output) as input to other functions that create plots. 
This all works because we are working 
within the _tidyverse_. 

**What is the `ggplot()` function?** 
As explained by Hadley Wickham:

> The grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinates system.

`ggplot2` Terminology: 

- **ggplot** - the main function where you specify the dataset and variables to plot (this is where we define the `x` and
`y` variable names)
- **geoms** - geometric objects
    - e.g. `geom_point()`, `geom_bar()`, `geom_line()`, `geom_histogram()`
- **aes** - aesthetics
    - shape, transparency, color, fill, line types
- **scales** - define how your data will be plotted
    - continuous, discrete, log, etc

The function `aes()` is an aesthetic mapping function inside the `ggplot()` object. 
We use this function to specify plot attributes (e.g. `x` and `y` variable names) that will not change as we add more layers.  

Anything that goes in the `ggplot()` object becomes a global setting. 
From there, we use the `geom` objects to add more layers to the base `ggplot()` object. 
These will define what we are interested in illustrating using the data.

</details>

####

***
For more of an introduction on creating plots with `ggplot2` , see this [case study](https://opencasestudies.github.io/ocs-bp-co2-emissions/)

***

First, we start with the `ggplot()` function from the `ggplot2` package.

This function requires that the aesthetics `aes()` be specified. This involves choosing what variable will be plotted on the x-axis and the y-axis. 

```{r}
shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings))
```

Using the `ggplot()` function alone will create an empty plot area. To make our plot not empty, we need to select one of the `geom_*` functions of the `ggplot2` package to specify what type of plot we want to create.

Assuming the `ggplot2` library is loaded, type `geom` into the RStudio console and you will see many options to scroll through.

Here, we use a `geom_col()` plot, which is a particular type of bar plot that uses the actual values to plot, rather than counts, which is the default of `geom_bar()`. We will specify with the `fill` argument, that we want our bars to be filled with the color black.

```{r}
shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings)) +
    geom_col(fill = "black")
```

We also modify the x-axis using the `scale_x_continuous()` function. This function allows for specification of the range or limits of the axis using the `limits` argument. We can use the base `seq()` function to create a sequence of numbers for each tick mark.

We can add labels to our plot using the `labs()` function from `ggplot2`. This has arguments such as `x` and `y` for the axes and `title` and `subtitle` for titles. We can use `NULL` to remove a label. For example to remove the x-axis label we can use `x = NULL`

We will also modify the overall aesthetics of the plot using a `theme_*` function. See [here](https://ggplot2.tidyverse.org/reference/ggtheme.html) for a list of options.

```{r}
start <- 1970
end <- 2020

shootings_per_year_p <- 
  shootings_per_year %>%
    ggplot(aes(x = Date_year, y = Shootings)) +
      geom_col(fill = "black") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      theme_minimal() +
      labs(title = "Yearly School Shootings",
           subtitle = "United States",
           x = NULL,
           y = "School Shootings")

shootings_per_year_p 
```

## **Yearly Deaths**
***

Let's make a similar plot for the number of deaths

```{r}
deaths_per_year<-
  shooting_data %>% 
  group_by(Date_year) %>%
  summarize(Deaths = sum(`Killed (includes shooter)`))

deaths_per_year_p <- 
  deaths_per_year %>%
    ggplot(aes(x = Date_year, y = Deaths)) +
      geom_col(fill = "black") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      theme_minimal() +
      labs(title = "Yearly Deaths Attributable to School Shootings",
           subtitle = "United States",
           x = NULL)

deaths_per_year_p
```

**Note**: When using the `summarize()` function, we don't need to use the `mutate()` function here.

Next, for the purposes of the dashboard, we actually want to create just one plot that shows both the number of school shootings per year and the number of deaths.

We can do so by combining our `shootings_per_year` and  `deaths_per_year` tibbles together and making what is called a faceted plot, using the `facet_wrap()` function to create two plots next to one another.

To combine our data we will use the `full_join()` function from the `dplyr` package. This maintains all values from both tibbles.

To do so we will be making our table "longer", meaning that it will have fewer columns and more rows. 
See [here](https://en.wikipedia.org/wiki/Wide_and_narrow_data) for more information about different table formats, typically referred to as wide and long or sometimes narrow.

We will use the `pivot_longer()` function from the `tidyr` package to change the shape of our table. 

There are 3 main arguments in this function:   

1. `cols` - which specifies what columns to collapse  
2. `names_to` - which specifies the name of the new column that will be created that will contain the column names of the columns you are collapsing  
3. `values_to` - which specifies the name of the new column that will be created that will contain the values from the columns you are collapsing 

To specify that we want to collapse all the columns that have year values, we can choose all those except the `Date_year` variable by using the `-` negative operator. 

```{r}
per_year <- 
  full_join(shootings_per_year, deaths_per_year)

per_year %<>%
  pivot_longer(cols = -Date_year, 
               values_to = "events", 
               names_to = "id")

per_year
```

Hmmm, we see the data type of the `id` column is a character (`<chr>`). Let's convert it to a factor, so that the order in which `Shootings` and `Deaths` appear is the order in which they appear first rather than by alphabetical order (which is default).

Using the `fct_inorder()` function from the `forcats` package, we can easily reorder the `id` variable`. 

```{r}
per_year %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year
```

Now since we the new variable for the names is called `id` we will use this as the variable to create the facet like so: `facet_wrap(~id)`. We can also specify that we want both plots to have their own y-axis with the `scales = "free"` argument. This causes each to have the y-axis automatically scaled for the data in each plot. We can then use the  `scale_y_continuous()` function to set both of the y-axes to be the same. 

```{r}
per_year %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col() +
    facet_wrap(~id, scales = "free") +
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                       labels = seq(0, 120, by = 30),
                       limits = c(0, 121))+
    theme_minimal() +
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = "Number of events",
         x = "Year")+
    scale_fill_manual(values = c("black", "black"))+
    theme(legend.position = "none", 
          legend.title = element_blank(),
          axis.text.x = element_text(angle = 90),
          strip.background =element_rect(fill="cornflowerblue"),
          strip.text = element_text(colour = 'white', face = "bold", size = 14))
```

Next, we can modify the plot further so that it is more obvious what each plot is showing. We can update the names of the y-axis for each plot by changing the `strip.position` argument of the `facet_wrap()` function to be placed on the left rather than above. Currently it is the label in blue that says what the value of the `id` variable is for each plot. This also requires some modification of the `theme()` function to place the `strip.text` outside the plot area and to remove the background.Furthermore, we also change the text using the `labeller` argument of the `facet_wrap()` function. The `as_labeller()` function from the `ggplot2` package can change out the `id` values for other text like in the following code:

```{r}
per_year %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col() +
    facet_wrap(~id, 
               scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left") +
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                       labels = seq(0, 120, by = 30),
                       limits = c(0, 121))+
    theme_minimal() +
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year")+
    scale_fill_manual(values = c("black", "black"))+
    theme(legend.position = "none", 
          legend.title = element_blank(),
          axis.text.x = element_text(angle = 90),
          strip.background = element_blank(),
          strip.placement = "outside",
          strip.text = element_text(face = "bold", size = 14))
```

Good,  Now this is much easier to interpret.

Our last step in this section is to save the style settings of this plot as theme so we can reuse it for future plots. To do this, we use the base `function()` function:

```{r}
theme_dashboard <- function(){ 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))
}
```



## **Yearly Cumulative Shootings**
***

Now let's make another plot of the cumulative deaths each year including those of the previous years. In this case we can use the `shootings_per_year` object that we previously made.

```{r}
shootings_per_year
```

However, we want to add a new variable using the `mutate` function called `n_cum_sum` by using the `cumsum()` function to calculate a cumulative sum based on the yearly count. 

```{r}
shootings_per_year_cum <- shootings_per_year %>%
    mutate(Shootings = cumsum(Shootings))

deaths_per_year_cum <- deaths_per_year %>%
    mutate(Deaths = cumsum(Deaths))

shootings_per_year_cum
```

Next, we join these tables together

```{r}
per_year_cum <- 
  full_join(shootings_per_year_cum, deaths_per_year_cum)

per_year_cum %<>% 
  pivot_longer(cols = c(Shootings,Deaths ), 
               values_to = "events", 
               names_to = "id")

per_year_cum
```

Good, this looks like we would expect.

Now let's make a plot like we did before:

```{r}
per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
    ggplot(aes(x = Date_year, y = events, fill = id)) +
      geom_col() +
      facet_wrap(~id, scales = "free", 
                 labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                          Deaths = "Deaths (# of people)")), 
                 strip.position = "left") +
      scale_x_continuous(breaks = seq(start, end, by = 5),
                         labels = seq(start, end, by = 5),
                         limits = c(start-1, end+1)) +
      scale_y_continuous(breaks = seq(0, 1500, by = 500),
                         labels = seq(0, 1500, by = 500),
                         limits = c(0, 1500)) +
      theme_minimal() +
      labs(title = "Cumulative Yearly Shootings and Deaths\nAttributable to School Shootings",
           subtitle = "United States",
           y = NULL,
           x = "Year") +
      scale_fill_manual(values = c("black", "black")) +
      theme_dashboard()
```

**Note**: the limits for the y-axis were determined by first plotting without the `scale_y_continuous()` function.



## **Deaths per Shooting**
***

Next, we will make a plot of the number of deaths per school shooting based on the `Killed (includes shooter)` variable. 

#### {.recall_code_question_block}

<b><u> Question Opportunity </u></b>

See if you can come up with the code for the plot.

***

<details> <summary> Click here to reveal the answer. </summary>


```{r}
deaths_per_event <-
  shooting_data %>%
  group_by(`Killed (includes shooter)`) %>%
  count() %>%
  ungroup()

per_shooting_plot <-deaths_per_event %>%
  ggplot(aes(y = `Killed (includes shooter)`, x = n)) +
    geom_col(fill = "black")+
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "School Shootings",
         y = "")
```

</details>

***

####

```{r}
per_shooting_plot 
```

This plot could also have been made using `geom_bar()` instead of `geom_col()` this makes a similar plot but automatically uses the count for one of the axes, thus it is not required to first summarize the data using the `count()` function. 


```{r}
shooting_data %>%
  ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_bar(fill = "black") +
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "School Shootings",
         y = "")
```



Because of the skewed distribution, it is difficult to see the school shootings that had more numerous deaths, so we will add a facet that zooms in on this portion of the plot. We can do so, using the `facet_zoom()` function from the `ggforce` package.

```{r}
shooting_data %>%
  ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_bar(fill = "black") +
    scale_x_continuous(breaks = seq(0, max(pull(shooting_data, 
                                    `Killed (includes shooter)`)), by = 1),
                       labels = seq(0, max(pull(shooting_data,
                                    `Killed (includes shooter)`)), by = 1)) +
    ggforce::facet_zoom(xlim = c(4, max(pull(shooting_data, 
                                    `Killed (includes shooter)`))), 
                        ylim = c(0,20)) +
    theme_minimal() +
    labs(title = "Deaths per School Shooting",
         subtitle = "United States",
         x = "Deaths per shooting",
         y = "Number of events with given number of deaths") +
  theme(axis.text.x = element_text(angle = 90))
```

It is still difficult to see. Let's try some other options.

The `geom_freqpoly()` function creates a graph that makes it very easy to see that most school shootings result in zero or one death and that the maximum number of deaths in this data for a single event is in the upper twenties. 

```{r}
shooting_data %>%
    ggplot(aes(x = `Killed (includes shooter)`)) +
    geom_freqpoly()
```

This really shows that most school shooting events luckily result in no deaths, but what are the actual proportions of school shootings that end in 0 deaths, 1 death, 2 deaths, etc. 
One way to look at this is to calculate the percentage of events that resulted in each number of deaths. 
We can do this by dividing the number of events by the overall sum of events and multiplying by 100. 
The base `round()` function can round this value to the nearest 1 decimal place by specifying that we want 1 digit after the decimal with `digits = 1`.

```{r}
deaths_perc_event <-
  shooting_data %>%
  count(`Killed (includes shooter)`) %>%
  rename("num_events"= n) %>%
  mutate(percent = round(num_events/sum(num_events)*100, digits =1))

deaths_perc_event

deaths_perc_event %>%
  ggplot(aes(x =`Killed (includes shooter)`, y = percent)) +
    geom_col()
```

We can see that greater than 60% of the events had no deaths. It is however, the plot is still unsatisfactory because there is such a long tail.

Next, we can try collapsing the events that resulted in 4 or more deaths together and create a pie chart which you are likely familiar with as well as alternative plot called a waffle plot.

First to collapse the percentage for the events that had 4 or more deaths, we need to do a bit of wrangling.

We will start with filtering the data to only these events and then we will sum each of the columns using the base R function `colSums()` with the goal of creating a new row in the `deaths_perc_event` object that will contain information about all events with 4 or more deaths. We will use the `>=` greater than or equal to operator.

```{r}
greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

greater_than4
```
Good, now we know the overall percentage for the events that unfortunately resulted in more than 4 deaths. 

Next, we combine this with the rest of our data using the `bind_rows()` function from the `dplyr` package which appends a tibble to another.

```{r, echo = FALSE, outwidth = "40%"}
knitr::include_graphics(here::here("img", "bindrows.png"))
```

##### [[source]](https://rstudio.com/resources/cheatsheets/)


```{r}
deaths_perc_event %<>%
  bind_rows(greater_than4)

deaths_perc_event 
```

Next, we add a new variable so that it is easy to plot and interpret the number of deaths for each percentage. 

We will add the word "deaths" to each value in the `Killed (includes shooter)` variable using the base `paste0()` function. Note that this function automatically will result in no space or any other character between pasted elements. The `paste()` function can alternatively be used for those cases. 

```{r}
deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, 
                           " deaths ", "(", percent, "%)")) 

deaths_perc_event
```

We can change the value for the last row about the events that resulted in more than 4 deaths. 

We can use the `last()` function from the `dplyr` package combined with the `pull()` function to specifically grab this value.

```{r}
last(pull(deaths_perc_event, category))
```

Using the `case_when()` function, we can change this value:

```{r}
deaths_perc_event %<>% 
  mutate(category =
           case_when(category == last(pull(deaths_perc_event, category)) ~ 
                        paste0("4+ deaths ", "(", percent, "%)"),
                     category == "1 deaths" ~ "1 death",
                     TRUE ~ category))

deaths_perc_event
```

#### {.think_question_block}

<b><u> Question Opportunity </u></b>

We could of used the`str_replace()` function from the `stringr` package to replace the value for the last row. This function would directly change the value of "85 deaths" to "4+deaths", but this would not be as reproducible. Why is that? 

***

<details> <summary> Click here to reveal the answer. </summary>

Say we used this code again after the data got updated. Then there may be more deaths in this category and therefore this value would no longer be "85 deaths". Instead, by using `case_when()`, we can use an expression for the last value of the `deaths_perc_event` tibble and replace that, regardless of what the value is, with "4+deaths". Recall that `case_when()` replaces all other values that are not specified with `NA`. We do not want to lose the other values for the `category` variable. So to avoid this, we assign each of the values that are not the last value or the `"1 deaths"` value to what they currently are for the `category` variable, using `TRUE ~ category` (Note that all remaining unassigned values are indicated as `TRUE`).

We could also actually type out the percentage of 4+death cases, but it is always more reproducible to instead use an expression that will evaluate to the value we want. This way if we were to update our data with additional school shooting events, this evaluation would also update.

</details>

***

####


OK, this looks as we hoped. OK, now we are ready to make plots. 

Let's start with the pie chart. Historically, this has become a bit controversial type of plot. However, it can be very useful when you are actually looking at percentages and the goal is to see major trends in the data, such as all the groups are roughly equal or one group is particularly larger than the rest. When this is the case and you are presenting the data to an audience that is less familiar with data science, they may expect to see a pie chart. Thus it is useful to know how to make one. However, in most other cases pie charts do a poor job at allowing us to see more subtle differences, and they are particularly confusing when we are not looking at proportions, but raw counts. In those cases it is better to use a bar chart as we have already done. 

There is no `geom_*` function that allows you to create a pie chart directly. Instead we will create our bar plot as we have and then use the `coord_polar()` function to wrap our y axis into a circular shape.

```{r}
deaths_perc_event %>%
  filter(percent>0.5) %>%
  ggplot(aes(x = "", y = percent, fill = category)) +
      # adding color here adds a black outline
    geom_col(color = "black") +
    coord_polar("y", start = 0) +
    scale_y_continuous(breaks= NULL) +
    theme_minimal() +
    theme(axis.title = element_blank()) +
    scale_fill_viridis_d() +
    labs(title = "Percentages of school shooting deaths\n(including the shooter)")
```

This is actually a fairly easy plot to interpret. We can see that most events resulted in zero deaths and that the next largest proportion resulted in one death, while a sizable but small proportion resulted in two deaths. A very small proportion resulted in three or four or more deaths.

We also can create a waffle plot. This plot offers one advantage over the pie chart, in that it also allows for easier interpretation of more subtle proportion differences while also showing big picture differences in efficient manner. 

First, we filter for only the data that we want to plot. We only want the 0,1,2,3, or 4+ categories. We can do so by using the `str_detect()` function from the `stringr` package. This allows us to find the values that match multiple patterns. The patterns are separated by the `|` or operator. Thus any value matching any of the patterns should be kept. Notice that the `\\` is necessary before the `+` so that is not interpreted as a mathematical plus sign. 

The `waffle()` function requires that the data be in wide format. Thus we need to use `pivot_wider()` of the `tidyr` package to do so. This is very similar to the `pivot_longer()` function, however in this case we need to specify what existing column contains the names for the new columns using `names_from` and what existing column contains the values for the new columns using `values_from`. 

```{r}
deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                     values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title="Deaths Per School Shooting", 
                 xlab="1 square ~ 1%") +
  scale_fill_viridis_d()
```

### Percentages

We are also interested in including statistics in our dashboard. For example, we are interested in how many shooters committed or attempted suicide.

We previously converted variables with `yes` or `no` answers because they were inconsistently coded as `yes`/ `y` and `no`/`n`.  Furthermore, logical variables are easier to work with in terms of performing calculations because `TRUE` values are treated like a `1` while `FALSE` values are treated like a `0`. 

We can calculate the percentage of shooters that committed or attempted suicide out of all entries that have data for this information. Thus we do not want to include `NA` values in the calculation, otherwise this might give us a distorted picture of the truth.

Let's take a look at the data for this variable:

```{r}
shooting_data %>% 
  count(`Suicide (or attempted suicide) by Shooter (Y/N)`)
```

We can see that there are 45 `NA` values. 

If we calculate a sum of the `TRUE` values, (which are those that are equivalent to `1`), we can do so by just summing this variable, which is equivalent to summing values that are greater than `0`. 

```{r}
sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`), 
    na.rm = TRUE)
sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`) > 0, 
    na.rm = TRUE)
```

In contrast, `FALSE` values are those that are equivalent to `0`. Thus if we want to divide by the sum of all values that are `FALSE` are `TRUE`, then we can sum all values greater than or equal to `0`.

```{r}
sum(pull(shooting_data, 
         `Suicide (or attempted suicide) by Shooter (Y/N)`) >= 0, 
    na.rm = TRUE)
```

Thus, we can calculate the percentage of all reporting values like so, where the `TRUE` values are divided by the sum of all `TRUE` and `FALSE` values: (We also multiply by 100 using `*100` to get the percentage value.) 

```{r}
suicide <- 
  (sum(pull(shooting_data,`Suicide (or attempted suicide) by Shooter (Y/N)`), na.rm = TRUE) /
   sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE))*100

suicide
```

We can use the `round()` function to round this value and the `format()` to make sure that the value has the correct number of digits. 

```{r}
suicide <- round(suicide, 2)
suicide
```

If after rounding we wanted zeros after the decimal so that the number of digits after the decimal was consistent for the different statistics we were displaying, we could use the `format()` function to specify this. 

So we can add a zero after  `13.4` like so:

```{r}
format(suicide, nsmall = 2)
```

To calculate the percentage of school shootings where this information was reported we can do the following, by calculating all values that are not `NA` using `>=0` and calculating the number all possible values using the base `length()` function.

```{r}
reporting_suic <- 
  (sum(pull(shooting_data, 
            `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, 
       na.rm = TRUE) /
   length(pull(shooting_data, 
               `Suicide (or attempted suicide) by Shooter (Y/N)`))
   )*100

reporting_suic <- round(reporting_suic, 1)
reporting_suic
```

We can see that 97% of the school shootings have information about this variable.

It is important to check and report this percentage so that people can better understand if our percentages are reliable. If only 2% of the school shootings had this information and in all cases of the 2% the school shootings involved a suicide (or attempt), then this would lead people to believe that 100% of school shootings involve a shooter suicide (or attempt). This would clearly be misleading!  In our case the majority of the school shootings included this information, so we will indeed report the percentage and we will also let people know how much of the school shooting data had this information.


#### {.think_question_block}
<b><u> Question Opportunity </u></b>

Now try to perform variations of these calculations to calculate other statistics for our dashboard, such as the percentage of the shooters that were male or the percentage of events where a single handgun was used, (hint the `Firearm Type` value will be `Handgun`).

***

**Shooter Was Male**

***

<details> <summary> Click here to reveal the code. </summary>

```{r}


gender <- paste(as.character(round(100 * (sum(
    case_when(pull(shooting_data,`Shooter Gender`) == "Male" ~ TRUE,
                                                        TRUE ~ FALSE),
                                      na.rm = TRUE)
    /
      sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_male <- (sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Shooter Gender`)))*100
reporting_male <- round(reporting_male, 1)


gender
reporting_male

```

</details>

***

**Use of a Single Handgun**

***

<details> <summary> Click here to reveal the code. </summary>


```{r}
handgun <-paste(as.character(round(100 *(sum(case_when(
      pull(shooting_data,`Firearm Type`) == "Handgun" ~ TRUE,
                                                 TRUE ~ FALSE), na.rm = TRUE)
    /
      sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_gun <- (sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Firearm Type`)))*100
reporting_gun <- round(reporting_gun, 1)

handgun
reporting_gun
```
</details>

***

####


# **Dashboard Basics**
***

We are now ready to build our dashboard!

Let's introduce some basics about creating dashboards in R in the `flexdashboard` package.

Note that you can also start the case study at this point, we will let you know how to get the data that you need.

## **Dashboard packages**
***

To make our dashboard we will use three very useful packages:

1. [flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)

Flexdashboard is a package that was created by RStudio and [released](https://blog.rstudio.com/2016/05/17/flexdashboard-easy-interactive-dashboards-for-r/) in May of 2016. This package allows for users to more easily create dashboards using [R Markdown](http://rmarkdown.rstudio.com/). 

See [here](https://rstudio.com/resources/webinars/introducing-flexdashboards/) for a video about flexdashboard and [here](https://rmarkdown.rstudio.com/flexdashboard/) for a more information on how to use this package.

2. [leaflet](https://rstudio.github.io/leaflet/)

[Leaflet](https://leafletjs.com/) is the leading open-source JavaScript library for interactive maps and is used by many websites. The [leaflet](https://rstudio.github.io/leaflet/) R package allows for users to more easily integrate leaflet maps in R, to create maps like the one below. We will use this package to create a map of where school shootings have occurred in the US.

Here is an example of an interactive map made with `leaflet`

```{r, echo=FALSE}
library(maps)
mapStates = map("state", fill = TRUE, plot = FALSE)
leaflet(data = mapStates) %>% addTiles() %>%
  addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)
```

3. [shiny](https://shiny.rstudio.com/)

[Shiny](https://shiny.rstudio.com/) is an R package that makes it easier to create interactive web applications in R. See [here](https://shiny.rstudio.com/gallery/) for a gallery of examples. People have created a variety of diverse applications using this package- from [interactive websites](https://shiny.rstudio.com/gallery/real-estate-investment.html) to [games](https://shiny.rstudio.com/gallery/hex-memory.html).

Here is an screenshot of a `shiny` app.

```{r, echo = FALSE, out.width= "60%"}
knitr::include_graphics(here::here("img", "game.png"))
```

##### [[source]](https://shiny.rstudio.com/gallery/hex-memory.html)

See [here](https://rmarkdown.rstudio.com/flexdashboard/using.html#components) for a list of other packages that are useful for adding elements to dashboards created with the `flexdashboard` package.

***

## **R Markdown**
***

The case study that you are reading right now was created using an [R Markdown document](https://rmarkdown.rstudio.com/). This means that it is a document that uses the `Markdown` language syntax with enhanced capabilities of executing R code in the document. 

In fact, if you click the button that says "code" on the upper right corner at the top of the HTML you will download the [R Markdown](https://rmarkdown.rstudio.com/articles_intro.html#:~:text=R%20Markdown%20is%20a%20file,code%2C%20like%20the%20document%20below.) document for this case study. 

[R Markdown (Rmd)](https://rmarkdown.rstudio.com/articles_intro.html#:~:text=R%20Markdown%20is%20a%20file,code%2C%20like%20the%20document%20below.) is a file format that contains Markdown syntax and embedded R code (it can also incorporate code from some other languages like [Python](https://en.wikipedia.org/wiki/Python_(programming_language)) and [SQL](https://en.wikipedia.org/wiki/SQL)).

```{r, echo = FALSE}
library(vembedr)
embed_url("https://vimeo.com/178485416") %>%
  div(class = "vembedr") %>%
  div(align = "center")
```

##### [[source]](https://rmarkdown.rstudio.com/lesson-1.html) 

#### {.click_to_expand_block}

<details> <summary> Click here to see how this video was embedded in this R Markdown. </summary>

This video was included using the `vembedr` package. Videos on [Vimeo](https://vimeo.com/) or [YouTube](https://youtube.com/) can be added like so, where a url is added within quotation marks and the following two lines of code allow for the video to be centered in the R Markdown output. See [here]([vembedr](https://github.com/ijlyttle/vembedr)  to learn more about embedding videos with this package.

```{r, eval = FALSE}
library(vembedr)
embed_url("https://vimeo.com/178485416") %>%
  div(class = "vembedr") %>%
  div(align = "center")
```

</details>

####

These Rmd files can be rendered into a variety of file outputs like PDF, word, HTML etc. by the  [`knitr`](https://yihui.org/knitr/) and [`rmarkdown`](https://cran.r-project.org/web/packages/rmarkdown/rmarkdown.pdf) packages.

This relies on conversion of the Rmd file into the [Markdown](https://en.wikipedia.org/wiki/Markdown) language by software called [Pandoc](https://en.wikipedia.org/wiki/Pandoc).

[Markdown](https://en.wikipedia.org/wiki/Markdown) (which has been implemented by many languages, such as [Perl](https://en.wikipedia.org/wiki/Perl), [Java](https://en.wikipedia.org/wiki/Java_(programming_language)), [Python](https://en.wikipedia.org/wiki/Python_(programming_language)), [C#](https://en.wikipedia.org/wiki/C_Sharp_(programming_language)), [Ruby](https://en.wikipedia.org/wiki/Ruby_(programming_language)), etc.) is a language of a particular class of programming languages called [lightweight markup languages (LML)](https://en.wikipedia.org/wiki/Lightweight_markup_language). 

LMLs have relatively simple and intuitive syntax, and are therefore relatively easy to write and read and are converted by software into some type of less human-friendly language to create an output document like a PDF or an HTML file. In fact, multiple output files can be created from the same LML file!

In our case we are interested in rendering our Rmd document into a website. The code in our R Markdown document will be interpreted and converted ultimately into HTML code.

Although LMLs tend to be quite similar, here you can see some of the differences in syntax:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "syntax.png"))
```

##### [[source]](https://en.wikipedia.org/wiki/Lightweight_markup_language)

See this [book](https://bookdown.org/yihui/rmarkdown/) for more information on working with R Markdown files. 

The RStudio [cheatsheet for R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and this [tutorial](https://ourcodingclub.github.io/tutorials/rmarkdown/) are great for getting started. 

## **Flexdashboard**
***

There are several important features about the R Markdown language that the `flexdashboard` package leverages. 

These features are used to specify the layout and elements of the dashboard.

Here are some major R Markdown features to keep in mind for `flexdashboard`:

1. The beginning of an R Markdown document is what is called the [YAML](https://en.wikipedia.org/wiki/YAML) header. This is delineated by `---` three dash marks before and after the header YAML code.

Like so:
```{r, echo = FALSE, out.width = "30%"}
knitr::include_graphics(here::here("img", "yaml.png"))
```

##### [[source]](https://ourcodingclub.github.io/tutorials/rmarkdown/)

[YAML](https://en.wikipedia.org/wiki/YAML) is yet another language, but unlike Markdown it is a data-oriented language and is often used for the [configuration](https://en.wikipedia.org/wiki/Configuration_file) of software or to set up how a software program should work.

Whatever code you put in the YAML header will influence the rest of the document and essentially set up how the R Markdown document will render. In the example above, the type of output is specified.

Other more complicated features can be included. For example, we can specify that we are creating a dashboard with `flexdashboard` and we can specify how we want the layout of our dashboard to be displayed like so:

```{r, echo = FALSE, out.width = "30%"}
knitr::include_graphics(here::here("img", "yaml_dashboard.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)

We will describe this in more detail soon.

2. To add a page to a navigation bar (also called a navbar) the following syntax is used `=======`. The number of dashes does not matter. (This is a level 1 header in Markdown, just like `#`)

3. To add columns or rows the following syntax is used `---------`. By default this notation will create new columns, however if the YAML is modified to specify to create rows, than this same syntax will be used to create rows. The number of dashes does not matter. (This is a level 2 header in Markdown, just like `##`)

4. Components within the dashboard are delineated by using `###` - if you are familiar with Markdown notation, this is a level 3 Markdown header.

If this includes text like so: `### text`, this adds header text to the component, however this is not required. 


5. To  include a plot or any output from R, use the following syntax:
`"```{r}"` on it's own line followed by your code, followed by `"```"`. This creates what is called a code chunk.

```{r, echo = FALSE, out.width = "30%"}
knitr::include_graphics(here::here("img", "code_chunk.png"))
```

##### [[source]](https://ourcodingclub.github.io/tutorials/rmarkdown/)

6. Another component of `flexdashboard` is value boxes. These are essentially text boxes for statistics or text that you might like to feature or emphasize. To do this again the `###` syntax is used to put a text label describing what the value box contains followed by a code chunk that uses the `valueBox()` function from the `flexdasboard` package. The value to display is specified using the `value` argument, as well as optional other aspects using additional arguments, such as the color of the value box using the `color` argument like the example below:

```
### ValueBoxText

'''{r}
valueBox(value = 10
  color = "white")

'''
```

**Note**: in our examples of code we will use `"'''"` instead of `"```"`. This is only to allow for easy viewing of examples. All code chunks require `"```"`.

Here you can see a more thorough example which includes icons:
```{r, echo = FALSE, out.width = "60%"}
knitr::include_graphics(here::here("img", "valuebox.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/using.html#value_boxes)

7. Instead of value boxes you can also include a slight variation called a gauge. These are created with the `guage()` function from the `flexdashboard` package. This requires numeric values for a `value`, a `min`, and a `max` argument. Optionally, a symbol can also be added with the `symbol` argument. The value argument does not have to be explicitly called though, which is also true of the `valueBox()` function.

Here is a simple example:

```
### GuageText

'''{r}
flexdashboard::gauge(value = 10, 
                       min = 0, 
                       max = 100, 
                    symbol = "%")

'''
```

This creates the following output:
```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "gauge_output.png"))
```

Here is a more complicated example:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "gauge.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/using.html#value_boxes)

## **Layout**
***

### **Adding Columns**
***

To add multiple columns the following syntax is used `---------` for each  column and nothing additional is required in the header.

Additional features about the columns, such as the width can be specified using brackets`{}`like in the example below. Note that the word `Column` isn't necessary. In this example two columns are created that will be oriented next to one another and elements within the columns will be placed top to bottom. 

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "columns.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)


### **Adding Rows**
***

To add multiple rows - the yaml needs to state that the orientation is for rows instead of for columns (see the image below),  and then the same syntax is used `---------`  for each row instead of columns. In this example, two rows are created that will be oriented on top of one another and elements within the rows will be placed next to each other.

Again the word `Row` is not actually necessary.

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "rows.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)

See [here](https://rmarkdown.rstudio.com/flexdashboard/layouts.html) for template options.


### **Tabs**
***

To add tabs columns/rows we can use the following: 

```
Column {.tabset}
```

In this example, two columns are created and then two tabs are added to the second column.

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "tab.png"))
```

##### [[source]](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)

### **Shiny**
***

Interactive elements can be added to dashboards. In our dashboard, we will use packages such as `DT` and `leaflet` that have shiny functionality. This requires that shiny is enabled in the YAML header by including `runtime:shiny` in the YAML.

Here is an example of a YAML that includes this:

```{r, echo = FALSE, out.width = "30%"}
knitr::include_graphics(here::here("img", "shiny_yml.png"))
```

***

## **Deployment**
***

You have a few options to publish your dashboard:  

1) If your dashboard is not interactive (does not use shiny) or uses certain widgets like the `datatable()` function from the `DT` package, you just need to knit your R Markdown file into an html file.

```{r, echo = FALSE, out.width="50%", fig.align="center"}
knitr::include_graphics(here::here("img", "knit.png"))
```

Then you can host this on GitHub if you choose by changing the GitHub Pages settings of your repository:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "githubpages.png"))
```

2) If your dashboard is interactive (uses shiny), you can host it on  https://www.shinyapps.io/ after making an account. To do this you need to install the `rsconnect` package and after you have made an account and configured it, you can use the publish button of the RStudio IDE which looks like this on the upper right corner:

```{r, echo = FALSE, out.width="40%", fig.align="center"}
knitr::include_graphics(here::here("img", "publish.png"))
```

Note that this also requires authentication using tokens. 

See this [link](https://shiny.rstudio.com/articles/shinyapps.html) for a getting start for this process.

3) You can also publish using [RStudio Connect](https://rstudio.com/products/connect/). This also involves creating an account and pushing the publish button. 


#### {.recall_code_question_block}
<b><u> Question Opportunity </u></b>

Let's take a minute to test your knowledge about `flexdashboard` basics:  

1) How do we create multiple pages?  
2) How do we create multiple columns?  
3) How do we create multiple tabs?  
4) How do we start creating a dashboard?  
5) How do we enable our dashboard to be interactive?  

***
<details> <summary> Click here to reveal the answers. </summary>

1) How do we create multiple pages? We use the `===` syntax.
2) How do we create multiple columns? We use the `---` syntax.
3) How do we create multiple tabs? We use `{.tabset}` syntax combined with the column break `---` syntax. 
4) How do we start creating a dashboard? We create an R Markdown document and we add `output: flexdashboard::flexdashboard` to the YAML.
5) How do we enable our dashboard to be interactive?  We add `runtime:shiny` to the YAML.

</details>

***

####



# **Our Dashboard**
***

OK! Now that we know a bit about the basics of creating a dashboard, let's create our own.

The link to the dashboard described in this section is located [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/).

We want to create a dashboard that has several tabs that will look like this:

```{r, echo = FALSE, fig.link = "https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/"}
knitr::include_graphics(here::here("img", "dashboard_school.png"))
```

## **Getting started**
***

The first thing we need to do to create our dashboard is to create a new .Rmd document like so in R Studio:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "newrmd.png"))
```


## **YAML header**
***

Next we need to update the YAML header to look like this:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "our_yaml.png"))
```


As you might expect, `title:` indicates the title of our dashboard.

The `output:` line specifies what type of output we want the .Rmd file to be rendered.

We need to include `flexdashboard::flex_dashboard:` as the output to create a dashboard with the `flexdashboard` package. This can be included on the same line as `output:` or on the next line with a preceding tab.

**Note**: that YAML is sensitive to spacing, thus this tab is required to get the proper output.

The next four lines are arguments for how the dashboard should be created.

1. `logo:` allows you to include a logo on top of your dashboard. With this theme this will be in the upper left corner. The logo we chose to use came from [here](https://iconarchive.com/), but you could theoretically use any png of appropriate size.

2. `theme:` allows you to specify how the dashboard will look in general. Note that this can be used to modify the general look of any type of R Markdown output, not just dashboards created with `flexdashboard`. See [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/) for a list of options. In our case, the theme is called readable and will create documents that look like this:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "readable.png"))
```

##### [[source]](https://www.datadreaming.org/post/r-markdown-theme-gallery/)

3. `orientation:` the options are `columns` or `rows` and specifies if the `--------` syntax creates rows or columns for the layout. This is not necessary if the option is `columns`.

4. `source_code:` specifies if a URL will be included as a navigation bar item with access to the source code. 

5. `vertical_layout:` The options are `fill` or `scroll`. Fill causes the charts to re-size to fill the page, while the scroll option renders plots as their natural height which may or may not require scrolling the page.

There are many other argument options for how the dashboard is displayed.

You can run the following command in the console to see more information about the arguments in the help pane of the R Studio [IDE](https://en.wikipedia.org/wiki/Integrated_development_environment).

```{r, eval=FALSE}
?flexdashboard::flex_dashboard()
```

Also see the CRAN [documentation](https://cran.r-project.org/web/packages/flexdashboard/flexdashboard.pdf) for more details.

## **Loading the packages and data**
***

Since we are creating our dashboard in a new Rmd file, we need to load the necessary packages and the wrangled data that we created in this Rmd file. In that Rmd file, it looks something like this. 

**Note**: all the following code would be **added to the Rmd file for the dashboard** and are simply shown here for illustrative purposes.

```{r,eval = FALSE}
library(here)
library(readr)
library(dplyr)
library(flexdashboard)
library(shiny)
library(magrittr)
library(forcats)
library(stringr)
library(waffle)
library(tidyr)
library(poliscidata)
library(leaflet)
library(htmltools)
library(DT)
```

For more information about what these packages were used for, see the beginning of this case study and the [Helpful Links] section. The data can be found and downloaded from our GitHub repository at this [link](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/tree/master/data/wrangled). In our case we saved this to a subdirectory called `wrangled` within a directory called `data` of our working directory. We recommend using RStudio projects and the `here` package to make navigating to files easy and reproducible.

```{r, eval = FALSE}
shooting_data <- 
  read_csv(here("data", "wrangled",
                "shooting_data_wrangled_pre_map.csv"))

shooting_data_for_map <- 
  read_csv(here("data", "wrangled",
                "shooting_data_wrangled_for_map.csv"))
```


## **Creating pages**
***

Recall that `===` is used to designate elements that are part of the navigation bar.

We want 7 items besides the source code (which was added automatically based on the YAML code).

First, we create 7 divisions for these main pages. We add icons to each from [Font Awesome](https://fontawesome.com/icons?d=gallery).

Use this [link](https://fontawesome.com/icons?d=gallery) to find other icon options. If you click on the "start using this icon" button it will take you to a page with HTML code like this:  

```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "fontawesome.png"))
```

#### [[source]](https://fontawesome.com/icons/database?style=solid)

Only the `fa-database` portion is required in the brackets after `data-icon=` to add the icon to the navigation bar.

```

About {data-icon="fa-question-circle"}
====================================

The Data {data-icon="fa-database"}
===================================== 

US Statistics {data-icon="fa-flag"}
=====================================

State Statistics {data-icon=fa-flag-checkered}
====================================

Map {data-icon="fa-map"}
====================================

Tutorial {.storyboard data-icon="fa-list-ol"}
====================================

Hotline {data-icon="fa-exclamation-triangle"}
====================================

```

## **The About Page**
***

Here, we create content in the About page.

### **Look**
***

This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "aboutpagelook.png"))
```

### **Overall Structure**
***

Here is the overall structure for this page:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "about_page_structure.png"))
```

### **Details**
***
***
<details> <summary> Click here if you would like to see all of the code for this page. </summary>

On this page we will have two columns - one which will be wider than the other.  Size specifications on `flexdashboard` are unit-less; the width of any column included on a page is a function of the width set for a column against the sum of widths for all columns on that page. If we set columns sizes of 600 and 300 on a page with two columns, one column will be twice as large as the other column. We want the left column to be quite a bit larger than the right, so we will set the left as `70` and the right as `30`.


#### {.recall_code_question_block}
<b><u> Question Opportunity </u></b>

Can you recall how we would make these columns?

***
<details> <summary> Click here to reveal the code. </summary>



```
About {data-icon="fa-question-circle"}
===================================== 

Column {data-width = 70}
-------------------------------------

###

Column {data-width = 30}
-------------------------------------

###

```

</details>

***

####

Recall that `###` is used to add elements to columns and rows. Note that there is no text next to the `###` syntax that designates an element of our dashboard. In the previous examples, a header was used like so `### header`:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "tab.png"))
```

[[source](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)]


We do not actually want a header now, so we can simply use `###` without any text following it. Note that you can get away with not using the `###`, but some elements will not render properly. 

Next, we add a block of text describing the dashboard to the first column and we will add an image to the second column like the following. Notice that two asterisks `**` around text makes them appear as bold and one `*` makes it appear as italic. See [this RStudio cheatsheet](https://rmarkdown.rstudio.com/lesson-15.html) for some basic Markdown syntax for stylizing text:

```{r, echo = FALSE}
knitr::include_graphics(here::here("img", "markdownsyntax.png"))
```

[[source](https://rmarkdown.rstudio.com/lesson-15.html)] 


This is what the code for this page looks like (notice that there is an internal link to the `Tutrial` page):

```
About {data-icon="fa-question-circle"}
===================================== 

Column {data-width=70}
-------------------------------------

### 

**What is the purpose of this dashboard?**

This dashboard has two purposes:

1. To illustrate trends in school shooting events in the United States
2. To demonstrate how to create a dashboard using `R`

**The data**

This dashboard uses data from the open-source [K-12 Shool Shooting Database](https://www.chds.us/ssdb/dataset/) downloaded from the [Center for Homeland Defense and Security](https://www.chds.us/c/) at the at the [Naval Postgraduate School(NPS)](https://en.wikipedia.org/wiki/Naval_Postgraduate_School). This data was downloaded June of 2020.

<style>
div.green { background-color:#8FBC8F; border-radius: 5px; padding: 20px; font-size: 1em;color: white;}
</style>
<div class = "green">
Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, [www.chds.us/ssdb](www.chds.us/ssdb).
</div>


  
This database includes information about school shooting events for students in grades K-12 in the United States dating back to 1970. The database has additional information not shown on our dashboard including, but not limited to: location of the event at the school, source for the shooting information, shooter characteristics, and victim characteristics. 

### 


<u>**Want to learn how to create a dashboard just like this?**</u>

Visit the [*Tutorial*](#tutorial) page of this dashboard to first learn the basics about building a dashboard with the `flexdashboard` package.

At the end of the tutorial we provide a link to this [supplementary resource by the Open Case Studies project](https://opencasestudies.github.io/ocs-bp-school-shootings-dashboard/), which provides more detailed information about how ***this dashboard*** was created.

<style>
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px; font-size: .8 em;}
</style>
<div class = "blue">

 **Acknowledgements**

This was created as part of the [Open Case Studies](https://opencasestudies.github.io){target="_blank"} project. We would like to acknowledge the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/) for funding this work. 

 **Disclaimer**

This dashboard uses data from the [K-12 Shool Shooting Database](https://www.chds.us/ssdb/about/). We acknowledge (like their website) that there may be reporting errors. The trends and statistics shown do not account for the many other factors that may influence the occurrence of shooting events. The dashboard should not be used in the context of making policy decisions without external consultation from scientific experts. 


 **License**

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 [(CC BY-NC 3.0)](https://creativecommons.org/licenses/by-nc/3.0/us/){target="_blank"} United States License.
</div>

Column {data-width=30}
-------------------------------------

###


'''{r, echo=FALSE, fig.cap="[Photograph by Nathan Dumlao](https://unsplash.com/photos/xPHmmVKS8lM)"}
knitr::include_graphics(here::here("img", "nathan-dumlao-xPHmmVKS8lM-unsplash.jpg"))
'''
```

Note that we will use `"'''"` for to show code chunks of the actual code from the dashboard.

The image used in this second column is from a website called unsplash (https://unsplash.com/) which hosts images for free use but includes information about the photographer if you chose to credit them. A short link for this image was found by clicking on it and then clicking the share button.

Notice the `echo = FALSE` specification for the code chunk which causes the code to be evaluated but but not shown, while `fig.cap` adds the figure caption.

The image is included using the `include_graphics()` function from the `knitr` package. We need to specify where this image is located for this to work. You can do this without specifying a path if the image file is in the same directory as your `.Rmd` file that you are using to create your dashboard. However using the `here()` function from the `here` package we can organize our files a bit. This function will automatically start the path wherever we have included an RStudio project file, this can be done in RStudio like so:

```{r, echo=FALSE, out.width="60%"}
knitr::include_graphics(here::here("img" , "project.png"))
```

If you are new to using RStudio projects, please see this [link](https://r4ds.had.co.nz/workflow-projects.html) for more information.

Then if we create a directory or folder called `img` and place our image files in this directory, then we can specify the full path to this file on our computer, by just using `here::here("img", "name_of_image.png")`. The `include_graphics()` function works for a variety image file types. 

```{r, echo=FALSE, out.width="60%"}
knitr::include_graphics(here::here("img" , "newdir.png"))
```

Also you may have noticed the  `<style>` html code to add a blue and green background to portions of the text. 

The text that we want altered with this particular style is delineated by the `<div>` to start and the `</div>` to end the style. 

Let's take a look at the first one to explain what is happening here:

```
<style>
div.green { background-color:#8FBC8F; border-radius: 5px; padding: 20px; font-size: 1em; color: white;}
</style>
<div class = "green">
Riedman, David, and Desmond O’Neill. “CHDS – K-12 School Shooting Database.” Center for Homeland Defense and Security, June 2020, [www.chds.us/ssdb](www.chds.us/ssdb).
</div>
```

The instructions for the style are within the `<style>` and `</style>` content dividers. Inside these dividers is [CSS](https://developer.mozilla.org/en-US/docs/Web/CSS) code, which is what is used to stylize HTML. The `div.green` is the name of this particular style which involves a particular background color (#8FBC8F - see [here](https://www.w3schools.com/cssref/css_colors.asp) for more options), with a boarder radius of 5 pixel to round the edges of the background color around the text with a size 5 pixel radius. The code also states that a [padding](https://developer.mozilla.org/en-US/docs/Web/CSS/padding) specification for the size of the margins of the text box around the text and it specifies that font should be of 1 [em units](https://www.w3.org/Style/Examples/007/units.en.html) (which stands for element - thus 1 unit relative to the size of the element)  and that the font should be white.  

The `div.green ` specifies that `green` is the name of this style, thus we can then use `<div class = green>` (called a [CSS selector](https://developer.mozilla.org/en-US/docs/Glossary/CSS_Selector)) to style the text this way. This can then be used again any time we want this style like so:

```
<div class = "green">

text 

</div>

```

See this [website](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/div) to learn more about HTML and CSS.

</details>

***

## **The Data Page (Interactive)**
***

Let's create a page about the data that we are using.

### **Look**
***


This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "The_Data_Page.png"))
```

### **Overall Structure**
***

Here is the overall structure for this page:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "Data_page_structure.png"))
```

### **Details**
***

***
<details> <summary> Click here to see the code for this page. </summary>
 
To create the structure for this page that will display the data, we have two columns, with the first one (on the left) wider than the other. Again we have a block of text in the column on the left like so:

```
The Data {data-icon="fa-database"}
===================================== 


Column {data-width=70}
-------------------------------------

###

The data used in this dashboard is from the [**Center for Homeland Defense and Security (CHDS)**](Center for Homeland Defense and Security (CHDS)) [**K-12 Shool Shooting Database**](https://www.chds.us/ssdb/about/). 

Their methods for identifying and authenticating incidents are outlined [here](https://www.chds.us/ssdb/methods/).

According to their website: 

*"The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. **All of the information is based on open-source information and 3rd party reporting... and may include reporting errors.**"*

***


Column {data-width=30}
-------------------------------------

###

```

Now we will add our `DT_table` to the first column. First, we need to include the code that we previously used to create the `DT_table` in our dashboard `.Rmd` file:

```{r, eval = FALSE}
DT_table <- shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)
```

We then include some code to render this interactive table in our dashboard. Since we have `shiny` enabled in our YAML header, we can use the `renderDataTable()` of the `DT` package to produce the output we desire. 

We also want to use the `options` argument to specify how the data is rendered. The `scroller = TRUE` argument adds a scroll bar to the table, the `scrollY` argument specifies that the scroll bar should be for they Y axis direction (up and down) of the table and specifies how large the scroller should be, the `pageLength` argument specifies how many rows should be displayed simultaneously within the table, and the `autoWidth = TRUE` argument specifies that the table should fit the space of the column or page it is within. 

We will also add a caption with a link to the original data using the `tags()` and `withTags()` functions of the `htmltools` package. Different options for types of tags can be selected using the `$`.

```{r, echo = FALSE, out.width="60%"}
knitr::include_graphics(here::here("img", "tags.png"))
```

```{r, eval=FALSE}
DT::renderDataTable({
  DT::datatable(DT_table,
                caption = htmltools::tags$caption(
                  style = 'caption-side: top; text-align: Left;',
                  htmltools::withTags(
                    div(HTML('<a href="https://www.chds.us/ssdb/about/)">Click here to be redirected to a page where this data can be downloaded.</a>')))),
                  options = list(autoWidth = TRUE,
                                 pageLength = 10,
                                 scroller = TRUE,
                                 scrollY = '450px'))
})
```

We will also add another image to the column on the right, overall the code looks like this:

```

The Data {data-icon="fa-database"}
===================================== 

Column {data-width=70}
-------------------------------------

###

The data used in this dashboard is from the [**Center for Homeland Defense and Security (CHDS)**](Center for Homeland Defense and Security (CHDS)) [**K-12 Shool Shooting Database**](https://www.chds.us/ssdb/about/). 

Their methods for identifying and authenticating incidents are outlined [here](https://www.chds.us/ssdb/methods/).

Previously, according to their website: 

*"The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. **All of the information is based on open-source information and 3rd party reporting... and may include reporting errors.**"*

***

'''{r, echo=FALSE}
# Create the DT table first
DT_table <- shooting_data %>%
  dplyr::select(Date,
                School,
                City,
                State,
                `Killed (includes shooter)`,
                `Narrative (Detailed Summary/ Background)`) %>%
  rename("Deaths" = `Killed (includes shooter)`) %>%
  rename("Narrative" = `Narrative (Detailed Summary/ Background)`)
# Instead of depending on the st_jitter algorithm to generate random placement, a custom function placing the points side by side at a set distance could be used to make points occuring at the same location appear neatly apart.
'''

'''{r, echo=FALSE}
DT::renderDataTable({
  DT::datatable(DT_table,
                caption = htmltools::tags$caption(
                  style = 'caption-side: top; text-align: Left;',
                  htmltools::withTags(
                    div(HTML('<a href="https://www.chds.us/ssdb/about/)">Click here to be redirected to a page where this data can be downloaded.</a>')))),
                options = list(autoWidth = TRUE,
                               pageLength = 10,
                               scroller = TRUE,
                               scrollY = '450px'))
})
'''

Column {data-width=30}
-------------------------------------

###

'''{r, echo=FALSE, fig.cap="[Photograph by Rubén Rodriguez](https://unsplash.com/photos/IXTvnOOSTyU)"}
knitr::include_graphics(here::here("img", "ruben-rodriguez-IXTvnOOSTyU-unsplash.jpg"))
'''
```

</details>

***

## **The US Statistics Page**
***

Let's create a page for **US Statistics** we would like to share. 

### **Look**
***
This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "US_Statistics_page.png"))
```

### **Overall Structure**
***

Here is the overall structure for this page which uses a tab layout:

```{r, echo=FALSE, out.width="90%"}
knitr::include_graphics(here::here("img" , "US_stats_page_overview.png"))
```



### **Details**
***

***
<details> <summary> Click here to see the code for this page. </summary>

Here we use the `.tabset` and `.tabset-fade` options specified for our first column. 

```
US Statistics {data-icon="fa-flag"}
===================================== 


Column {data-width=70 .tabset .tabset-fade}
-------------------------------------
```

After having specified the `.tabset` and `.tabset-fade` options, we can create new tabs in the same way we would add elements to our dashboard with the `###` syntax. Just like in this example:

```{r, echo=FALSE, out.width="60%"}
knitr::include_graphics(here::here("img" , "tab.png"))
```

[[source](https://rmarkdown.rstudio.com/flexdashboard/layouts.html)]


Let's make a tab for yearly school shooting events and deaths, a tab for cumulative school shooting events and deaths, and a tab about the number of deaths per school shooting. In each tab, we will include the code for the plots that we have previously created. 

```
US Statistics {data-icon="fa-flag"}
===================================== 

Column {data-width=700 .tabset .tabset-fade}
-------------------------------------

### Yearly Deaths and Shootings

'''{r}

start <- 1970
end <- 2020

shootings_per_year<- shooting_data %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

deaths_per_year<-shooting_data %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))


per_year<-full_join(shootings_per_year, deaths_per_year)
per_year %<>%pivot_longer( cols = (-Date_year), 
                           values_to = "events", 
                           names_to = "id")

per_year%<>% 
  mutate(id = forcats::fct_inorder(id))

per_year %>%
    ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    scale_y_continuous(breaks = seq(0, 120, by = 30),
                 labels = seq(0, 120, by = 30),
                 limits = c(0, 121))+
    theme_minimal() +
   labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year")+
  scale_fill_manual(values = c("black", "black"))+
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))


theme_dashboard <- function(){ 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        #title = element_text(face = "bold", size = 16),
        axis.text.x = element_text(angle = 90, face = "bold"),
        axis.title.x = element_text(face = "bold", size = 14),
        strip.background = element_blank(),
        strip.placement = "outside",
        strip.text = element_text(face = "bold", size = 14))
}
'''

### Yearly Cumulative Deaths and Shootings

'''{r}
shootings_per_year_cum <- 
    shootings_per_year %>%
    mutate(Shootings = cumsum(Shootings))

deaths_per_year_cum <- 
    deaths_per_year %>%
    mutate(Deaths = cumsum(Deaths))

per_year_cum <- full_join(shootings_per_year_cum, deaths_per_year_cum)

per_year_cum %<>% 
  pivot_longer(cols = c(Shootings, Deaths ), 
               values_to = "events", 
               names_to = "id")

per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
  ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (cumulative # of events)", 
                                        Deaths = "Deaths(cumulative # of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                       labels = seq(start, end, by = 5),
                       limits = c(start-1, end+1)) +
    scale_fill_manual(values = c("black", "black")) +
    theme_minimal() +
    labs(title = "Cumulative Yearly Shootings and Deaths Attributable to\nSchool Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year") +
    theme_dashboard() 
'''

### Deaths Per Shooting

'''{r}
deaths_perc_event <- 
   shooting_data %>%
   count(`Killed (includes shooter)`) %>%
   rename("num_events"= n) %>%
   mutate(percent = round(num_events/sum(num_events)*100, digits =1))

greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

deaths_perc_event %<>% bind_rows(greater_than4)

deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, " deaths ", "\n(", percent, "%)")) 

deaths_perc_event %<>% 
  mutate(category = case_when(
    category ==  last(pull(deaths_perc_event, category)) ~ paste0("4+ deaths ", "\n(", percent, "%)"),
    category == "1 deaths" ~ "1 death",
    TRUE ~ category))

deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                    values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title = "Deaths Per School Shooting", 
       xlab="1 square ~ 1%")+  scale_fill_viridis_d()

'''
```

In the second column, we will include what are called value boxes to contain statistics that will remain static as the user moves through the tabs of the first column.

```
Column {data-width=30}
------------------------------------- 
```

We want to display some important statistics, such as:

+ Total number of people wounded in a school shooting   
+ Total number of deaths from a school shooting  
+ Median number of shots fired  
+ Percentage of school shootings where the shooter was the only victim  
+ Percentage of school shootings where a single handgun was used  
+ Percentage of school shootings where the shooter was male

To create a value box we will use the `valueBox()` function from the `flexdashboard` package. The text for the the value box is specified by the text following the `###` syntax.

There are a few arguments to be aware of for this function:

1. `value` - this is the value to be displayed in the box - this usually a number, but might be text
2. `caption` - if desired, you can add text to be displayed under the value but keep in mind that you will also include text with the `###` syntax
3. `icon` - if you would like to add an icon you can specify it like so: `icon = fa-flag`
4. `color` - this changes the color of the box
5. `href` - if you would like to add a URL link you can do so with this argument

We can create a value box for the total number of people wounded as follows, where we use the base `sum()` function to calculate the sum of all the values for the `Wounded` variable which was extracted using the `pull()` function from the `dplyr` package. We need to remove `NA` values to be able to calculate the sum and we can do this using the `na.rm = TRUE` argument. 


```
Column {data-width=300}
------------------------------------- 


### **Total Wounded**
    
'''{r}
valueBox(value = sum(pull(shooting_data, Wounded), na.rm = TRUE),
         color = "white")
'''
    
### **Total Deaths**

'''{r}
valueBox(value = sum(pull(
  shooting_data,`Killed (includes shooter)`), na.rm = TRUE),
         color = "white")
'''

```

To calculate the percentage of school shootings where the shooter committed or attempted suicide, we will use our calculation which was explained in the [Data Analysis and Visualization] section. The `paste0` function is used to add the percentage symbol.

```

### **Shooter committed or attempted suicide**

'''{r}

suicide <- (sum(pull(shooting_data,`Suicide (or attempted suicide) by Shooter (Y/N)`), na.rm = TRUE) /
            sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE))*100
suicide <- round(suicide, 1)

reporting_suic <- (sum(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Suicide (or attempted suicide) by Shooter (Y/N)`)))*100
reporting_suic <- round(reporting_suic, 1)

valueBox(value = paste0(suicide,"%"), 
         color = "white")
'''

```

For the value box of the percentage of school shootings where a single handgun was used was calculated by using the `case_when()` function to specify all cases where the `Firearm Type` variable was equal to `"Handgun"` as `TRUE` and all others as `FALSE`. This allows us to use the base `sum()` function as `TRUE` values will be counted as a value of `1` and `FALSE` values will be counted as a value of `0`. This sum was then divided by the total number of school shooting events by getting the length of the `Firearm Type` variable using the base `length()` function. The next value box about the gender of the shooter was calculated in a similar manner. 

```
    
### **Use of a Single Handgun**

'''{r}

handgun <-paste(as.character(round(100 *(sum(case_when(
      pull(shooting_data,`Firearm Type`) == "Handgun" ~ TRUE,
                                                 TRUE ~ FALSE), na.rm = TRUE)
    /
      sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_gun <- (sum(pull(shooting_data, `Firearm Type`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Firearm Type`)))*100
reporting_gun <- round(reporting_gun, 1)


valueBox(value = handgun,
  color = "white")

'''

### **Shooter Was Male**
'''{r}


gender <- paste(as.character(round(100 * (sum(
    case_when(pull(shooting_data,`Shooter Gender`) == "Male" ~ TRUE,
                                                        TRUE ~ FALSE),
                                      na.rm = TRUE)
    /
      sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)),
    1)), "%")

reporting_male <- (sum(pull(shooting_data, `Shooter Gender`)>=0, na.rm = TRUE)/
              length(pull(shooting_data, `Shooter Gender`)))*100
reporting_male <- round(reporting_male, 1)


valueBox(value = paste(gender),
  color = "white")
'''
```

Additional text about the reporting rate for these statistics was added using the `###` syntax. Additionally inline code is evaluated using the notation `"`r `"` Again notice that `"'"` was used instead of `"`"` just for illustrative purposes to allow this R Markdown document to render the code from the dashboard file.

```

###

reporting rate of shooter suicide = 'r reporting_suic'%,  
reporting rate of gun type = 'r reporting_gun'%,  
reporting rate of shooter gender = 'r reporting_male'%

```

</details> 

***

## **The State Statistics Page (Interactive)**
***

Let's create a page for **State Statistics** we would like to share. Importantly this page allows for the user to choose what state to look at.

### **Look**
***

This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "State_Statistics_page.png"))
```

### **Overall Structure**
***


Here is the overall structure for this page:

```{r, echo=FALSE, out.width="90%"}
knitr::include_graphics(here::here("img" , "states_stats_page_overview.png"))
```

**Note**: the other value Boxes are not included in this image. You can see that the `renderPlot()` function is used for plots and the `renderValueBox()` function is used for value boxes. 

### **Details**
***

On this page we want the user to be able to select data for a specific state and render plots and get statistics just for the selected state. To do this we will utilize the `renderPlot()` and `renderValueBox()` functions of the `flexdashboard` package, as well as the `selectInput()` function from the `shiny` package. See this [website](https://rmarkdown.rstudio.com/flexdashboard/shiny.html) for more information on using `shiny` to create interactive dashboards with `flexdashboard`.

***
<details> <summary> Click here to see the code for this page. </summary>

The first thing we need to do to allow this page to be interactive is to add `runtime: shiny` to the YAML header at the top of the R Markdown file.

The next thing we want to do is add the `{.sidebar}` attribute to the first column of this page. This allows us to use `shiny` input functions in this column.

Then, we use the `selectInput()` function to create a menu for the user to interact with and add it to this column.

Finally, we use the `renderPlot()` function  and `renderValueBox()` function to use the input from the user to render plots and value boxes based on their input.

The `selectInput()` function allows us to provide the user with a pull down menu of options for states. The main arguments for this function are:

1. `inputId` - this is what the selection will be called in subsequent code
2. `label` - this is what the user sees above the pull down menu
3. `choices` - this is a list of options for the menu
4. `selected` - this causes a particular option to be the default choice

This is placed in a column on the far left side that is more narrow than the others. 

```
State Statistics {data-icon=fa-flag-checkered}
===================================== 

Column {.sidebar data-width=250}
-----------------------------------------------------------------------

Note that the statistics shown do not account for other possibly influential state specific features like population density or gun laws among others.


'''{r}
  
selectInput(inputId = "state_selected", 
            label = "Select a state to explore:",
            choices = shooting_data %>% 
            pull(State) %>% 
            unique() %>%
            sort(), selected = "Alabama")

#  Washington, D.C. gets excluded by this
'''
```

Note that we used the `unique()` function to select only unique values of the `State` variable of the `shooting_data` tibble. The `sort()` function was used to put the options in alphabetical order.

In the next column, we have our plots like we did on the last page. Again we will use `tabset`. However, the difference here is that we need to include the `renderPlot()` function around all of our code for each plot and we need to use the data that the user selected. 

This will automatically be in a data object called `input` and it will be within a variable called `state_selected"` based on what we used for the `inputID` in the `select_Input()` function (this requires the base R way of selecting a specific variable using the `$`). 

Notice that the `renderPlot()` function requires that the code be within brackets `{}`. The data is filtered first for just the state that was selected. The code for the plots is essentially the same with minor modifications to allow for all unique cases that the different states present. For example the `deaths_perc_event %<>%filter (!duplicated(category))` is added to the last plot about the number of deaths per school shooting to avoid duplication of the rows in cases like Colorado where the there is only one event that had 4 or more deaths (because in the other cases this value is a sum of all school shooting with 4 or more deaths). 

It's always good to check as many possible input values as possible to make sure that your plot shows up as you expect!

```
Column {data-width=750 .tabset .tabset-fade}
-----------------------------------------------------------------------

### Yearly Deaths and Shootings

'''{r}
renderPlot({
shooting_data_state <- shooting_data %>% filter(State == input$state_selected)

shootings_per_year<- shooting_data_state  %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

deaths_per_year<-shooting_data_state  %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))


per_year <- full_join(shootings_per_year, deaths_per_year)
per_year %<>% pivot_longer(cols = (-Date_year), 
                           values_to = "events", 
                           names_to = "id")

per_year %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year %<>%
    ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_wrap(~id, scales = "free", 
               labeller = as_labeller(c(Shootings = "Shootings (# of events)", 
                                        Deaths = "Deaths (# of people)")), 
               strip.position = "left")+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    theme_minimal() +
  scale_fill_manual(values = c("black", "black"))+
    labs(title = "Yearly Shootings and Deaths Attributable to School Shootings",
         subtitle = "United States",
         y = NULL,
         x = "Year") +
    theme_dashboard()+
    theme(title = element_text(size = 16, face = "bold"),
          axis.text = element_text(size = 14))
})
'''

### Yearly Cumulative Deaths and Shootings

'''{r}

renderPlot({

shooting_data_state <- shooting_data %>% filter(State == input$state_selected)

shootings_per_year<- shooting_data_state  %>%
    group_by(Date_year) %>%
    count() %>%
  rename("Shootings" = n) %>%
    ungroup()

shootings_per_year_cum <- 
  shootings_per_year %>%
  mutate(Shootings = cumsum(Shootings))

deaths_per_year<-shooting_data_state  %>% 
  group_by(Date_year) %>%
  summarize(Deaths =sum(`Killed (includes shooter)`))

deaths_per_year_cum <- 
  deaths_per_year %>%
  mutate(Deaths = cumsum(Deaths))

per_year_cum <- full_join(shootings_per_year_cum, deaths_per_year_cum)


per_year_cum %<>% 
  pivot_longer(cols = c(Shootings, Deaths ), 
               values_to = "events", 
                names_to = "id")
                
per_year_cum %<>% 
  mutate(id = forcats::fct_inorder(id))

per_year_cum %>%
ggplot(aes(x = Date_year, y = events, fill =id)) +
    geom_col()+
    facet_grid(~id)+
    scale_x_continuous(breaks = seq(start, end, by = 5),
                 labels = seq(start, end, by = 5),
                 limits = c(start-1, end+1)) +
    scale_fill_manual(values = c("black", "black"))+
    theme_minimal() +
    labs(title = "Cumulative Yearly Shootings and Deaths\nAttributable to School Shootings",
         subtitle = input$state_selected,
         y = "Cumulative number of events",
         x = "Year") +
    theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text.x = element_text(angle = 90),
        strip.background = element_rect(fill="cornflowerblue"),
        strip.text = element_text(colour = 'white', face = "bold", size = 14))

})

'''

### Deaths Per Shooting

'''{r}

renderPlot({

shooting_data_state <- shooting_data %>% filter(State == input$state_selected)
library(tidyr)
deaths_perc_event <-shooting_data_state %>%
   count(`Killed (includes shooter)`) %>%
   rename("num_events"= n) %>%
     tidyr::drop_na() %>%
   mutate(percent = round(num_events/sum(num_events)*100, digits =1))

greater_than4 <- 
  deaths_perc_event %>% 
  filter(`Killed (includes shooter)` >= 4) %>% 
  colSums()

deaths_perc_event %<>% bind_rows(greater_than4)

deaths_perc_event %<>% 
  mutate(category = paste0(`Killed (includes shooter)`, " deaths ", "\n(", percent, "%)")) 

deaths_perc_event %<>% 
  mutate(category = case_when(
    category ==  last(pull(deaths_perc_event, category)) ~ paste0("4+ deaths ", "\n(", percent, "%)"),
    category == "1 deaths" ~ "1 death",
    TRUE ~ category))

deaths_perc_event %<>% 
  filter (!duplicated(category))

deaths_perc_event %>% 
  select(-`Killed (includes shooter)`) %>%
  filter(str_detect(category, "0 deaths|1 death|2 deaths|3 deaths|4\\+")) %>%
  mutate(percent = round(percent)) %>%
  select(-num_events) %>%
  tidyr::pivot_wider(names_from = category, 
                    values_from = percent) %>%
  waffle::waffle(legend_pos = "bottom", title = "Deaths Per School Shooting", 
       xlab="1 square ~ 1%")+  scale_fill_viridis_d()

})

'''
```

In the third column, the state specific statistics are displayed. Some of these are static, while others update for the state selected. To calculate some of these we will also use data form the `poliscidata` function to get the state population values in 2010. The `pop2010_hun_thou` variable is the population in terms of 100,000 people.  

```
Column {data-width=450}
-----------------------------------------------------------------------

### **Total State Deaths**

'''{r}
renderValueBox({
shooting_data_state <- shooting_data %>% filter(State == input$state_selected)


valueBox(sum(pull(shooting_data_state,`Killed (includes shooter)`), na.rm = TRUE),
         color = "white")
})
'''

### **US State Average Death Count**

'''{r}
shooting_data_state <-shooting_data %>% 
  group_by(State_abb, State) %>%
  count(na.rm = TRUE) %>%
  rename(shootings = n) %>%
  ungroup() %>%
  mutate(state_sum = sum(shootings)) %>%
  mutate(state_avg = state_sum/50)

state_data <- poliscidata::states
state_data %<>%
  select(stateid, pop2010, pop2010_hun_thou) %>%
  mutate(stateid = as.character(stateid))%>%
  mutate(stateid = str_remove_all(stateid, pattern = " "))

shooting_data_state<-left_join(shooting_data_state, state_data, by = c("State_abb" = "stateid"))

deaths_State <-shooting_data %>% 
  group_by(State) %>%
  summarize( deaths = sum(`Killed (includes shooter)`, na.rm = TRUE))

state_data <- left_join(shooting_data_state, deaths_State)

USavg <- round(mean(pull(state_data, deaths), na.rm = TRUE), 2)
valueBox(USavg, color = "white")
'''


### **State Death Rate (per 100,000 people)**

'''{r}
state_data %<>%
  mutate(percapita_deaths  = deaths/pop2010_hun_thou)

renderValueBox({
  
  shooting_data_state <- state_data %>% filter(State == input$state_selected)

  valueBox(format(round(pull(shooting_data_state, percapita_deaths), digits = 3), nsmall = 3),
         color = "white")
})


'''

### **US National Death Rate (per 100,000 people)**

'''{r}
renderValueBox({

 US_percap <-summarize(state_data, sum(deaths, na.rm = TRUE))/ (summarize(state_data,sum(pop2010, na.rm = TRUE)) /100000)
 
valueBox(value = round(US_percap, digits = 3),
         color = "white")
})
'''

### **State Shooting Rate (per 100,000 people)**

'''{r}

state_data %<>%
  mutate(percapita_shootings  = shootings/pop2010_hun_thou)

renderValueBox({
  
  shooting_data_state <- state_data %>% filter(State == input$state_selected)

valueBox(format(round(pull(shooting_data_state, percapita_shootings), digits = 3), nsmall = 3),
         color = "white")
})


'''

### **US National Shooting Rate (per 100,000 people)**

'''{r}
renderValueBox({

 US_percap <-summarize(state_data, sum(shootings, na.rm = TRUE))/ (summarize(state_data,sum(pop2010, na.rm = TRUE)) /100000)
 
valueBox(value = round(US_percap, digits = 3),
         color = "white")
})
'''

###

Per capita calculations are based on 2010 population values.
```

</details>

***

##  **The Map page (Interactive)**
***

Next, we create our map page. Previously, in the  [**Data Exploration and Wrangling**] section, we geocoded our data and modified the `latitude` and `longitude` variables so that events that occurred in the same location would have slightly different values so that they will not cover one another in our map.

To create our map, we will use the `leaflet` package which uses the [Leaflet](https://leafletjs.com/)  JavaScript library. 

### **Leaflet**
***

`Leaflet` works by provided by adding base data (such as a map) and then adding markers if desired in layers. This is very similar to how `ggplot2` functions (pun intended).

The layers displayed can be controlled using a sort of legend. Depending on the type of layers, some information may be displayed mutually exclusive of the other layers; other layers (such as circles/general markers) can be toggled on and off. 

Clustering options can also be applied to circles/markers. Some examples of this can be found on the bottom of [this website](https://rstudio.github.io/leaflet/markers.html).

The `groups` in leaflet can be thought of as layer-specific IDs that create labels for legends and allow specific layers to be referred to in separate functions. 

Thus, if we called a group "Layer 1" and then in a subsequent layer refer to "Layer 1", `leaflet` will correctly identify which layer is being referenced.

Note that `leaflet` can require a lot of computational power depending on the types of maps produced.

### **Look**
***

This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "The_Map_Page.png"))
```

### **Overall Structure**
***

The overall structure for this page is simple. There is just one column ,which will contain the map.

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "map_page_structure.png"))
```

### **Details**
***

***
<details> <summary> Click here to see the code for this page. </summary>

First, we create a smaller dataset that just includes the data that we want to use in the map. We will include the date, the name of the school and the narrative for each point as a [popup](https://rstudio.github.io/leaflet/popups.html) that will be shown when the user hovers over a point. 

We need to do this using [HTML](https://developer.mozilla.org/en-US/docs/Web/HTML) code as the `leaflet` package will ultimately render the map using this language.

We use the `paste()` function to combine these elements as well as HTML code to create line breaks and bold the name of the school.

To create line breaks in HTML, the `<br>` syntax is used. This is used to separate each part of the elements that are getting pasted together with the base `paste()` function by being specified as the separator with the `sep` argument.

To create bold font in HTML, the text is surrounded by `<b>` and `</b>` like so: `<b> Bold text </b>`. Thus only the school name is in bold.

Finally, the `<div>` and `</div>` are content dividers in HTML.  They separate the individual school shooting event information sections that will be plotted on the map. The first divider can also take information about the style of the output. This uses [CSS](https://developer.mozilla.org/en-US/docs/Web/CSS) code, which is what is used to stylize HTML. 

The code here states that the height of the text box for each event should have a height that is proportional to the text, that the [height of each line](https://developer.mozilla.org/en-US/docs/Web/CSS/line-height) should be of 1 [em units](https://www.w3.org/Style/Examples/007/units.en.html) (em stands for element). Hence, 1 unit relative to the size of the element. Therefore gaps between lines are the same height as the lines of text. The `overflow:visible` code specifies what to do in case the text box text is too large - in this case users can scroll (see [here](https://developer.mozilla.org/en-US/docs/Web/CSS/overflow) for more options), and the [padding](https://developer.mozilla.org/en-US/docs/Web/CSS/padding) specification sets the size of the margins of the text box around the text.  

See this [website](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/div) to learn more about HTML code.

```
Map {data-icon="fa-map"}
===================================== 
Column
------------------------------------- 
    
### 

This map shows where school shootings took place in the United States between January 1970 to June 2020 according to the the open-source [Center for Homeland Defense and Security](https://www.chds.us/c/) (CHDS) [K-12 School Shooting Database](https://www.chds.us/ssdb/dataset/). Click the circles for more information.

    
'''{r}
# specify the popups

shooting_information0 <- paste('<div style="height:auto;line-height:1em;overflow:scroll;padding:1em">',
                              shooting_data_geocoded$Date,
                              "<b>",
                              shooting_data_geocoded$School,
                              "</b>",
                              shooting_data_geocoded$`Narrative (Detailed Summary/ Background)`,
                              "</div>",
                              sep = "<br>")


'''
```

The next bit of code then uses this data and the `shooting_data_geocded` to actually create the map!

The `leaflet()` function from the `leaflet` package creates a Leaflet map [widget](https://en.wikipedia.org/wiki/Web_widget) using the  [`htmlwidgets`](https://cran.r-project.org/web/packages/htmlwidgets/index.html) package, which allows the map to be rendered as an application within HTML websites.

This first line of code starts the process of making the widget, but just like the `ggplot()` function from `ggplot2` it creates an empty map and layers need to be added.

The `addProviderTiles()` function from the `leaflet()` package does just that, by adding the map background. We will add three different kinds of map backgrounds. See [here](http://leaflet-extras.github.io/leaflet-providers/preview/index.html) for all the options of providers which create a variety of distinct backgrounds and then the `group` argument names each of these layers to be referred to later. The last layer added will be the one shown by default.

At this point we still only have a map in general. Now we need to add the data about school shooting events. 

To do this, we add markers to the plot using the `addCircleMarkers()` function. This function takes many different arguments. See details about them [here](https://cran.r-project.org/web/packages/leaflet/leaflet.pdf). 

Importantly, we need to specify what variables in our provided data `shooting_data_geocoded` contains the longitude values (`lng`) and the latitude values (`lat`). 

We will also use the following arguments:  

- `radius` - argument specifies how large the circles for the points will be  
- `color` - argument specifies the color of the individual points  
- `fillOpacity` - argument allows for the filling of the points to a bit translucent if set below 1  
- `clusterOptions` - argument can be used to cluster points together into larger circles  
- `group` - argument specifies what the points should be called in the legend and what this layer should be referred to as for later use  

We also add a mini map using the `addMiniMap()` function, which can be useful to see where you are on the map. The type of plot style to use for the mini map is specified with the `tiles` argument and the `toggleDisplay` argument allows for the user to remove this feature.

Importantly, the `addLayersControl()` function  allows users to toggle between different backgrounds and markers. In our case we have three different background layers which are referred to as `baseGroups` and we have one `overlayGroups` which is our circle markers for school shooting events. The group names for these need to be identified to allow users to toggle between them. 

The `set_view()` function allows for the starting position and zoom to be modified. This allows us to center the map around the continental US.

```
'''{r}
leaflet(shooting_data_for_map) %>%
  addProviderTiles(provider = providers$OpenStreetMap, group = "OpenStreetMap") %>%
  addProviderTiles(provider = providers$Esri.WorldImagery, group = "ESRI World Imagery") %>%
  addProviderTiles(provider = providers$Stamen.TonerLite, group = "Toner")%>%
  addCircleMarkers(popup = ~shooting_information0,
                     lng = ~longitude,
                     lat = ~latitude,
     radius = 5,
     color = "red",
     fillOpacity = 0.2,
     clusterOptions = markerClusterOptions(),
     group = "Circles") %>%
  addMiniMap(tiles = providers$Stamen.Toner,
              toggleDisplay = TRUE) %>%
  addLayersControl(
     baseGroups = c("Toner Lite",
                    "OpenStreetMap",
                    "ESRI World Imagery"),
     overlayGroups = c("Circles")) %>%
   setView(lng = -98.35, lat = 39.5, zoom = 4)
'''
```

</details>
 
***
 

## **The Tutorial Page**
***

Here, we create a **Tutorial** page that links to this case study. This provides a simple overview of how we created the dashboard. 

### **Look**
***

This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "the_Tutorial_Page.png"))
```


### **Overall Structure**
***

To create this page we will use a special layout called a [storyboard](https://en.wikipedia.org/wiki/Storyboard). Story boards are used in many other fields, but the idea is that there are multiple images in a sequence. To create our storyboard page with `flexdashboard` we will use `{.stroyboard}` next to the page name. Each page name will be specified using this syntax: `###`. 

Here you see the top part of the overall structure:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "tutorialStructure.png"))
```

### **Details**
***

The code for this page is similar to the other pages, except for the story board structure. 

***
<details> <summary> Click here to see the code for this page. </summary>

```
Tutorial {.storyboard data-icon="fa-list-ol"}
=========================================   

### **1)** Load the `flexdashboard` package.

Install the package (and other supporting optional packages) if you don't have them installed already.

'''{r, echo=TRUE, eval=FALSE}
install.packages("flexdashboard")
install.packages("shiny")
install.packages("leaflet")
install.packages("ggplot2")
'''

Once installed, load the package(s) into the `R` environment.

'''{r, echo=TRUE}
library(flexdashboard)
library(shiny)
library(leaflet)
library(ggplot2)
'''

This all needs to be done separately in the `R` console.

### **2)** Create an `RMD` document.   

Dashboards can be created with `flexdashboard` in the `HTML` format. 

The`flexdashboard` package uses `RMarkdown` to produce dashboards that can contain `R` output.

This makes it possible to include several mediums in dashboards such as plots created with `ggplot2` or maps created with `leaflet`.
    
### **3)** Create an appropriate `YAML`.

The use of `flexdashboard` alters the way R Markdown documents function. 

R Markdown documents can be rendered into many different outputs, one of which is a dashboard. 
The `YAML` header sets up how the document output should be created.

Here is an example of a `YAML` header that creates an `HTML` document from an R Markdown document:


---
title: "Untitled"
author: "John Smith"
date: "8/12/2020"
output: html_document
---


We used the following `YAML` for this dashboard, which importantly includes `flexdahsboard::flex_dashboard`which specifies that a dashboard should be created and `runtime:shiny` which allows for the dashboard to be interactive:

output: 
  flexdashboard::flex_dashboard:
    logo: https://icons.iconarchive.com/icons/icons8/windows-8/48/Programming-Dashboard-icon.png
    theme: readable
    orientation: columns
    source_code: embed
    vertical_layout: fill
runtime: shiny


We also introduced an icon as a logo, provided a theme with a color scheme, defined the orientation (and thus order) of coded output, added a navigation bar item to give users easy access to the code used, and  limited scrolling with the `verticle_layout: fill` option.

### **4)** Design the layout of the dashboard.

Dashboards are inherently visual, making this step the most time intensive after content creation. To goal is to present the data in a way that is both meaningful and visually appealing.

On this dashboard, we wanted to present static plots of the United States and of individual states. We also wanted to display the locations of school shootings and provide some information about school shootings. Aside from being a dashboard, we wanted to create an educational resource that was reproducible for others. Lastly, as this is a sensitive topic, we wanted to raise awareness and provide information that could help others act.

Given these goals, we decided on the following page layout:

+ About
+ The Data
+ US Statistics
+ State Statistics
+ Map
+ Tutorial
+ Get Help

The first page gives users to the opportunity to look at the data themselves. More complicated components such as the map of each incident were left alone on a single page. US and state-level statistics were separated from one another. This short tutorial on how to create the dashboard and source code were included in the dashboard with programmers at all levels in mind.

### **5)** Add content to the dashboard.

You can begin adding content to the dashboard once you have an initial layout in mind. Keep in mind that this will likely be an iterative process. 

The R Markdown file used to create a dashboard with `flexdashboard` works similarly as it does in other cases, with a few exceptions.

R code chunks can be defined like so:


'''{r, echo = TRUE}
# Code chunks can be explicitly included
'''

'''{r, echo = FALSE}

# Code chunks are hidden by default 
'''

Pages and columns within pages can be defined like so:


Page
=========================================   

Column {data-width=500}
-------------------------------------

Column {data-width=500}
-------------------------------------


### **6)** Add content to the pages and columns.

Plots and other elements can be added within columns like so:

### Plot name

'''{r}
# include plot code here
'''

Value Boxes, which are essentially text boxes, can be defined like so:


### ValueBoxText

'''{r}
valueBox(value = 10
  color = "white")



Gauges, can be defined like so:

### GaugeText

'''{r}
flexdashboard::gauge(value = 10, 
                       min = 0, 
                       max = 100, 
                    symbol = "%")

'''

####
Which will produce output like this:
'''{r, out.width= "40%", echo = FALSE}
knitr::include_graphics(here::here("img", "gauge_output.png"))
'''

### Additional Info
As mentioned before, the `flexdashboard` metadata included in the `YAML` also alters how R Markdown documents are rendered. For more on how you can leverage both the `RMarkdown` package and the `flexdashboard` package to produce a dashboard, click [here](https://rmarkdown.rstudio.com/flexdashboard/index.html).

This [supplementary resource by the Open Case Studies project](https://opencasestudies.github.io/ocs-bp-school-shootings-dashboard/) provides a case study on how to create this very dashboard in more detail.

```

</details>

***

## **The Get Help page**
***

We create a **Get Help** page to spread awareness on this important public health topic.

### **Look**
***

This is what the page will look like:

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "The_Get_Help_Page.png"))
```


### **Overall Structure**
***

```{r, echo=FALSE}
knitr::include_graphics(here::here("img" , "The_Get_Help_Page_structure.png"))
```


### **Details**
***

This page has two columns. The first column is much wider than the second. This first column includes two colored backgrounds which were created using [CSS](https://developer.mozilla.org/en-US/docs/Web/CSS) code. See [The About Page] Details section to for more details about how this works. 

The text that we want altered with this particular style is delineated by the <div> to start and the </div> to end the style.

The instructions for the style are within the <style> and </style> content dividers. 

Inside these dividers is CSS code, which is what is used to stylize HTML. The div.blue is the name of the first particular style which involves a particular background color (#e6f0ffF) (see [here](https://www.w3schools.com/cssref/css_colors.asp) for more color options), with a boarder radius of 5 pixels to round the edges of the background color around the text with a size 5 pixel radius. The code also states that a [padding](https://developer.mozilla.org/en-US/docs/Web/CSS/padding) specification for the size of the margins of the text box around the text and it specifies that font should be of 20 pixels.

The div.blue specifies that blue is the name of this style, thus we can then use <div class = blue> (called a [CSS selector](https://developer.mozilla.org/en-US/docs/Glossary/CSS_Selector)) to style the text this way. This can then be used again any time we want this style like so:

<div class = "blue">

text 

</div>

See this [website](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/div) to learn more about HTML and CSS.


We also see a list on this page, `+` signs are used to indicate new items. Importantly two spaces are necessary after each item to start a new line. 

The other unique aspect about this page are the telephone links like so `[+1-844-5-SAYNOW](tel:18445729669)`.

By using `tel:` and the number, users can click this link to directly call the telephone number from their computer or phone if their device has such capabilities. 


***
<details> <summary> Click here to see the code for this page. </summary>

```
Get Help {data-icon="fa-exclamation-triangle"}
=========================================   

Column {data-width=800}
-------------------------------------

###

**Warning Signs**

From [Sandy Hook Promise](https://www.sandyhookpromise.org/gun-violence/know-the-signs-of-gun-violence/)...

<style>
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
</style>
<div class = "blue">

Here is a list of potential warning signs that can signal an individual may be in crisis and/or need help:

+ Suddenly withdrawing from people and activities
+ Consistent bullying or intimidating others, or being bullied by others
+ Extreme mood or personality changes
+ Victim of constant social rejection
+ Talking about plans or actively making plans to harm themselves or others
+ Bringing a weapon to school – or threatening or talking about doing so
+ Bragging about or warning others about an upcoming act of violence
+ Recruiting others to join in a planned act of violence
+ Warning students to stay away from school or events
+ Expressing fascination with guns and/or school shootings
+ Expressing hopelessness about the future
+ Extreme, prolonged sadness or distress
+ Expressing or showing feelings of isolation
+ Bragging about access to guns

**This list is not a comprehensive list of warning signs nor does exhibiting one of these signs indicate imminent violence.**

According to the following article:

Flannery, D. J., Modzeleski, W. & Kretschmar, J. M. Violence and School Shootings. Curr Psychiatry Rep 15, 331 (2013). DOI: [10.1007/s11920-012-0331-6](https://doi.org/10.1007/s11920-012-0331-6)

"To date, studies of school shootings have concluded that no
consistent and reliable profile of school shooters exist, and
most researchers and clinicians would agree that predicting
violent behavior is a slippery slope that will usually result in
more false positives than false negatives."

"...most shooters were depressed, had experienced some significant
loss, felt persecuted or bullied by others, and had prior
difficulty coping or had previously tried suicide. Most of
the shooters did not, however, have a history of drug abuse
or violence or cruelty to animals, common psychiatric indicators of risk, nor did they report excessive exposure to
violence in the media (though many produced their own
violent themes in writings or drawings)."

</div>


<style>
div.red { background-color:#BC8F8F; border-radius: 5px; padding: 20px;}
</style>
<div class = "red">

According to the [National Institute of Mental Health (NIMH)](https://www.nimh.nih.gov/health/publications/teen-depression/index.shtml){target="_blank"}:

For youths who may be at risk for suicidal behavior, visit the **National Suicide Prevention Lifeline (NSPL)** website at [www.suicidepreventionlifeline.org](www.suicidepreventionlifeline.org){target="_blank"}.

Additionally, the **Crisis Text Line** is another free, confidential resource available 24 hours a day, seven days a week. Visit [www.crisistextline.org](www.crisistextline.org){target="_blank"} for more information.

Also see [here](https://www.mhanational.org/depression-teens-0){target="_blank"} for more information about how to recognize and help youths experiencing symptoms of depression and warning signs of suicide.

</div>


Column {data-width=200}
-------------------------------------

### 

**Respond to Warning Signs**

When concerned about troubling behaviors, tell a trusted adult.


Call **911** if you feel there is an immediate threat. 

Call [+1-844-5-SAYNOW](tel:18445729669) if you would like to submit an anonymous safety concern.

Text “HOME” to **741741** to text a trained crisis counselor 24 hours a day.

The **National Suicide Prevention Lifeline (NSPL)** is available 24 hours a day, every day at **[1-800-273-TALK (8255)](tel:18002738255)**. 

The deaf and hard of hearing can contact the **(NSPL)** via TTY at **[1-800-799-4889](tel:18007994889)**. All calls are confidential.
```

</details>

***

# **Summary**
*** 

## **Synopsis**
***

In this case study, we demonstrated the basics of R Markdown and how to create a dashboard with using the `flexdashboard` package. We also demonstrated how to include an interactive table with the `DT` package, how to include interactive plots using functions of the `shiny` package such as `renderPlot()`. We included interactive value boxes using the `renderValueBox()` function from the `flexdashboard` package, which works with the `shiny` package. Finally, we showed how to include interactive maps using the `leaflet` package. 

This case study also explored how to properly calculate and interpret percentages when the data has missing values. We also discussed the benefits and limiting aspects of pie charts (using the `ggplot2` package) and waffle plots (using the `waffle` package).

Overall, the dashboard we created which can be found [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/), 
shows that the number of school shootings per year has increased overtime. Further investigation is necessary to determine if this is simply due to increases in population alone or if the rate has increased due to other factors and if so, what those factors might be. It is also clear that the number of school shootings and the number of deaths per capita varies by state. There appears to be other aspects accounting for state differences. 

*Note the limitations of the dashboard in the [Limitations](https://www.opencasestudies.org/ocs-bp-school-shootings-dashboard/#Limitations) section.*

# **Suggested Homework**
*** 

Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more school shootings.


# **Additional Information**
***

## **Helpful Links**
***

The link to the dashboard described in this case study can be found  [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/).


[RStudio](https://rstudio.com/products/rstudio/features/){target="_blank"}  
[Cheatsheet on RStuido IDE](https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf){target="_blank"}  
[Other RStudio cheatsheets](https://rstudio.com/resources/cheatsheets/){target="_blank"}   
[RStudio projects](https://r4ds.had.co.nz/workflow-projects.html)

[Tidyverse](https://www.tidyverse.org/){target="_blank"}   

[Piping in R](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html){target="_blank"}   

[String manipulation cheatsheet](https://rstudio.com/resources/cheatsheets/){target="_blank"}  
[Table formats](https://en.wikipedia.org/wiki/Wide_and_narrow_data){target="_blank"}

[Geocoding](https://en.wikipedia.org/wiki/Geocoding)  
[Coordinate reference system (CRS)](https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems) [ESPG](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset)
[World Geodetic System (WGS) version 84 also called ESPG:4326 ](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84)   
[Albers equal-area conic projection](https://en.wikipedia.org/wiki/Albers_projection#:~:text=The%20Albers%20equal%2Darea%20conic,that%20uses%20two%20standard%20parallels.&text=The%20Albers%20projection%20is%20used,the%20United%20States%20Census%20Bureau.)   
[crs 102008](https://spatialreference.org/ref/esri/102008/html/)  

To learn more about geospatial coordinate systems see [here](https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf) and [here](https://guides.library.duke.edu/r-geospatial/CRS).


[`ggplot2` package](http://ggplot2.tidyverse.org){target="_blank"}    
Please see [this case study](https://opencasestudies.github.io/ocs-bp-co2-emissions/)  for more details on using `ggplot2`    
[grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html){target="_blank"}   
[`ggplot2` themes](https://ggplot2.tidyverse.org/reference/ggtheme.html){target="_blank"}   

[Motivating article for this case study about school shootings](https://link.springer.com/content/pdf/10.1007/s11920-012-0331-6.pdf)

Also see this [article](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf) to learn more about the impacts of school shootings.


[Lightweight markup languages(LML)](https://en.wikipedia.org/wiki/Lightweight_markup_language)  
[Markdown](https://en.wikipedia.org/wiki/Markdown)  
[R Markdown](http://rmarkdown.rstudio.com/)   
[`knitr`](https://yihui.org/knitr/)  
[`rmarkdown` (package)](https://cran.r-project.org/web/packages/rmarkdown/rmarkdown.pdf)

See this [book](https://bookdown.org/yihui/rmarkdown/) for more information on working with R Markdown files. 

The RStudio [cheatsheet for R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and this [tutorial](https://ourcodingclub.github.io/tutorials/rmarkdown/) are great for getting started. 

[Pandoc](https://en.wikipedia.org/wiki/Pandoc)  

[YAML](https://en.wikipedia.org/wiki/YAML)  
[Configuration](https://en.wikipedia.org/wiki/Configuration_file)  

[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)  

See [here](https://rstudio.com/resources/webinars/introducing-flexdashboards/) for a video about flexdashboard and [here](https://rmarkdown.rstudio.com/flexdashboard/) for a more information on how to use this package.   
See [here](https://rmarkdown.rstudio.com/flexdashboard/using.html#components) for a list of other packages that are useful for adding elements to dashboards created with the `flexdashboard` package.   
See [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/) for a list of R Markdown themes which can be used with `flexdashbard`.   
See [Font Awesome](https://fontawesome.com/icons?d=gallery) for icons.  

To learn more about using `shiny` with the `flexdashboard` package to create interactive dashboards, see this [tutorial](https://rmarkdown.rstudio.com/flexdashboard/shiny.html).   

[leaflet (R package)](https://rstudio.github.io/leaflet/)   
[Leaflet (JavaScript Library)](https://leafletjs.com/)   

[shiny](https://shiny.rstudio.com/)  
See [here](https://shiny.rstudio.com/gallery/) for a gallery of `shiny` examples.

See this [website](https://rstudio.github.io/shinydashboard/) to learn about a more flexible and slightly more challenging option for creating dashboards in R using a package called `shinydashboard`.


<u>**Packages used in this case study:** </u>

Package   | Use in this case study                                                                      
---------- |-------------
[here](https://github.com/jennybc/here_here){target="_blank"}       | to easily load and save data  
[readr](https://readr.tidyverse.org/) |  to import the data  as a csv file  
[googlesheets4](https://googlesheets4.tidyverse.org/) | to import directly from Google Sheets
[tibble](https://tibble.tidyverse.org/) | to create tibbles (the tidyverse version of dataframes)
[dplyr](https://dplyr.tidyverse.org/){target="_blank"}      | to filter, subset, join, add rows to, and modify the data  
[stringr](https://stringr.tidyverse.org/){target="_blank"}      | to manipulate  character strings within the data (collapsing strings together, replace values, and detect values)
[magrittr](https://magrittr.tidyverse.org/){target="_blank"}      | to pipe sequential commands 
[tidyr](https://tidyr.tidyverse.org/){target="_blank"}      | to change the shape or format of tibbles to wide and long, to drop rows with `NA` values, and to see the last few columns of a tibble
[ggmap](https://cran.r-project.org/web/packages/ggmap/ggmap.pdf) | to geocode the data (which means get the latitude and longitude values)
[sf](https://r-spatial.github.io/sf/) | to modify the geocoded data so that overlapping points did not overlap
[lubridate](https://lubridate.tidyverse.org/) | to work with the data-time data    
[DT](https://rstudio.github.io/DT/) | to create the interactive table  
[htmltools](https://www.rdocumentation.org/packages/htmltools/versions/0.5.0) | to add a caption to our interactive table 
[ggplot2](https://ggplot2.tidyverse.org/){target="_blank"}      | to create plots  
[ggforce](https://cran.r-project.org/web/packages/ggforce/ggforce.pdf)   | to create a plot zoom
[forcats](https://forcats.tidyverse.org/){target="_blank"}      | to reorder factor for plot
[waffle](https://github.com/hrbrmstr/waffle) | to make waffle proportion plots  
[poliscidata](https://cran.r-project.org/web/packages/poliscidata/poliscidata.pdf) | to get population values for the states
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)     | to create the dashboard  
[shiny](https://shiny.rstudio.com/){target="_blank"}      | to allow our dashboard to be interactive   
[leaflet](https://rstudio.github.io/leaflet/shiny.html) | to implement the [leaflet](http://leafletjs.com/) (a JavaScript library for maps) to create the map for our dashboard   
[maps](https://cran.r-project.org/web/packages/maps/maps.pdf) | to create the simple leaflet map example   
[vembedr](https://github.com/ijlyttle/vembedr) | to include a video in our case study   


#### {.emphasis_block}

**Warning Signs**

From [Sandy Hook Promise](https://www.sandyhookpromise.org/gun-violence/know-the-signs-of-gun-violence/)...

Here is a list of potential warning signs that can signal an individual may be in crisis and/or need help:

+ Suddenly withdrawing from people and activities
+ Consistent bullying or intimidating others, or being bullied by others
+ Extreme mood or personality changes
+ Victim of constant social rejection
+ Talking about plans or actively making plans to harm themselves or others
+ Bringing a weapon to school – or threatening or talking about doing so
+ Bragging about or warning others about an upcoming act of violence
+ Recruiting others to join in a planned act of violence
+ Warning students to stay away from school or events
+ Expressing fascination with guns and/or school shootings
+ Expressing hopelessness about the future
+ Extreme, prolonged sadness or distress
+ Expressing or showing feelings of isolation
+ Bragging about access to guns

**NOTE**

This list is not a comprehensive list of warning signs nor does exhibiting one of these signs indicate imminent violence.

When concerned about seeing troubling behaviors, tell a trusted adult or call 911, if there is an immediate threat.

**Respond to Warning Signs**

Call 911 if you feel there is an immediate threat. 

Call [+1-844-5-SAYNOW](tel:18445729669) if you would to submit an anonymous safety concern.


If you or your child or student experienced a shooting please see this [website](https://kidshealth.org/en/parents/ptsd.html) and this [website](https://www.verywellmind.com/shooting-ptsd-from-a-shooting-2797200) for guidance about dealing with the trauma.


####

## **Session Info**
***

```{r}
sessionInfo()
```

**Estimate of RMarkdown Compilation Time: **

```{r, echo=FALSE}
rmarkdown:::perf_timer_stop("render")
pts = rmarkdown:::perf_timer_summary()
cat("About", round(pts$time[1]/1000 + 5), "-", round(pts$time[1]/1000 + 15),"seconds")
```

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.

## **Acknowledgments**
***

We would like to acknowledge [Elizabeth Stuart](https://www.jhsph.edu/faculty/directory/profile/1792/elizabeth-a-stuart) for assisting in framing the major direction of the case study.

We would like to acknowledge [Michael Breshock](https://mbreshock.github.io/) for his contributions to this case study and developing the `OCSdata` package. 

We would also like to acknowledge the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/) for funding this work. 



