1. Setup
- Create a new repo on GitHub called
eds221-day7-activities
- Clone to make a version controlled R Project
- Add subfolders
data
, R
and figs
- Familiarize yourself with the contents, data files, and variables from this data package on EDI
- Download the entire Zip Archive for the package
- Copy all 4 files to your
data
folder
Task 1: Joins on birds
In this section, you’ll test and explore a number of different joins.
- Create a new .qmd in your
R
folder saved as bird_joins.qmd
- Read in the data sets and store the data frames as
bird_observations
, sites
, surveys
, and taxalist
(it should be clear from the raw file names which is which)
- Create a subset of
bird_observations
called birds_subset
that only contains observations for birds with species id “BHCO” and “RWBL”, and from sites with site ID “LI-W” and “NU-C”
Left join practice
- Use left join(s) to update
birds_subset
so that it also includes sites
and taxalist
information. For each join, include an explicit argument saying which variable you are joining by (even if it will just assume the correct one for you). Store the updated data frame as birds_left
. Make sure to look at the output - is what it contains consistent with what you expected it to contain?
Full join practice
First, answer: what do you expect a full_join()
between birds_subset
and sites
to contain?
Write code to full_join
the birds_subset
and sites
data into a new object called birds_full
. Explicitly include the variable you’re joining by. Look at the output. Is it what you expected?
Task 2: Data wrangling and visualization with birds
Continue in your same .qmd that you created for Task 1
Starting with your birds
object, rename the notes
column to bird_obs_notes
(so this doesn’t conflict with notes
in the surveys
dataset
Then, create a subset that contains all observations in the birds
dataset, joins the taxonomic, site and survey information to it, and is finally limited to only columns survey_date
, common_name
, park_name
, and bird_count
. You can decide the order that you want to create this in (e.g. limit the columns first, then join, or the other way around).
Use lubridate::month()
to add a new column called survey_month
, containing only the month number. Then, convert the month number to a factor (again within mutate()
)
Learn a new function on your own! Use dplyr::relocate()
to move the new survey_month
column to immediately after the survey_date
column. You can do this in a separate code chunk, or pipe straight into it from your existing code.
Find the total number of birds observed by park and month (i.e., you’ll group_by(park_name, survey_month)
)
Filter to only include parks “Lindo”, “Orme”, “Palomino” and “Sonrisa”
Task 3: Practice with strings
- Create a new .qmd in your
R
folder called string_practice.qmd
- Copy all contents of the html table below to your clipboard:
2020-03-14 |
Engineering-North |
10:02am -- HVAC system down, facilities management alerted |
2020-03-15 |
Bren Hall |
8:24am -- Elevator North out of service |
2020-04-10 |
Engineering-South |
12:41am -- Fire alarm, UCSB fire responded and cleared |
2020-04-18 |
Engr-North |
9:58pm -- Campus point emergency siren, UCPD responded |
Back in your string_practice.Rmd
, create a new code chunk
With your cursor in your code chunk, go up to Addins in the top bar of RStudio. From the drop-down menu, choose ‘Paste as data frame’. Make sure to add code to store the data frame as alarm_report
Practice working with strings by writing code to update alarm_report
as follows (these can be separate, or all as part of a piped sequence):
- Replace the “Engr” with “Engineering” in the
building
column
- Separate the
building
column into two separate columns, building
and wing
, separated at the dash
- Only keep observations with the word “responded” in the
alarm_message
column
- Separate the message time from the rest of the message by separating at
--
- Convert the date column to a Date class using
lubridate
End Activity Session (Day 7)