Day 7 Tasks & Activities

1. Setup

Create a new repo on GitHub called eds221-day7-activities
Clone to make a version controlled R Project
Add subfolders data, R and figs
Familiarize yourself with the contents, data files, and variables from this data package on EDI
Download the entire Zip Archive for the package
Copy all 4 files to your data folder

In this section, you’ll test and explore a number of different joins.

Create a new .qmd in your R folder saved as bird_joins.qmd
Read in the data sets and store the data frames as bird_observations, sites, surveys, and taxalist (it should be clear from the raw file names which is which)
Create a subset of bird_observations called birds_subset that only contains observations for birds with species id “BHCO” and “RWBL”, and from sites with site ID “LI-W” and “NU-C”

Use left join(s) to update birds_subset so that it also includes sites and taxalist information. For each join, include an explicit argument saying which variable you are joining by (even if it will just assume the correct one for you). Store the updated data frame as birds_left. Make sure to look at the output - is what it contains consistent with what you expected it to contain?

First, answer: what do you expect a full_join() between birds_subset and sites to contain?
Write code to full_join the birds_subset and sites data into a new object called birds_full. Explicitly include the variable you’re joining by. Look at the output. Is it what you expected?

Continue in your same .qmd that you created for Task 1

Starting with your birds object, rename the notes column to bird_obs_notes (so this doesn’t conflict with notes in the surveys dataset
Then, create a subset that contains all observations in the birds dataset, joins the taxonomic, site and survey information to it, and is finally limited to only columns survey_date, common_name, park_name, and bird_count. You can decide the order that you want to create this in (e.g. limit the columns first, then join, or the other way around).
Use lubridate::month() to add a new column called survey_month, containing only the month number. Then, convert the month number to a factor (again within mutate())
Learn a new function on your own! Use dplyr::relocate() to move the new survey_month column to immediately after the survey_date column. You can do this in a separate code chunk, or pipe straight into it from your existing code.
Find the total number of birds observed by park and month (i.e., you’ll group_by(park_name, survey_month))
Filter to only include parks “Lindo”, “Orme”, “Palomino” and “Sonrisa”

date	building	alarm_message
2020-03-14	Engineering-North	10:02am -- HVAC system down, facilities management alerted
2020-03-15	Bren Hall	8:24am -- Elevator North out of service
2020-04-10	Engineering-South	12:41am -- Fire alarm, UCSB fire responded and cleared
2020-04-18	Engr-North	9:58pm -- Campus point emergency siren, UCPD responded

Back in your string_practice.Rmd, create a new code chunk
With your cursor in your code chunk, go up to Addins in the top bar of RStudio. From the drop-down menu, choose ‘Paste as data frame’. Make sure to add code to store the data frame as alarm_report
Practice working with strings by writing code to update alarm_report as follows (these can be separate, or all as part of a piped sequence):
- Replace the “Engr” with “Engineering” in the building column
- Separate the building column into two separate columns, building and wing, separated at the dash
- Only keep observations with the word “responded” in the alarm_message column
- Separate the message time from the rest of the message by separating at --
- Convert the date column to a Date class using lubridate

End Activity Session (Day 7)