End Activity Session (Day 6)
<- read_csv(here("data","Lobster_Abundance_All_Years_20210412.csv"), na = c("-99999", "")) %>%
lobsters clean_names() %>%
uncount(count)
Tidying & subsetting with tidyr
and dplyr
Take ~15 minutes to read Broman & Woo’s evergreen paper Data organization in spreadsheets. As you read, think about data that you have created or had to work with that did not follow these guidelines. Make notes of examples to share from several - how did you input data previously? How would you change the way you input data?
Questions:
What are major / most common ways you have seen these guidelines ignored?
What is your experience working with or creating data in spreadsheets that don’t follow these guidelines?
Santa Barbara Coastal LTER, D. Reed, and R. Miller. 2021. SBC LTER: Reef: Abundance, size and fishing effort for California Spiny Lobster (Panulirus interruptus), ongoing since 2012 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/0bcdc7e8b22b8f2c1801085e8ca24d59
eds221-day6-activities
data
and docs
data
subfolder.docs
, create a new .Rmd or .qmd saved with file prefix lobster_exploration
data/Lobster_Abundance_All_Years_20210412.csv
file. Take note of values that can be considered NA
(see metadata) and update your import line to convert those to NA
valuesdplyr::uncount()
on the existing count
column. What did this do? Add annotation in your code explaining dplyr::uncount()
Here’s code to read in your data, just to get your started:
n()
), and mean carapace lengths of lobsters observed in the dataset by site and year.The legal lobster size (carapace length) in California is 79.76 mm.
Create a subset that only contains lobster data from 2020 (note: this should be from the original data you read in, not the summary table you created above)
Write code (you can decide how to do this - there are a number of ways) to find the counts of lobsters observed at each site (only using site as the grouping factor) that are above and below the legal limit. Hint: You may want to add a new column legal
that contains “yes” or “no” based on the size of the observed lobster (see dplyr::case_when()
for a really nice way to do this), then use group_by() %>% summarize(n())
or dplyr::count()
to get counts by group within variables
Create a stacked column graph that shows the proportion of legal and non-legal lobsters at each site. **Hint: create a stacked column graph with geom_col()
, then add the argument position = "fill"
to convert from a graph of absolute counts to proportions.
Which two sites had the largest proportion of legal lobsters in 2020? Explore the metadata to come up with a hypothesis about why that might be.
Starting with the original lobsters data that you read in as lobsters
, complete the following (separately - these are not expected to be done in sequence or anything). You can store each of the outputs as ex_a
, ex_b
, etc. for the purposes of this task.
filter()
practiceCreate and store a subset that only contains lobsters from sites “IVEE”, “CARP” and “NAPL”. Check your output data frame to ensure that only those three sites exist.
Create a subset that only contains lobsters observed in August.
Create a subset with lobsters at Arroyo Quemado (AQUE) OR with a carapace length greater than 70 mm.
Create a subset that does NOT include observations from Naples Reef (NAPL)
group_by() %>% summarize()
practiceFind the mean and standard deviation of lobster carapace length, grouped by site.
Find the maximum carapace length by site and month.
mutate()
practiceAdd a new column that contains lobster carapace length converted to centimeters. Check output.
Update the site column to all lowercase. Check output.
Convert the area column to a character (not sure why you’d want to do this, but try it anyway). Check output.
case_when()
practiceUse case_when()
to add a new column called size_bin
that contains “small” if carapace size is <= 70 mm, or “large” if it is greater than 70 mm. Check output.
Use case_when()
to add a new column called designation
that contains “MPA” if the site is “IVEE” or “NAPL”, and “not MPA” for all other outcomes.
End Activity Session (Day 6)