End Interactive Session 2A
<- c(2.1, 4.2, 3.3, 5.4) x
Data in R: accessing / updating elements continued
These materials are modified from the following source:
Wickham, Hadley. Advanced R.
http://adv-r.had.co.nz/Subsetting.html
Let’s explore the different types of subsetting with a simple vector, x
using [
.
Note that the number after the decimal point represents the original position in the vector.
There are six things that you can use to subset a vector:
Positive integers return elements at the specified positions:
Negative integers exclude elements at the specified positions:
Note that you can’t mix positive and negative integers in a single subset:
Logical vectors select elements where the corresponding logical value is TRUE
. This is probably the most useful type of subsetting because you can write an expression that uses a logical vector:
In x[y]
, what happens if x
and y
are different lengths? The behaviour is controlled by the recycling rules where the shorter of the two is recycled to the length of the longer. This is convenient and easy to understand when one of x
and y
is length one, but I recommend avoiding recycling for other lengths because the rules are inconsistently applied throughout base R.
Note that a missing value in the index always yields a missing value in the output:
Nothing returns the original vector. This is not useful for 1D vectors, but, as you’ll see shortly, is very useful for matrices, data frames, and arrays. It can also be useful in conjunction with assignment.
Zero returns a zero-length vector. This is not something you usually do on purpose, but it can be helpful for generating test data.
All subsetting operators can be combined with assignment to modify selected values of the input vector.
You just need to make sure that the lengths of left and right hand side of the assignments match.
You can’t combine integer indices with NA
But you can combine logical indices and NAs! (The NAs will be treated as false.)
This becomes really useful because you can conditionally modify vectors.
We can use the same operations to subset data based on conditions.
For example, if we wanted to find all the cars with 5 gears.
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Or, we could subset based on conditions for multiple columns.
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
The subset()
function is a specialized shorthand function for subsetting data frames.
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
To remove columns from a data frame, you can…
End Interactive Session 2A