The score and gender vectors really belong together
as one object. You can tie them together into a data frame
this way:
> info <- data.frame(gender,score) > info gender score 1 M 47 2 M 82 3 F 65 4 F 81 5 M 39 6 F 72 7 M 89 8 M 49 9 F 52 10 F 76
With vectors, you specified an item or item(s) in square brackets to select from the vector. Since data frames have rows and columns, you specify both of them in the square brackets to show what you want, with the row number(s) first. Try the following R commands.
> info[1,1] # row 1, column 1 [1] M Levels: F M > info[1,2] #row 1, column 2 [1] 47
When one of the vectors making up a data frame consists of character data (the gender vector in this case), R converts it to a factor with levels corresponding to each unique value. In this case, gender is a factor with two levels: F and M. In the first example, when you accessed the gender, R also showed you the levels for that factor.
You can also extract whole rows by leaving out a column number, or extract whole columns by leaving out the row number. Notice what happens in the last example when you give only one number in the square brackets.
> info[1, ] # all of row 1 gender score 1 M 47 > info[ , 1] # column 1 as a vector [1] M M F F M F M M F F Levels: F M > info[ , 2] # column 2 as a vector [1] 47 82 65 81 39 72 89 49 52 76 > info[1:3, 2] # rows 1 to 3, column 2 [1] 47 82 65 > info[1:3, 1:2] # rows 1 to 3, columns 1 to 2 gender score 1 M 47 2 M 82 3 F 65 > info[2] # column 2 as a one-column data frame score 1 47 2 82 3 65 4 81 5 39 6 72 7 89 8 49 9 52 10 76
You can also do conditional selection in a data frame. Here is how you would get all the scores for the males:
> info[info[1]=="M", 2]
[1] 47 82 39 89 49Why does this work? Remember that the first entry in the square brackets tells you which row or rows you want; the second entry tells you which column or columns you want. info[1]=="M" gives you the rows for which column 1 (gender) contains the letter M.
For this data frame, it’s easy to remember that column 1 is
the gender and column 2 is the score. When you have a larger data
frame with many columns, using just numbers is not the best way;
you end up with a piece of paper that tells which variable is in which
column. Since the columns in a data frame are already labeled, wouldn’t
it be nice if you could use those labels? And, of course, you can, by
using the dollar sign $ symbol, which you can read as
“column” or “variable.” Try these:
> info$gender[1] # info column gender, row 1 [1] M Levels: F M > info$score[2] # info column score, row 2 [1] 82 > mean(info$score) # mean of info column score [1] 65.2 > info[info$gender == "M", ] # see following gender score 1 M 47 2 M 82 5 M 39 7 M 89 8 M 49 > info$score[info$gender=="F"] # see following [1] 65 81 72 52 76
The next to last one is read as “from the info data frame, select all rows where info column gender equals M.”
The last one is read as “from info column score, select all rows where info column gender equals F.”
Presume you have an experiment to measure the reaction time of people in two groups, group A and group B. Figure out the following in R. See the solution.