PSYCH 018 Index > Introduction to R (Part 5)

Introduction to R: Reading External Data

For data consisting of a few numbers, we can type the data by hand. But for any reasonably large set of data, typing lots of c() lines is just not feasible. Instead, either you or the software you are using (in the case of most survey software) create a CSV files. CSV stands for Comma-Separated Values. That is, each row or observation consists of several values, separated by commas. Here is a CSV file that gives demographic data from a group of students. The first line gives the column names, and the rest of the lines are your data.

To read in the data, you use the read.csv() function. If you have the file on your local disk, you can give the path name in the parentheses. Or, in this case, you can read it straight from the website. (In the example below, we show only a few of the rows rather than waste the space here on this page.)

> student <- read.csv("http://evc-cit.info/psych018/r_intro/demographics.csv")
> student
   gender age feet inches weight
1       M  22    5      4  145.0
2       M  18    5      6  140.0
3       M  31    5     11  240.0
# ...
36      F  18    5      4  145.0
37      F  32    5      2   97.0
38      F  21    4      7  110.0

Using feet and inches as separate columns is inconvenient; it would be nicer to have a single column named height that gives the total height in inches. Create that column, and then get rid of the feet and inches columns by typing the following commands:

> student$height <- student$feet * 12 + student$inches
> student
   gender age feet inches weight height
1       M  22    5      4  145.0     64
2       M  18    5      6  140.0     66
3       M  31    5     11  240.0     71
#...
36      F  18    5      4  145.0     64
37      F  32    5      2   97.0     62
38      F  21    4      7  110.0     55
> student$feet <- NULL # eliminate the feet column
> student$inches <- NULL # and inches too
> student
   gender age weight height
1       M  22  145.0     64
2       M  18  140.0     66
3       M  31  240.0     71
#...
36      F  18  145.0     64
37      F  32   97.0     62
38      F  21  110.0     55

When you eliminate a column by setting its value to the special word NULL, it is gone forever!

Another Shortcut

If you get tired of typing student$ before all the column names, you can attach the data frame, so that when R looks for an object, it will look in the data frame’s column names.

> attach(student)
> gender # from student data frame
 [1] M M M F M F F M F M F F F F F F M F M F F F M F M M F F M F M F F F F F F F
Levels: F M
> mean(weight)
[1] 138.25
> sd(height)
[1] 3.769780
> mean(student$weight) # you can always do it the long way
[1] 138.25

When you are done using the shortcut from attach( ), you use detach( ). After you detach the data frame, you have to use the frame$column notation to access the data.

> detach(student)
> mean(weight)
Error in mean(weight) : object "weight" not found
> mean(student$weight)
[1] 138.25

attach( ) is not without problems, however. See some cautionary notes.