## Exercises - Data Frames

1. Create a data frame named dframe in accordance with the following table. Note, person should be a vector of strings, while sex and funny should be factors with the observed nominal levels and ordinal levels, respectively.

$$\begin{array}{l|l|l} \textrm{person} & \textrm{sex} & \textrm{funny}\\\hline \textrm{Stan} & M & \textrm{High}\\ \textrm{Francine} & F & \textrm{Med}\\ \textrm{Steve} & M & \textrm{Low}\\ \textrm{Roger} & M & \textrm{High}\\ \textrm{Abigail} & F & \textrm{High}\\ \textrm{Klaus} & M & \textrm{Med} \end{array}$$

Now do or find the following:

1. Add an age column where the ages of Stan, Francine, Steve, Roger, Hayley, and Klaus are 41, 41, 15, 1600, 21, and 60, respectively. (Yes, Roger is very old.)

2. Reorder the columns of the data frame so that they occur in the following order: person, age, sex, funny.

3. Make a new data frame similarly constructed and named dframe2 in accordance with the table below.

$$\begin{array}{l|l|l|l} \textrm{person} & \textrm{age} & \textrm{sex} & \textrm{funny}\\\hline \textrm{Fredrico} & 42 & M & \textrm{High}\\ \textrm{Roberta} & 37 & F & \textrm{Med}\\ \textrm{Alfred} & 19 & M & \textrm{Low}\\ \textrm{Bruce} & 35 & M & \textrm{High} \end{array}$$

Then combine these two data frames into a single data frame called mydataframe.

4. Write a single line of code that will extract from mydataframe just the names and ages of any records where the individual is female and has a level of funniness of "Med" or "High".

# (a)
> p = c("Stan","Francine","Steve","Roger","Abigail","Klaus")
> s = factor(c("M","F","M","M","F","M"))
> f = factor(c("High", "Med", "Low", "High", "High", "Med"),
ordered = TRUE,
levels=c("Low","Med","High"))

> dframe = data.frame(person=p, sex=s, funny=f)
> dframe
person sex funny
1     Stan   M  High
2 Francine   F   Med
3    Steve   M   Low
4    Roger   M  High
5  Abigail   F  High
6    Klaus   M   Med

> # (a)
> dframe$age = c(41,41,15,1600,21,60) > dframe person sex funny age 1 Stan M High 41 2 Francine F Med 41 3 Steve M Low 15 4 Roger M High 1600 5 Abigail F High 21 6 Klaus M Med 60 > # (b) > dframe = dframe[TRUE,c(1,4,2,3)] > dframe person age sex funny 1 Stan 41 M High 2 Francine 41 F Med 3 Steve 15 M Low 4 Roger 1600 M High 5 Abigail 21 F High 6 Klaus 60 M Med > # (c) > p = c("Fredrico","Roberta","Alfred","Bruce") > a = c(42,37,19,35) > s = factor(c("M","F","M","M")) > f = factor(c("High","Med","Low","High"), + ordered=TRUE, + levels=c("Low","Med","High")) > dframe2 = data.frame(person=p, age=a, sex=s, funny=f) > dframe2 person age sex funny 1 Fredrico 42 M High 2 Roberta 37 F Med 3 Alfred 19 M Low 4 Bruce 35 M High > mydataframe = rbind(dframe,dframe2) > mydataframe person age sex funny 1 Stan 41 M High 2 Francine 41 F Med 3 Steve 15 M Low 4 Roger 1600 M High 5 Abigail 21 F High 6 Klaus 60 M Med 7 Fredrico 42 M High 8 Roberta 37 F Med 9 Alfred 19 M Low 10 Bruce 35 M High > # (d) > mydataframe[(mydataframe$sex == "F") & (mydataframe$funny > "Med"),1:2] person age 5 Abigail 21  2. A data frame named diet has components gender (M or F), age (an integer), treatment.group (diet or placebo), weight.before (possibly a decimal value), and weight.after (also possibly a decimal value). 1. Write a statement in R that adds a component weight.loss to the data frame diet whose values are the amounts of weight each subject lost during the program. 2. Describe what the following code will do: tapply(diet$weight.loss, diet$treatment.group, hist) 1. diet$weight.loss = diet$before - diet$after

2. This will split the data in the diet$weight.loss vector into two vectors -- one containing the values associated with the diet treatment and another containing the values associated with the placebo treatment group. Then, the hist() function will be applied to each, producing two histograms of weight loss -- one for each treatment group. 3. Import the Excel file "mydata.xlsx" into R as a data frame named apples. 1. Download the the "mydata.xlsx" file to your local machine 2. Click on the "File" tab in the panel in the lower right corner of R-Studio 3. Navigate to the "mydata.xlsx" file 4. Left-click the file and choose "Import Dataset..." 5. Click the "Update" button to see the spreadsheet in the "Import Excel Data" dialog box 6. Change the "Name:" field to apples 7. Click the "Import" button 8. Type "apples" in the console window to verify the import happened correctly 4. Import the text file people.txt into R as a data frame named people. First copy the link and save it to a variable named url. Then, use the read.table() function to create a data frame. Importantly, as the text file has column headers in the first line, use the argument header=TRUE so that R treats these as component names in the data frame, rather than the data itself. url = "https://math.oxford.emory.edu/site/math117/..." # <-- full address # not shown as # it is quite # long people = read.table(url,header=TRUE)  5. After creating the data frame people above, display the age and birth month (in that order) of all males in the dataset. people[people$Sex == "M",c("Age","BirthMonth")]

6. Points earned for several students are given below. Construct a data frame to hold this data, and then add a column to that data frame called grade where the grades are assigned in the following way: A for the top 10% of point totals, B for the next 20%, C for the middle 40%, D for the 20% below that, and F for the bottom 10%.

 studentId pts
1  86
2  84
3  81
4  90
5  85
6  79
7  79
8  75
9  66
10  86
11  73
12  84
13  83
14  68
15  78
16  93
17  82
18  75
19  80
20  87
21  82
22  66
23  91
24  79
25  81

# first, create the data frame...

> studentId = 1:25
> pts = c(86, 84, 81, 90, 85, 79, 79, 75, 66, 86, 73, 84, 83,
68, 78, 93, 82, 75, 80, 87, 82, 66, 91, 79, 81)
> df = data.frame(studentId, pts)

# then, use cut to create a factor for the grades

> mini = min(pts)-1
> maxi = max(pts)+1
> grade = cut(pts,breaks=c(mini,quantile(df$pts,c(0.10,0.30,0.70,0.90)),maxi), labels=c("F","D","C","B","A")) # finally, add the grades to the data frame > df$grade = grade
> df
1          1  86     B
2          2  84     C
3          3  81     C
4          4  90     A
5          5  85     C
6          6  79     D
7          7  79     D
8          8  75     D
9          9  66     F
10        10  86     B
11        11  73     F
12        12  84     C
13        13  83     C
14        14  68     F
15        15  78     D
16        16  93     A
17        17  82     C
18        18  75     D
19        19  80     C
20        20  87     B
21        21  82     C
22        22  66     F
23        23  91     A
24        24  79     D
25        25  81     C