| ![]() |
Create a data frame named dframe in accordance with the following table. Note, person should be a vector of strings, while sex and funny should be factors with the observed nominal levels and ordinal levels, respectively.
Now do or find the following:
Add an age column where the ages of Stan, Francine, Steve, Roger, Hayley, and Klaus are 41, 41, 15, 1600, 21, and 60, respectively. (Yes, Roger is very old.)
Reorder the columns of the data frame so that they occur in the following order: person, age, sex, funny.
Make a new data frame similarly constructed and named dframe2 in accordance with the table below.
Then combine these two data frames into a single data frame called mydataframe.
Write a single line of code that will extract from mydataframe just the names and ages of any records where the individual is female and has a level of funniness of "Med" or "High".
# (a)
> p = c("Stan","Francine","Steve","Roger","Abigail","Klaus")
> s = factor(c("M","F","M","M","F","M"))
> f = factor(c("High", "Med", "Low", "High", "High", "Med"),
ordered = TRUE,
levels=c("Low","Med","High"))
> dframe = data.frame(person=p, sex=s, funny=f)
> dframe
person sex funny
1 Stan M High
2 Francine F Med
3 Steve M Low
4 Roger M High
5 Abigail F High
6 Klaus M Med
> # (a)
> dframe$age = c(41,41,15,1600,21,60)
> dframe
person sex funny age
1 Stan M High 41
2 Francine F Med 41
3 Steve M Low 15
4 Roger M High 1600
5 Abigail F High 21
6 Klaus M Med 60
> # (b)
> dframe = dframe[TRUE,c(1,4,2,3)]
> dframe
person age sex funny
1 Stan 41 M High
2 Francine 41 F Med
3 Steve 15 M Low
4 Roger 1600 M High
5 Abigail 21 F High
6 Klaus 60 M Med
> # (c)
> p = c("Fredrico","Roberta","Alfred","Bruce")
> a = c(42,37,19,35)
> s = factor(c("M","F","M","M"))
> f = factor(c("High","Med","Low","High"),
+ ordered=TRUE,
+ levels=c("Low","Med","High"))
> dframe2 = data.frame(person=p, age=a, sex=s, funny=f)
> dframe2
person age sex funny
1 Fredrico 42 M High
2 Roberta 37 F Med
3 Alfred 19 M Low
4 Bruce 35 M High
> mydataframe = rbind(dframe,dframe2)
> mydataframe
person age sex funny
1 Stan 41 M High
2 Francine 41 F Med
3 Steve 15 M Low
4 Roger 1600 M High
5 Abigail 21 F High
6 Klaus 60 M Med
7 Fredrico 42 M High
8 Roberta 37 F Med
9 Alfred 19 M Low
10 Bruce 35 M High
> # (d)
> mydataframe[(mydataframe$sex == "F") & (mydataframe$funny > "Med"),1:2]
person age
5 Abigail 21
A data frame named diet has components gender (M or F), age (an integer), treatment.group (diet or placebo), weight.before (possibly a decimal value), and weight.after (also possibly a decimal value).
Write a statement in R that adds a component weight.loss to the data frame diet whose values are the amounts of weight each subject lost during the program.
Describe what the following code will do:
tapply(diet$weight.loss, diet$treatment.group, hist)
diet$weight.loss = diet$before - diet$after
This will split the data in the diet$weight.loss vector into two vectors -- one containing the values associated with the diet treatment and another containing the values associated with the placebo treatment group. Then, the hist() function will be applied to each, producing two histograms of weight loss -- one for each treatment group.
Import the Excel file "mydata.xlsx" into R as a data frame named apples.
applesImport the text file people.txt into R as a data frame named people.
First copy the link and save it to a variable named url. Then, use the read.table() function to create a data frame. Importantly, as the text file has column headers in the first line, use the argument header=TRUE so that R treats these as component names in the data frame, rather than the data itself.
url = "https://math.oxford.emory.edu/site/math117/..." # <-- full address
# not shown as
# it is quite
# long
people = read.table(url,header=TRUE)
people above, display the age and birth month (in that order) of all males in the dataset.
people[people$Sex == "M",c("Age","BirthMonth")]
Points earned for several students are given below. Construct a data frame to hold this data, and then add a column to that data frame called grade where the grades are assigned in the following way: A for the top 10% of point totals, B for the next 20%, C for the middle 40%, D for the 20% below that, and F for the bottom 10%.
studentId pts
1 86
2 84
3 81
4 90
5 85
6 79
7 79
8 75
9 66
10 86
11 73
12 84
13 83
14 68
15 78
16 93
17 82
18 75
19 80
20 87
21 82
22 66
23 91
24 79
25 81
# first, create the data frame...
> studentId = 1:25
> pts = c(86, 84, 81, 90, 85, 79, 79, 75, 66, 86, 73, 84, 83,
68, 78, 93, 82, 75, 80, 87, 82, 66, 91, 79, 81)
> df = data.frame(studentId, pts)
# then, use cut to create a factor for the grades
> mini = min(pts)-1
> maxi = max(pts)+1
> grade = cut(pts,breaks=c(mini,quantile(df$pts,c(0.10,0.30,0.70,0.90)),maxi),
labels=c("F","D","C","B","A"))
# finally, add the grades to the data frame
> df$grade = grade
> df
studentId pts grade
1 1 86 B
2 2 84 C
3 3 81 C
4 4 90 A
5 5 85 C
6 6 79 D
7 7 79 D
8 8 75 D
9 9 66 F
10 10 86 B
11 11 73 F
12 12 84 C
13 13 83 C
14 14 68 F
15 15 78 D
16 16 93 A
17 17 82 C
18 18 75 D
19 19 80 C
20 20 87 B
21 21 82 C
22 22 66 F
23 23 91 A
24 24 79 D
25 25 81 C