## Exercises - Lists

1. Predict the output if the following is executed in R.

my.list=list(a=1:3,b=2:4,c=3:5)
factor(rep(names(my.list),sapply(my.list,length)))

[1] a a a b b b c c c
Levels: a b c

2. Create a list mylist that contains, in this order, a sequence of 20 evenly spaced numbers between $-4$ and $4$; the $3 \times 3$ matrix shown below; a vector containing two strings "Don" and "Quixote"; and a factor made from the vector c("LOW","MED","LOW","MED","MED","HIGH").

$$\begin{bmatrix}5 & 9 & 1\\4 & 3 & 2\\7 & 6 & 8\end{bmatrix}$$

Then do the following:

1. Write R code to display column elements 2 and 1 of rows 2 and 3, in that order, of the matrix inside mylist.

2. Write R code that prints the two strings in the vector inside mylist.

3. Write R code to display all values from the sequence between $-4$ and $4$ inside mylist that are greater than 1.

4. Using the which(), determine which indexes in the factor are assigned the "MED" level.

mylist = list(seq(from=-4,to=4,length.out=20),
matrix(c(5,4,7,9,3,6,1,2,8),nrow=3),
c("Don","Quixote"),
factor(c("LOW","MED","LOW","MED","MED","HIGH")))

# (a)
mylist[[2]][c(2,3),c(2,1)]

# (b)
cat(mylist[[3]])

# (c)
mylist[[1]][mylist[[1]] > 1]

# (d)
which(mylist[[4]] == "MED")

3. Create a new list with the factor named grades made from the vector c("A","C","B","B","D","A"); the vector named nums identical to c(3,2.1,3.3,4,1.5,4.9); and a nested list comprised of the first three members from the list created in the previous question, named oldlist. Then do the following:

1. Write R code to display the elements of grades that correspond to elements of nums that are greater than or equal to $3$.

2. Add a new member to the list, named third.col.twice, that is a vector of length $6$ and a twofold repetition of the third column of the matrix in the oldlist component.

3. Replace the vector containing the two words in the oldlist component with only the single string of text "Don Quixote".

4. Add a logical vector named above.avg to the list that indicates (in the same order) if each grade in grades was above a "C".

newlist = list()
newlist$grades = c("A","C","B","B","D","A") newlist$nums = c(3,2.1,3.3,4,1.5,4.9)
newlist$oldlist = mylist # (a) newlist$grades[which(newlist$nums >= 3)] # (b) newlist$third.col.twice = rep(newlist$oldlist[[2]][,3],times=2) # (c) newlist$oldlist[[3]] = "Don Quixote"

# (d)
newlist$above.avg = factor(newlist$grades,
ordered=TRUE,
levels=c("F","D","C","B","A"))>"C"

4. Use R to express the following list in order by component name

my.list = list(b=3,a="bob",c=c(1,2,3),f=TRUE,e=c("x","y","z"),d=37)

> my.list = list(b=3,a="bob",c=c(1,2,3),f=TRUE,e=c("x","y","z"),d=37)

> names(my.list)                 # <-- this is how we can access the names as a
#     vector of strings (of text)
[1] "b" "a" "c" "f" "e" "d"

> order(names(my.list))          # <-- this produces a vector we can use
#     to permute the list elements
#     through a sublist
[1] 2 1 3 6 5 4

> my.list[order(names(my.list))]
$a [1] "bob"$b
[1] 3

$c [1] 1 2 3$d
[1] 37

$e [1] "x" "y" "z"$f
[1] TRUE

5. Write R code that finds the relative frequency of occurrence of each letter in the alphabet for a given string of text. Note, only the frequencies associated with letters of the alphabet should be reported -- spaces and any punctuation should be ignored. Also, both the uppercase and lowercase versions of a letter should be counted for any given letter, when calculating these frequencies.

You may find the following functions helpful:

• strsplit(v,"") splits each string in $v$ into a vector of the characters that comprise it, and returns a list of these vectors.
• tolower(v) takes a vector of strings of text (i.e., a character vector) and returns a vector of those same strings converted to lowercase.

For the following string of text:

message = "While the individual man is an insoluble puzzle, in the aggregate
he becomes a mathematical certainty.  You can, for example, never
foretell what any one man will be up to, but you can say with
precision what an average number will be up to.  Individuals vary,
but percentages remain constant.  So says the statistician.
~Arthur Conan Doyle"


The relative frequencies reported should be:

          a           b           c           d           e           f
0.114391144 0.025830258 0.036900369 0.018450185 0.114391144 0.007380074
g           h           i           l           m           n
0.018450185 0.036900369 0.077490775 0.055350554 0.029520295 0.084870849
o           p           r           s           t           u
0.051660517 0.022140221 0.047970480 0.047970480 0.084870849 0.044280443
v           w           x           y           z
0.018450185 0.022140221 0.003690037 0.029520295 0.007380074