## Exercises - Measures of Center and Spread

1. Given the data set $1, 1, 2, 3, 5, 8$, find the mean, median, mode, and midrange.

$\overline{x} \doteq 3.3333$
Median $= 2.5$
Mode $=1$
Midrange $=4.5$

Note, on a TI-83 calculator you can find the mean and median with:

1. : EDIT : Edit...

2. Enter values in the list marked "L1"

3. : CALC : 1-Var Stats

4. The mean is listed as $\overline{x}$, while you may have to scroll to see the median, which is labeled "Med"

In R, you can calculate the mean, median, and midrange easily -- the mode takes a little more effort:

# Define this function if you absolutely must find modes in R:

getmode = function(v) {
uniques = unique(v)
counts = tabulate(match(v,uniques))
max.count = max(counts)
return(uniques[counts == max.count])
}

# From there, things get easy:

data = c(1,1,2,3,5,8)

mean(data)                  # <-- calculates the mean
median(data)                # <-- calculates the median
getmode(data)               # <-- calculates mode, presuming you defined
#     getmod() as shown above
(min(data) + max(data))/2   # <-- calculates midrange


2. For the data set $1, 1, 2, 3, 5, 8$, what is the range, variance, and IQR?

Range $=8-1=7$

Variance (noting that the mean is 10/3): $$\frac{(1-10/3)^2 + (1-10/3)^2 + (2-10/3)^2 + (3-10/3)^2 + (5-10/3)^2 + (8-10/3)^2}{5} \doteq 7.4666$$ IQR $=5-1=4$

On a TI-83 calculator, assuming the data values have been entered into the list L1 already, simply use the "1-Var Stats" option again:

1. : CALC : 1-Var Stats

2. Sample standard deviations are listed as Sx. You can square this value to find the sample variance. If the contents of L1 actually represent an entire population (this is rare), then the population standard deviation is listed as σx. The IQR can be found by subtracting the value listed as minX from the value of maxX.

In R, calculating these statistics is very direct:

data = c(1,1,2,3,5,8)
max(data) - min(data)    # <-- calculates the range
var(data)                # <-- calculates the variance
IQR(data)                # <-- calculates the IQR (interquartile range)


3. For the following data set: $$\begin{array}{ccccc} 171 & 186 & 191 & 204 & 235\\ 173 & 186 & 193 & 204 & 239\\ 174 & 186 & 197 & 209 & 240\\ 181 & 187 & 199 & 210 & 242\\ 182 & 188 & 200 & 211 & 243\\ 184 & 191 & 200 & 218 & 320\\ \end{array}$$

1. What is the standard deviation?
2. What percentage of the data lies within 1 standard deviation of the mean?
3. What percentage of the data lies within 2 standard deviations of the mean?
5. What percentage of the data lies within 3 standard deviations of the mean?
7. Are there any outliers? (use both rules for determining outliers)
1. $s \doteq 30.3058$
2. $22$ values are between $\overline{x} - s = 174.49$ and $\overline{x} + s = 235.12$, so $22/30 \doteq 0.7333$
3. $29$ values are between $\overline{x} - 2s = 144.18$ and $\overline{x} + 2s = 265.41$, so $29/30 \doteq 0.9667$
4. Yes, Chebyshev predicts a proportion that is at least $1 - 1/2^2 = 0.75$
5. $29$ values are between $\overline{x} - 3s = 113.88$ and $\overline{x} + 3s = 295.71$, so $29/30 \doteq 0.9667$
6. Yes, Chebyshev predicts a proportion that is at least $1 - 1/3^2 = 0.8889$
7. $320$ is an outlier. $Q_1 = 186, Q_3 = 211, IQR = 25$, so $Q_1 - 1.5IQR = 148.5$ and $Q_3 + 1.5IQR = 248.5$. $320$ is outside this range. It is also more than 3 standard deviations from the mean based on calculations in part (e).

4. For a distribution with a mean of 80 and a standard deviation of 10, at least what percentage of values will fall

1. between 60 and 100?
2. between 65 and 95?
1. By Chebyshev's rule, within 2 standard deviations of the mean should be at least $1-1/2^2 = 0.75$
2. By Chebyshev's rule, within 1.5 standard deviations of the means should be at least $1-1/1.5^2 = 0.5556$

5. In the simple random sample of lengths (in hours) of space shuttle flights given below, is there a time that is unusual? How might this flight time be explained? $$0, 73, \ 95, \ 235, \ 192, \ 165, \ 262, \ 191, \ 376, \ 259, \ 235, \ 381, \ 331, \ 221, \ 244$$

$0 \lt Q_1 - 1.5 \cdot IQR$, making it an outlier. This data value corresponds to the catastrophic explosion of the Challenger shortly after take-off.

6. Of the following playing times (in seconds) of a random selection of pop songs, is there one that seems significantly different from the others? If there is, and it is deleted, comment on whether or not the mean, median, standard deviation, and IQR change by a significant amount. $$448, \ 242, \ 231, \ 246, \ 246, \ 293, \ 280, \ 227, \ 244, \ 213, \ 262, \ 239, \ 213, \ 258, \ 255, \ 257$$

$Q_3 + 1.5 \cdot IQR = 297.5$, making $448$ an outlier. If $448$ is removed, the mean decreases from approximately $259.6$ to $247.1$, while the median remains unchanged at $246$. The effect on the standard deviation is even more dramatic, changing from approximately $54.5$ to $22.0$ upon removal of the outlier, while the IQR changes only slightly from $25$ to $27$. Clearly, the median and IQR are not nearly as affected by outliers as the mean and standard deviation.

7. Nicotine amounts (in mg per cigarette) for random samples of filtered and nonfiltered cigarettes are given below. Use appropriate statistics to compare these two samples. $$\begin{array}{rl} \textrm{Nonfiltered:} & 1.1, \ 1.1, \ 1.7, \ 1.6, \ 1.1, \ 1.2, \ 1.1, \ 1.3, \ 1.0, \ 1.3, \ 1.1, \ 1.1\\ \textrm{Filtered:} & 0.4, \ 0.2, \ 1.2, \ 1.0, \ 0.8, \ 1.0, \ 1.1, \ 1.1, \ 1.1, \ 0.6, \ 0.8, \ 1.1 \end{array}$$

The mean nicotine amount for filtered cigarettes is lower than the mean for nonfiltered cigarettes ($\overline{x}_{filtered} = 0.8667 \lt \overline{x}_{nonfiltered} = 1.225$), but it also has a slightly higher standard deviation
($s_{filtered} = 0.3172 \gt s_{nonfiltered} = 0.2179$).

8. Suppose a population has mean 161 and a standard deviation of 7. What can one say about the percentage of values in the population that are within 14 units of the mean?

The percentage of values within 14 units of the mean is at least $75\%$.