In R, finding means and medians can be found with similarly named functions, as seen below.

> data = c(1,2,2,20,4,5,7,3) > mean(data) [1] 5.5 > median(data) [1] 3.5

There is no built-in function for identifying the mode, although one can write a custom function to this end. Importantly, avoid naming this custom function "mode" (although "Mode" with a capital "M" is ok, as R is case-sensitive). This is due to the fact that there is another function named "mode" that is used for something else in R. An example of such a function is given below, along with an example of its application:

> data = c(1,2,2,20,4,5,7,3) > Mode = function(x) {} > Mode = function(x) { + ux = unique(x) + ux[which.max(tabulate(match(x,ux)))] + } > Mode(data) [1] 2

In Excel, the mean, median, and mode of a given range $R$ can be quickly calculated with the following functions, respectively: $$AVERAGE(R), \quad MEDIAN(R), \quad \textrm{and} \quad MODE(R)$$

In R, one can calculate the variance, standard deviation, and interquartile range (IQR) with functions `var()`

, `sd()`

, and `IQR()`

, as seen in the examples below:

> data = c(1,2,2,20,4,5,7,3) > var(data) [1] 38 > sd(data) [1] 6.164414 > IQR(data) [1] 3.5

Of course, if one needed the population variance or population standard deviation, one could multiply the variance by the appropriate constant:

> data = c(1,2,2,20,4,5,7,3) > n = length(data) > n [1] 8 > pop.var = var(data) * (n-1) / n > pop.var [1] 33.25 > pop.sd = sqrt(pop.var) > pop.sd [1] 5.766281

The range of values in a data set can also be easily found with the `min()`

, `max()`

, and `range()`

, as shown below:

> data = c(1,2,2,20,4,5,7,3) > min(data) [1] 1 > max(data) [1] 20 > range(data) [1] 1 20

In Excel, similar statistics related to the spread of data in a given range $R$ are given by the following functions or expressions:

Statistic | Function |
---|---|

Population Variance | `VAR.P(R)` |

Sample Variance | `VAR.S(R)` |

Population Standard Deviation | `STDEV.P(R)` |

Sample Standard Deviation | `STDEV.S(R)` |

Inter-quartile Range (IQR) | `QUARTILE(R,3) - QUARTILE(R,1)` |

Maximum | `MAX(R)` |

Minimum | `MIN(R)` |

`quantile()`

function, which gives the cutoff data value under which some percentage of the data lie.
As an example of how this function works, suppose we have the following variable defined:

> grades = c(86,92,100,93,89,95,79,98,68,62,71,75,88,86,93,81,100,86,96,52,15)We can find $Q_1$ with

`quantile(grades,0.25)`

, since 25% of the data lies below $Q_1$.
Similarly, `quantile(grades,0.75)`

will give $Q_3$.

Using these, we can quickly identify any outliers with the following:

> lower.bound = quantile(grades,0.25)-1.5*IQR(grades) > upper.bound = quantile(grades,0.75)+1.5*IQR(grades) > c(lower.bound,upper.bound) 25% 75% 48 120 > grades[grades < lower.bound | grades > upper.bound] [1] 15Thus, $15$ is an outlier as it falls below $Q_1 = 48$.

Of course, one can also identify outliers under appropriate circumstances by finding those data values that fall outside of 3 standard deviations from the mean. This is done in a straight-forward way, as shown in the example below:

> grades = c(86,92,100,93,89,95,79,98,68,62,71,75,88,86,93,81,100,86,96,52,15) > lower.bound = mean(grades) - 3*sd(grades) > upper.bound = mean(grades) + 3*sd(grades) > c(lower.bound,upper.bound) [1] 21.58463 140.79632 > grades[grades < lower.bound | grades > upper.bound] [1] 15