You can make your own functions in R to perform any calculations or tasks you might wish. This can be done with the keyword `function`

.

Let's consider a simple example. Suppose we want to create a function to calculate the two real solutions (assuming they exist) to an arbitrary quadratic $ax^2 + bx + c = 0$. We would like our function to take in a vector of three values (i.e., $a$, $b$, and $c$) and produce a vector of two values that represent our solutions.

A function that does this is defined below. Take a look, and then we'll discuss the various parts involved with its definition...

solve.quadratic = function(coefficients) { a = coefficients[1] b = coefficients[2] c = coefficients[3] discriminant = sqrt(b^2-4*a*c) solution1 = (-b-discriminant)/(2*a) solution2 = (-b+discriminant)/(2*a) return (c(solution1,solution2)) }

If you wish to be able to use this function later, you can make R aware of it in one of two ways. First, you can copy it into a script file, select all of it, and either click the run "Run" button, or hit ⌘-enter (OS X) or Ctrl-Enter (windows), as you would to evaluate any expression. Alternatively, you can type the whole function directly in the console and hit enter.

Once the function has been evaluated, you can use/invoke it whenever you like. As an example of using the function above:

> solutions = solve.quadratic(c(1,5,6)) > solutions [1] -3 -2

In the first line of the first code box above, `solve.quadratic`

is the name we have given to this function. This is the name with which we will invoke it later -- as demonstrated by what is typed in the second code box immediately above.

Then in the second line of the first code box above comes the assignment operator. "`=`

", followed by the keyword "`function`

", and a list of parameters we want this function to use inside parentheses. Here, we require only a single parameter -- a vector that inside this function will be referred to as `coefficients`

.

Then comes the body of the function, delimited by curly braces. Everything inside the body of the function will execute every time the function is invoked. Here, we simply create a couple of more conveniently named variables to help calculate the two solutions.

Last, but not least, we see a `return`

statement. The value returned -- here, that would be `c(solution1,solution2)`

-- is the value the function will take on when invoked later. This is seen to be the case when we type `solutions`

in the last line of the second code box above, to display its elements.

A function can have more than one parameter if desired. One needs only separate them by commas inside the parentheses. This allows us to write the function above in a different way. Before, we lumped the three values $a$, $b$, and $c$ into a single vector that we could pass as the lone input to our function. While this is a useful technique when writing a function that takes as input a vector representing some data (potentially a large amount of data) -- for a function that relies on just a few values, it might be better to give each input its own name, as shown below:

solve.quadratic = function(a,b,c) { discriminant = sqrt(b^2-4*a*c) solution1 = (-b-discriminant)/(2*a) solution2 = (-b+discriminant)/(2*a) return (c(solution1,solution2)) }

This time, we invoke the function in the following way:

> solve.quadratic(1,5,6) [1] -3 -2

When you have more than one parameter for a function, you may find it useful to identify parameters by name when you invoke the function, as shown below. In this way, you can list the parameters in whatever order you wish (even backwards):

> solve.quadratic(c=6,b=5,a=1) [1] -3 -2

This is particularly helpful when there are many parameters and it is difficult to remember which one comes before another.

One has the option to give parameters default values in a function's definition -- whereby one only has to provide values for those parameters not using their default values.

As an example, consider the definition of the `root`

function below that computes a square root by default, unless one provides an `index`

parameter. If an index is provided, that root is calculated instead (e.g., if the index is 3, a cube root should be calculated):

root = function(x,index=2) { return(x^(1/index)) }And below, we invoke the function in a couple of different ways:

> root(25) [1] 5 > root(index=3,8) [1] 2

When writing functions, it is important to understand the concept of variable *scope*.

Variables that are seen only within the body of a function are called **local variables** and these are created when the function is invoked and disappear when the function returns.

Variables created outside of functions are called **global variables** and can be referenced both outside *and* inside functions.

Importantly, when a function is invoked, the values supplied for its parameters are copied to the corresponding local variables. When these local variables are modified inside the function, it does not affect any global variables used in the function's invocation.

See if you can follow all the different assignments below and make sense of the output shown in the following example:

x = 1 # <-- this is a global variable d = c(1,2,3) # <-- this is a global variable y = 15 # <-- this is a global variable mystery = function(x,d,e) { # <-- Inside this function x, d, and e # are variables local to the mystery() function. # # In particular, when x & d are now used inside # this function, they will refer to new and different # variables than the global variables x & d defined # earlier. # # The values given to this new x & d are copies of the # values given to them at invocation (see below). # # On the other hand, if y is used inside this function # it refers to the global y variable defined earlier x = x + y # <-- Here, the local x given at invocation and the global y are # added together and their sum is stored in the local x. print(x) # Then, this local x is printed. d[2]=5 # <-- Here, the local vector d's second element is assigned the value 5 print(d) # and the local d is printed print(e+y) # <-- Here, the sum of the local e given at invocation and the # global y are summed and printed y = 200 # <-- Curiously, this statement has no real effect. This actually # creates a local y -- but as the function is about to end, its # value disappears before it is used. If one had wanted to assign # a value to the global y, one needs to use a special operator # called the "superassignment operator, <<-" y <<- 300 # <-- Here, we use the superassignment operator just mentioned to # actually change the global variable y. } mystery(7,c(8,9,10),100) print(x) print(d) print(y)The first three assignments above (i.e., the initial assignments of the global variables x,d, and y) and the definition of the function

`mystery()`

produce easily predictable outputs. Starting with the function's invocation, predicting what R will do requires a bit more thinking. Below are the corresponding outputs:
> mystery(7,c(8,9,10),100) [1] 22 # <-- printed local x equal to 7 + 15 [1] 8 5 10 # <-- printed local d with 2nd element changed to 5 [1] 115 # <-- printed sum of local e (i.e., 100) and global y (15) > print(x) [1] 1 # <-- printed global x, which was not changed by mystery() > print(d) [1] 1 2 3 # <-- printed global d, also not changed by mystery() > print(y) [1] 300 # <-- printed global y, explicitly changed by mystery()

`sapply(v,f)`

to apply a function $f$ to every element of a vector $v$. As an example, suppose one has defined the function given below:
square = function(n) { return(n^2) }We can apply this function to every integer from 1 to 10 with the following:

sapply(1:10,square) [1] 1 4 9 16 25 36 49 64 81 100Of course, the following also works in this case:

square(1:10) [1] 1 4 9 16 25 36 49 64 81 100Given this last example, one might wonder why R provides

`sapply()`

at all. The answer to that questions lies in the fact that some operations (like exponentiation via the "^") can be applied to any vector of values and work just fine -- but others don't.
While we will save discussing the particulars of it for a later time, the use of an `if`

-conditional statement counts itself among these things that cause problems when applied to vectors of length greater than one.

As an example, consider the function below that returns the value $n/2$ if $n$ is even, and $(3n+1)$ otherwise:

collatz = function(n) { if (n %% 2 == 0) { return(n/2) } else { return(3*n+1) } }

Note what happens when we try to apply the function to the vector `1:10`

directly:

collatz(1:10) > collatz(1:10) [1] 4 7 10 13 16 19 22 25 28 31 Warning message: In if (n%%2 == 0) { : the condition has length > 1 and only the first element will be usedversus using

`sapply()`

to do the same:
sapply(1:10,collatz) [1] 4 1 10 2 16 3 22 4 28 5Only in the second case was the output what we actually wanted. Looks like

`sapply()`

might be valuable after all!