<- function(data_1, data_2 = NULL,
name_of_function argument_2 = TRUE, argument_3 = NULL,
argument_1,
...){# Conduct Task
# Conduct Task
<- Tasks
output_object return(output_object)
}
3 Functional Programming
3.1 Functions
The functionality in R is what makes it completely powerful compared to other statistical software. There are several pre-built functions, and you can extend R’s functionality further with the use of R Packages.
3.1.1 Built-in Functions
There are several available functions in R to conduct specific statistical methods. The table below provides a set of commonly used functions:
Functions | Description |
---|---|
aov() |
Fits an ANOVA Model |
lm() |
Fits a linear model |
glm() |
Fits a general linear model |
t.test() |
Conducts a t-test |
Several of these functions have help documentation that provide the following sections:
Section | Description |
---|---|
Description | Provides a brief introduction of the function |
Usage | Provides potential usage of the function |
Arguments | Arguments that the function can take |
Details | An in depth description of the function |
Value | Provides information of the output produced by the function |
Notes | Any need to know information about the function |
Authors | Developers of the function |
References | References to the model and function |
See Also | Provide information of supporting functions |
Examples | Examples of the function |
To obtain the help documentation of each function, use the ?
operator and function name in the console pane.
3.1.2 Generic Functions
Commonly used functions, such as summary()
and plot()
functions, are considered generic functions where their functionality is determined by the class of an R object. For example, the summary()
function is a generic function for several types of functions: summary.aov()
, summary.lm()
, summary.glm()
, and many more. Therefore, the appropriate function is needed depending the type of R object. This is where generic functions come in. We can use a generic function, ie summary()
, to read the type of object and then apply to correct procedure to the object.
3.1.3 User-built Functions
While R has many capable functions that can be used to analyze your data, you may need to create a custom function for specific needs. For example, if you find yourself writing the same to repeat a task, you can wrap the code into a user-built function and use it for analysis.
To create a user-built function, you will using the function()
to create an R object that is a function. To use the function Inside the funtion()
parentheses, write the arguments that need to specified for your function. These are arguments you choose for the function.
3.1.3.1 Anatomy
In general function we construct a function with the following anatomy:
Here, we are creating an R function called name_of_function
that will take the following arguments: data_1
, data_2
, argument_1
, argument_2
, argument_3
, and ...
. From this function, it requires us to supply data for data_1
and argument_1
. Arguments data_2
and argument_3
are not required, but can be utilized in the function if necessary. Argument argument_2
is also required for the function, but it it has a default setting (in this case TRUE
) if it is not specified. Lastly, the ...
argument allows you to pass other arguments to R built in functions if they are present. For example, we may use the plot()
to create graphics and want to manipulate the output plot further, but do not want to specify the arguments in the user-based function. In the function itself, we will complete the necessary tasks and then use the return()
to return the output.
3.1.3.2 Example
To begin, let’s create a function that squares any value:
<- function(x){x^2} x_square
Above, I am creating a new function called x_square
and it will take values of x
and square it. Here are a couple of examples of x_square()
:
x_square(4)
[1] 16
x_square(5)
[1] 25
The mtcars
data set has several numeric variables that can be used for analysis. Let’s say we want to apply a function (x_square()
) to the sum of a specific variable and return the value. Then let’s further complicate the function by allowing the sum of 2 variables, take the log of the sum and dividing the value if necessary. Below is the code for such function called summing
:
<- function(vec1, vec2 = NULL, FUN, log_val = FALSE, divisor_val = NULL){
summing <- match.fun(FUN)
FUN <- c(vec1, vec2)
wk_vec <- FUN(sum(wk_vec))
fun_sum_val <- NULL
lval if (isTRUE(log_val)){
<- log(fun_sum_val)
lval else {
} <- fun_sum_val
lval
}if (!is.null(divisor_val)){
<- divisor_val
dval else {dval <- 1}
} <- lval/dval
output return(output)
}
Now let’s try obtaining the
sum(mtcars$mpg)^2
[1] 413320.4
summing(mtcars$mpg, FUN = x_square)
[1] 413320.4
log(sum(c(mtcars$mpg,mtcars$disp))^2)
[1] 17.98088
summing(mtcars$mpg, mtcars$disp, x_square, T)
[1] 17.98088
log(sum(c(mtcars$mpg,mtcars$disp))^2)/5
[1] 3.596177
summing(mtcars$mpg, mtcars$disp, x_square, T, 5)
[1] 3.596177
3.2 *apply Functions
*apply functions are used to iterate a function through a set of elements in a vector, matrix, or list. This will then return a vector or list depending on what is requested.
3.2.1 apply()
The apply()
function is used to apply a function to the margins of an array or matrix. It will iterate between the elements, apply a function to the data, and return a vector, array or list if necessary. To use the apply()
function, you will need to specify three arguments, X
or the array, MARGIN
which margin to apply the function on, and FUN
the function.
Below we calculate the row means and column means using the apply function for a \(5 \times 4\) matrix containing the elements 1 through 20:
<- matrix(1:20, nrow = 5, ncol = 4)
x
# Row Means
apply(x, 1, mean)
[1] 8.5 9.5 10.5 11.5 12.5
# Col Means
apply(x, 2, mean)
[1] 3 8 13 18
3.2.2 lapply()
The lapply()
function is used to apply a function to all elements in a vector or list. The lapply()
function will then return a list as the output.
3.2.3 sapply()
The sapply()
function is used to apply a function to all elements in a vector or list. Afterwards, the sapply()
will return a “simplified” version of the list format. This could be a vector, matrix, or array.
3.3 Anonymous Functions
Anonymous functions are functions that R temporarily creates to conduct a task. They are commonly used in the *apply functions, piping or within functions. To create an anonymous function, we use the function()
to create a function .
For example, let x
be a vector with the values 1 through 15. Let’s say we want to apply the function \(f(x) = x^2+\ln(x) + e^x/x!\). We can evaluate the function as the expression in the function:
<- 1:15
x ^2 + log(x) + exp(x)/factorial(x) x
[1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
[7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
[13] 171.565020 198.639071 227.708053
Let’s say we could not do that, we need to evaluate the function for each value of x
. We can use the sapply()
function with an anonymous function:
sapply(x, function(x) x^2 + log(x) + exp(x) / factorial(x))
[1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
[7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
[13] 171.565020 198.639071 227.708053
In R 4.1.0, developers introduce a shortcut approach to create functions. You can create a function using \()
expression, and specify the arguments for your function within the parenthesis. Reworking the previous code, we can use \()
instead of function()
:
sapply(x, \(x) x^2 + log(x) + exp(x)/factorial(x))
[1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
[7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
[13] 171.565020 198.639071 227.708053
sapply(x, \(.) .^2 + log(.) + exp(.)/factorial(.))
[1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
[7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
[13] 171.565020 198.639071 227.708053
Notice that the argument in the anonymous function can be anything.