Creating your own functions

Although many programmers have created many thousands of functions for R users, it is sometimes useful to write your own. For example, you might want to calculate a specific value with an equation you found in a paper, or you might want to create a custom graph and use it to make many graphs. Regardless of the reason, below are the general steps you would take to create your own function.

1 Arguments

Funtions are code (or mathematical equations) that do something. A function might always do exactly the same things, in which it would not need additional information from the user. However, if the function is intended to provide a different answer in different situations, then we must include at least one argument (but we might want to include more than one). You can think of arguments as user inputs for the function.

Let’s creat a simple function to illustrate these ideas. Below we create a function that adds 2 to the number that is provided by the user. So, we will need an argument in our function so we can get from the user the number to add two. We can create new functions with function(). The arguments (in this case just one argument) we want to create for our function go inside the parentheses. I can make up the name of arguments, but in this case I will use “x”. I will call the new function “add_two”, but I could use used another name. Here we go.

add_two <- function(x) x + 2

Let’s see if it worked. If I provide a value of 4 to the argument “x”, then the function should add 2 and return the number 6. In other words, you can think of the value for the argument “x” is “passed onto” the code after or below the parentheses.

add_two(4)
## [1] 6

Nice!

Of course we can provide different numbers to our new function or even a vector of numbers.

add_two(10)
## [1] 12
add_two(1:10)
##  [1]  3  4  5  6  7  8  9 10 11 12

I know, pretty cool.

Let’s modify our function so the user can determine whether we should add or subtract the number two from “x”–I will call this function “add_sub_two” because the user can add or subtract 2. In this case, we need another argument. I will call it “add”. If the user gives the value TRUE for the argument “add” then we will add 2 and if the user provides the value FALSE then we will subtract two. Notice that I use the curly brackets because want to include code on the following lines for this function. To add the logic or determining whether to add or subtract, I use if().

add_sub_two <- function(x, add) {
  if(add == TRUE) x + 2 else x - 2
}

2 Let’s make an easy button

Today a student asked me whether there was function that would calculate a bunch of descriptive stats for numerical data. I suggested summary(), but they countered with “but what about the measures of spread?”. So, let’s start with how you might create this function. I won’t create a function that will do it all for you, but I will provide you with the tools so you can create your own (you are welcome).

2.1 Use the function function()

Like always, we can create our own custom function with function(). We can create the function with any number of arguments, from zero to a lot, and default or user-required values for the those arguments. Let’s start with creating a function that will print something in the console.

desc_stats <- function() {
  cat("Mean =")
}
#The function has no arguments right now
desc_stats()
## Mean =

We can use cat() to print stuff out, without quotations marks, in the console. Of course, we now want to add the mean of data, because just printing “Mean =” is a pretty lame function. So, we need to add an argument to our new function so we can get the data from the user

desc_stats <- function(x) {
  cat("Mean =", mean(x))
}
#We now need to provide the function with some data
desc_stats(1:10)
## Mean = 5.5

Hopefully you noticed that we added the argument x to our function. Of course, we could have named the argument anything, but x (as is y) is a common letter to represent data.

Let’s now add the median. We want to print the median on the next line, so we use the “” to tell R to go to a new line.

desc_stats <- function(x) {
  cat("Mean =", mean(x), "\n")
  cat("Median =", median(x))
}

desc_stats(1:10)
## Mean = 5.5 
## Median = 5.5

Maybe you want to include a couple metrics of spread. We just need to add a few lines to calculate and print these metrics.

desc_stats <- function(x, center = TRUE, spread = TRUE) {
  cat("Mean =", mean(x), "\n")
  cat("Median =", median(x), "\n")
  cat("Variance =", var(x), "\n")
  cat("Standard deviation =", sd(x))
}
desc_stats(1:10)
## Mean = 5.5 
## Median = 5.5 
## Variance = 9.166667 
## Standard deviation = 3.02765

2.2 Adding more arguments

I am sure you can see how to add more descriptive stats to the function. So, I will now show you how you can add options for the user. We will add a new argument and slightly modify the printout. The new argument allows the user to select whether they want metrics of “location”, “spread”, or “both”. Notice that results = “both” provides a default value for this argument. So, if the user only provides the data, then the function will print out measures of location and spread by default. We use if() to execute the code base on what the user has selected. Notice that I use “ to tab the output. I also added the number of observations in the data to the bottom of the output.

desc_stats <- function(x, results = "both") {
     
  if(results == "both" | results == "location") {
    cat("Measures of location", "\n")
    cat("\t", "Mean =", mean(x), "\n")
    cat("\t", "Median =", median(x), "\n \n")    
  }

  if(results == "both" | results == "spread") {
    cat("Measures of spread", "\n")
    cat("\t", "Variance =", var(x), "\n")
    cat("\t", "Standard deviation =", sd(x), "\n \n")
  }  
  
  cat("n =", length(x))
}
desc_stats(1:10)
## Measures of location 
##   Mean = 5.5 
##   Median = 5.5 
##  
## Measures of spread 
##   Variance = 9.166667 
##   Standard deviation = 3.02765 
##  
## n = 10
desc_stats(1:10, results = "both")
## Measures of location 
##   Mean = 5.5 
##   Median = 5.5 
##  
## Measures of spread 
##   Variance = 9.166667 
##   Standard deviation = 3.02765 
##  
## n = 10
desc_stats(1:10, results = "location") 
## Measures of location 
##   Mean = 5.5 
##   Median = 5.5 
##  
## n = 10
desc_stats(1:10, results = "spread") 
## Measures of spread 
##   Variance = 9.166667 
##   Standard deviation = 3.02765 
##  
## n = 10

This is looking pretty good. But… Let’s add a little more code to provide the user with a snarky response if they don’t provide numeric data for the argument x, or give an invalid value for the argument results.

desc_stats <- function(x, results = "both") {
  if(!is.numeric(x)) stop("The data in x must be numeric. How do you expect me to calculate the mean or variance of a non-numeric value?")
  
  if(!(results %in% c("location", "spread", "both"))) stop("Options for the results are: 'location', 'spread', or 'both' -- and not whatever you put.")
     
  if(results == "both" | results == "location") {
    cat("Measures of location", "\n")
    cat("\t", "Mean =", mean(x), "\n")
    cat("\t", "Median =", median(x), "\n \n")    
  }

  if(results == "both" | results == "spread") {
    cat("Measures of spread", "\n")
    cat("\t", "Variance =", var(x), "\n")
    cat("\t", "Standard deviation =", sd(x), "\n \n")
  }  
  
  cat("n =", length(x))
}
desc_stats("testing") #Give x a non-numeric object
## Error in desc_stats("testing"): The data in x must be numeric. How do you expect me to calculate the mean or variance of a non-numeric value?
desc_stats(1:10, results = "stats") #Give an invalid value for results
## Error in desc_stats(1:10, results = "stats"): Options for the results are: 'location', 'spread', or 'both' -- and not whatever you put.