Introduction to R

1 What is R

R is an open-source, free program for statistics that is maintained by a small group of statisticians. The program comes with a core set of packages that enable certain functions. You can add additional functions by installing and loading other packages. Although the program seems very simple, it is able to do many things. For example, you can perform statistical tests, create complex mathematical models, graph, documents, and static and interactive webpages in R. For graphing, there are basic graphing functions that will build graphs for you. However, you can also build very complex graphs from scratch and modify every aspect of a graph.

2 R Studio

The user interface that comes with the program R is very simple. As a result, some very kind people have developed other programs to make using R much easier. R Studio is one of these programs, and is the one that I recommend you use. The program is free, and available for Windows, Mac, and Linux operating systems.

Open R Studio, which will also start R. Now go to File, New file > R script. You should see four windows. The upper left is the R script you just opened. This window is where you can type R code and save it for later use. The lower left window is the Console and is the equivolent to R Console in R. You can either type code directly into the console and run it or send code from your R script to the console and run it.

In the upper right window there are two tabs: Environment and History. The environment tab provides information about variables or data that you have created during your R session. The history tab provides information about what code you have run during your R session. In the lower right window there are five tabs: Files, Plots, Packages, Help, and Viewer. The file tab provides information about folders and files on our computer. The plots tab shows any plots that you create. The packages tab provides information about the packages that are installed and loaded. The help tab provides help information. The viewer tab can display local web content. I use the plots, packages, and help tabs all the time.

There is also a bunch of other utilities that R Studio provides to help users more easily use R. Please ask me questions, if you are curious about something you see in R Studio.

3 Today you will learn…

  • How to use the help
  • How to create and name objects
  • About functions
  • How to use scripts

4 Getting started

Open up R or RStudio on your computer. There are two general ways to tell R what to do. The first is to just type into the ``R Console’’ window. When you complete your code, hit enter and R will execute your code. The other is to open a file called a script and type the code into this window, highlight the code you want R to run, and then tell R to execute that code (Ctr Enter in Windows and Cmd return in Mac OS).

The R Console and scripts each have advantages. In the R Console window you can quickly run and recall previous commands. With scripts you can easily organize and save your code to run later. Let’s begin by just working with the R Console.

To begin, let’s use R like a calculator to get an idea of how the program works. Type in the following and observe the output. The # is used to add comments (R will not execute anything after #), so you don’t have to type the number signs or what follows them on a line.

2 + 2
6 - 4
3 * 6
4 / 8
2^3 #exponent
2 - 3 * 4
(2 - 3) * 4 #parentheses change the order of operations
sqrt(2) #square root
2^(1/2) #also square root
pi
sin(pi)
exp(1) #e^1
log(10) #natural log
log(10, 10) #log base 10

The number sign, #, tells R to not execute the code following the number sign on a line. Anything that follows a number sign on a line is called a comment.

5 Errors and warnings

R will provide errors or warnings when you do something that it doesn’t like. Typically the error or warning messages are helpful. Thus, read them carefully and use the information to help correct the problem. If you don’t understand what the error means you can often get help by copying the error or warning message and pasting it in Google. Let’s create a couple of errors and warnings so you can see what they look like–by the end of the class you will have seen thousands.

squareroot(2)
sqrt 2
sqrt(-2)

If you forget to add the closing parentheses and hit return then R will just assume that you are going to continue to type more on the next line. You can use ESC to exit and get to a new line. Try the following.

sin(3     #hit ENTER
)         #hit ENTER
sin(3     #hit ENTER
          #hit ESC

ESC will abort a calculation.

7 Assigning names

Probably the most important thing to learn in R is how to create, name, and assign values to objects (sometimes called variables). So, let’s start by creating some objects and assigning a name and value to them. There are four ways to do this.

a = 1
b <- 2
c <- 3
4 -> d

If you type a in the console and hit return, R should return 1. What do you think R will return if you type b, c, or d and hit return? You have now created 4 objects, which you should see in the environment tab in the upper right window (in R Studio). Notice that all four ways produce the same result. The <- or -> syntax is often used because = also represents the assignment of values to arguments in functions and can make your code confusing. You will learn about functions and arguments next.

Now that you have assigned names to values, you can just use the object names. In this case, you can use the letters, a, b, c, or d, to perform calculations. For example, try multiplying a by b. Did you get the answer 4? I hope not because the correct answer is 2.

When creating objects in R, it is worthwhile to use names that are meaningful but short. This way you can remember what the object represents but do not have to type a lot. For naming objects in R, periods or underscores are often used to separate words. For example, if I wanted to create an object with the value of a population mean, I might name it pop.mean. The value of an object can take the form of a number, word, vector, list of numbers or words, matrix, or data frame for example. You can also create functions and assign them names to do simple or complex calculations. You will learn about functions next, and data types shortly.

8 Functions

Although it is fun to create silly names for objects and perform simple calculations, R is much more advanced because the developers have included many functions to do all kinds of things. You will use functions to do just about everything in R, and learning how to use R is basically learning what functions to use and when.

In computer programming, a function is a set of commands that do something, typically return an answer to the user. A function has two important parts to it, the name of the function and the arguments. The name of a function is what you type to tell R which function you want to use. The arguments, which there might be zero or many, are variables that are used in a function. For example, let’s use the sum function as an example. The name of the function is sum, and you include parentheses after the name of the function. You provide values for required arguments or change the default values of arguments by typing information between the parantheses. The sum function requires the numbers that you want to sum up (as you might expect, this is a required argument). The form of the argument is a vector of numbers, which we will create with the c function. Don’t confuse the object you created earlier and named c with the function c.

sum(c(1, 2, 3, 4))
## [1] 10

Notice that there two functions, sum and c, and therefore two sets of parentheses.

Now you try to use the functions mean, min, and max to calculate the mean, minimum, and maximum value of the vector c(1, 2, 3, 4).

We are now going to create our own function to see how the developers of R create functions and the associated arguments. There is a function to create a function in R, and its name is function. Let pretend that we are in the third grade and have a crush on someone. In my case, I will have a crush on Melissa. But I want to create a function that my classmates can use to indicate their crushes. So, I am going to use a few functions to create my new function. The function function() will create my new function, the function paste() will stick some words together, and the function print() will return the words we stick together.

crush <- function(name_1, name_2) {
  print(paste(name_1, "LOVES", name_2))
}
crush(name_1 = "Ben", name_2 = "Melissa")
## [1] "Ben LOVES Melissa"

As you can see, I named my new function crush, and it has two required arguments, which are named name_1 and name_2 (I just made these names up). I put quotes around LOVES, Ben, and Melissa to tell R that these are strings (i.e., letters). If I didn’t put quotes around LOVES, Ben, and Melissa, then R would tell us that it cannot find the objects LOVES, Ben, and Melissa. Currently the function crush() has two required arguments, but we can provide default values for these arguments so that the function will work even if we don’t specify the arguments. First run the crush(). It should return an error telling you that you forgot to provide values for the required arguments. Now let’s modify our function and provide default values for the functions.

crush <- function(name_1 = "Ben", name_2 = "Melissa") {
  print(paste(name_1, "LOVES", name_2))
}
crush()
## [1] "Ben LOVES Melissa"

Now the arguments in the function default to Ben and Melissa, but you can still change the defults if you want to.

crush("Elton", "David")
## [1] "Elton LOVES David"

Notice that I didn’t write the names of the arguments and then an equal sign, like crush(name_1 = “Elton”, name_2 = “David”). In R, if you provide the arguments in the order in which the function was created, then you do not need to provide the name of the argument. R will just understand that the first argument is name_1 and the second argument is name_2. R will also understand if you rearrange the order of the functions, but provide the name of the argument and the its value. For example, crush(name_2 = “David”, name_1 = “Elton”) will return the same result as above. Lastly note that arguments are always separated by commas.

9 Using help files

It is important that you quickly learn how to get help in R and understand what the help files tell you. So ask questions if you are confused or curious about something. You type the a question mark before the name of a function to pull up the help file for that function. Type in ?sum into the console and hit return. The help file will come up. The first bit of information in the upper left corner is the name of the function and package in which the function is located. In this case, the sum function is in the base package. The help file provides a description, usage, arguments, details, and often examples (though not for this function). The usage, arguments, and examples are typically the most useful information. For the sum function, you can see there two arguments, and na.rm. The first tells us that R is not sure how many arguments the user might include because it doesn’t know how many numbers we would like to add up, and it can take the form of a numeric or complex number or a vector of numeric or complex numbers. Notice there is no default value for the first argument (because there is not an equal sign with a value following the name of the argument), so you need to supply one. The second argument tells R whether to remove missing values. In R, missing values are indicated by NA. The argument requires a logical value of TRUE or FALSE, and has the default value of FALSE. So, if you do not specify the value for the na.rm argument, the function will work but not remove missing values. Now call up the help file for the mean and paste functions. Discuss the help files with others at your table.