1 + 1 # Addition
[1] 2
10 - 1.5 # Subtraction
[1] 8.5
2 * 3 # Multiplication
[1] 6
10.12 / 17.99 # Division
[1] 0.5625347
5 ^ 2 # Exponentiation
[1] 25
Isaac Vock
January 21, 2025
This worksheet will walk you through some basic concepts in R. I would suggest copying code shown here into an R script and running it yourself so that you can play around with the presented examples.
simple_calc()
The simplest use case of R is using it to do math:
You can store numbers in “variables”. This is like a special box in your computer’s memory labeled with a name (like my_number
). When you put a number into this box (for example, 10
), we say you have assigned the value 10 to the variable my_number
.
In R, you’d do this by writing:
Typing and executing print(my_number)
or just my_number
will print out the value of the variable to your console.
Here is what’s happening in this code:
my_number
is the label on the box in memory.<-
is like an arrow pointing from the value to the box, meaning “put this value in that box”.You can then do math like this just like with regular numbers:
my_number
does not change value in any of the above lines. To change the value of my_number
, you would have to assign it the new value with <-
again:
You can store more than numbers in variables. For example, you can store text, which is referred to as a “string”:
You tell R that you are storing text by wrapping that text in ""
or ''
.
Below are some useful tools that R provide you to work with strings. These are called functions, a concept discussed later.
paste(..., sep = " ")
: paste()
allows you to stitch together multiple strings, with a chosen separator text between strings (sep
argument). Having no separator (sep = ""
) is identical to using a different function paste0()
:[1] "Hello friend."
[1] "Hellofriend."
[1] "Hellofriend."
[1] "Hello friend. It's been too long"
[1] "Hello friend."
nchar()
: This will give you the number of individual characters in your text string:gsub(pattern, replacement, x)
: This allows you to look for the string pattern
in the query string x
, and replace it with the string replacement
:grepl(pattern, x)
: This is similar to gsub()
but just searches for string pattern
in string x
and spits out TRUE
if it finds itThere is a whole R package called stringr devoted to making working with strings in R easier and more intuitive, so you might want to look into that as well!
Another thing that is commonly stored in variables is logical values (TRUE or FALSE), otherwise known as “booleans”:
You can do a sort of math with booleans, referred to as “boolean logic”. This takes as input two (in the case of AND and OR) or one (in the case of NOT) boolean variables and outputs a new boolean. The most common examples are:
AND (&
)
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
OR (|
)
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
NOT (!
)
Finally, you can compare the value of two variables to see if they are the same. If the are variable_1 == variable_2
will return TRUE
, otherwise it will return FALSE
:
A function in R is like a “recipe” for a mini “machine” that does one specific job. You give it some inputs (called arguments), it follows the steps you’ve defined, and then it gives you a result.
Functions help you organize your code so you can reuse it instead of writing the same steps again and again. Here is a simple example:
my_function
is the name of the function (like a label on the mini machine).function(x,y) { ... }
says “I am creating a function that expects two inputs, called x
and y
.{ ... }
, you can write as much code as you want; this is the instructions for what you want the function to do with the inputsreturn(result)
sends the output of the function back to you.After creating my_function
, you can call it (computer science lingo meaning “use the function”) by typing:
Sometimes, you want one (or more) of your function’s inputs to have a “fallback” value if the user doesn’t supply one. That’s where default arguments come in. For example:
my_new_function
now only needs you to supply x. You can supply x
and y
, but if you don’t supply y
, it will give y
a default value of 10 by default:
[1] 11
[1] 22
[1] 22
Sometimes, one of the arguments of your function may have a set number of possible values that you intend for a user to input. You can specify this as such:
In all of our examples so far, we have assumed that the user has supplied a particular kind of data for each argument. Mostly, we have assumed that numbers are being passed to many of our example functions, numbers that we can add. What if they messed up though and passed a string, for example? We can catch this and throw an error message:
This function will work as normal if a
and b
are numbers, but will throw informative error messages if not. You will also get an error in the old version of this function that didn’t have the stopifnot()
lines, but this error might be far more cryptic and hard to understand. You will also get a different error depending on what is wrong about a
and/or b
, further confusing you or other users of this function.
An if-else statement is one of the most common ways to control the flow of a program. It lets your code make decisions based on whether a condition is TRUE
or FALSE
.
if
checks if something is TRUE
else
covers what happens if it is not TRUE
else if
to handle different possible conditionsThe basic structure looks like:
Think of this code as asking a set of questions:
TRUE
, do something.TRUE
, do something elseTRUE
, do a default thing.A real example might look like:
x <- 5
if(x > 3){
print("x is greater than 3")
}else if(x < 5){
print("x is between 3 and 5")
}else{
print("x is greater than or equal to 5")
}
[1] "x is greater than 3"
Conditions in R must evaluate to a single TRUE
or FALSE
. Common ways to form conditions are comparison operators:
==
: Check if two things are equal (e.g., a == b
). a
and b
can be numbers, strings, booleans, etc.!=
: Check if two things are not equal 1 <
, >
, <=
, >=
: Less than, greater than, less than or equal to, or greater than or equal to, respectively.Here is an example of how you might use control flow in a function:
greetUser <- function(user_input){
# Check if user_input equals "Hello"
if (user_input == "Hello"){
return("Hi there! Nice to meet you.")
}else if(user_input == "Goodbye"){
return("See you later! Take care.")
}else{
return("I'm not sure how to respond to that...")
}
}
greetUser("Hello")
[1] "Hi there! Nice to meet you."
[1] "I'm not sure how to respond to that..."
vector_calc()
In R, a vector is a container that holds multiple values of the same data type (such as numbers, strings, or booleans). You can think of it like a row of boxes, each containing a value of the same kind.
You can create a vector with the c()
function (short for “combine” or “concatenate”). Here are a few example:
Often, you will want to access specific elements or sets of elements of a vector. To do this, you can use square brackets [ ]
:
[1] 10
[1] "dog"
[1] 10 30
[1] FALSE TRUE
You can also change values of specific elements:
[1] 10 20 30 40
[1] 10 99 30 40
Sometimes, it will be useful to check what kind of data is in a vector. This can be done with the class()
function:
[1] "numeric"
[1] "character"
[1] "logical"
You can also check the value with functions like is.numeric()
. is.character()
, or is.logical()
:
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
Below are some useful functions that allow you to create vectors or lookup some information about a vector:
length(v)
: returns the number of elements in the vector v:seq(from, to, length.out)
or seq(from, to, by)
: Creates a vector starting from the number from
(default value of 1), to the number to
(default value of 1). If you set length.out
, then you will get a vector of length.out
elements. If you set by
, then you specify the distance between adjacent elements:rep(x, times)
: Creates a vector containing the value x
repeated times
times:start:end
: Same as seq(from = start, to = end, by = 1)
:A loop is a way to tell R to “do something multiple times”. This unlocks one of the powerful aspects of computers: their ability to do multiple things quickly.
There are two commonly used types of loops: for loops and while loops.
A for loop in R iterates (or “loops”) over each element of a vector and does something with it. For example, if we want to print every element of a numeric vector:
[1] 10
[1] 20
[1] 30
[1] 40
# Loop over the vector 1 to the length of the vector
for(i in 1:length(numbers)){
print(numbers[i])
}
[1] 10
[1] 20
[1] 30
[1] 40
[1] 10
[1] 20
[1] 30
[1] 40
What’s happening here?
for (value in numbers)
means “go through each element of numbers
and temporarily call that element value
. for(i in 1:length(numbers)
creates a vector (1:length(numbers)
) which is a vector of whole numbers from 1 to the length of the vector numbers. Each of these whole numbers is then temporarily called i
. seq_along(numbers)
does pretty much the same things as 1:length(numbers)
.print(value)
means we display the current value
on the screen.numbers
.A while loop keeps going as long as some condition is TRUE
. Suppose we want to keep adding numbers from a vector until the total sum exceeds 50:
numbers <- c(10, 20, 30, 40, 50)
total <- 0 # Start total at 0
i <- 1 # Start index at 1
while(i <= length(numbers) & total <= 50){
# Add to total
total <- total + numbers[i]
# Track which element we are on
i <- i + 1
}
print(total)
[1] 60
[1] 4
What’s going on here?
while(i <= length(numbers) && total <= 50)
- The loop will continue running while two conditions are both TRUE
:i <= length(numbers)
) andtotal
hasn’t exceeded 50 (total <= 50
).i
-th element of numbers
to total
.i
to the next element by adding 1.FALSE
, the loop stops.calc_df_stats()
When you work in R, you’ll often deal with files (like CSV files) that sit in folders (directories) on your computer. To load these files into R so that you can work with and analyze them, you need to tell R where they are. In addition, it is important to know where you are while working in R.
When I say “know where you are”, I am referring to your “working directory”. When you open up Rstudio, there is some folder on your computer that R will call your “working directory”. You can see what this folder is at any time with getwd()
:
[1] "C:/Users/isaac/Documents/Simon_Lab/isaacvock.github.io/posts/Rintro_day1"
If you want to change this directory, you can switch to a new directory with setwd("/path/to/new/working/directory")
.
You can specify a file path to setwd()
in one of two ways:
setwd("data")
, assuming”Documents” is your current working directory.The readr package (part of the tidyverse collection of packages), provides user-friendly functions for reading in data. For example, you can read a csv file like so:
read_csv("path/to/mydata.csv")
reads the CSV file located at the specified path (either a relative or absolute path) and creates a data frame (more on those soon).my_data
.A data frame is a table-like structure with rows and columns, commonly used for storign datasets in R. Each column is usually a vector of a particular type (numeric, character, boolean, etc.), and all columns have the same length.
To create a data frame you can run code like this:
ages <- c(30, 25, 35)
# You can either specify the vector directly
# or provide the name of a vector you previously created
people_df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = ages,
Score = c(100, 95, 90)
)
people_df
Name Age Score
1 Alice 30 100
2 Bob 25 95
3 Charlie 35 90
Here are some ways you can interact with the data inside of a data frame:
$
or [[<col name as string>]]
:[1] "Alice" "Bob" "Charlie"
[1] "Alice" "Bob" "Charlie"
[ , <column number>]
:[1] "Alice" "Bob" "Charlie"
Name Age
1 Alice 30
2 Bob 25
3 Charlie 35
[<row number>, ]
:A list is like a container in R that can hold a mix of different types of items, like a data frame. Lists are more flexible though, and can hold things of different sizes. A list can hold:
All at once!
Here is how you can create a list:
To access elements of a list, you can:
$
operator (if the elements have names):[[ ]]
operator with the element’s name (if it has one), or its position:Often, you will want to go element by element of a list and do something with each element. In addition, data frame columns are equivalent to elements of a list (actually, under the hood, a data frame is just a list that forces the list elements to be the same size). You could write a for loop, but there are popular alternatives that can make your code cleaner and easier to read. R has a version of these, but the R package purrr
has improved versions of these that I prefer.
map()
: takes a single list as inputlibrary(purrr)
numbers <- list(
c(1, 2, 3),
c(4, 5, 6),
c(10, 20, 30, 40, 50)
)
# Outputs a list, one element for original element of the list
map(numbers, function(x) sum(x))
[[1]]
[1] 6
[[2]]
[1] 15
[[3]]
[1] 150
# Outputs a vector numbers, one element per original list element
# Also using an alternative notation
map_dbl(numbers, ~ sum(.x))
[1] 6 15 150
map2()
: takes two lists as inputnumbers <- list(
c(1, 2, 3),
c(4, 5, 6),
c(10, 20, 30, 40, 50)
)
numbers2 <- list(
c(-1, -2, -3),
c(12, 13),
c(2, 4, 6, 8, 10, 12)
)
# Outputs a list, one element for original element of the list
map2(numbers, numbers2, ~ sum(.x) + sum(.y))
[[1]]
[1] 0
[[2]]
[1] 40
[[3]]
[1] 192
# Outputs a vector numbers, one element per original list element
map2_dbl(numbers, numbers2, function(x, y) sum(x) + sum(y))
[1] 0 40 192
pmap()
allows you to provide a named list of inputs: