Welcome to R!
- R is an incredibly powerful tool; knowing how to use it is an incredibly valuable skill. However, to learn it takes the same diligence as learning a human language does.
- R is a very capable calculator; you can use it to do math.
- R, like human languages, has punctuation marks (operators) that have specific meanings and usage rules.
- R “sentences” are called commands. They include inputs as well as instructions for operations we want R to perform for us.
- Script files are text files that allow us to write and store code and annotations and then “teleport” this code to the Console when we’re ready.
- “Nouns” in R are called objects. These are impermanent until we use assignment commands to name them, in which case they persist until we overwrite them or close R.
- Our environment is everything we have named since starting our R session.
- There are rules about what we can and can’t (and should and shouldn’t) name objects in R.
- R is case-sensitive, so capital and lowercase letters are distinct.
- It’s good to have a naming convention to name objects.
- Objects in R take many different shapes, and the values they store can be of many different types (the “adjectives” of the R language).
- In R, “verbs” are called functions. Functions take inputs, perform operations, and produce outputs. Some functions do math; others might create new objects.
- Functions have slots for specific inputs. These slots are called parameters, and they have names. The inputs we provide to these slots are called arguments. Arguments should be given in a specific order, and they should be of specific types.
- Some function inputs are optional; others are required. Optional inputs are like R’s “adverbs” in that they often control how R performs a specific operation.
- If we want help with a function, we can use the
?
operator to open its help page. - The
=
symbol has many different uses in R, which can be confusing. As such, consider using<-
for assignment. - Logical tests are like “questions” in R. There are many different logical operators to ask a variety of questions.
- Use the square bracket operators
[ ]
to peek inside objects in indexing commands. Indexing commands can also be used to update or subset objects, and their format differs for 1D vs. 2D object types. - Installing packages is necessary to have access to them, but even then, packages must be turned on to use their features.
- Your working directory is the folder R assumes it should interact with on your computer when loading/saving files.
- R Project folders are handy for keeping organized when working on a large, important, or complex project.
- Loading files in R typically requires a
read
function; saving files typically requires awrite
function.
Exploring the Tidyverse, a modern R "dialect"
- Use the
dplyr
package to manipulate data frames in efficient, clear, and intuitive ways. - Use
select()
to retain specific columns when creating subsetted data frames. - Use
filter()
to create a subset by rows using logical tests to determine whether a row should be kept or gotten rid of. - Use
group_by()
andsummarize()
to generate summaries of categorical groups within a data set. - Use
mutate()
to create new variables using old ones as inputs. - Use
rename()
to rename columns. - Use
arrange()
to sort your data set by one or more columns. - Use pipes (
%>%
) to stringdplyr
verbs together into “paragraphs.” - Remember that order matters when stringing together
dplyr
verbs!
Every ggplot graph requires a
ggplot()
call, a data set, some mapped aesthetics, and one or more geometries. Mixing and matching data and their types, aesthetics, and geometries can result in a near-infinite number of different base graphs.Mapping an aesthetic means linking a visual component of your graph, such as colors or a specific axis, to either a column of data in your data set or to a constant value. To do the former, you must use the
aes()
function. To do the latter, you can useaes()
, but you don’t need to.Aesthetics can be mapped (and data sets provided) globally within the
ggplot()
call, in which case they will apply to all map components, or locally within individualgeom_*()
calls, in which case they will apply only to that element.If you want to adjust the appearance of any aesthetic, use the appropriate
scale_*()
family function.If you want to adjust the appearance of any text box, line, or rectangle, use the
theme()
function, the proper parameter, and the appropriateelement_*()
function.In ggplot commands, the
+
operator is used to (literally) add additional components or layers to a graph. If multiple layers are added, the layers added later in the command will appear on top of layers specified earlier in the command and may cover them up. If multiple, conflicting specifications are given for a property in the same ggplot command, whichever specification is given later will “win out.”Craft a single
theme()
command you can use to provide consistent base styling for every one of your graphs!Use faceting to automatically create a series of sub-panels, one for each member of a grouping variable, that will share the same aesthetics and design properties.
Use the
ggsave()
function to programmatically save ggplots as image files with the desired resolution, size, and file type.
- Use the
tidyr
package to reshape the organization of your data. - Use
pivot_longer()
to go towards a “longer” layout. - Use
pivot_wider()
to go towards a “wider” layout. - Recognize that “longness” and “wideness” is a continuum and that your data may not be as “long” or as “wide” as they could be.
- Recognize that there are advantages and disadvantages to every data layout.
Control Flow--if() and for()
- Use
if
andelse
to have R make choices on the fly for you with respect to what operations it should do. - Use
for
to repeat operations many times.
Vectorization
- Use vectorized operations instead of loops.
Functions Explained
- Use
function
to define a new function in R. - Use parameters to pass values into functions.
- Use
stopifnot()
to flexibly check function arguments in R. - Load functions into programs using
source()
.