- https://www.r-project.org
- GNU project since 1997
- Name from the authors (Ross Ihaka and Robert Gentleman)
- Influenced by S lang
- Originally written in Fortran
- The idea was that you can initially analyse the data without thinking with the “programming” mindset
Rmd
Resources to learn
- https://swirlstats.com/
- Starting data analysis/wrangling with R: Things I wish I’d been told
- https://r4ds.had.co.nz/data-visualisation.html
- https://r4ds.had.co.nz/graphics-for-communication.html
- https://www.r-graph-gallery.com/
Basic commands
ls()
- List all objects
rm(<v>)
- remove object
rm(list = ls())
- Removes all elements on the list (in this case, everything)
class(<v>)
is.na(<v>)
is.nan(v>)
Atomic Objects
- Numeric (double)
- Integer
- Complex (complex numbers, e,g.
1 + 4i
) - Logical (boolean)
- Character (actually a String as we know in other languages)
Vectors
- You can use it as a dictionary as well (
names
)
> x <- c(1,2)
> names(x) <- c("aa", "bb")
> x
aa bb
1 2
> x <- c(1,2,3,4) # concat vectors
Factors
- “Categories”
- Is possible to define the “size” of each level
> factor(c("A", "B", "C"), labels = c("A", "B", "C"), ordered = T)
[1] A B C
Levels: A < B < C
- Usage:
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)
> factor(c("A", "B", "C"))
[1] A B C
Levels: A B C
Matrixes
- By default they are built by column (needs to set
byrow
to TRUE if otherwise)
> matrix(data = 1:6, nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> attributes(matrix( data = 1:6, nrow = 2, ncol = 3))
$dim
[1] 2 3
> x <- c(1,2,3,4)
> dim(x) <- c(2,2)
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
rbind()
cbind()
- To create names for matrixes, you can use
dimnames
List
Data frames
-
A special type of list
- All elements are vectors, each one represents a column. They must have the same number of subelements.
-
Different columns can have different types
-
You can set
stringAsFactores
. Prior to R 4.0 it was set as TRUE by default -
data.frame()
-
read.table()
-
read.csv()
-
cbind
/rbind
can be used with Data Frames as well
Subcollections
[]
return object from the same class as the original object- Negative indexes excludes it:
> x <- 1:4; x[-2]
[1] 1 3 4
- You can also use a boolean vector:
> x <- 1:4; x[x > 2]
[1] 3 4
- You can also use it kinda like a dictionary
> x <- c(1,2); names(x) <- c("aa", "bb"); x["aa"]
aa
1
- For matrixes you can use:
> m[2, ] # second line
> m[, 3] # third column
> m[, 3, drop=FALSE] # third column and returns it as a column
- You can use partial names with
$
:
> x <- list(foo = 1:4, bar = 1); x$f
[1] 1 2 3 4
Dates
- Date type
- Number of days since 01/01/1970
> unclass(as.Date('1976/01/01'))
[1] 2191
Timestamps
-
POSIXct and POSIXlt types
- POSIXct is just a number, useful to save space
- POSIXlt is a list with other informations (day of the week, day of the year, month and so on)
-
[[]]
extract the element of lists and data frames (class of the object and not of its “parent”) -
$
extracts elements based on the name of a list our data frame (similar to[[]]
)
Curious stuff
%*%
is matrix multiplicationround
uses the ISO rule to round x.5 numbers. If pair it will round to get closer to zero, if it’s odd it will round to get away from zero. (https://en.wikipedia.org/wiki/ISO_80000-1)
> 1/0
[1] Inf
> 0/0
[1] NaN
Random
> 10:20
[1] 10 11 12 13 14 15 16 17 18 19 20
> seq(from=7, by=4, to=20)
[1] 7 11 15 19
order(x)
returns the order of the elements:
> x <- runif(5);
> x
[1] 0.03387367 0.53094936 0.26855677 0.96293228 0.01368555
> order(x)
[1] 5 1 3 2 4
cor()
correlation
Functions
- https://bookdown.org/rdpeng/rprogdatascience/scoping-rules-of-r.html
- Functions always return the last value, like Ruby; (you can also use
return()
) function()
creates function “objects”f <- function(<args>) { }
formals()
returns the list of formal arguments- Not all arguments are required, even if they dont have a default value
- You can verify if an argument was given with
missing()
- You can verify if an argument was given with
- You can pass arguments by positioning it in the right order or explicitly saying the name
- You can also use
...
arg, if you want to use it you need to either create a vector or a list (e.g.c(...)
,list(...)
) - All arguments are passed by value (copy)
- To vectorise a function that originally does not know how to handle vectors you can use
Vectorize(function)
Good to know
lapply
: apply function to each element of a list, result is a listsapply
: similar to lapply, but the result is simplified (if the result length is one, the result is a vector, if it’s a list with each element of the same size, the result is a matrix. Otherwise is a list)apply
: Input is a matrix or a data frame, apply a function to all values of a certain dimension (lines or columns)mapply
:mapply(log, 2:5, 2:3)
(calls log passing an element of first argument and an element of the second argument). The result is also simplified likesapply
. From the docs: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each … argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.tapply
: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. (tapply(mtcars$mpg, mtcars$cyl, mean)
= groups the cars by$cyl$ and gets the mean of the
$mpg`)
Reading data
read.csv
is just a thin wrapper aroundread.table
.
ggplot2
- https://ggplot2.tidyverse.org/
- You can add “layers” on top of previous layers to make it more rich. From the documentation:
It’s hard to succinctly describe how ggplot2 works because it embodies a deep philosophy of visualisation. However, in most cases you start with ggplot(), supply a dataset and aesthetic mapping (with aes()). You then add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).")
aes()
let you define aestetic things on your chart. If you defineaes
when you callggplot
, the next layears can extend it.