# R lang

• https://www.r-project.org
• GNU project since 1997
• Name from the authors (Ross Ihaka and Robert Gentleman)
• Influenced by S lang
• Originally written in Fortran
• The idea was that you can initially analyse the data without thinking with the “programming” mindset

## § Basic commands

• `ls()`
• List all objects
• `rm(<v>)`
• remove object
• `rm(list = ls())`
• Removes all elements on the list (in this case, everything)
• `class(<v>)`
• `is.na(<v>)`
• `is.nan(v>)`

## § Atomic Objects

• Numeric (double)
• Integer
• Complex (complex numbers, e,g. `1 + 4i`)
• Logical (boolean)
• Character (actually a String as we know in other languages)

## § Vectors

• You can use it as a dictionary as well (`names`)
``````> x <- c(1,2)
> names(x) <- c("aa", "bb")
> x
aa bb
1  2
``````
``````> x <- c(1,2,3,4) # concat vectors
``````

## § Factors

• “Categories”
• Is possible to define the “size” of each level
``````> factor(c("A", "B", "C"), labels = c("A", "B", "C"), ordered = T)
 A B C
Levels: A < B < C
``````
• Usage:
``````factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)

> factor(c("A", "B", "C"))
 A B C
Levels: A B C
``````

## § Matrixes

• By default they are built by column (needs to set `byrow` to TRUE if otherwise)
``````> matrix(data = 1:6, nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

> attributes(matrix( data = 1:6, nrow = 2, ncol = 3))
\$dim
 2 3

> x <- c(1,2,3,4)
> dim(x) <- c(2,2)
> x
[,1] [,2]
[1,]    1    3
[2,]    2    4
``````
• `rbind()`
• `cbind()`
• To create names for matrixes, you can use `dimnames`

## § Data frames

• A special type of list

• All elements are vectors, each one represents a column. They must have the same number of subelements.
• Different columns can have different types

• You can set `stringAsFactores`. Prior to R 4.0 it was set as TRUE by default

• `data.frame()`

• `read.table()`

• `read.csv()`

• `cbind`/`rbind` can be used with Data Frames as well

## § Subcollections

• `[]` return object from the same class as the original object
• Negative indexes excludes it:
``````> x <- 1:4; x[-2]
 1 3 4
``````
• You can also use a boolean vector:
`````` > x <- 1:4; x[x > 2]
 3 4
``````
• You can also use it kinda like a dictionary
``````> x <- c(1,2); names(x) <- c("aa", "bb"); x["aa"]
aa
1
``````
• For matrixes you can use:
``````> m[2, ] # second line
> m[, 3] # third column
> m[, 3, drop=FALSE] # third column and returns it as a column
``````
• You can use partial names with `\$`:
``````> x <- list(foo = 1:4, bar = 1); x\$f
 1 2 3 4
``````

## § Dates

• Date type
• Number of days since 01/01/1970
``````> unclass(as.Date('1976/01/01'))
 2191
``````

## § Timestamps

• POSIXct and POSIXlt types

• POSIXct is just a number, useful to save space
• POSIXlt is a list with other informations (day of the week, day of the year, month and so on)
• `[[]]` extract the element of lists and data frames (class of the object and not of its “parent”)

• `\$` extracts elements based on the name of a list our data frame (similar to `[[]]`)

## § Curious stuff

• `%*%` is matrix multiplication
• `round` uses the ISO rule to round x.5 numbers. If pair it will round to get closer to zero, if it’s odd it will round to get away from zero. (https://en.wikipedia.org/wiki/ISO_80000-1)
``````> 1/0
 Inf
``````
``````> 0/0
 NaN
``````

## § Random

``````> 10:20
 10 11 12 13 14 15 16 17 18 19 20
``````
``````> seq(from=7, by=4, to=20)
  7 11 15 19
``````
• `order(x)` returns the order of the elements:
``````> x <- runif(5);
> x
 0.03387367 0.53094936 0.26855677 0.96293228 0.01368555
> order(x)
 5 1 3 2 4
``````
• `cor()` correlation

## § Functions

• https://bookdown.org/rdpeng/rprogdatascience/scoping-rules-of-r.html
• Functions always return the last value, like Ruby; (you can also use `return()`)
• `function()` creates function “objects”
• `f <- function(<args>) { }`
• `formals()` returns the list of formal arguments
• Not all arguments are required, even if they dont have a default value
• You can verify if an argument was given with `missing()`
• You can pass arguments by positioning it in the right order or explicitly saying the name
• You can also use `...` arg, if you want to use it you need to either create a vector or a list (e.g. `c(...)`, `list(...)`)
• All arguments are passed by value (copy)
• To vectorise a function that originally does not know how to handle vectors you can use `Vectorize(function)`

### § Good to know

• `lapply`: apply function to each element of a list, result is a list
• `sapply`: similar to lapply, but the result is simplified (if the result length is one, the result is a vector, if it’s a list with each element of the same size, the result is a matrix. Otherwise is a list)
• `apply`: Input is a matrix or a data frame, apply a function to all values of a certain dimension (lines or columns)
• `mapply`: `mapply(log, 2:5, 2:3)` (calls log passing an element of first argument and an element of the second argument). The result is also simplified like `sapply`. From the docs: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each … argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.
• `tapply`: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. (`tapply(mtcars\$mpg, mtcars\$cyl, mean)` = groups the cars by `\$cyl\$ and gets the mean of the `\$mpg`)

• `read.csv` is just a thin wrapper around `read.table`.
• `aes()` let you define aestetic things on your chart. If you define `aes` when you call `ggplot`, the next layears can extend it.