 Index | About | Me | Jump to Menu Section

# R lang

• https://www.r-project.org
• GNU project since 1997
• Name from the authors (Ross Ihaka and Robert Gentleman)
• Influenced by S lang
• Originally written in Fortran
• The idea was that you can initially analyse the data without thinking with the “programming” mindset

## Basic commands

• `ls()`
• List all objects
• `rm(<v>)`
• remove object
• `rm(list = ls())`
• Removes all elements on the list (in this case, everything)
• `class(<v>)`
• `is.na(<v>)`
• `is.nan(v>)`

## Atomic Objects

• Numeric (double)
• Integer
• Complex (complex numbers, e,g. `1 + 4i`)
• Logical (boolean)
• Character (actually a String as we know in other languages)

## Vectors

• You can use it as a dictionary as well (`names`)
``> x <- c(1,2)> names(x) <- c("aa", "bb")> xaa bb  1  2 ``
``> x <- c(1,2,3,4) # concat vectors``

## Factors

• “Categories”
• Is possible to define the “size” of each level
``> factor(c("A", "B", "C"), labels = c("A", "B", "C"), ordered = T) A B CLevels: A < B < C``
• Usage:
``factor(x = character(), levels, labels = levels,       exclude = NA, ordered = is.ordered(x), nmax = NA)> factor(c("A", "B", "C")) A B CLevels: A B C``

## Matrixes

• By default they are built by column (needs to set `byrow` to TRUE if otherwise)
``> matrix(data = 1:6, nrow = 2, ncol = 3)     [,1] [,2] [,3][1,]    1    3    5[2,]    2    4    6> attributes(matrix( data = 1:6, nrow = 2, ncol = 3))\$dim 2 3> x <- c(1,2,3,4)> dim(x) <- c(2,2)> x     [,1] [,2][1,]    1    3[2,]    2    4``
• `rbind()`
• `cbind()`
• To create names for matrixes, you can use `dimnames`

## Data frames

• A special type of list

• All elements are vectors, each one represents a column. They must have the same number of subelements.
• Different columns can have different types

• You can set `stringAsFactores`. Prior to R 4.0 it was set as TRUE by default

• `data.frame()`

• `read.table()`

• `read.csv()`

• `cbind`/`rbind` can be used with Data Frames as well

## Subcollections

• `[]` return object from the same class as the original object
• Negative indexes excludes it:
``> x <- 1:4; x[-2] 1 3 4``
• You can also use a boolean vector:
`` > x <- 1:4; x[x > 2] 3 4``
• You can also use it kinda like a dictionary
``> x <- c(1,2); names(x) <- c("aa", "bb"); x["aa"]aa  1 ``
• For matrixes you can use:
``> m[2, ] # second line> m[, 3] # third column> m[, 3, drop=FALSE] # third column and returns it as a column``
• You can use partial names with `\$`:
``> x <- list(foo = 1:4, bar = 1); x\$f 1 2 3 4``

## Dates

• Date type
• Number of days since 01/01/1970
``> unclass(as.Date('1976/01/01')) 2191``

## Timestamps

• POSIXct and POSIXlt types

• POSIXct is just a number, useful to save space
• POSIXlt is a list with other informations (day of the week, day of the year, month and so on)
• `[[]]` extract the element of lists and data frames (class of the object and not of its “parent”)

• `\$` extracts elements based on the name of a list our data frame (similar to `[[]]`)

## Curious stuff

• `%*%` is matrix multiplication
• `round` uses the ISO rule to round x.5 numbers. If pair it will round to get closer to zero, if it’s odd it will round to get away from zero. (https://en.wikipedia.org/wiki/ISO_80000-1)
``> 1/0 Inf``
``> 0/0 NaN``

## Random

``> 10:20 10 11 12 13 14 15 16 17 18 19 20``
``> seq(from=7, by=4, to=20)  7 11 15 19``
• `order(x)` returns the order of the elements:
``> x <- runif(5);> x 0.03387367 0.53094936 0.26855677 0.96293228 0.01368555> order(x) 5 1 3 2 4``
• `cor()` correlation

## Functions

• https://bookdown.org/rdpeng/rprogdatascience/scoping-rules-of-r.html
• Functions always return the last value, like Ruby; (you can also use `return()`)
• `function()` creates function “objects”
• `f <- function(<args>) { }`
• `formals()` returns the list of formal arguments
• Not all arguments are required, even if they dont have a default value
• You can verify if an argument was given with `missing()`
• You can pass arguments by positioning it in the right order or explicitly saying the name
• You can also use `...` arg, if you want to use it you need to either create a vector or a list (e.g. `c(...)`, `list(...)`)
• All arguments are passed by value (copy)
• To vectorise a function that originally does not know how to handle vectors you can use `Vectorize(function)`

### Good to know

• `lapply`: apply function to each element of a list, result is a list
• `sapply`: similar to lapply, but the result is simplified (if the result length is one, the result is a vector, if it’s a list with each element of the same size, the result is a matrix. Otherwise is a list)
• `apply`: Input is a matrix or a data frame, apply a function to all values of a certain dimension (lines or columns)
• `mapply`: `mapply(log, 2:5, 2:3)` (calls log passing an element of first argument and an element of the second argument). The result is also simplified like `sapply`. From the docs: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each … argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.
• `tapply`: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. (`tapply(mtcars\$mpg, mtcars\$cyl, mean)` = groups the cars by `\$cyl\$ and gets the mean of the `\$mpg`)

• `read.csv` is just a thin wrapper around `read.table`.
• `aes()` let you define aestetic things on your chart. If you define `aes` when you call `ggplot`, the next layears can extend it.