Random Numbers in R Science 24.05.2016

Random numbers are critical to many analyses, and consequently R has a wide variety of functions for sampling from different distributions.

A sample function is important workhorse function for generating random values, and its behavior has a few quirks, so it’s worth getting to know it more thoroughly. If you just pass it a number, n, it will return a permutation of the natural numbers from 1 to n

sample(7)
[1] 1 2 5 7 4 6 3

If you give it a second value, it will return that many random numbers between 1 and n

sample(7, 5)
[1] 7 2 3 1 5

Notice that all those random numbers are different. By default, sample samples without replacement. That is, each value can only appear once. To allow sampling with replacement, pass replace = TRUE

sample(7, 10, replace = TRUE)
[1] 4 6 1 7 5 3 6 7 4 2

Also, you can select n items from a vector

#sample(vec, n)
sample(world.series$year, 10)
[1] 1906 1963 1966 1928 1905 1924 1961 1959 1927 1934

The following example generates a random sequence of 10 simulated flips of a coin

sample(c("H","T"), 10, replace=TRUE)
[1] "H" "H" "H" "T" "T" "H" "T" "H" "H" "T"

By default, sample will choose equally among the set elements and so the probability of selecting either H or T is 0.5. With a Bernoulli trial, the probability p of success is not necessarily 0.5. You can bias the sample by using the prob argument of sample. This argument is a vector of probabilities, one for each set element.

Suppose we want to generate 10 Bernoulli trials with a probability of success p = 0.8. We set the probability of H to be 0.2 and the probability of T to 0.8

sample(c("H","T"), 10, replace=TRUE, prob=c(0.2,0.8))
[1] "T" "T" "T" "T" "T" "H" "H" "T" "H" "T"