Random Variables
Random variable is a numerical characteristic that takes on different values due to chance.
Examples
Random variables are classified into two broad types: discrete and continuous. A discrete random variable has a countable set of distinct possible values. A continuous random variable is such that any value (to any number of decimal places) within some interval is a possible value. So, if a variable can takes on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable.
Binomial random variable is a specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials.
For a variable to be a binomial random variable, all of the following conditions must be met:
Examples
Discrete Random Variables:
Continuous Random Variables:
Probability Distribution
The probability of an event is estimated from the observed data by dividing the number of trials in which the event occurred by the total number of trials. For instance, if it rained 3 out of 10 days with similar conditions as today, the probability of rain today can be estimated as 3 / 10 = 0.30 or 30 percent. Similarly, if 10 out of 50 prior email messages were spam, then the probability of any incoming message being spam can be estimated as 10 / 50 = 0.20 or 20 percent.
To denote these probabilities, we use notation in the form P(x), which signifies the probability of event x. For example, P(rain) = 0.30 and P(spam) = 0.20.
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. Consider a simple experiment in which we flip a coin two times. An outcome of the experiment might be the number of heads that we see in two coin flips.
If a random variable is a discrete variable, its probability distribution is called a discrete probability distribution.
An example will make this clear. Suppose you flip a coin two times. This simple statistical experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let the random variable X represent the number of Heads that result from this experiment. The random variable X can only take on the values 0, 1, or 2, so it is a discrete random variable.
The probability distribution for this statistical experiment appears below.
Number of heads | Probability |
---|---|
0 | 0.25 |
1 | 0.50 |
2 | 0.25 |
The above table represents a discrete probability distribution because it relates each value of a discrete random variable with its probability of occurrence.
There are any number of discrete probability distributions. A discrete random variable can take on only clearly separated values, such as heads or tails, or the number of spots on a six-sided die. The categories must be mutually exclusive and exhaustive. Every event belongs to one and only one category, and the sum of the probabilities is 1.
The mean of any discrete probability distribution can be computed by the following
The variance for a discrete probability distribution is calculated as follows
R has functions for all of the well-known probability distributions. For each distribution, there are the following four functions:
The probability density function (pdf) and cumulative distribution function (cdf) are two ways of specifying the probability distribution of a random variable.
The pdf is denoted f(x) and gives the relative likelihood that the value of the random variable will be equal to x. The total area under the curve is equal to 1.
The cdf is denoted F(x) and gives the probability that the value of a random variable will be less than or equal to x.
The Binomial Distribution
The discrete binomial distribution is very useful for modeling processes in which the binary outcome can be either a success (1) or a failure (0). The random variable X is the number of successes in N independent trials, for each of which the probability of success, p, is the same. The number of successes can range from 0 to N. The expected value of k is N p, the number of trials times the probability of success on a given trial, and the variance of the binomial distribution is N pq, where q = 1 − p. We calculate the binomial probability as follows
Here is a binomial distribution for the number of successes (heads) in 10 tosses of a fair coin, in which the probability of success for each independent trial is .50. We establish a vector of the number of successes (heads), which can range from 0 to 10, with 5 being the most likely value. The dbinom
function produces a vector of values.
x = 0:10 binomDist = dbinom(x, 10, 0.50) plot(x, binomDist, type = "h") points(x, binomDist) abline(h = 0) lines(x, binomDist)
Example 1
Suppose that a fair die is rolled 10 times. What is the probability of throwing exactly two sixes?
dbinom(2, 10, 1/6)
The probability of throwing two sixes is approximately 0.29 or 29 percent.
Example 2
If you were to roll a fair six-sided die 100 times, what is the probability of rolling a six no more than 10 times? The number of sixes in 100 dice rolls follows a binomial distribution, so you can answer the question with the pbinom
function
pbinom(10, 100, 1/6)
From the output, you can see that the probability of rolling no more than 10 sixes is 0.043 (4.3%).
What is the probability of rolling a six more than 20 times?
pbinom(20, 100, 1/6, lower.tail=F)
The probability of rolling more than 20 sixes is approximately 0.15, or 15 percent.
Example 3
To simulate the number of sixes thrown in 10 rolls of a fair die, use the command:
rbinom(1, 10, 1/6) [1] 3
The Poisson Distribution
The Poisson distribution is a special case of the binomial distribution. We define success and failure in the usual way as 1 and 0, respectively, and as with the binomial distribution, the distinction is often arbitrary.
The Poisson distribution, unlike the binomial distribution, has no theoretical upper bound on the number of occurrences that can happen within a given interval. We assume the number of occurrences in each interval is independent of the number of occurrences in any other interval. We also assume the probability that an occurrence will happen is the same for every interval. As the interval size decreases, we assume the probability of an occurrence in the interval becomes smaller. In the Poisson distribution, the count of the number of occurrences, X, can take on whole numbers 0, 1, 2, 3, ... The mean number of successes per unit of measure is the value μ. If k is any whole number 0 or greater, then
The number of lobster ordered in a restaurant on a given day is known to follow a Poisson distribution with a mean of 20. What is the probability that exactly eighteen lobsters will be ordered tomorrow?
Example 1
dpois(18, 20)
The probability that exactly eighteen lobsters are ordered is 8.4 percent.
Example 2
The number of lobsters ordered on any given day in a restaurant follows a Poisson distribution with a mean of 20. To simulate the number of lobsters ordered over a seven-day period, use the command:
rpois(7, 20) [1] 19 10 13 23 21 13 25
The Normal Distribution
Continuous variables can take on any value within some specified range. Thus continuous probability functions plot a probability density function (PDF) instead of a discrete probability mass function (PMF).
In contrast to discrete probability distributions, the probability of a single point on the curve is essentially zero, and we rarely examine such probabilities, rather focusing on areas under the curve. In statistics, the four most commonly used continuous probability distributions are the normal distribution and three other distributions theoretically related to the normal distribution, namely, the t distribution, the F distribution, and the chi-square distribution.
The normal distribution serves as the backbone of modern statistics. As the distribution is continuous, we are usually interested in finding areas under the normal curve. In particular, we are often interested in left-tailed probabilities, right-tailed probabilities, and the area between two given scores on the normal distribution. There are any number of normal distributions, each for any non-zero value of σ, the population standard deviation, so we often find it convenient to work with the unit or standard normal distribution. The unit normal distribution has a mean of 0 (not to be confused in any way with a zero indicating the absence of a quantity), and a standard deviation of 1. The normal distribution is symmetrical and mound shaped, and its mean, mode, and median are all equal to 0. For any normal distribution, we can convert the distribution to the standard normal distribution as follows:
which is often called z-scoring or standardizing. The empirical rule tells us that for mound-shaped symmetrical distributions like the standard normal distribution, about 68% of the observations will lie between plus and minus 1 standard deviation from the mean. Approximately 95% of the observations will lie within plus or minus 2 standard deviations, and about 99.7% of observations will lie within plus or minus 3 standard deviations. We can use the built-in functions for the normal distribution to see how accurately this empirical rule describes the normal distribution. We find the rule is quite accurate.
pnorm(3) - pnorm(-3) [1] 0.9973002
To find the value of the pdf at x=2.5 for a normal distribution with a mean of 5 and a standard deviation of 2, use the command:
dnorm(2.5, mean=5, sd=2) [1] 0.09132454
To find a probability for a nonstandard normal distribution, add the mean and sd arguments. For example, if a random variable is known to be normally distributed with a mean of 5 and a standard deviation of 2 and you wish to find the probability that a randomly selected member will be no more than 6, use the command:
pnorm(6, mean=5, sd=2) [1] 0.6914625
To find the complementary probability that the value will be greater than 6, set the lower.tail argument to F:
pnorm(6, 5, 2, lower.tail=F) [1] 0.3085375
Example of generating random numbers from a normal distribution
Hand span in a particular population is known to be normally distributed with a mean of 195 millimeters and a standard deviation of 17 millimeters. To simulate the hand spans of three randomly selected people, use the command:
rnorm(3, 195, 17) [1] 186.376 172.164 195.504
One of the most important applications of the normal distribution is its ability to describe the distributions of the means of samples from a population. The central limit theorem tells us that as the sample size increases, the distribution of sample means becomes more and more normal, regardless of the shape of the parent distribution.
Vocabulary
Discrete is data that can only take on set number of values.
Continuous is quantitative data that can take on any value between the minimum and maximum, and any value between two other values.
Probability is the likelihood of an event occuring.
Mean is the numerical average; calculated as the sum of all of the data values divided by the number of values.
Standard deviation is roughly the average difference between individual data and the mean.
Sample is a subset of the population from which data is actually collected.
Population is the entire set of possible observations in which we are interested.
Parameter is a measurable characteristic of a population, such as a mean or standard deviation.
Statistic is a measurable characteristic of a sample, such as a mean or standard deviation.
Useful links