Probability and Statistics Fact Sheet
Introduction
A sample containing n outcomes is called a simple sample space if the probability
assigned to each of the outcomes is 1/n. If event A contains m outcomes (n³ m), then ![]()
Example: assume a deck of 52 cards. What is the probability of getting a spade?
Answer: n=52, m=13, ![]()
This simple fact underlies the main interpretation of what a probability statement is.
Often called the relative frequency approach. It says that given Pr(A)=
, this implies that it is
estimated that the sample space contains ¾ As; alternatively, given a sample space
N, if we can count f As, the ratio f/N is the relative frequency of A, and its
probability is f/N. One can examine other interpretations, but this one is most frequent
to intuition.
Combinatorics
Given n objects, there are n! ways of arranging them
Example: how many ways are their of arranging 6 books on a shelf?
Answer: 6!=720
Permutations
Given n distinct objects, if you choose k objects (i.e., sampling without replacement),
there are ![]()
Example: given 10 applicants and two positions, assistant and president. There are
or 90 ways of filling this set of applications.
In contrast, with replacement the problem looks like this. Given n distinct objects, sampling with replacement, and k different selections, there are nk ways of choosing k objects.
Example: Given 365 days in a years, what is the lowest number of people such that the probability of sharing a birthday is greater than .5?
Answer: The number of ways in which k people will be different is P365,k. The number of ways that people can have birthdays is 365k. The probability that at least two of the people will have the same birthday is therefore
. It can be shown that the integer k that is above but closest
to 0.5 is k=23.
Combinations
Assume we are uninterested in the order of the grouping k from a sample of n. The
number of ways of choosing k people from a group of n is
. This is often shown as
.
Independence
If A and B are independent, then Pr(AB)=Pr(A)Pr(B), (defining the joint occurrence of A
and B as
). The converse is
not true.
In general the probability of two events comes from the equation:
![]()
alternatively,
![]()
Conditional probability
![]()
Bayes Theorem
Let events A1, ,Ak for a partition such that Pr(Aj)>0 " j, and Pr(B)>0.
![]()
or

A Markov chain is a special type of stochastic process. At any given time n, when the current state Xn and all previous states X1, ,Xn-1 of the process are known, the probabilities of all future states Xj (j>n) depend only on the current state Xn and do not depend on the earlier states X1, ,Xn-1. Formally,
![]()
The conditional probability Pr(Xn+1 = sj|Xn=si) that the Markov chain will be in state sj at time n+1 if it is in state si at time n is called transition probability. If for the certain Markov chain this transition probability has the same value for every n, then it is said that the Markov chain has stationary transition probabilities. In other words, a Markov chain has stationary transition probabilities if, for any states si and sj, there is a transition probability such that
![]()
Consider a finite Markov chain with k possible states s1,
,sk
and stationary transition probabilities. For i=1,
,k and j=1,
,k we shall let pi,j
denote the conditional probability that the process will be in state j given it was in
state i previously. The transition matrix of the Markov chain is defined to be the
matrix P with elements pi,j. Thus

note that the sum of each row sums to 1. That is, the probability of moving to some other state, including the initial state, is 1.
Normal distribution
A random variable X has a normal distribution with mean m
and variance s if X has a continuous distribution for which the
p.d.f.
is as follows

This is also called a Gaussian distribution. If X has a normal distribution with mean m and variance s 2 and if
Y=aX+b, where a and b are given constants and a¹ 0, the Y has
a normal distribution with mean
and variance ![]()
The normal distribution with mean 0 and variance 1 is called the standard normal
distribution. The p.d.f. of the standard normal distribution is usually denoted by the
symbol f and the d.f. is denoted by the symbol F . This gives the density function
. This distribution is used to generate standard
tables, such as F (.95)=1.645, F
(.99)=2.33, F (.995)=2.58.
Note that given the symmetry of this distribution around zero (since
), we have F (z)=1- F (-z).
To transform a variable into a standard normal variable, we need only apply
, and using convention note z as
pulled from a standard normal distribution.
The sample mean of a normal distribution has the following property:
. This implies that the mean will be measured more
precisely the larger the sample, since variance is proportional to 1/n.
The Central Limit Theorem (Lindeberg and Levy) for the sample mean states that a
sample mean from any distribution with finite variance will be distributed
normally, such that as
.
The above gives rise to the Law of Large Numbers:
Suppose that X1,
,Xn from a random sample from a
distribution for which the mean is m , and let
denote the sample mean. Then
![]()
That is, the sample mean converges to the population mean as the sample grows.
Note that taking a different interpretation of this is that regardless of the distribution of the underlying, the sum will be distributed normally. This is why many items can be analyzed with a normal distribution.
Covariance
Assume X and Y have means m 1 and m 2, respectively, as well as variances
, respectively. The covariance of X and Y is given by
![]()
The correlation coefficient is given by
![]()
The Gamma Distribution
X has a gamma distribution with parameters a and B (a>0, and B>0) if X has a
continuous distribution with the p.d.f.
is
specified as follows:

where the gamma function G (a )
is defined as the following
.
For a gamma distribution the mean and variance are:
and ![]()
sometimes (e.g., Hogg and Craig, p.104), the variable b is represented as b -1, so that

and ![]()
Thus it is not unambiguous what the parameter b of the gamma distribution implies without a full specification of the density function.
The exponential distribution is for the special case where a =1, while the chi-squared distribution is for the special case where b =1/2 and a =r/2 (using the former p.d.f.). The chi-squared results from a multiplication of Gaussian variables, and thus is often used in test statistics. It is often noted as c (r).
This distribution is often used in practical problems to represent the distribution of the time that elapses before the occurrence of some event. For example, the time a machine operates without breaking down.
Beta Distribution
X has a beta distribution with parameters a and B if X has a continuous distribution
for which the p.d.f.
is as follows:

the mean and variance of the Beta Distribution are as follows:
and ![]()