Basics

Probability and Statistics Fact Sheet

Introduction

A sample containing n outcomes is called a simple sample space if the probability assigned to each of the outcomes is 1/n. If event A contains m outcomes (n³ m), then

Example: assume a deck of 52 cards. What is the probability of getting a spade?

Answer: n=52, m=13,

This simple fact underlies the main interpretation of what a probability statement is. Often called the relative frequency approach. It says that given Pr(A)=, this implies that it is estimated that the sample space contains ¾ A’s; alternatively, given a sample space N, if we can count f A’s, the ratio f/N is the relative frequency of A, and its probability is f/N. One can examine other interpretations, but this one is most frequent to intuition.

Combinatorics

Given n objects, there are n! ways of arranging them

Example: how many ways are their of arranging 6 books on a shelf?

Answer: 6!=720

Permutations

Given n distinct objects, if you choose k objects (i.e., sampling without replacement), there are

Example: given 10 applicants and two positions, assistant and president. There are or 90 ways of filling this set of applications.

In contrast, with replacement the problem looks like this. Given n distinct objects, sampling with replacement, and k different selections, there are n^k ways of choosing k objects.

Example: Given 365 days in a years, what is the lowest number of people such that the probability of sharing a birthday is greater than .5?

Answer: The number of ways in which k people will be different is P_365,k. The number of ways that people can have birthdays is 365^k. The probability that at least two of the people will have the same birthday is therefore

. It can be shown that the integer k that is above but closest to 0.5 is k=23.

Combinations

Assume we are uninterested in the order of the grouping k from a sample of n. The number of ways of choosing k people from a group of n is . This is often shown as .

Independence

If A and B are independent, then Pr(AB)=Pr(A)Pr(B), (defining the joint occurrence of A and B as ). The converse is not true.

In general the probability of two events comes from the equation:

alternatively,

Conditional probability

Baye’s Theorem

Let events A₁, …,A_k for a partition such that Pr(A_j)>0 " j, and Pr(B)>0.

Markov chains

A Markov chain is a special type of stochastic process. At any given time n, when the current state X_n and all previous states X₁, …,X_n-1 of the process are known, the probabilities of all future states X_j (j>n) depend only on the current state X_n and do not depend on the earlier states X₁, …,X_n-1. Formally,

The conditional probability Pr(X_n+1= s_j|X_n=s_i) that the Markov chain will be in state s_j at time n+1 if it is in state s_i at time n is called transition probability. If for the certain Markov chain this transition probability has the same value for every n, then it is said that the Markov chain has stationary transition probabilities. In other words, a Markov chain has stationary transition probabilities if, for any states s_i and s_j, there is a transition probability such that

Consider a finite Markov chain with k possible states s₁,…,s_k and stationary transition probabilities. For i=1,…,k and j=1,…,k we shall let p_i,j denote the conditional probability that the process will be in state j given it was in state i previously. The transition matrix of the Markov chain is defined to be the matrix P with elements p_i,j. Thus

note that the sum of each row sums to 1. That is, the probability of moving to some other state, including the initial state, is 1.

Normal distribution

A random variable X has a normal distribution with mean m and variance s if X has a continuous distribution for which the p.d.f. is as follows

This is also called a Gaussian distribution. If X has a normal distribution with mean m and variance s ² and if Y=aX+b, where a and b are given constants and a¹ 0, the Y has a normal distribution with mean and variance

The normal distribution with mean 0 and variance 1 is called the standard normal distribution. The p.d.f. of the standard normal distribution is usually denoted by the symbol f and the d.f. is denoted by the symbol F . This gives the density function . This distribution is used to generate standard tables, such as F (.95)=1.645, F (.99)=2.33, F (.995)=2.58.

Note that given the symmetry of this distribution around zero (since ), we have F (z)=1- F (-z).

To transform a variable into a standard normal variable, we need only apply , and using convention note z as pulled from a standard normal distribution.

The sample mean of a normal distribution has the following property: . This implies that the mean will be measured more precisely the larger the sample, since variance is proportional to 1/n.

The Central Limit Theorem (Lindeberg and Levy) for the sample mean states that a sample mean from any distribution with finite variance will be distributed normally, such that as .

The above gives rise to the Law of Large Numbers:

Suppose that X₁,…,X_n from a random sample from a distribution for which the mean is m , and let denote the sample mean. Then

That is, the sample mean converges to the population mean as the sample grows.

Note that taking a different interpretation of this is that regardless of the distribution of the underlying, the sum will be distributed normally. This is why many items can be analyzed with a normal distribution.

Covariance

Assume X and Y have means m ₁ and m ₂, respectively, as well as variances , respectively. The covariance of X and Y is given by

The correlation coefficient is given by

The Gamma Distribution

X has a gamma distribution with parameters a and B (a>0, and B>0) if X has a continuous distribution with the p.d.f. is specified as follows:

where the gamma function G (a ) is defined as the following .

For a gamma distribution the mean and variance are:

and

sometimes (e.g., Hogg and Craig, p.104), the variable b is represented as b ^-1, so that

and

Thus it is not unambiguous what the parameter b of the gamma distribution implies without a full specification of the density function.

The exponential distribution is for the special case where a =1, while the chi-squared distribution is for the special case where b =1/2 and a =r/2 (using the former p.d.f.). The chi-squared results from a multiplication of Gaussian variables, and thus is often used in test statistics. It is often noted as c (r).

This distribution is often used in practical problems to represent the distribution of the time that elapses before the occurrence of some event. For example, the time a machine operates without breaking down.

Beta Distribution

X has a beta distribution with parameters a and B if X has a continuous distribution for which the p.d.f. is as follows:

the mean and variance of the Beta Distribution are as follows:

and