Application to Binomial Probability Distribution

Let us now apply what we have just learned about the mean, variance, and standard deviation of a general probability distribution to the specific case of the binomial probability distribution. Recall, from Section 5.1.2, that if a simple system has just two possible outcomes, denoted and , with respective probabilities and , then the probability of obtaining occurrences of outcome in observations is

$\displaystyle P_N(n) = \frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}.$

(5.27)

Thus, making use of Equation (5.21), the mean number of occurrences of outcome in observations is given by

$\displaystyle \langle n\rangle= \sum_{n=0,N} P_N(n)\,n = \sum_{n=0,N} \frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}\, n.$

(5.28)

We can see that if the final factor were absent on the right-hand side of the previous expression then it would just reduce to the binomial expansion, which we know how to sum. [See Equation (5.16).] We can take advantage of this fact using a rather elegant mathematical sleight of hand. Observe that because

$\displaystyle n\,p^{n} \equiv p\,\frac{\partial}{\partial p}\,p^{n},$

(5.29)

the previous summation can be rewritten as

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{\,N-n}\, n \equiv p\... ...{\partial p}\!\left[\sum_{n=0,N} \frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n} \right].$

(5.30)

The term in square brackets is now the familiar binomial expansion, and can be written more succinctly as $(p+q)^{N}$ . Thus,

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{\,N-n}\, n =p\,\frac{\partial}{\partial p} \,(p+q)^{N}= p\,N\,(p+q)^{N-1}.$

(5.31)

However, for the case in hand [see Equation (5.9)], so

$\displaystyle \langle n\rangle = N\,p.$

(5.32)

In fact, we could have guessed the previous result. By definition, the probability, , is the number of occurrences of the outcome divided by the number of observations, in the limit as the number of observations goes to infinity:

$\displaystyle p= ~_{\lim\,N\rightarrow\infty}\,\frac{n}{N}.$

(5.33)

[See Equation (5.1).] If we think carefully, however, we can appreciate that taking the limit as the number of observations goes to infinity is equivalent to taking the mean value, so that

$\displaystyle p = \left\langle\frac{n}{N}\right\rangle= \frac{\langle n\rangle}{N}.$

(5.34)

But, this is just a simple rearrangement of Equation (5.32).

Let us now calculate the variance of . Recall, from Equation (5.25), that

$\displaystyle \left\langle({\mit\Delta} n)^{2}\right\rangle= \left\langle n^{2}\right\rangle- \left\langle n\right\rangle^2$

(5.35)

We already know $\langle n\rangle$ , so we just need to calculate $\left\langle n^2\right\rangle$ . This average is written

$\displaystyle \left\langle n^2\right\rangle =\sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\, q^{N-n}\,n^{2}.$

(5.36)

The sum can be evaluated using a simple extension of the mathematical trick that we used previously to evaluate $\langle n\rangle$ . Because

$\displaystyle n^2 \,p^{\,n} \equiv \left(p\,\frac{\partial}{\partial p}\right)^{2} p^{n},$

(5.37)

we can write

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}\,n^2$	$\displaystyle \equiv \left(p\,\frac{\partial}{\partial p}\right)^2\sum_{n=0,N} \frac{N!}{n!\,(N-n)!}\,p^{n}q^{N-n}$
	$\displaystyle = \left(p\,\frac{\partial}{\partial p}\right)^2(p+q)^{N}$
	$\displaystyle =\left(p\,\frac{\partial}{\partial p}\right)\left[p\,N\, (p+q)^{N-1}\right]$
	$\displaystyle = p\left[N\,(p+q)^{N-1}+p\,N\,(N-1)\,(p+q)^{N-2}\right].$	(5.38)

Using , we obtain

$\displaystyle \left\langle n^2\right\rangle$	$\displaystyle = p\left[N+p\,N\,(N-1)\right]= N\,p\left(1+p\,N-p\right)$
	$\displaystyle = (N\,p)^{2} + N\,p\,q = \langle n\rangle^2 + N\,p\,q,$	(5.39)

because $\langle n\rangle= N\,p$ . [See Equation (5.32).] It follows that the variance of is given by

$\displaystyle \left\langle ({\mit\Delta} n)^2\right\rangle= \left\langle n^2\right\rangle -\left\langle n\right\rangle^2 = N\,p\,q.$

(5.40)

The standard deviation of is the square root of the variance [see Equation (5.26)], so that

$\displaystyle \sigma_n= \sqrt{N\,p\,q}.$

(5.41)

Now, the standard deviation is essentially the width of the range of probable values over which is distributed around its mean value, $\langle n\rangle$ . The relative width of the distribution is characterized by

$\displaystyle \frac{\sigma_{n}}{\langle n\rangle }= \frac{\sqrt{N \,p\,q}}{N\,p} = \sqrt{\frac{q}{p}}\frac{1}{\sqrt{N}}.$

(5.42)

It is clear, from the previous formula, that the relative width decreases with increasing like $N^{-1/2}$ . So, the greater the number of observations, the more likely it is that an observation of will yield a result that is relatively close to the mean value, $\langle n\rangle$ .