Application to Binomial Probability Distribution

Let us now apply what we have just learned about the mean, variance, and standard deviation of a general probability distribution to the specific case of the binomial probability distribution. Recall, from Section 5.1.2, that if a simple system has just two possible outcomes, denoted $P$ and $Q$, with respective probabilities $p$ and $q=1-p$, then the probability of obtaining $n$ occurrences of outcome $P$ in $N$ observations is

$\displaystyle P_N(n) = \frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}.$ (5.27)

Thus, making use of Equation (5.21), the mean number of occurrences of outcome $P$ in $N$ observations is given by

$\displaystyle \langle n\rangle= \sum_{n=0,N} P_N(n)\,n = \sum_{n=0,N}
\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}\, n.$ (5.28)

We can see that if the final factor $n$ were absent on the right-hand side of the previous expression then it would just reduce to the binomial expansion, which we know how to sum. [See Equation (5.16).] We can take advantage of this fact using a rather elegant mathematical sleight of hand. Observe that because

$\displaystyle n\,p^{n} \equiv p\,\frac{\partial}{\partial p}\,p^{n},$ (5.29)

the previous summation can be rewritten as

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{\,N-n}\, n
\equiv p\...
...{\partial p}\!\left[\sum_{n=0,N}
\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}
\right].$ (5.30)

The term in square brackets is now the familiar binomial expansion, and can be written more succinctly as $(p+q)^{N}$. Thus,

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{\,N-n}\, n
=p\,\frac{\partial}{\partial p} \,(p+q)^{N}= p\,N\,(p+q)^{N-1}.$ (5.31)

However, $p+q=1$ for the case in hand [see Equation (5.9)], so

$\displaystyle \langle n\rangle = N\,p.$ (5.32)

In fact, we could have guessed the previous result. By definition, the probability, $p$, is the number of occurrences of the outcome $P$ divided by the number of observations, in the limit as the number of observations goes to infinity:

$\displaystyle p= ~_{\lim\,N\rightarrow\infty}\,\frac{n}{N}.$ (5.33)

[See Equation (5.1).] If we think carefully, however, we can appreciate that taking the limit as the number of observations goes to infinity is equivalent to taking the mean value, so that

$\displaystyle p = \left\langle\frac{n}{N}\right\rangle= \frac{\langle n\rangle}{N}.$ (5.34)

But, this is just a simple rearrangement of Equation (5.32).

Let us now calculate the variance of $n$. Recall, from Equation (5.25), that

$\displaystyle \left\langle({\mit\Delta} n)^{2}\right\rangle= \left\langle n^{2}\right\rangle- \left\langle n\right\rangle^2$ (5.35)

We already know $\langle n\rangle$, so we just need to calculate $\left\langle n^2\right\rangle$. This average is written

$\displaystyle \left\langle n^2\right\rangle =\sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,
q^{N-n}\,n^{2}.$ (5.36)

The sum can be evaluated using a simple extension of the mathematical trick that we used previously to evaluate $\langle n\rangle$. Because

$\displaystyle n^2 \,p^{\,n} \equiv \left(p\,\frac{\partial}{\partial p}\right)^{2} p^{n},$ (5.37)

we can write

$\displaystyle \sum_{n=0,N}\frac{N!}{n!\,(N-n)!}\,p^{n}\,q^{N-n}\,n^2$ $\displaystyle \equiv \left(p\,\frac{\partial}{\partial p}\right)^2\sum_{n=0,N}
\frac{N!}{n!\,(N-n)!}\,p^{n}q^{N-n}$    
  $\displaystyle = \left(p\,\frac{\partial}{\partial p}\right)^2(p+q)^{N}$    
  $\displaystyle =\left(p\,\frac{\partial}{\partial p}\right)\left[p\,N\, (p+q)^{N-1}\right]$    
  $\displaystyle = p\left[N\,(p+q)^{N-1}+p\,N\,(N-1)\,(p+q)^{N-2}\right].$ (5.38)

Using $p+q=1$, we obtain

$\displaystyle \left\langle n^2\right\rangle$ $\displaystyle = p\left[N+p\,N\,(N-1)\right]= N\,p\left(1+p\,N-p\right)$    
  $\displaystyle = (N\,p)^{2} + N\,p\,q = \langle n\rangle^2 + N\,p\,q,$ (5.39)

because $\langle n\rangle= N\,p$. [See Equation (5.32).] It follows that the variance of $n$ is given by

$\displaystyle \left\langle ({\mit\Delta} n)^2\right\rangle= \left\langle n^2\right\rangle -\left\langle n\right\rangle^2 = N\,p\,q.$ (5.40)

The standard deviation of $n$ is the square root of the variance [see Equation (5.26)], so that

$\displaystyle \sigma_n= \sqrt{N\,p\,q}.$ (5.41)

Now, the standard deviation is essentially the width of the range of probable values over which $n$ is distributed around its mean value, $\langle n\rangle$. The relative width of the distribution is characterized by

$\displaystyle \frac{\sigma_{n}}{\langle n\rangle }= \frac{\sqrt{N \,p\,q}}{N\,p} =
\sqrt{\frac{q}{p}}\frac{1}{\sqrt{N}}.$ (5.42)

It is clear, from the previous formula, that the relative width decreases with increasing $N$ like $N^{-1/2}$. So, the greater the number of observations, the more likely it is that an observation of $n$ will yield a result that is relatively close to the mean value, $\langle n\rangle$.