next up previous
Next: The Gaussian distribution Up: Probability theory Previous: The mean, variance, and

Application to the binomial distribution

Let us now apply what we have just learned about the mean, variance, and standard deviation of a general distribution function to the specific case of the binomial distribution function. Recall, that if a simple system has just two possible outcomes, denoted 1 and 2, with respective probabilities $p$ and $q=1-p$, then the probability of obtaining $n_1$ occurrences of outcome 1 in $N$ observations is
\begin{displaymath}
P_N(n_1) = \frac{N!}{n_1 !\,(N-n_1)!} \,p^{n_1}\,q^{N-n_1}.
\end{displaymath} (38)

Thus, the mean number of occurrences of outcome 1 in $N$ observations is given by
\begin{displaymath}
\overline{n_1} = \sum_{n_1=0}^N P_N(n_1)\,n_1 = \sum_{n_1=0}^N
\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,q^{N-n_1}\, n_1.
\end{displaymath} (39)

This is a rather nasty looking expression! However, we can see that if the final factor $n_1$ were absent, it would just reduce to the binomial expansion, which we know how to sum. We can take advantage of this fact by using a rather elegant mathematical sleight of hand. Observe that since
\begin{displaymath}
n_1\,p^{n_1} \equiv p\,\frac{\partial}{\partial p}\,p^{n_1},
\end{displaymath} (40)

the summation can be rewritten as
\begin{displaymath}
\sum_{n_1=0}^N\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,q^{N-n_1}\...
..._1=0}^N
\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,q^{N-n_1}
\right].
\end{displaymath} (41)

This is just algebra, and has nothing to do with probability theory. The term in square brackets is the familiar binomial expansion, and can be written more succinctly as $(p+q)^N$. Thus,
\begin{displaymath}
\sum_{n_1=0}^N\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,q^{N-n_1}\...
...\frac{\partial}{\partial p} \,(p+q)^N\equiv p\,N\,(p+q)^{N-1}.
\end{displaymath} (42)

However, $p+q=1$ for the case in hand, so
\begin{displaymath}
\overline{n_1} = N\,p.
\end{displaymath} (43)

In fact, we could have guessed this result. By definition, the probability $p$ is the number of occurrences of the outcome 1 divided by the number of trials, in the limit as the number of trials goes to infinity:

\begin{displaymath}
p= ~_{lt\,N\rightarrow\infty~}\frac{n_1}{N}.
\end{displaymath} (44)

If we think carefully, however, we can see that taking the limit as the number of trials goes to infinity is equivalent to taking the mean value, so that
\begin{displaymath}
p = \overline{\left(\frac{n_1}{N}\right)} = \frac{\overline{n_1}}{N}.
\end{displaymath} (45)

But, this is just a simple rearrangement of Eq. (43).

Let us now calculate the variance of $n_1$. Recall that

\begin{displaymath}
\overline{({\mit\Delta} n_1)^2}= \overline{(n_1)^2} - (\overline{n_1})^2.
\end{displaymath} (46)

We already know $\overline{n_1}$, so we just need to calculate $\overline{(n_1)^2}$. This average is written
\begin{displaymath}
\overline{(n_1)^2}=\sum_{n_1=0}^{N}\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,
q^{N-n_1}\,(n_1)^2.
\end{displaymath} (47)

The sum can be evaluated using a simple extension of the mathematical trick we used earlier to evaluate $\overline{n_1}$. Since
\begin{displaymath}
(n_1)^2 \,p^{n_1} \equiv \left(p\,\frac{\partial}{\partial p}\right)^2 p^{n_1},
\end{displaymath} (48)

then
$\displaystyle \sum_{n_1=0}^{N}\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}\,q^{N-n_1}\,(n_1)^2$ $\textstyle \equiv$ $\displaystyle \left(p\,\frac{\partial}{\partial p}\right)^2\sum_{n_1=0}^N
\frac{N!}{n_1!\,(N-n_1)!}\,p^{n_1}q^{N-n_1}$  
  $\textstyle \equiv$ $\displaystyle \left(p\,\frac{\partial}{\partial p}\right)^2(p+q)^N$ (49)
  $\textstyle \equiv$ $\displaystyle \left(p\,\frac{\partial}{\partial p}\right)\left[p\,N\, (p+q)^{N-1}\right]$  
  $\textstyle \equiv$ $\displaystyle p\left[N\,(p+q)^{N-1}+p\,N\,(N-1)\,(p+q)^{N-2}\right].$  

Using $p+q=1$ yields
$\displaystyle \overline{(n_1)^2}$ $\textstyle =$ $\displaystyle p\left[N+p\,N\,(N-1)\right]= N\,p\left[1+p\,N-p\right]$  
  $\textstyle =$ $\displaystyle (N\,p)^2 + N\,p\,q = (\overline{n_1})^2 + N\,p\,q,$ (50)

since $\overline{n_1}= N\,p$. It follows that the variance of $n_1$ is given by
\begin{displaymath}
\overline{({\mit\Delta} n_1)^2}= \overline{(n_1)^2}- (\overline{n_1})^2 = N\,p\,q.
\end{displaymath} (51)

The standard deviation of $n_1$ is just the square root of the variance, so

\begin{displaymath}
{\mit\Delta}^\ast n_1 = \sqrt{N\,p\,q}.
\end{displaymath} (52)

Recall that this quantity is essentially the width of the range over which $n_1$ is distributed around its mean value. The relative width of the distribution is characterized by
\begin{displaymath}
\frac{{\mit\Delta}^\ast n_1}{\overline{n_1}}= \frac{\sqrt{N \,p\,q}}{N\,p} =
\sqrt{\frac{q}{p}}\frac{1}{\sqrt{N}}.
\end{displaymath} (53)

It is clear from this formula that the relative width decreases like $N^{-1/2}$ with increasing $N$. So, the greater the number of trials, the more likely it is that an observation of $n_1$ will yield a result which is relatively close to the mean value $\overline{n_1}$. This is a very important result.


next up previous
Next: The Gaussian distribution Up: Probability theory Previous: The mean, variance, and
Richard Fitzpatrick 2006-02-02