Section 8.5.1 of Rice discusses multinomial cell probabilities.
Data consisting of: X1,X2,…,Xm
are counts in cells 1,…,m and follow a multinomial distribution
f(x1,…,xn∣p1,…,pm)=n!∏mj=1xi!m∏j=1pxjj
where
(p1,…,pm) is the vector of cell probabilities with ∑mi=1pi=1.
n=∑mj=1xi is the total count.
These data arise from a random sample of single-count Multinomial random variables, which are a generalization of Bernoulli random variables (m distinct outcomes versus 2 distinct outcomes).
Suppose
W1,W2,…,Wn are iid W∼Multinomial(n,probs=(p1,…,pm)) random variables:
The sample space of each Wi is W={1,2,…,m}, a set of m distinct outcomes.
P(Wi=k)=pk, k=1,2,…,m.
Define
Xk=∑ni=11(Wi=k), (sum of indicators of outcome k), k=1,…,m
X=(X1,…,Xm)
lik(p1,…,pm)=log[f(x1,…,xm∣p1,…,pm)]=log(n!)−∑mj=1log(xj!)+∑mj=1xjlog(pj)
The MLE of (p1,…,pm) maximizes lik(p1,…,pm) (with x1,…,xm fixed!)
Maximum achieved when differential is zero
Constraint: ∑mj=1pj=1
Apply method of Lagrange multipliers
Solution: ˆpj=xj/n, j=1,…,m.
Note: if any xj=0, then ˆpj=0 solved as limit
Equilibrium frequency of genotypes: AA, Aa, and aa
P(a)=θ and P(A)=1−θ
Equilibrium probabilities of genotypes: (1−θ)2, 2(θ)(1−θ), and θ2.
Multinomial Data: (X1,X2,X3) corresponding to counts of AA, Aa, and aa in a sample of size n.
See, e.g.
http://www.nature.com/scitable/definition/hardy-weinberg-equation-299
http://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122
Sample Data
GenotypeAAAaaaTotalCountX1X2X3nFrequency3425001871029
˜θ=(˜p)1/2=(X3/n)1/2 =√187/1029=.4263.
Solve for MLE
l(θ)=log(f(x1,x2,x3∣p1(θ),p2(θ),p3(θ)))=log(n!x1!x2!x3!p1(θ)x1p2(θ)x2p3(θ)x3)=x1log((1−θ)2)+x2log(2θ(1−θ))+x3log(θ2)+(non-θterms)=(2x1+x2)log(1−θ)+(2x3+x2)log(θ)+(non-θterms)⟹ˆθ=2x3+x22x1+2x2+2x3=2x3+x22n=0.4247
Which estimate is better?
Conduct Parametric Bootstrap Simulation!