Loading [MathJax]/jax/output/HTML-CSS/jax.js

18.443 File Rproject3_rmd_multinomial_theory.html

Maximum Likelihood Estimates of Multinomial Cell Probabilities

Definition: Multinomial Distribution (generalization of Binomial)

Section 8.5.1 of Rice discusses multinomial cell probabilities.

Data consisting of: X1,X2,,Xm

are counts in cells 1,,m and follow a multinomial distribution

f(x1,,xnp1,,pm)=n!mj=1xi!mj=1pxjj

where

  • (p1,,pm) is the vector of cell probabilities with mi=1pi=1.

  • n=mj=1xi is the total count.

These data arise from a random sample of single-count Multinomial random variables, which are a generalization of Bernoulli random variables (m distinct outcomes versus 2 distinct outcomes).

Suppose

  • W1,W2,,Wn are iid WMultinomial(n,probs=(p1,,pm)) random variables:

  • The sample space of each Wi is W={1,2,,m}, a set of m distinct outcomes.

  • P(Wi=k)=pk, k=1,2,,m.

Define

  • Xk=ni=11(Wi=k), (sum of indicators of outcome k), k=1,,m

  • X=(X1,,Xm)

Maximum Likelihood Estimation

Likelihood function of Multinomial

lik(p1,,pm)=log[f(x1,,xmp1,,pm)]=log(n!)mj=1log(xj!)+mj=1xjlog(pj)

Maximum Likelihood Estimate (MLE)

The MLE of (p1,,pm) maximizes lik(p1,,pm) (with x1,,xm fixed!)

  • Maximum achieved when differential is zero

  • Constraint: mj=1pj=1

  • Apply method of Lagrange multipliers

  • Solution: ˆpj=xj/n, j=1,,m.

Note: if any xj=0, then ˆpj=0 solved as limit

Example 8.5.1.A

Hardy-Weinberg Equilibrium

  • Equilibrium frequency of genotypes: AA, Aa, and aa

  • P(a)=θ and P(A)=1θ

  • Equilibrium probabilities of genotypes: (1θ)2, 2(θ)(1θ), and θ2.

  • Multinomial Data: (X1,X2,X3) corresponding to counts of AA, Aa, and aa in a sample of size n.

See, e.g.

http://www.nature.com/scitable/definition/hardy-weinberg-equation-299

http://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122

Sample Data

GenotypeAAAaaaTotalCountX1X2X3nFrequency3425001871029

Estimation of θ

  • X3Binomial(n,p=θ2)

˜θ=(˜p)1/2=(X3/n)1/2 =187/1029=.4263.

  • (X1,X2,X3)Multinomial(n,p=((1θ)2,2θ(1θ),θ2))

Solve for MLE

l(θ)=log(f(x1,x2,x3p1(θ),p2(θ),p3(θ)))=log(n!x1!x2!x3!p1(θ)x1p2(θ)x2p3(θ)x3)=x1log((1θ)2)+x2log(2θ(1θ))+x3log(θ2)+(non-θterms)=(2x1+x2)log(1θ)+(2x3+x2)log(θ)+(non-θterms)ˆθ=2x3+x22x1+2x2+2x3=2x3+x22n=0.4247

  • Which estimate is better?

  • Conduct Parametric Bootstrap Simulation!