18. Bayesian Statistics (cont.)
Using the data, we want to update that belief and transform it into a posterior belief
⇒ We can model our prior belief using a distribution for p, as if p was random
→ This distribution is called prior distribution
The basic steps in Bayesian inference are as follows:
Bayes’ theorem states that the posterior probability distribution is proportional to the product of the prior probability distribution and the likelihood function, which describes the probability of observing the data given the hypothesis or parameter. The normalization constant in Bayes’ theorem ensures that the posterior probability distribution integrates to 1.
Bayesian inference has several advantages over other statistical approaches. For example, Bayesian inference allows for the incorporation of prior knowledge or beliefs, which can improve the accuracy of statistical inference, especially in situations where the sample size is small. Bayesian inference also provides a flexible framework for modeling complex data structures and can handle missing data and other types of uncertainties. However, Bayesian inference can be computationally intensive, and the choice of prior distribution can have a significant impact on the posterior distribution.
In general, we can still define a posterior distribution using an improper prior, using Bayes’s formula.
If $p \sim U(0,1)$ and given $p,X_1,\cdots,X_n \stackrel{\text{i.i.d.}}{\sim} Ber(p):$
$\pi(p|X_1,\cdots,X_n)\propto p^{\sum_{i=1}^n X_i} (1-p)^{n-\sum_{i=1}^n X_i}$
⇒ the posterior distribution is:
$B(1+\sum_{i=1}^nX_i,1+n-\sum_{i=1}^nX_i)$
If $\pi(\theta) = 1 \forall \theta \in R$ and given $\theta, X_1,…,X_n \stackrel{\text{i.i.d.}}{\sim} N(\theta,1)$ :
$\pi(\theta|X_1,…,X_n) \propto \exp(-{1\over 2}\sum_{i=1}^n(X_i-\theta)^2)$
⇒ The posterior distribution is:
\[N(\bar X_n,{1\over n})\]If $\eta$ is a re-parametrization of $\theta$(i.e., $\eta = \phi(\theta)$ for some one-to-one map $\phi$), then the pdf $\widetilde{\pi}(.)$ of $\eta$ satisfies: $\widetilde{\pi}(\eta)\propto \sqrt{\det \widetilde{I}(\eta)}$,
Where $\widetilde{I}(\eta)$ is the Fisher information of the statistical model parametrized by $\eta$ instead of $\theta$.
⇒ This is confidence region (Frequentist) at level 75%
Likelihood: $P(X_1,X_2|\theta)$
Let assume that $X_1=5,X_2=7$.
$P(5,7|\theta) = \begin{cases} 1/4,if \theta=6\over 0, otherwise \end{cases}$
posterior: $\pi(\theta|5,7) = {\pi(\theta)p(5,7|\theta)\over \sum_{t\in Z}\pi(t)p(5,7|t)}$
⇒ $\pi(\theta|5,7) = \begin{cases} 1,if \theta=6 \over 0, otherwise \end{cases}$
This is the posterior mean.