%%%%%%%%%%%%%%%%%%%%
%I) PRELIMINARIES
%%%%%%%%%%%%%%%%%%%%
\documentclass[11pt,reqno]{amsart}
% Packages
%%%%%%%%%%%%%%%%%%
\usepackage{graphicx}%preferred package for inclusion of graphics
\usepackage{comment}
\usepackage{setspace}
\usepackage{enumerate}%for easy choice of enumerator symbol
\usepackage{tabularx}%for tables width user-defined width
\usepackage{ctable} %for toprule, midrule etc. in tables
\usepackage{multirow} %for more flexibility with tables
\usepackage{textcomp}%for cent-symbol
\usepackage[colorlinks=true, urlcolor=blue]{hyperref}%for inserting hyperlinks
\usepackage{caption} %flexibility with tables
\usepackage{subcaption} %flexibility with tables
\usepackage{html}%to get harvard to work, insert immediately before \usepackage{harvard}
\usepackage{url}%to get harvard to work, insert immediately before \title{...}
\usepackage[dcucite]{harvard} %bibliography style, dcu gives commas before year, semicolon between references, and "and" between authors
\usepackage{amssymb} %for the more esotheric math expressions, such as \approxeq
\usepackage{lineno}%for line numbers
\usepackage{lscape}%for inserting landscape format pages
\usepackage{float} % for more flexibility with tables
\usepackage{appendix}%allows for turning appendices on and off
\usepackage{epstopdf} %to allow import of .eps graphics
% Page formatting
%%%%%%%%%%%%%%%%%%%%
\pagestyle{plain} %puts page number center bottom
\setlength{\topmargin}{0in}
\setlength{\textheight}{8.5in}
\setlength{\oddsidemargin}{.0in}
\setlength{\evensidemargin}{.0in}
\setlength{\textwidth}{6.5in}
\setlength{\footskip}{.5in}
% Customized commands
%%%%%%%%%%%%%%%%%%%%%
% Math
\newcommand{\mlt}[1]{\mathbf{#1}} %matrix bold for Latin symbols
\newcommand{\mgr}[1]{\boldsymbol{#1}}%matrix bold for Greek symbols
\newcommand{\kl}{\left(}
\newcommand{\kr}{\right)}
\newcommand{\kll}{\left\{}
\newcommand{\krr}{\right\}}
\newcommand{\kmu}{\mgr{\mu}}
\newcommand{\kpsi}{\mgr{\psi}}
\newcommand{\kphi}{\mgr{\phi}}
\newcommand{\kgam}{\mgr{\gamma}}
\newcommand{\ktheta}{\mgr{\theta}}
\newcommand{\kbeta}{\mgr{\beta}}
\newcommand{\kdelta}{\mgr{\delta}}
\newcommand{\kt}{^{\prime}}
\newcommand{\kdel}{\partial}
\newcommand{\kdot}{\kl . \kr}
\newcommand{\keps}{\epsilon}
\newcommand{\kx}{\mlt{x}}
\newcommand{\kX}{\mlt{X}}
\newcommand{\kV}{\mlt{V}}
\newcommand{\ky}{\mlt{y}}
\newcommand{\kb}{\mlt{b}}
\newcommand{\ki}{\mlt{i}}
\newcommand{\klam}{\lambda}
\newcommand{\kp}{\mlt{p}}
\newcommand{\kprob}{\text{prob}}
\newcommand{\kz}{\mlt{z}}
\newcommand{\ksig}{\sigma^2}
\newcommand{\klog}{\text{log}}
%Special font within regular document
\newcommand{\chp}[1]{\textbf{\textsl{#1}}}
\newcommand{\km}[1]{\textsf{\small{#1}}} %special font for my own comments
\newcommand{\mlab}{\textbf{\texttt{Matlab }}}
%Tables
\newcolumntype{C}{>{\centering\arraybackslash}X} %for centered columns within tabularx,instead of justified (the default)
%Others
\newcommand{\kpm}{PM_{2.5}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%
\citationmode{abbr} %use only "et al" citations
%III) TOP MATTER INFORMATION
\renewcommand{\harvardurl}{URL: \url}%to get harvard to work, insert immediately before \title{...}
\title{Principles of Bayesian Analysis}
\author{AAEC 6564 \\ Instructor: Klaus Moeltner}
\maketitle %this comes at the end of the top matter to set it.
\begin{flushleft}
\begin{tabbing}
\hspace{1.2in}\= \\\\%sets the tab stop
Textbooks:\> \citeasnoun{koop2003}, Ch.1; \citeasnoun{koopetal2007}, Ch.1-2; \citeasnoun{hoff2009}, Ch. 1\\
\mlab scripts:\> \texttt{mod1s1a, mod1s1apublish}\\
\end{tabbing}
\end{flushleft}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Bayesian vs. Classical Estimation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Bayesian data analysis is distinctly different from classical (or ``frequentist'') analysis in its treatment of probabilities, and in its resulting treatment of model parameters when compared to classical parametric analysis.\footnote{Throughout this course we will largely remain within the realm of parametric analysis. However, students should note that there also exists a variety of Bayesian methods for non- and semiparameteric modeling.}\\
Bayesian analysts formulate probabilistic statements about uncertain events before collecting any additional evidence (i.e. ``data''). These ex-ante probabilities (or, more generally, probability distributions plus underlying parameters) are called \emph{priors}. This notion of \emph{subjective probabilities} is absent in classical estimation. In the classical world, all estimation and inference is based solely on observed data.\\
Both Bayesian and classical econometricians aim to learn more about a set of parameters, say $\ktheta$. In the classical mindset, $\ktheta$ contains fixed but unknown elements, usually associated with an underlying population of interest (e.g. the mean and variance for credit card debt amongst U.S. college students). Bayesians share with Classicals the interest in $\ktheta$ and the definition of the population of interest. However, they assign ex ante a prior probability to $\ktheta$, labeled $p\kl \ktheta \kr$, which usually takes the form of a probability distribution with ``known'' moments. For example, Bayesians might state that the above stated debt amount has a normal distribution with mean \$3000 and standard deviation of \$1500. This prior may be based on previous research, related findings in the published literature, or it may be completely arbitrary. In any case, it's an inherently subjective constructs.\\
Both schools then develop a theoretical framework that relates $\ktheta$ to observed data, say a ``dependent variable'' $\ky$, and a matrix of explanatory variables $\kX$. This relationship is formalized via a likelihood function, say $p\kl \ky | \ktheta,\kX\kr$ to stay with Bayesian notation. To stress, this likelihood function takes the exact same analytical form for both schools.\\
The Classical analyst then collects a sample of observations from the underlying population of interest and, combining these data with the formulated structural model, produces an estimate of $\ktheta$, say $\hat{\ktheta}$. Any and all uncertainty surrounding the accuracy of this estimate is solely related to the notion that results are based on a sample, not data for the entire population. A different sample (of equal size) may produce slightly different estimates. Classicals express this uncertainty via ``standard errors'' assigned to each element of $\hat{\ktheta}$. They also have a strong focus on the behavior of $\hat{\ktheta}$ as the sample size increases. The behavior of estimators under increasing sample size falls under the heading of ``Asymptotic Theory''. The properties of most estimators in the Classical world can only be assessed ``asymptotically'', i.e. are only understood for the hypothetical case of an infinitely large sample. Also, virtually all specification tests used by Frequentists hinge on Asymptotic Theory.\\
Bayesians, in turn, combine prior and likelihood via Bayes' Rule to derive the \emph{posterior distribution} of $\ktheta$ as
\begin{equation}
\label{equ1}
p\kl \ktheta | \ky,\kX \kr =
\frac{p\kl \ktheta,\ky |\kX \kr}{p\kl \ky|\kX\kr}=
\frac{p\kl \ktheta\kr p\kl \ky|\ktheta,\kX\kr}{p\kl \ky|\kX\kr}\propto
p\kl \ktheta\kr p\kl \ky|\ktheta,\kX\kr
\end{equation}
Simply put, the posterior distribution is just an updated version of the prior. If the data have high informational content (i.e. allow for substantial learning about $\ktheta$), the posterior will generally look very different from the prior. In most cases, it is much ``tighter'' (i.e. has a much smaller variance) than the prior. There is no room in Bayesian analysis for the Classical notions of ``sampling uncertainty'', and less a-priori focus on the ``asymptotic behavior'' of estimators.\footnote{However, at times Bayesian analysis does rest on asymptotic results - see \citeasnoun{koopetal2007}, Ch. 9.. Naturally, the general notion that a larger sample, i.e. more empirical information, is better than a small one also holds for Bayesian analysis.}\\
The term in the denominator of (\ref{equ1}) is called ``marginal likelihood'', is not a function of $\ktheta$, and can usually be ignored for most components of Bayesian analysis. Thus, we usually work only with the nominator (i.e. prior times likelihood) for inference about $\ktheta$. From (\ref{equ1}) we know that this expression is proportional (``$\propto$ '') to the actual posterior. However, the marginal likelihood is crucial for model comparison, so we'll learn a few methods to derive it as a by-product of or following the actual posterior analysis. For some choices of prior and likelihood there exist analytical solutions for this term.\\
In summary, Frequentists start with a ``blank mind'' regarding $\ktheta$. They collect data to produce an estimate $\hat{\ktheta}$. They formalize the characteristics and uncertainty of $\hat{\ktheta}$ for a finite sample context (if possible) and a hypothetical large sample (asymptotic) case. Bayesians collect data to \emph{update a prior}, i.e. a pre-conceived probabilistic notion regarding $\ktheta$.\\
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Practical Implications of Choosing a Classical or Bayesian Estimation Framework}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
If the sample size is large and the likelihood function ``well-behaved'' (which usually means a simple function with a clear maximum, plus a small dimension for $\ktheta$), Classical and Bayesian analysis are essentially on the same footing and will produce virtually identical results. This is because the likelihood function and empirical data will dominate any prior assumptions in the Bayesian approach.\\
If the sample size is large but the dimensionality of $\ktheta$ is high and the likelihood function is less tractable (which usually means highly nonlinear, with local maxima, flat spots, etc.), a Bayesian approach may be preferable purely from a computational standpoint. It can be very difficult to get good and reliable estimates via Maximum Likelihood (MLE) techniques, but it is usually straightforward to derive a posterior distribution for the parameters of interest using Bayesian estimation approaches, which usually operate via sequential draws from known distributions.\\
If the sample size is small, Bayesian analysis can have substantial advantages over a Classical approach. First, Bayesian results do not depend on asymptotic theory to hold for their interpretability. Second, the Bayesian approach combines the sparse data with subjective priors. If these priors are well informed, this can truly add value (i.e. gain in accuracy and efficiency) to the analysis. Conversely, of course, poorly chosen priors\footnote{For example, priors that place substantial probability mass on practically infeasible ranges of $\theta$ - this often happens inadvertently when parameter transformations are involved in the analysis.} can produce misleading posterior inference in this case. Thus, under small sample conditions, the choice between Bayesian and Classical estimation often boils down to a choice between trusting the asymptotic properties of estimators and trusting one's priors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Model Comparison}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Bayesian setting also offers a very flexible framework for the comparison of competing models. The models don't have to be ``nested'' in the Classical sense. All that's required is that the competing specifications share the same dependent variable, i.e. $\ky$. In contrast, model comparison can be quite tricky in the Classical setting when competing specifications are not nested.\\
Assume you're considering two models, say $M_1$ and $M_2$, each associated with a respective set of parameters, say $\ktheta_1$ and $\ktheta_2$. We want to know which model is more ``probable'', given the observed data. We start by re-writing (\ref{equ1}) with explicit inclusion of model indexes (we'll drop $\kX$ since the exact composition of explanatory data is implicitly covered by model index $M_i$):
\begin{equation}
\label{equ2}
p\kl \ktheta_i | \ky,M_i \kr =
\frac{p\kl \ktheta_i |M_i\kr p\kl \ky|\ktheta_i,M_i\kr}{p\kl \ky|M_i\kr}\quad i=1,2
\end{equation}
This expression shows that differences across models can occur due to differing priors for $\ktheta$ and / or differences in the likelihood function. The marginal likelihood in the denominator will usually also differ across models. We now re-apply Bayes' Rule to derive an expression for the \emph{posterior model probability}
\begin{equation}
\label{equ3}
p\kl M_i |\ky\kr =
\frac{p\kl M_i\kr p\kl \ky|M_i\kr}{p\kl \ky\kr}\quad i=1,2
\end{equation}
where then numerator is the product of \emph{prior model probability} (often set to equal values across models in absence of strong priors) and the marginal likelihood from (\ref{equ2}). We can now construct the \emph{posterior odds ratio} for the two models as
\begin{equation}
\label{equ4}
\frac{p\kl M_1 |\ky\kr}{p\kl M_2 |\ky\kr}=
\frac{p\kl M_1\kr p\kl \ky|M_1\kr}{p\kl M_2\kr p\kl \ky|M_2\kr}
\end{equation}
Under equal model priors (i.e. $p\kl M_1 \kr = p\kl M_2\kr$) this reduces to the \emph{Bayes Factor} for model 1 vs. 2, i.e.
\begin{equation}
\label{equ5}
BF_{1,2}=
\frac{p\kl \ky|M_1\kr}{p\kl \ky|M_2\kr}
\end{equation}
which is simply the ratio of marginal likelihoods for the two models. Since Bayes Factors can become quite large, we usually prefer to work with its logged version
\begin{equation}
\label{equ6}
logBF_{1,2}=
\klog p\kl \ky|M_1\kr -\klog p\kl \ky|M_2\kr
\end{equation}
The derivation of BF's and thus model comparison is straightforward if expressions for marginal likelihoods are analytically known or can be easily derived. However, often this can be quite tricky, and we'll learn a few techniques to compute marginal likelihoods in this course.\\
On a final note, marginal likelihoods can also be used to derive \emph{model weights} in \emph{Bayesian Model Averaging (BMA)}. Simply put, the intuition behind BMA is that we're never fully convinced that a single model is the correct one for our analysis at hand. There are usually several (and often millions of) competing specifications. To explicitly incorporate this notion of ``model uncertainty'', one can estimate every model separately, compute relative probability weights for each model, and then generate model-averaged posterior distributions for the parameters (and predictions) of interest. This would be awkward to accomplish in a Classical framework, and thus constitutes another advantage of employing a Bayesian estimation approach.\footnote{That said, there is now an emerging literature on ``Frequentist Model Averaging''. See, for example, \citeasnoun{hjortclaeskens2003} and \citeasnoun{hansen2007}.}.
\bibliography{AAEC6984bib}
\bibliographystyle{dcu}
\end{document}