vignettes/hlm.Rmd
hlm.Rmd
The Heteroscedastic Linear Model (HLM) is of the form \[\begin{equation} \label{eq:hlm} y_i \mid {\boldsymbol{x}}_i, {\boldsymbol{z}}_i \stackrel{\textrm{ind}}{\sim}\mathcal N\big({\boldsymbol{x}}_i'{\boldsymbol{\beta}}, \exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})\big), \end{equation}\] where \(y_i\) is the response for observation \(i\), \({\boldsymbol{x}}_i \in \mathbb R^p\) and \({\boldsymbol{z}}_i \in \mathbb R^q\) are the mean and variance covariates, respectively. Inference is over the unknown parameters \({\boldsymbol{\beta}}\) and \({\boldsymbol{\gamma}}\).
Let \(\boldsymbol{\mathcal D}= ({\boldsymbol{y}}, {\boldsymbol{X}}, {\boldsymbol{Z}})\) denote the observed data, where \({\boldsymbol{y}}= (y_{1},\ldots,y_{n})\), \({\boldsymbol{X}}= (\boldsymbol{x}_{1},\ldots,\boldsymbol{x}_{,} n)\), and \({\boldsymbol{Z}}= (\boldsymbol{z}_{1},\ldots,\boldsymbol{z}_{n})\), such that the loglikelihood function is \[ \ell({\boldsymbol{\beta}}, {\boldsymbol{\gamma}}\mid \boldsymbol{\mathcal D}) = - \frac 1 2 \sum_{i=1}^n \frac{(y_i - {\boldsymbol{x}}_i'{\boldsymbol{\beta}})^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} + {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}. \] Then a relatively stable and straightforward algorithm for maximizing \eqref{eq:hlm} is blockwise coordinate ascent, alternating between updates of \({\boldsymbol{\beta}}\) and \({\boldsymbol{\gamma}}\). That is, for fixed \({\boldsymbol{\gamma}}\) the conditional likelihood corresponds to \[ y_i \stackrel{\textrm{ind}}{\sim}\mathcal N\big({\boldsymbol{x}}_i'{\boldsymbol{\beta}}, 1/w_i\big), \] where \(w_i = \exp(-{\boldsymbol{z}}_i'{\boldsymbol{\gamma}})\). The conditional maximum likelihood estimate \(\hat {\boldsymbol{\beta}}({\boldsymbol{\gamma}})\) is recognized as the weighted least-squares solution, \[ \hat {\boldsymbol{\beta}}({\boldsymbol{\gamma}}) = ({\boldsymbol{X}}'\operatorname{diag}({\boldsymbol{w}})\ {\boldsymbol{X}})^{-1}{\boldsymbol{X}}'\operatorname{diag}({\boldsymbol{w}}) \ {\boldsymbol{y}}, \] where \({\boldsymbol{w}}= (w_{1},\ldots,w_{n})\). Similarly, fo fixed \({\boldsymbol{\beta}}\) the conditional likelihood corresponds to \[\begin{equation}\label{eq:llvm} \tilde y_i \stackrel{\textrm{ind}}{\sim}\mathcal N\big(0, \exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})\big), \end{equation}\] where \(\tilde y_i = y_i - {\boldsymbol{x}}_i'{\boldsymbol{\beta}}\). We shall refer to \eqref{eq:llvm} as a Log-Linear Variance Model (LLVM) and present two algorithms for estimating its parameters in the following section.
Let \(\delta_i = 1\) express the fact that the response \(y_i\) for observation \(i\) is fully observed, and \(\delta_i = 0\) indicate that it has been right-censored. Assuming that the censoring mechanism and the response are conditionally independent given the covariates, the loglikelihood for the data \(\boldsymbol{\mathcal D}= ({\boldsymbol{y}}, {\boldsymbol{\delta}}, {\boldsymbol{X}}, {\boldsymbol{Z}})\) is given by \[\begin{equation}\label{eq:chlm}
\ell({\boldsymbol{\beta}}, {\boldsymbol{\gamma}}\mid \boldsymbol{\mathcal D}) = - \frac 1 2 \sum_{i \in \mathcal S_\delta} \left\{\frac{(y_i - {\boldsymbol{x}}_i'{\boldsymbol{\beta}})^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} + {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}\right\} + \sum_{i \in \mathcal S_\delta^c} \log\left\{1 - \Phi\left(\frac{y_i - {\boldsymbol{x}}_i'{\boldsymbol{\beta}}}{\exp(\tfrac 1 2 {\boldsymbol{z}}_i'{\boldsymbol{\gamma}})}\right)\right\},
\end{equation}\] where \(\mathcal S_\delta= \{i: \delta_i = 1\}\) and \(\mathcal S_\delta^c= \{i: \delta_i = 0\}\) are the sets of uncensored and censored observations, respectively, and \(\Phi(\cdot)\) is the CDF of the standard normal \(\mathcal N(0, 1)\) distribution.
Finding the MLE of \(({\boldsymbol{\beta}}, {\boldsymbol{\gamma}})\) in \eqref{eq:chlm} can be done with an Expectation-Conditional-Maximization (ECM) algorithm. That is, if \(({\boldsymbol{\beta}}^{(t)}, {\boldsymbol{\gamma}}^{(t)})\) denotes the value of the parameters at step \(t\), the E-step function is given by \[
\begin{aligned}
Q_t({\boldsymbol{\beta}}, {\boldsymbol{\gamma}}) & = E\left[-\frac 1 2 \sum_{i=1}^n \frac{(y_i - {\boldsymbol{x}}_i'{\boldsymbol{\beta}})^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} + {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}\mid \boldsymbol{\mathcal D}, {\boldsymbol{\beta}}^{(t)}, {\boldsymbol{\gamma}}^{(t)}\right] \\
& = - \frac 1 2 \sum_{i=1}^n \frac{s_i^{(t)} - 2 r_i^{(t)} {\boldsymbol{x}}_i'{\boldsymbol{\beta}}+ ({\boldsymbol{x}}_i'{\boldsymbol{\beta}})^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} + {\boldsymbol{z}}_i'{\boldsymbol{\gamma}},
\end{aligned}
\] where…
Dropping the “tilde” to simplify notation, the LLVM model \eqref{eq:llvm} is written as \[ y_i \stackrel{\textrm{ind}}{\sim}\mathcal N\big(0, \exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})\big). \] Its loglikelihood is given by \[ \ell({\boldsymbol{\gamma}}) = -\frac 1 2 \sum_{i=1}^n \left[\frac{ y_i^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} + {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}\right], \] and the score function and Hessian are given by \[ \begin{aligned} {\boldsymbol{g}}({\boldsymbol{\gamma}}) & = \frac{\partial \ell({\boldsymbol{\gamma}})}{\partial {\boldsymbol{\gamma}}} = \frac 1 2 \sum_{i=1}^n \left[\frac{y_i^2}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} - 1\right]{\boldsymbol{z}}_i \\ {\boldsymbol{H}}({\boldsymbol{\gamma}}) & = \frac{\partial^2 \ell({\boldsymbol{\gamma}})}{\partial {\boldsymbol{\gamma}}\partial {\boldsymbol{\gamma}}'} = - \frac 1 2 \sum_{i=1}^n \frac{y_i^2 {\boldsymbol{z}}_i{\boldsymbol{z}}_i'}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})}. \end{aligned} \]
The expected Fisher information is \[
\boldsymbol{\mathcal I}({\boldsymbol{\gamma}}) = -E[{\boldsymbol{H}}({\boldsymbol{\gamma}})] = \frac 1 2 \sum_{i=1}^n \frac{E[y_i^2] {\boldsymbol{z}}_i{\boldsymbol{z}}_i'}{\exp({\boldsymbol{z}}_i'{\boldsymbol{\gamma}})} = \tfrac 1 2 {\boldsymbol{Z}}'{\boldsymbol{Z}}.
\] Therefore, the Fisher scoring algorithm to obtain the MLE of \({\boldsymbol{\gamma}}\) is \[
{\boldsymbol{\gamma}}_{m+1} = {\boldsymbol{\gamma}}_m + \boldsymbol{\mathcal I}({\boldsymbol{\gamma}}_m)^{-1} {\boldsymbol{g}}({\boldsymbol{\gamma}}_m).
\] An initial value can be found by noting that \[
r_i = \log(y_i^2) = {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}+ \varepsilon_i, \qquad \exp(\varepsilon_i) \stackrel{\textrm{iid}}{\sim}\chi^2_{(1)}.
\] Since \[
\rho = E[\varepsilon_i] = \psi(\tfrac 1 2) + \log 2,
\] where \(\psi(x)\) is the digamma function, an initial value for \({\boldsymbol{\gamma}}\) is given by the linear regression estimator \[
{\boldsymbol{\gamma}}_0 = ({\boldsymbol{Z}}'{\boldsymbol{Z}})^{-1}{\boldsymbol{Z}}'({\boldsymbol{r}}- \rho).
\] Following the stats::glm.fit()
function in R, the stopping criterion for the algorithm is either when more than \(M_{\textrm{max}}\) steps have been taken, or when \[
\frac{|\ell({\boldsymbol{\gamma}}_{m} ) - \ell({\boldsymbol{\gamma}}_{m-1} )|}{0.1 + |\ell({\boldsymbol{\gamma}}_m )|} < \epsilon.
\]
A different estimation algorithm is obtained by noting that the score function may be written as \[ {\boldsymbol{g}}({\boldsymbol{\gamma}}) = \frac 1 2 \sum_{i=1}^n w_i \cdot (r_i - {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}), \qquad w_i = \frac{\exp(r_i - {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}) - 1}{r_i - {\boldsymbol{z}}_i'{\boldsymbol{\gamma}}}. \] Thus we have \({\boldsymbol{g}}({\boldsymbol{\gamma}}) = \tfrac 1 2 {\boldsymbol{W}}({\boldsymbol{r}}- {\boldsymbol{Z}}{\boldsymbol{\gamma}})\) where \({\boldsymbol{W}}= \operatorname{diag}(w_1, \ldots, w_n)\). Setting the score function to zero produces the iteratively reweighted least squares (IRLS) algorithm \[ {\boldsymbol{\gamma}}_{m+1} = ({\boldsymbol{Z}}'{\boldsymbol{W}}_m {\boldsymbol{Z}})^{-1}{\boldsymbol{Z}}'{\boldsymbol{W}}_m {\boldsymbol{r}}, \qquad {\boldsymbol{W}}_m = {\boldsymbol{W}}({\boldsymbol{\gamma}}_m). \]