Time-Varying Coefficient Models

class: center, middle, inverse, title-slide

# Time-Varying Coefficient Models
### Isaac Quintanilla
### April 21, 2020

---

## Presentation Access

This presentation is available at:

[gitlab.com/iquintanilla/gss](https://gitlab.com/iquintanilla/gss)

---
## Thank You to all essential workers!

- Healthcare workers

- Farmworkers

- Grocery workers

- All other essential workers

---
## UCR Land Acknowledgement

We at UCR would like to respectfully acknowledge and recognize our responsibility to the original and current caretakers of this land, water, and air: the Cahuilla [ka-wee-ahh], Tongva [tong-va], Luiseño [loo-say-ngo], and Serrano [se-ran-oh] peoples and all of their ancestors and descendants, past, present, and future. Today this meeting place is home to many Indigenous peoples from all over the world, including UCR faculty, students, and staff, and we are grateful to have the opportunity to live and work on these homelands.

---
## Table of Contents

* Time-Varying Coefficient Models

* Generalized Time-Varying Coefficient Models

* Bandwidth and Kernel Function

* Simulation Study
  * Normal Outcome
  * Binary Outcome
  
---
## Varying-Coefficient Models

- Fan and Zhang (2008)
- Hastie and Tibshirani (1993)
- Hoover, Rice, Wu, and Yang (1998)
- Cai, Fan, and Li (2000)
- Zhang and Lee (2000)
- Kürüm, Li, Wang, and Şentürk (2014)
- Kürüm, Li, Shiffman, and Yao (2016)

---
layout: false
class: inverse, middle, center

# Time-Varying Coefficient Models
---
## Model
Parametric Model:
.size130[
$$
Y=\boldsymbol X^\mathrm T\boldsymbol\beta+\epsilon
$$
]
--

Time-Varying Coefficient Model:
.size130[
$$
Y=\boldsymbol X^\mathrm T\boldsymbol\beta(t)+\epsilon
$$
]

???
- `$\boldsymbol \beta$` vector of coefficients
- `$\boldsymbol X$` vector of predictors
- `$Y$` response variable
- `$\epsilon$` error term
- `$\boldsymbol \beta(t)$` varying coefficient

---

## Estimation

- `$\boldsymbol \beta (t)$` is unknown

### Approximation techniques

- Polynomial Splines

- Smoothing Splines

- Local Polynomials

???

We will focus on local polynomials

- directly estimate the function at a grid point

- The function is described by a set of gridpoints

- Use Local Linear models

---

## Local Linear Model

For a set of grid points, the varying coefficient is approximates around `$t_0$` with a Taylor's Expansion
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol \beta(t_0)+\boldsymbol \beta^\prime(t_0)(t-t_0)
$$
]

The model can be rewritten as
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol a+\boldsymbol b (t-t_0)
$$
]

???

* `$\boldsymbol \beta(t_0)$` is the function
* `$\boldsymbol \beta(t_0)^\prime$` is the first derivative with respect to t
* As long as we are in the neighborhood of `$t_0$` this approximation is correct.
* No higher order polynomial are necessary
* Lessens the chances to be affected by the curse of dimensionality
* You need to choose a bandwidth and kernel function

---

### Estimation Procedure

---

### Estimating value `$t_0=0.5$`

Bandwidth and Kernel Function

---
## Local Least Squares

For `$n$` subjects, each subject containing `$n_i$` measurements, the local least squares is formulated as

`\begin{equation}
L(\boldsymbol a,\boldsymbol b)=\sum^n_{i=1}\sum^{n_i}_{j=1}\left[Y_{ij}-\boldsymbol X_i^\mathrm T \{\boldsymbol a+\boldsymbol b(t_{ij}-t_0)\} \right]^2K_h(t_{ij},t_0)
\end{equation}`

--
- `$t_{ij}$`: time point

--
- `$Y_{ij}$`: outcome

--
- `$\boldsymbol X_{i}$`: time-invariant predictors

--
- `$K_h(\cdot)$`: kernel function with associated bandwidth `$h$`

---
## Weighted Least Squares Estimator

The estimates for `$\boldsymbol a (t_0)$` are found with with least squares estimator:

`\begin{equation}
\hat {\boldsymbol a} (t_0)=(\boldsymbol I_p, \boldsymbol 0_p)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal Y_i\right)
\end{equation}`

--
- `$\mathcal Y_i$`: vector of repeated measurements for `$i^{th}$` subject

- `$\mathcal K_i$`: `$n_i \times n_i$` diagonal matrix accounting for the weights

- `$\mathcal X_i$`: `$n_i \times 2p$` design matrix of local linear model

- `$p$`: number of predictors

???

Since we only care about the function, we do not need to know how the first derivative looks like.

Next slide is messy

---
## Asymptotic Theory
According to Zhang and Lee (2000), the asymptotic distribution is
`\begin{equation}
cov^{-1/2}\{\hat{\boldsymbol a}(t_0)\}[\hat{\boldsymbol a}(t_0)-\boldsymbol a (t_0)-bias\{\hat{\boldsymbol a}(t_0)\}]\xrightarrow{D} N(\boldsymbol 0,\boldsymbol I_{p})
\end{equation}`

`\begin{equation}
bias\left\lbrace\hat{\boldsymbol a}(t_0)\right\rbrace=2^{-1}h^2\mu_2\boldsymbol a^{\prime\prime}(t_0)
\end{equation}`

`\begin{equation}
cov\{\hat{\boldsymbol a}(t_0)\}=\lbrace nh f(_0)E(XX^\mathrm T|T=t_0)\rbrace^{-1}\nu_0\sigma^2(t_0)
\end{equation}`

--
--

- `$\nu_i=\int t^i K^2(t)dt$`

- `$\mu_i=\int t^iK(t)dt$`

- `$f(t_0)$`: Density function of `$T$`

???

The main thing to take away is that the estimators are asymptotically normal

---

## Asymptotic Covariance

.size60[
`\begin{eqnarray*}
& \widehat cov \{\hat {\boldsymbol a}(t_0) \} \approx \\
& (\boldsymbol I_p, \boldsymbol 0_p)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal Q_i \mathcal K_i \mathcal X_i\right)\left(\sum_{i=1}^n\mathcal X_i^\mathrm T \mathcal K_i \mathcal X_i\right)^{-1}(\boldsymbol I_p, \boldsymbol 0_p)^\mathrm T
\end{eqnarray*}`
]

- `$\mathcal Q_i$`: a diagonal matrix of the squared residuals

- The sandwich estimator is used

???

The sandwich estimator has shown to provide consistent results.

---
layout: false
class: inverse, middle, center

# Generalized Time-Varying Coefficient Models
---
## Model

`\begin{equation}
g\lbrace m(t,\boldsymbol X)\rbrace= E(Y|\boldsymbol X,t)=\boldsymbol X^\mathrm T\boldsymbol{\beta}(t)
\end{equation}`

- `$\boldsymbol X$`: vector of predictors

- `$Y$`: outcome

- `$t$`: time point

- `$g(\cdot)$`: canonical link-function

???

g is a canonical link function

---
## Local Linear Model

The model can be rewritten as
.size130[
$$
\boldsymbol \beta(t)\approx \boldsymbol a+\boldsymbol b (t-t_0)
$$
]

???

---
## Local Log-Likelihood Function

For `$n$` subjects, each subject containing `$n_i$` measurements, the local log-likelihood function is constructed as

.size90[
`\begin{equation}
\mathcal{L} (\boldsymbol a,\boldsymbol b)=\sum^n_{i=1}\sum^{n_i}_{j=1}\ell (g^{-1}[ \boldsymbol X_i^\mathrm T\lbrace \boldsymbol a+\boldsymbol b(t_{ij}-t_0)\rbrace],Y_{ij})K_h(t_{ij}-t_0)
\end{equation}`
]

- `$\ell(\cdot,\cdot)$`: log-likelihood function

--
- `$t_{ij}$`: time point

--
- `$Y_{ij}$`: outcome

--
- `$\boldsymbol X_{i}$`: time-invariant predictors

--
- `$K_h(\cdot)$`: kernel function with associated bandwidth `$h$`

---
## Estimator

The estimator that minimizes `$-\mathcal L(\boldsymbol a,\boldsymbol b)$` is found via a Newton-Raphson algorithm with its update

`\begin{equation}
\boldsymbol c^{(it+1)}=\boldsymbol c^{(it)}-\{\mathcal H^{(it)}\}^{-1}\mathcal G^{(it)} 
\end{equation}`

- `$\boldsymbol c = (\boldsymbol a^\mathrm T,\boldsymbol b^\mathrm T)^\mathrm T$`

- `$\boldsymbol c^{(it)}$`: current iteration of `$\boldsymbol c$`

- `$\mathcal H^{(it)}=-\mathcal L^{\prime\prime}(\boldsymbol a,\boldsymbol b)$`

- `$\mathcal G^{(it)}=-\mathcal L^{\prime}(\boldsymbol a,\boldsymbol b)$`

???

Initial estimates can be obtained from glmmm

---
## Asymptotic Theory

Based on the regulatory conditions provided by Cai, Fan, and Li (2000) and Kürüm, Li, Shiffman, et al. (2016), the asymptotic distribution for `$\boldsymbol c_{ML}$` is given as

`\begin{equation}
\sqrt{nh}\left\{\boldsymbol H\left( \boldsymbol c_{ML}-\boldsymbol c\right)-\mathrm{bias}(\boldsymbol c)+o_P(h^2) \right\}\sim N(0,\boldsymbol \Sigma)
\end{equation}`

- `$\boldsymbol H=diag(1,h)\otimes\boldsymbol I_{p}$`

- `$\boldsymbol \Sigma$`: covariance of `$\boldsymbol c$`

---
## One-Step Estimator

To reduce computational burden, Cai, Fan, and Li (2000) propose the one-step estimator

`\begin{equation}
\boldsymbol c_{OS}=\boldsymbol c^{(0)}-\{\mathcal H^{(0)}\}^{-1}\mathcal G^{(0)} 
\end{equation}`

- `$\boldsymbol c^{(0)}$`: initial value of `$\boldsymbol c$`

- `$\mathcal H^{(0)}=-\mathcal L^{\prime\prime}(\boldsymbol a,\boldsymbol b)$`

- `$\mathcal G^{(0)}=-\mathcal L^{\prime}(\boldsymbol a,\boldsymbol b)$`

---

## One-Step Theorem

Cai, Fan, and Li (2000) provides this theorem

`\begin{equation}
diag(1,h)\otimes \boldsymbol I_{p}\{ \boldsymbol c^{(0)}-\boldsymbol c\}=O_p\{h^2+(nh)^{-1/2}\}.
\end{equation}`

This means as long as your initial estimate is close to the truth, `$\boldsymbol c_{OS}$` has the same asymptotic distribution of `$\boldsymbol c_{ML}$`

---

## Using OS Estimator

- Find the estimates for the first grid point using a Newton-Raphson algorithm

- Use the estimates of the first grid point as the initial values for the next grid point's estimates

- Repeat until all grid points' estimates are found

---
## Standard Error

`\begin{equation}
\widehat{cov}\{\hat{\boldsymbol a}(t_0)\}=
 (\boldsymbol I_{p},\boldsymbol 0_{p})
\hat{\boldsymbol \Gamma}_1^{-1}\hat{\boldsymbol \Gamma}_2^{-1}\hat{\boldsymbol \Gamma}_1^{-1}
(\boldsymbol I_{p},\boldsymbol 0_{p})^\mathrm T,  
\end{equation}`
where

.size60[
`\begin{equation}
\hat{\boldsymbol \Gamma}_1=\sum_{i=1}^n \sum_{j=1}^{n_i} z_2\left[\boldsymbol X_i^\mathrm T\left\lbrace\hat{\boldsymbol a}(t_0)+\hat{\boldsymbol b}(t_0)(t_{ij}-t_0)\right\rbrace,Y_{ij}\right]\boldsymbol T_{ij} \otimes(\boldsymbol X_i^\mathrm T\boldsymbol X_i)K_h(t_{ij}-t_0),
\end{equation}`
]

.size60[
`\begin{equation}
\hat{\boldsymbol \Gamma}_1=\sum_{i=1}^n \sum_{j=1}^{n_i} z_1^2\left[\boldsymbol X_i^\mathrm T\left\lbrace\hat{\boldsymbol a}(t_0)+\hat{\boldsymbol b}(t_0)(t_{ij}-t_0)\right\rbrace,Y_{ij}\right]\boldsymbol T_{ij} \otimes(\boldsymbol X_i^\mathrm T\boldsymbol X_i)K_h(t_{ij}-t_0),
\end{equation}`
]

- `$z_j=\frac{\partial^j}{\partial s^j}\ell\{g^{-1}(s),y\}$`

- `$\boldsymbol T_{ij}=(1, t_{ij}-t_0)^\mathrm T(1, t_{ij}-t_0)$` for `$j=1,...,n_i$`.

---
layout: false
class: inverse, middle, center

# Bandwidth and Kernel Function

---

## Bandwidth

The choice of bandwidth has an effect on the bias-variance trade-off

- Smaller `$h$`, smaller bias, larger variance

- Larger `$h$`, larger bias, smaller variance

- Need to find ideal bandwidth to minimize both

- The ideal bandwidth can be found via a cross-validation approach

---

## Kernel Function

Use the Epanechnikov Kernel Function:

`$K(z) = \frac{3}{4}(1-z^2)_+$`

---
layout: false
class: inverse, middle, center

# Simulation Study
---
## Normal Simulation Parameters

- 250 Monte Carlo Datasets

- 250 participants

- 25 equally-space time points from 0 to 1

- 1 predictor from `$N(-2,1)$`

- `$\beta_0(t)=\sqrt t$`

- `$\beta_1(t)=-\sin(t)$`

- Outcome was generated from a normal distribution

---
## Normal Estimation VCM

- WLS estimator was used to obtain vcm estimates at 100 grid points equally spaced from 0 to 1

- `$h=0.1$`

- Epanechnikov Kernel Function was used

---
## Normal Data Results
<img src="Presentation_GSS_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
---
## Normal Data Results
<img src="Presentation_GSS_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
---
## Binary Simulation Parameters

- 250 Monte Carlo Datasets

- 250 participants

- 25 equally-space time points from 0 to 1

- 2 predictor from `$N\{(-1,1)^\mathrm T,diag(1.5^2,.5^2)\}$`

- `$\beta_0(t)=\sin (t)$`

- `$\beta_1(t)=\sqrt t$`

- `$\beta_2(t)=-\cos(t)$`

- Outcome was generated from a latent normal distribution

---
## Binary Estimation VCM

- Initial values obtained from GLMM

- OS estimator was used to obtain vcm estimates at 100 grid points equally spaced from 0 to 1

- `$h=0.1$`

- Epanechnikov Kernel Function was used

---
## Binary Data Results

---
## Binary Data Results

---

## Binary Data Results

---

## Thank You!

Be Safe and Healthy!
---
## Reference
.size30[
Cai, Z., J. Fan, and R. Li (2000). "Efficient Estimation and Inferences
for Varying-Coefficient Models". In: _Journal of the American
Statistical Association_ 95.451, pp. 888-902. ISSN: 0162-1459. DOI:
[10.2307/2669472](https://doi.org/10.2307%2F2669472).

Fan, J. and W. Zhang (2008). "Statistical Methods with Varying
Coefficient Models". In: _Statistics and its interface_ 1.1, pp.
179-195. ISSN: 1938-7989.

Hastie, T. and R. Tibshirani (1993). "Varying-Coefficient Models". En.
In: _Journal of the Royal Statistical Society: Series B
(Methodological)_ 55.4, pp. 757-779. ISSN: 2517-6161. DOI:
[10.1111/j.2517-6161.1993.tb01939.x](https://doi.org/10.1111%2Fj.2517-6161.1993.tb01939.x).

Hoover, D. R., J. A. Rice, C. O. Wu, et al. (1998). "Nonparametric
Smoothing Estimates of Time-Varying Coefficient Models with
Longitudinal Data". En. In: _Biometrika_ 85.4, pp. 809-822. ISSN:
0006-3444. DOI:
[10.1093/biomet/85.4.809](https://doi.org/10.1093%2Fbiomet%2F85.4.809).

Kürüm, E., R. Li, S. Shiffman, et al. (2016). "Time-Varying Coefficient
Models for Joint Modeling Binary and Continuous Outcomes in
Longitudinal Data". In: _Statistica Sinica_ 26.3, pp. 979-1000. ISSN:
1017-0405.

Kürüm, E., R. Li, Y. Wang, et al. (2014). "Nonlinear
Varying-Coefficient Models with Applications to a Photosynthesis
Study". En. In: _Journal of Agricultural, Biological, and Environmental
Statistics_ 19.1, pp. 57-81. ISSN: 1537-2693. DOI:
[10.1007/s13253-013-0157-7](https://doi.org/10.1007%2Fs13253-013-0157-7).

Zhang, W. and S. Lee (2000). "Variable Bandwidth Selection in
Varying-Coefficient Models". In: _Journal of Multivariate Analysis_
74.1, pp. 116-134. ISSN: 0047-259X. DOI:
[10.1006/jmva.1999.1883](https://doi.org/10.1006%2Fjmva.1999.1883).

Aden-Buie, G. (2020). _xaringanthemer: Custom 'Xaringan' CSS Themes_.
https://pkg.garrickadenbuie.com/xaringanthemer,
https://github.com/gadenbuie/xaringanthemer.

Bates, D., M. Maechler, B. Bolker, et al. (2019). _lme4: Linear
Mixed-Effects Models using 'Eigen' and S4_. R package version 1.1-21.
URL:
[https://CRAN.R-project.org/package=lme4](https://CRAN.R-project.org/package=lme4).

Genz, A., F. Bretz, T. Miwa, et al. (2020). _mvtnorm: Multivariate
Normal and t Distributions_. R package version 1.1-0. URL:
[https://CRAN.R-project.org/package=mvtnorm](https://CRAN.R-project.org/package=mvtnorm).

McLean, M. W. (2019). _RefManageR: Straightforward 'BibTeX' and
'BibLaTeX' Bibliography Management_. R package version 1.2.12. URL:
[https://CRAN.R-project.org/package=RefManageR](https://CRAN.R-project.org/package=RefManageR).

R Core Team (2020). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing. Vienna, Austria.
URL: [https://www.R-project.org/](https://www.R-project.org/).

Wickham, H. (2019). _tidyverse: Easily Install and Load the
'Tidyverse'_. R package version 1.3.0. URL:
[https://CRAN.R-project.org/package=tidyverse](https://CRAN.R-project.org/package=tidyverse).

Xie, Y. (2020). _xaringan: Presentation Ninja_. R package version 0.15.
URL:
[https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan).
]