---
title: "Get Started"
description: "A brief introduction to civ."
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This article is a brief introduction to ``civ``.


```r
library(civ)
library(AER)
set.seed(517938)
```

To illustrate ``civ`` on a simple example, consider the data generating process from the simulation of Wiemann (2023). The code snippet below draws a sample of size $n=800$.


```r
# Set seed
set.seed(51944)
# Sample parameters
nobs = 800 # sample size
C = 0.858 # first stage coefficient
sgm_V = sqrt(0.81) # first stage error
tau_X <- c(-0.5, 0.5) + 1 # second stage effects
# Sample controls and instrument
X <- sample(1:2, nobs, replace = T)
Z <- model.matrix(~ 0 + as.factor(sample(1:20, nobs, replace = T)):as.factor(X))
Z <- Z %*% c(1:ncol(Z))
# Create the low-dimensional latent instrument
Z0 <- Z %% 2 # underlying latent instrument
# Draw first and second stage errors
U_V <- matrix(rnorm(2 * nobs, 0, 1), nobs, 2) %*%
  chol(matrix(c(1, 0.6, 0.6, sgm_V), 2, 2))
# Draw treatment and outcome variables
D <- Z0 * C + U_V[, 2]
y <- D * tau_X[X] + U_V[, 1]
```

In the generated sample, the observed instrument takes 40 values with varying numbers of observations per instrument. Using only the observed instrument ``Z``, the goal is to estimate the in-sample average treatment effect:

```r
mean(tau_X[X])
```

```
## [1] 1.0325
```


The  code snippet below estimates CIV where the first stage is restricted to ``K=2`` support points. The ``AER`` package is used to compute heteroskedasticity robust standard errors.


```r
# Compute CIV with K=2 and conduct inference
civ_fit <- civ(y = y, D = D, Z = Z, X = as.factor(X), K = 2)
civ_res <- summary(civ_fit, vcov = vcovHC(civ_fit$iv_fit, type = "HC1"))
```

The CIV estimate and the corresponding standard error are shown below. The associated 95\% confidence interval covers the true effect as indicated by the _t_-value of less than 1.96.

```r
c(Estimate = civ_res$coef[2, 1], "Std. Error" = civ_res$coef[2, 2],
  "t-val." = abs(civ_res$coef[2, 1]-mean(tau_X[X]))/civ_res$coef[2, 2])
```

```
##   Estimate Std. Error     t-val. 
##  1.0063143  0.1086868  0.2409285
```

CIV uses a K-Conditional-Means (KCMeans) estimator in a first step to estimate the optimal instrument. To understand the estimated mapping of observed instruments to the support points of the latent instrument, it is useful to print the ``cluster_map`` attribute of the first-stage ``kcmeans_fit`` object (see also [``kcmeans``](https://thomaswiemann.com/kcmeans/) for details). The code snippet below prints the results for the first 10 values of the instrument. Here, ``x`` denotes the value of the observed instrument while ``cluster_x`` denotes the association with the estimated optimal instrument.

```r
t(head(civ_fit$kcmeans_fit$cluster_map[, c(1, 4)], 10))
```

```
##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x           26   20   10   32   23   12    7   25   33    21
## cluster_x    1    1    1    1    2    1    2    2    2     2
```

# References
Wiemann T (2023). "Optimal Categorical Instruments." https://arxiv.org/abs/2311.17021