Standardization simplified

Simulate some data

Here we simulate a sample of 20 participants with observations on: * sex (M or F) * height in cm (from a random normal distribution with mean 180cm and standard deviation 10 cm), plus 5cm for males * weight is the outcome variable, equal to 0.2 * height, minus 2 for the treatment group, plus some noise with standard deviation equal to 1 (N(0, 1))

set.seed(1)
df <- data.frame(
  sex = c(rep("M", 10), rep("F", 10)),
  group = sample(c(rep("trt", 10), rep("placebo", 10)))
)
df$height = rnorm(20, mean = 180, sd = 10) + ifelse(df$sex == "M", 5, 0)
df$weight = df$height * 0.2 - ifelse(df$group == "trt", 2, 0) + rnorm(20)

Estimate the effect of treatment using regression

mod <- lm(weight ~ group + height + sex, data = df)
summary(mod)

## 
## Call:
## lm(formula = weight ~ group + height + sex, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2803 -0.6890  0.2467  0.5510  1.0991 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.00218    5.44217   0.368   0.7178    
## grouptrt    -2.65104    0.38109  -6.956 3.23e-06 ***
## height       0.18876    0.03046   6.197 1.28e-05 ***
## sexM         0.82686    0.37541   2.203   0.0426 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8304 on 16 degrees of freedom
## Multiple R-squared:  0.8349, Adjusted R-squared:  0.8039 
## F-statistic: 26.97 on 3 and 16 DF,  p-value: 1.703e-06

Estimate the effect of treatment using standardization

Need to create new datasets where everyone was treated, or everyone received placebo.

newdatatrt <- df
newdatatrt$group <- "trt" #everyone treated
newdataplacebo <- df
newdataplacebo$group <- "placebo" #everyone placebo

What would the average weight be if everyone received treatment?

(trtmean <- mean(predict(mod, newdata = newdatatrt)))

## [1] 33.8674

What would the average weight be if everyone received placebo?

(placebomean <- mean(predict(mod, newdata = newdataplacebo)))

## [1] 36.51843

What is the causal effect of treatment compared to placebo in this sample?

trtmean - placebomean

## [1] -2.651035

Discussion

How does this compare to the estimate from the regression table? It’s THE SAME. But if there were any interactions with treatment, we would not find the standardized mean treatment effect anywhere in the table of regression coefficients.

Example: if the treatment had a different effect on males than on females, the regression table would only show the effect for the reference sex. You could add the coefficient of the interaction term with sex to get the effect of treatment on the non-reference sex. The standardized effect would be the average effect on the composition of males and females found in this sample. The average treatment effect could be recalculated for any other composition of males and females.

Why standardization then?

If there were interactions with treatment, then the effect of treatment from the regression output would only be for the reference group, it would not be an average treatment effect for the sample.
We can standardize to different samples or to the population, where the distribution of covariates is different than in our sample, even in the presence of interactions.
Standardization is easily amenable to bootstrap simulation to estimate confidence intervals.

Levi Waldron

2023-05-15

Simulate some data

Estimate the effect of treatment using regression

Estimate the effect of treatment using standardization

Discussion

Why standardization then?