Standardization simplified
Levi Waldron
2023-05-15
standardization_simplified.Rmd
Simulate some data
Here we simulate a sample of 20 participants with observations on: *
sex (M or F) * height in cm (from a random normal
distribution with mean 180cm and standard deviation 10 cm), plus 5cm for
males * weight is the outcome variable, equal to
0.2 * height
, minus 2 for the treatment group, plus some
noise with standard deviation equal to 1 (N(0, 1)
)
Estimate the effect of treatment using regression
##
## Call:
## lm(formula = weight ~ group + height + sex, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2803 -0.6890 0.2467 0.5510 1.0991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.00218 5.44217 0.368 0.7178
## grouptrt -2.65104 0.38109 -6.956 3.23e-06 ***
## height 0.18876 0.03046 6.197 1.28e-05 ***
## sexM 0.82686 0.37541 2.203 0.0426 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8304 on 16 degrees of freedom
## Multiple R-squared: 0.8349, Adjusted R-squared: 0.8039
## F-statistic: 26.97 on 3 and 16 DF, p-value: 1.703e-06
Estimate the effect of treatment using standardization
Need to create new datasets where everyone was treated, or everyone received placebo.
newdatatrt <- df
newdatatrt$group <- "trt" #everyone treated
newdataplacebo <- df
newdataplacebo$group <- "placebo" #everyone placebo
What would the average weight be if everyone received treatment?
## [1] 33.8674
What would the average weight be if everyone received placebo?
## [1] 36.51843
What is the causal effect of treatment compared to placebo in this sample?
trtmean - placebomean
## [1] -2.651035
Discussion
How does this compare to the estimate from the regression table? It’s THE SAME. But if there were any interactions with treatment, we would not find the standardized mean treatment effect anywhere in the table of regression coefficients.
Example: if the treatment had a different effect on males than on females, the regression table would only show the effect for the reference sex. You could add the coefficient of the interaction term with sex to get the effect of treatment on the non-reference sex. The standardized effect would be the average effect on the composition of males and females found in this sample. The average treatment effect could be recalculated for any other composition of males and females.
Why standardization then?
If there were interactions with treatment, then the effect of treatment from the regression output would only be for the reference group, it would not be an average treatment effect for the sample.
We can standardize to different samples or to the population, where the distribution of covariates is different than in our sample, even in the presence of interactions.
Standardization is easily amenable to bootstrap simulation to estimate confidence intervals.