13.10.2022 (Thursday)

Dr Tamara Broderick (Massachusetts Institute of Technology (MIT).)
13 Oct at 14:00 - 15:00
KCL, Strand - Webinar

One hopes that data analyses will be used to make beneficial
decisions regarding people's health, finances, and well-being. But the
data fed to an analysis may systematically differ from the data where
these decisions are ultimately applied. For instance, suppose we
analyze data in one country and conclude that microcredit is effective
at alleviating poverty\DSEMIC based on this analysis, we decide to
distribute microcredit in other locations and in future years. We
might then ask: can we trust our conclusion to apply under new
conditions? If we found that a very small percentage of the original
data was instrumental in determining the original conclusion, we might
expect the conclusion to be unstable under new conditions. So we
propose a method to assess the sensitivity of data analyses to the
removal of a very small fraction of the data set. Analyzing all
possible data subsets of a certain size is computationally
prohibitive, so we provide an approximation. We call our resulting
method the Approximate Maximum Influence Perturbation. Our
approximation is automatically computable, theoretically supported,
and works for common estimators --- including (but not limited to)
OLS, IV, GMM, MLE, MAP, and variational Bayes. We show that any
non-robustness our metric finds is conclusive. Empirics demonstrate
that while some applications are robust, in others the sign of a
treatment effect can be changed by dropping less than 0.1% of the data
--- even in simple models and even when standard errors are small.

Posted by maria.kalli@kcl.ac.uk