Parametric analyses such as t tests and ANOVAs are the norm—if not the default—statistical tests found in quantitative applied linguistics research (Gass 2009). Applied statisticians and one applied linguist (Larson-Hall 2010, 2012; Larson-Hall and Herrington 2010), however, have argued that this approach may not be appropriate for small samples and/or nonnormally distributed data (e.g. Wilcox 2003), both common in second language (L2) research. They recommend instead ‘robust statistics’ such as bootstrapping, a nonparametric procedure that randomly resamples from an observed data set to produce a simulated but more stable and statistically accurate outcome. The present study tests the usefulness of bootstrapping by reanalyzing raw data from 26 studies of applied linguistics research. Our results found no evidence of Type II error (false negative). However, 4 out of 16 statistically significant results were not replicated (i.e. a Type I error ‘misfit’ five times higher than an alpha of .05). We discuss empirically justified suggestions for the use of bootstrapping in the context of broader methodological issues and reforms in applied linguistics (see Plonsky 2013, 2014).