Just because there are potential adjustments to the models does not mean:
- Statisticians can use them; and
- Policy-makers will use them.
For instance, Rothstein’s paper depends on having data from Grades 3-5, an impossibility when you are setting up your system for the first time and never for children in earlier grades (“though unavoidable data limitations would prevent its widespread adoption. Most importantly, this VAM is not available for the assessment of teachers in the first three grades in which students are tested.”). He ends with “Although some assumptions about the assignment process permit
nearly unbiased estimation, other plausible assumptions yield large biases.”
This leads to the second point, which is that policy-makers, even the most lovely of them, have to use a model that works across legislated or mandate scope and is interpretable by non-technical folks. Their incentive to have well-adjusted models may be much less than having a model now for all of the teachers so they can implement the changes to the system requested, e.g. establish a penalty system for poor performance or re-allocate resources. This is often where the most problem occurs. Any global bias - and Rothstein is completely agreeing with Genovese that this is a large problem, see his self-referenced 2008 paper - is going to be a large and persistent source of errors that PhD level economists will be forced to shake their head about for generations to come since fixing it requires a lot of work.