Our course evaluations are completed near the end of the course but before the final. My response rates are still around 70% most of the time, but we went to online evaluations a while back and response rates for some courses are routinely below 50%. In a small class, that’s a low enough number of responses that it only takes one disgruntled student to tank the averages, and far too many schools summarize course evaluation data as a mean+std dev, even though the data meets none of the criteria necessary for such a statistical analysis to be valid.
My ratemyprofessor listing has 48 responses, and I’ve taught 2600 students at this job, so more like under 2% for me.
I saw the original twitter assault and resulting crescendo of supporting voices happen in real time two weeks ago, and it was marvelous. To their credit, ratemyprofessor responded and took down the chili pepper ratings within a day (albeit accompanied by a tepid and transparently absurd denial that physical attractiveness is not what the chili peppers were intended to convey). In any case, no college or university pays any attention to the site, so the site’s ability to influence anybody’s professional reputation or career is minimal. But this speaks to the larger question of whether student course evaluations actually measure the thing they are used by university administrations to assess, and the evidence on that is unequivocal: institutional course evaluations are hot garbage:
Oh, more than one, and those studies date back 20 years.
Here’s a review article from last year:
Money quotes from the abstract:
“Adams (1997) mentioned many problems with the use of SETs that have since appeared in the literature such as validity, reliability, gender bias, and a number of other related issues (Beecham, 2009; Boring, Ottoboni, & Stark, 2016; Braga, Paccagnella, & Pellizzari, 2014; Hoefer, Yurkiewicz, & Byrne, 2012; Spooren, Brockx, & Mortelmans, 2013; Stark & Freishtat, 2014; Wright, 2006; Yunker & Yunker, 2003)”
“As is shown in this paper, the persistent practice of using student evaluations as summative measures to determine decisions for retention, promotion, and pay for faculty members is improper and depending on circumstances could be argued to be illegal.”
And that’s going to be the lever that kills them. There is clear evidence that course evaluations don’t measure what they are being used to measure, that their data is inevitably interpreted and analyzed in ways that are statistically invalid, and that they are grossly and irredeemably biased and inequitable. Government arbitration in Ontario just decided that course evaluations in their current form cannot be used for tenure and promotion decisions:
“According to the evidence, which was largely uncontested, and which came in the form of expert testimony and peer reviewed publications, numerous factors, especially personal characteristics – and this is just a partial list – such as race, gender, accent, age and “attractiveness” skew SET results. It is almost impossible to adjust for bias and stereotypes. Student and faculty gender affects outcomes, as does grade expectation.”
“That evidence … was virtually uncontradicted. It establishes, with little ambiguity, that a key tool in assessing teaching effectiveness is flawed, while the use of averages is fundamentally and irreparably flawed. It bears repeating: the expert evidence called by the [Faculty] Association was not challenged in any legally or factually significant way.”
Student evaluations still have an important role to play in formative assessment and in professional development, and I learn a great deal from mine, but they simply can’t be used ethically to make promotion and tenure decisions.