Professor rating site finally drops creepy "hotness" rating


Our course evaluations are completed near the end of the course but before the final. My response rates are still around 70% most of the time, but we went to online evaluations a while back and response rates for some courses are routinely below 50%. In a small class, that’s a low enough number of responses that it only takes one disgruntled student to tank the averages, and far too many schools summarize course evaluation data as a mean+std dev, even though the data meets none of the criteria necessary for such a statistical analysis to be valid.

My ratemyprofessor listing has 48 responses, and I’ve taught 2600 students at this job, so more like under 2% for me.

I saw the original twitter assault and resulting crescendo of supporting voices happen in real time two weeks ago, and it was marvelous. To their credit, ratemyprofessor responded and took down the chili pepper ratings within a day (albeit accompanied by a tepid and transparently absurd denial that physical attractiveness is not what the chili peppers were intended to convey). In any case, no college or university pays any attention to the site, so the site’s ability to influence anybody’s professional reputation or career is minimal. But this speaks to the larger question of whether student course evaluations actually measure the thing they are used by university administrations to assess, and the evidence on that is unequivocal: institutional course evaluations are hot garbage:

Here’s a review article from last year:

Money quotes from the abstract:
“Adams (1997) mentioned many problems with the use of SETs that have since appeared in the literature such as validity, reliability, gender bias, and a number of other related issues (Beecham, 2009; Boring, Ottoboni, & Stark, 2016; Braga, Paccagnella, & Pellizzari, 2014; Hoefer, Yurkiewicz, & Byrne, 2012; Spooren, Brockx, & Mortelmans, 2013; Stark & Freishtat, 2014; Wright, 2006; Yunker & Yunker, 2003)”

“As is shown in this paper, the persistent practice of using student evaluations as summative measures to determine decisions for retention, promotion, and pay for faculty members is improper and depending on circumstances could be argued to be illegal.”

And that’s going to be the lever that kills them. There is clear evidence that course evaluations don’t measure what they are being used to measure, that their data is inevitably interpreted and analyzed in ways that are statistically invalid, and that they are grossly and irredeemably biased and inequitable. Government arbitration in Ontario just decided that course evaluations in their current form cannot be used for tenure and promotion decisions:

“According to the evidence, which was largely uncontested, and which came in the form of expert testimony and peer reviewed publications, numerous factors, especially personal characteristics – and this is just a partial list – such as race, gender, accent, age and “attractiveness” skew SET results. It is almost impossible to adjust for bias and stereotypes. Student and faculty gender affects outcomes, as does grade expectation.”

“That evidence … was virtually uncontradicted. It establishes, with little ambiguity, that a key tool in assessing teaching effectiveness is flawed, while the use of averages is fundamentally and irreparably flawed. It bears repeating: the expert evidence called by the [Faculty] Association was not challenged in any legally or factually significant way.”

Student evaluations still have an important role to play in formative assessment and in professional development, and I learn a great deal from mine, but they simply can’t be used ethically to make promotion and tenure decisions.


“We” seem to be operating on the assumption that evaluating an instructor on the basis of sexual attractiveness is a destructive, awful feature to have on a ratings site. I don’t see how the information you’re hoping for about raters would change that assumption.


Why do you believe that information be relevant? There’s no assumption being made. The assessment of the physical attractiveness of any teacher, instructor, or professor by anyone, is obnoxious and insulting and demeaning, and has nothing to do their ability to promote student learning. Whatever it is that ratemyprofessor claims to be assessing, and whatever service it claims to be providing, the deliberate sexualization of instructors is irrelevant and inappropriate. Sexual orientation of the rater and gender of the ratee doesn’t change that.


One of mine says “… the epitome of excellent, middle-aged teaching”, which I love so much I’ve used it in various bylines.


You’ve been asked, twice, why that information impacts your opinion. It’s been explained to you, twice, why you shouldn’t need that information to assess the inappropriateness of the chili pepper rating at ratemyprofessor or the correctness of their decision to remove it.

Why do you think that the gender or sexual orientation of those employing the chili pepper rating makes any difference? How does it help you form an opinion? What assumptions do you feel are being made?


Congratulations on getting a TT, though! May I ask what field he’s in?


I can say, that every time I wrap up a semester and go look at my evals, I have a pretty hard time, because I actually DO care that they got something out of the class at the end of the semester. Maybe departments should regularly send a higher ranked prof out to rate newer profs, but then again, you have the problem of inter-office politics. What if someone just doesn’t like you and has an ax to grind?

So much of what happens in a college classroom is hard to measure. Everyone has different styles and what looks like a failure to one prof looks like success to another. So unless a department has an enforced set of syllabi that everyone has to use, it’s hard to judge another professor (except in some few cases, I think).


Last semester someone just wrote “I LOVE YOU” in their eval comment! :joy:


I was traveling with a group of students on a long bus ride. After a couple hours, one of the students decided to relieve the tedium by reading my ratemyprofessor evaluations out loud to the rest of the students.

I was so tempted to jump out of the window…

Never got a chili pepper, but I’m okay with that.


Yup and yup and yup.

I’m on our Senate Academic Policies committee, and we’re going to start working this fall to burn to the ground our current course evaluations and the manner in which they are used. Most reasonable people, given the data at hand, concede that they aren’t viable in their current form. So the question becomes: what do we replace them with and how do we use them better?

The most important ideas I’m seeing so far include:

  • You can’t average the numerical data and you cannot compare that data across courses of different size, type, function, or discipline
  • For tenure/promotion decisions, student evaluation data reduced to a list of numbers is useless
  • For tenure/promotion decisions, peer evaluation including classroom observation is absolutely required
  • For tenure/promotion decisions, teaching portfolios, including instructor reflection and sample course materials, are absolutely required
  • Student feedback and comments have value for formative evaluation of some aspects of teaching, but not usually for pedagogy

You try to avoid ax grinding the same way you do with peer review of publications or with external referees of tenure files: the candidate gets to make recommendations, and has, if not veto power, a voice in the conversation regarding who gets to do the assessments. (Hopefully the department head doesn’t let the sexist asshats among the faculty play any role in important decisions. Or, you know, talk. To people.) Our peer teaching evaluation process involves meetings between instructor and observer both before and after the observed class, to discuss learning outcomes and pedagogy and other factors, and offers the instructor an opportunity to write a reflection/rebuttal post-observation, which goes in the file.

But yes, the most important stuff that happens in a classroom is hard to measure, because it’s squishy and involves human beings. Universities have been seduced by five-point Likert scales because they produce numbers, so it looks “real”, even though they don’t measure what they’re supposed to. To fix this will require a cultural shift with regard to the role and value of teaching at universities, and a lot of education of university administrators.


My boyfriend is a gay latino and he also got a chili pepper AND a tenure-track position. So it has less to do with “privilege (sic) straight white guy” and more to do with how he is viewed by his peers and students.


Except that privileged straight white guys tend to have an automatic level of respect that women in the classroom often have to fight to receive, no matter how smart or talented we are. That undermines you from day one.

Men (especially those who are straight and white) have much less of a struggle in that regard, even ones who are gay or people of color (yes, I’ve worked with profs who are gay and those who are people of color). Not only have I seen that myself (have worked as a TA for both men and women, of all levels while an MA/PhD candidate and in my own experience teaching for the past few years), but plenty of studies have confirmed that. Men just get more respect before even opening their mouths.


Said every person in denial of the existence of White privilege, ever.


Anecdotal evidence of “exceptions to the rule” don’t negate reality.


And some people who smoke cigarettes live over the age of 100. That doesn’t mean smoking isn’t likely to shorten your lifespan.


And how he is viewed by his peers and students is male. Study after study after study shows, over and over, that what numerical student course evaluations measure, more than anything else, and certainly more than teaching effectiveness, is gender.

Does your boyfriend have a tenure track position in a discipline that understands the difference between anecdotes and data? Did he ever explain it to you?

Yes, obviously, gay male faculty members exist. So what? At my school, queer female faculty act as Associate Deans, and chair important Senate committees. My campus’ engineering school is directed by a woman from Iran who has won awards for both her teaching and research. The two highest ranking administrators on my campus and the pro tem Dean of my faculty and the highest-paid faculty member in my department are all women. Yes, obviously, some university faculty can succeed without the benefit of being straight and white and male.

That’s not the point. The point is that every one of those people gets teaching evaluations that almost certainly would be higher if they were men, because course evaluations, like much of our society, is systemically biased against women. The point is that those evaluations have been used to assess those women’s professional ability, without correcting for that bias. They are almost certainly even better than the numbers show then to be.

Which is part of the reason that (the one data point of your boyfriend and the success of my female colleagues not withstanding) every single demographic that is not “straight white male” remains hopelessly underrepresented among university faculty, as they are in every position of power in our society.






