This guy keeps saying stuff that is blatantly, factually wrong and it’s bugging me.
Primarily the stuff about the robot preferring the “quick, easy” solution. Because if you were adding in secondary reward modifiers to make the robot not just seek “button or tea” but also engage secondary concerns like “quick, easy” that somehow make the reward more desirable when done in those manners, then you already have the answer to your conundrum! Just add a secondary concern whereby any recognized attempt to press the button reduces the reward function for tea to zero.