MOHS was a mistake

I remember reading a Paul Graham essay about how people can’t think clearly about parts of their identity. In my students, I have never seen this more clearly than when people argue about the difficulty of problems.

Some years ago I published a chart of my ratings of problem difficulty, using a scale called MOHS. When I wrote this I had two goals in mind. One was that I thought the name “MOHS” for a Math Olympiad Hardness Scale was the best pun of all time, because there’s a geological scale of mineral hardness that coincidentally has the same name. The other was that I thought it would be useful for beginner students, and coaches, to help find problems that are suitable for practice.

I think it did accomplish those goals. The problem is that I also inadvertently helped catalyze an endless, incessant stream of students constantly arguing heatedly about whether so-and-so problem is 10 MOHS or 30 MOHS or whatever.

I think there is an inductive chain of failure modes here. To start, it’s hard to reason about the difficulty of a problem, because it threatens your identity as a strong problem-solver if you miss an “easy” problem. Going one further step, if you claim a problem is easier than the consensus, then people might attack you as insensitive, out-of-touch with reality, miscalibrated, elitist, and so on. Since “out-of-touch with reality” is not something most people want to be part of their identity, people also start saying things “this problem is not as easy as you think” to try and send the relevant tribal signals that they’re not one of the head-in-the-clouds-humblebraggers. Which then leads to the pendulum swinging all the way across to “you’re just saying that, students aren’t as dumb as you think”, and so on ad infinitum.

That’s how you can get beautiful Internet phenomenon such as flame wars about whether the IMO bronze cutoff is going to be 15 or 16 points.

That particular example illustrates another thing: although problem difficulty is obviously subjective, adjacent questions like “how many people will solve this problem on this exam?” is a completely objective question that generates just as much controversy.

And more importantly, they generate wrong answers. I see so many examples of students who boldly assert that “I bet 75% of students solved this problem”, only for statistics to come out a week later showing the prediction was completely in the wrong ballpark. Sometimes there is emotion attached, but other times there isn’t. People will casually try to predict outcomes in passing conversation and still find themselves totally off the mark.

So there is something deeper going on. These students are normally pretty smart (because they’re math olympiad students) and also often under-confident (because they’re math olympiad students). So why would they suddenly be so confidently wrong on their field of expertise?

Perhaps judging problem difficulty is just hard? After thinking about it, I’ve begun to suspect that it’s not actually intrinsically as hard as it’s made out to be; instead, most of the trouble1 is actually just self-imposed by people’s egos being tied in.

This leads me to my latest piece of advice: if you are an intermediate-advanced student who doesn’t need help picking practice problems anymore, do not use the MOHS hardness scale. It’s fine to ask questions like, “what is the hardest step of this problem?” or “what makes this problem difficult?”, because that kind of reflection does help you improve.2 But further trying to place that difficulty on a scale from 0 to 50 in multiples of 5 seems to largely be a waste of time, because at that point there is too much emotional baggage attached that isn’t easy to disentangle.


  1. OK, there is one other factor: it’s time-consuming. I think it is true that it’s difficult to judge a problem unless you try it yourself, and olympiad problems take a lot of time. This is an issue for example at the IMO, where the Jury who votes on which problems to use on the exam only gets a few days to work an entire shortlist of 30+ problems. I’ve felt for a while this is simply not enough time, and this leads to a lot of haphazard decisions. 
  2. In my case, when students find a problem harder than I predicted, I’ve been sometimes able to use that to guide my teaching. For a concrete example, see the story about TSTST 2016/4 at the end of this blog post, where lower-than-expected scores on TSTST 2016/4 gave me an idea for a new lesson plan. 

5 thoughts on “MOHS was a mistake”

  1. I’ve been using the mohs hardness scale a bit for picking problems. Mostly it is a filter of the form “I’m looking for a hard geo problem, look for something >= 35.” I use it similarly to how I would use shortlist numbers. Not as an objective measure of difficulty, but as a rough ballpark of how hard the problem is. Would you say that this is a bad use of mohs?

    Also I know you think (and I would probably agree) that mohs is net negative, all things considered. But do you think that there are there any positive uses for individuals to use it for?

    Also, what do you think about using shortlist numbers / oly position numbers to judge difficulty? Before mohs, debates about something being “too easy for a g6” for example would also be common.

    Would you endorse the statement: “Most of the harm from Mohs comes from people talking about it too much/taking it too seriously.”

    Sorry for the firehose of questions, please don’t feel obligated to answer each one individually.

    Asking for a friend.

    Like

    1. > Would you say that this is a bad use of mohs?

      nah that seems fine

      > But do you think that there are there any positive uses for individuals to use it for?

      yes, what you described sounds fine too

      > Would you endorse the statement: “Most of the harm from Mohs comes from people talking about it too much/taking it too seriously.”

      that seems reasonable to me

      Like

  2. I definitely got benefit out of the MOHS hardness scale. I remember one time I had started to feel like I wasn’t making any progress but a friend helped me out by suggesting I try a problem which he reckoned I could do, and that hard a MOHS rating which was higher than I realised I was able to do. I was able to do the problem, and then I decided to try continuing trying problems of the same difficulty and it made me realise I had made a lot of progress because I wouldn’t have been able to do problems of that difficulty 6 months to 1 year before that.

    I think they probably shouldn’t be a regularly used tool (I think the negative sides of difficulty ratings mostly only come into effect when they are overused), but using them every so often to see if you’ve made progress is nice I think (assuming you have been employing a good strategy and have been making progress — if you haven’t been making progress then maybe having confirmation of that is useful but perhaps not so pleasant to find out). I think this can be quite nice but you have to be careful not to set your expectations too high when measuring progress. Checking in a few times a year is probably a good way to help you see your hard work paying off (assuming you have been putting in a fair bit of work into olympiads between these checks, otherwise obviously you aren’t terribly likely to see much progress).

    Another time I found it useful and reassuring was just after I competed in the IMO. Going into the IMO, I was expecting to get a silver if I was a bit lucky, get a bronze if I was a bit unlucky, or get a gold if I was very lucky. I ended up getting an honourable mention with a couple of extra marks, and it made me wonder whether I’d made progress in the 1-2 years leading up to the IMO, and sort of made me feel like I must’ve just been wasting my time. A short while after this, I got back into doing olympiad problems and decided to use MOHS rating to filter for fairly hard problems, and being able to do these helped me confidently believe that my IMO result wasn’t a sign that my training had been pointless, because I knew for certain that I couldn’t have done those difficulty of problems a year before.

    Anyway, I reckon a lot of people get benefit from using it to help them pick suitable problems, or gauge their progress, but you won’t see as much about this on the internet because it can’t turn into a fun heated argument. I certainly agree with what you’ve said in the blog, but I’m not sure how much of this stuff is catalyzed by the MOHS scale (I wonder whether there would be alternative things with the same negatives but without the positives, if MOHS didn’t exist).

    Like

Leave a comment