Ed.: Day 3 of the “blog every weekday for a month” challenge. Feeling strong so far.

Back in the summer I was teaching Calculus online – the first time I’d ever taught an online course, and the first time my department had ever offered one. I was using specifications grading for the course because I’m sold on specs grading, and because I think it’s an excellent fit for online courses. But not everyone shares my outlook.

At one point I got a call from the chair of our IRB who had received a call from a parent. The parent was calling to complain about specs grading, specifically that I was “experimenting” on her child, and she wanted to know whether I had gotten the proper IRB approval for experimenting on people.

I have a lot of thoughts about that situation, but one question that stuck out in my mind from it was, Is specs grading really all that experimental? It’s certainly different from the norm, but why is “the norm” what it is? Is there demonstrable evidence that the standard A/B/C/D/F system using points is anything more than just a convention that we keep around because we can’t imagine anything else?

So I did some research, starting with the first chapter of Linda Nilson’s book on specs grading which has a historical overview, and somehow I happened upon this short but mind-blowing paper [PDF], more like a literature review, about the history of grading in higher education. It sounds dry but the facts in it are fascinating:

• The diary of Ezra Stiles, president of Yale University in the late eighteenth cenury, contains the first evidence of “grading” in American higher education, in the form of classifications of students: Optimi, Inferiones, and Pejores. But not A,B,C,D, or F.
• The method for assigning these “grades” is questionable – it might have been based more on social class than academic performance. However in the early nineteenth century there is the first appearance of using a marking system on a scale of 4 to determine the classifications. Thus, the “4.0” is actually older than the “A”.
• After this point, the way students were classified for “grades” is inconsistent, to put it mildly. The system, such as it was, at Yale was:

No. 1. (Names listed) The first in their respective classes; No. 2. Orderly, correct, and attentive; No. 3. They have made very little improvement; No. 4. They have learnt little or nothing.

• Meanwhile at Harvard, a 20-point scale (!) was being used in rhetoric while a 100-point scale was used in mathematics and philosophy. At Yale, they changed to a 9.0 scale in 1813 only to change it back to 4.0 in 1832.
• Other universities had no numerical scales at all. William and Mary didn’t use one before 1850. Michigan held out until 1860 and used three letter grades, P (passing), C (conditioned) and A (absent).
• At Harvard things got weirder and weirder. In 1877, they changed to a six-grade system using a 100 percent basis. In 1883, the first recorded use of the “B” to represent a grade showed up. Then in 1884, the faculty did away with the six-division system in favor of a five-level system without “minute percentages” to determine the rankings. Then in 1895, there were three levels of classification – “failed”, “passed”, and “passed with merit”. Then at some point prior to 1896, Harvard dropped decimal points altogether and went with an integer-only system, a 225 on which was required for passing.
• Elsewhere, at Mount Holyoke College in 1897 we have the first instance of the A/B/C/D/F system – except “F” was “E” – and tied to a 100 point scale.

Here’s what I learned from this paper.

1. The system of grading we have right now – A/B/C/D/F using points – is not sacred. It is rather the result of about 200 years of unscientific trial and error, with no particular evidence of systematic improvement over the years. Universities simply threw stuff at the wall to see what would stick, and some things hit better parts of the wall than others. There is no particular reason we ended up with the system we have, other than dumb luck and the influence of some universities over all the others.
2. The A/B/C/D/F with points system is only a little over 100 years old – it is a relative newcomer to higher education, seeing as how the first university was established in 1088. Not only is it neither sacred nor particularly scientific, it’s not really even that much of a tradition, even in the US.
3. Specifications grading is not only not experimental, it is more grounded in the history and practice of higher ed than “traditional” grading. If you were to look for a grading system that looks like what the great American universities used 200 years ago, then you’d have to look at the systems used by Yale and Harvard in the 1800s. Those systems involved classifying student work as Passing, Good, or Not Passing based on broad non-numerical measures. Sound familiar?
4. Therefore if a professor – or an entire profession! – wanted to change the way we grade, we have every right to do so. Especially if there is some evidence that doing so will promote student learning and make faculty’s lives easier.