The longer I use specifications grading, and the more I see how differently students experience college courses that use mastery grading compared to courses that don't, the more I believe that the reform of our grading practices is an urgent ethical imperative. Like I said on Twitter last week:

I switched from traditional, points-based, no-revision grading a few years ago to specifications grading because I had a strong sense that not only was traditional grading uninformative (large numbers of false positives and false negatives, and no clear link between the grade and what the students can do) but actively harmful to many students in many ways, one of the biggest being motivation. When I used traditional grading, students always seemed motivated not by the promise of learning the subject but by the inner game of scoring enough points in the right ways to get the grade they needed to move on --- or else they had no motivation at all.

This intution that traditional grading is demotivating was just that: An intuition. But a study I came across recently gives results about the real effects of traditional grading on motivation.

Chamberlin, K., Yasué, M., & Chiang, I. C. A. (2018). The impact of grades on student motivation. Active Learning in Higher Education, 1469787418819728.

Link to paper: https://journals.sagepub.com/doi/pdf/10.1177/1469787418819728

The authors in this study investigate how "multi-interval" grades (read: the A/B/C/D/F system) affect the basic psychological needs and academic motivation of students when compared with "narrative evaluation", where the instructor gives students verbal feedback both instead of, and in addition to multi-interval grades.

The theoretical basis of the study is self-determination theory (SDT). This framework is where we get the concepts of extrinsic and intrinsic motivation, where people are motivated to complete a task either by an external reward or for the sake of the task itself, respectively. (For more background, I wrote about SDT and flipped learning in this post.) According to SDT, there are three basic psychological needs that learners have while they are involved in a learning process: competence (the need to be good, or at least feel that they are good, at what they are learning), autonomy (having choice and agency), and connectedness or "relatedness" (being psychologically connected to others while doing the task). Essentially, the more these three needs are met in a learning process, the more intrinsic motivation the learner will experience; the lack of satisfaction of these needs leads to less intrinsic motivation, either in the form of extrinsic motivation or no motivation at all.

They studied 394 students at three different universities. One of those universities gave exclusively multi-interval grades in its classes; another had institutionally eschewed multi-interval grades and used only narrative evaluations in its courses. This is where instead of a grade, students get verbal feedback (that is honest, detailed, constructive, and actionable) on what they did and what they need to do. The third used a mix of narrative evaluation and letter grades. The students were given two surveys on academic motivation, and a subset of those underwent semi-structured interviews to dig deeper.

The results are a sobering indictment of traditional grading. Here are just a few that stood out.

Students were asked, among other things, about what information (if any) they got from their grades, whether their grades affected their decisions on what classes to take, and whether their relationship with grades had changed since high school. The prevailing opinion was that grades do not convey "competence-enhancing feedback" that can be used to improve; most students could not give any examples of how they used grades to improve their learning. Worse, the information that grades did give students tended to be negative signals about the students' self-worth. High-achieving students experienced pressure to achieve high grades; low-achieving students felt condemned by their low grades. All students associate the word "stress" with grades far more frequently than any other concept.

Moreover, traditional grades actively decayed students' sense of autonomy because many times the grade they get and what they have learned seem unrelated. As one student said:

And it was actually pretty frustrating because it felt like even in classes where I was really into the content and worked really hard I came out with a B+. And in classes that I didn’t care about and didn’t work very hard I still got a B+.

Grades worked against relatedness as well, as expressed by some students who described how their relationships with their parents suffered when their grades were poor.

The authors also noted that when discussing traditional grades, students readily adopted capitalist-style business language, for example referring to "cost-benefit analysis" and "payoffs" in describing how they approach class. That's strategic learning and extrinsic motivation taking hold.

The results from students who experienced narrative evaluation were almost completely the opposite of the results from multi-interval grading. Every "narrative evaluation" student interviewed expressed that narrative evaluation gave them usable information about their competence and were more useful than multi-interval grades. The study found strong links between narrative evaluation and enhanced competence, autonomy, and connectedness, and many of the students commented about how narrative evaluation built trust between the student and the instructor --- even if the feedback was largely negative.

These results came not just from the interviews but also from the quantitative results of the surveys, with statistically significant differences in measures of academic motivation found between students from traditional grading backgrounds versus narrative backgrounds (with narrative grading leading to higher indicators of motivation). Students from the university that used mixed grading methods experienced some of the benefits of narrative evaluation, but also some of the detractions of traditional grading --- and although the study didn't say this directly, it seems clear to me that the detractions happen because of the letter grades. (If you put a student in a "mixed" environment and give them good narrative evaluations followed by a "B+" grade, guess what the student will tend to focus on?)

So what do we do about this? For me, the course of action is clear: We need to walk away from traditional grading --- in which I include not only multi-interval letter grades but also grades based on statistical point accumulation. We've seen enough. Grades are harmful to students' well-being; they do not provide accurate information for employers, academic programs, or even students themselves; and they steer student motivations precisely where we in higher education do not want those motivations to go. There is no coherent argument you can make any more that traditional grading is the best approach, in terms of what's best for students, to evaluating student work. If we value our students, we'll start being creative and courageous in replacing traditional grading with something better.

Cue the objections about how this can't be done because of transfer credit issues, making non-traditional grading work at scale, etc. I agree partially, in the sense that this move is a long sequence of small steps. The article here is similarly pragmatic and gives some good advice:

Few universities are likely to abolish grades. However, universities should question the conventional use of multi-interval grades and consider their advantages and disadvantages in different departments, years of study, courses and learner types. For example, there may be specific courses or programs [...] in which cultivating deep learning and motivation may be more important than standardized communication of performance to external audiences. For such courses, greater use of narrative evaluations (as opposed to multi-interval grades) may be warranted. In addition, withholding grades from students or providing narrative/ written feedback several days prior to the grades may help students focus on mastery-related learning goals rather than extrinsic rewards.

I'd add the following ideas that I've learned from using specifications grading and hearing about how others use this and other forms of mastery grading:

  • It's possible to keep the A/B/C/D/F system for reporting semester grades, but use narrative evaluation and mastery grading instead of points and statistics to determine students' grades. Here's an example.[1]
  • Do what the article suggests and start changing your grading practices over to something less focused on letters and points, in those courses where narrative evaluation and mastery grading make the most sense: Graduate courses, seminars, proof-oriented upper-level math courses, honors sections of courses, and so on.
  • I think you could also make a strong case that introductory courses are also fertile ground for trying out narrative evaluation and/or mastery grading because these are where student motivation tends to be at its lowest point.
  • Treat student work --- as much of it as possible --- like submissions to a journal. When we academics submit articles to a journal (or tenure portfolios, etc.) we don't get a point value or letter grade attached. We get verbal feedback with a brief summary: "Accept", "reject", "Major revision", "Minor revision" etc. followed by details. Then assign course grades, if you must have them, based on how much acceptable work the student was able to produce.

There are some practical issues at work here that can't be minimized, for example (and especially) large sections. The issue of scaling is a tough one, but it's not impossible. In my experience with specs grading, doing narrative evaluation takes no more time per student than traditional grading (which involves endless hair-splitting on how many points to give a response), so I don't think there's any reason to believe that nontraditional grading can't scale up.

Moving away from traditional grading could be one of those Pareto principle concepts where focusing intently on this one idea could usher in outsized improvements in many other areas of student learning. I think it could be a fulcrum for bringing about wholesale, even revolutionary change in higher education. Let's give it a try.


  1. Although: I have to admit that recently, I've noticed that students in my specs-graded classes tend to focus laser-like on their grading checklist where they keep track of how many Learning Targets they've passed, rather than on what those Targets represent. In other words the specs end up becoming a proxy for letter grades and students fixate on those accordingly. I'm still thinking about how to handle this. ↩︎