Mastery Grading - Robert Talbert, Ph.D.

Taming the snowball

Robert Talbert — Fri, 10 Nov 2023 21:18:21 GMT

This post first appeared at Grading For Growth on November 6, 2023.

In alternative grading systems that follow the Four Pillars framework, students have clearly defined standards for what they need to learn and how to demonstrate that they’ve learned it; and reattempts without penalty so that they can take the helpful feedback they receive from us, process it, and iterate on it. It’s crucial to have this feedback loop at the heart of our grading and assessment, because learning takes time.

But what happens if we use an alternative grading setup and a student gets stuck on an early topic? It may be difficult or impossible to halt the flow of the course to wait for the student to catch up, and so the availability of reattempts turns into a snowball: The student is still trying to demonstrate skill on earlier topics while new ones are coming into the queue. So the student not only has to demonstrate skill on the old topics but also the new ones, and every time a new one enters the queue it makes it harder to demonstrate skill on any of them.

It seems like a death spiral for many of the students who might otherwise benefit from an extended time scale and reattempts without penalty. Is this an unavoidable bug in alternative grading? How can we as instructors help mitigate the snowball effect? In this post, I’d like to explore these and other related questions.

This is real life

First, let’s acknowledge that the “snowball effect” in alternative grading can happen and often does. I’m seeing it happen right now in my Discrete Structures course. That course has 15 Learning Targets that are listed on the syllabus in order of appearance in the course. And like a lot of math classes (and math is not unique in this sense), the course builds on itself: Students really need to understand how conditional statements work (Learning Target 3) in order to grasp core ideas of set theory (Learning Targets 8 and 9) which are then needed for combinatorics problems (Learning Targets 12 and 13).

We’re finishing up that module on combinatorics right now, and I am using terms like “subset” and “power set” freely in activities, with the assumption that students are solid on these concepts. But the fact is that many are not. A significant portion of both my sections have not yet demonstrated skill on the assessment for conditional statements; for the assessment on basic set operations the pass rate is less than 50%.

There’s a lot to unpack there. But the immediate issue is that I have only so much ability to hit the pause button on the semester to allow students to drill into the basic core concepts that go into combinatorics problems. I’ll do what I can (see below) but at some point, because there’s a schedule I have to follow in order to prepare students for the second semester of this course, we have to move on, and backfilling skill on of earlier concepts becomes something that the student is at least partially responsible for in their practice time.

And that’s where the snowball starts. We just introduced two core Learning Targets on combinatorics requiring students to complete two separate successful assessments in order to pass the course. But those depend partially on mastery of 1-2 other Learning Targets that also need two attempts, which in turn depend on even earlier ones. If you fall behind on these, it could get ugly.

It’s ugly but not a bug

I think this experience, where you have to move on to concept N+1 before you have fully mastered concept N, is normal in our own experiences as learners. Sometimes we can’t fully grasp a concept or topic until we move on to something higher, requiring us to put that earlier concept on the back burner for a while.

Here’s a recent personal example. As a bass guitarist, I am currently trying to learn how to play walking bass lines like you hear in jazz and blues music. These are deceptively, and devilishly, hard to play well. I have a book on this that I am working through that’s broken into 55 exercises that are roughly in increasing order of difficulty. I was stuck on Exercise 16 for a week before I finally decided to just move on. Last night I tried Exercises 20 and 21, which builds upon the techniques from Exercise 16, and nailed both of them, somehow. I don’t know how this happened, but studying a later exercise when I hadn’t mastered the “prerequisite” exercises, just worked. And afterwards I went back to Exercise 16 and nailed that one too, even though I hadn’t tried it in a week.

The lesson here is that while the “snowball effect” might be inevitable in situations where you get more than one opportunity to demonstrate learning, and while it might be ideal for you while it happens, it’s not a critical flaw in alternative grading – it’s just a normal part of the nonlinear nature of learning things.

Traditional methods almost by definition do not have this snowball effect (unless you count studying for a final exam). But this is because they don’t have feedback loops. By removing the loop, you remove the snowballing. That is a partial win for students, because they never have to deal with shoring up skill on old topics while new ones are emerging; but it’s also a big loss, for the same reason. On balance, students are better off having feedback loops with the possibility of a snowball than they are without feedback loops and no snowball.

How do we help?

So the question isn’t really How do we avoid the snowball effect in alternative grading? Because it might not be possible to avoid it entirely. Instead, how can we instructors help students through it?

First, communicating with students about this situation is important. When explaining your grading system, alert students to the possibility of the snowball. They should strive not to fall behind in the process of demonstrating skill on course concepts because this snowball on assessments can easily happen. But also let them know that, if the snowball happens, it’s normal and not a sign of a deficiency in their intellect. Connect it back to their everyday experiences: Can you think of a time where you had to continue learning a basic concept while also learning something built on that concept? Chances are they have a full store of those experiences because learning takes time and is inherently nonlinear. So, normalize nonlinearity.

Second, while you are communicating about it, provide concrete ways for students to optimize their time on the older topics. For example, make sure students are clear on the expectations for how they will demonstrate skill, and give practice (either in class or not, or both, your choice) that will help them prepare for those demonstrations. You might not even have to make the practice opportunities yourself; there are some great internet resources for this, like this website I use to generate practice for my Learning Target 10 (about determining whether a mapping is a function). You may need to explain how to practice. In my course we also have a Learning Target about doing arithmetic in binary; I explain to students that they can practice by making up two random 8-bit strings, then adding and subtracting them, then checking their work with this online calculator. For topics that aren’t so straightforward, you might have to get more personally involved with the practice (e.g. have students bring you samples of writing for inspection).

Third, try to loosely couple topics that build on each other. That is, try to make your system so that absolute mastery of topic N is not necessary for topic N+1. A non-example would be if I had my Discrete Structures students compute set operations (Learning Target 9) with sets that are in a complicated format like set-builder notation (which requires demonstration of skill of Learning Target 8). That coupling is too tight because now Learning Target 9 is really also an assessment of Learning Target 8. Instead, just give simple sets for Learning Target 9 so that only that topic is being assessed. (This is really just a corollary of the first pillar about having clear content standards — i.e. that when you assess a standard you assess that standard and not some inextricable mashup of that plus other standards.) If the topics are loosely coupled, it becomes easier to work on a new topic while still working out the details of the older ones.

Fourth, consider providing alternative forms of assessment that don’t occupy the same time/space coordinates as your main assessments. For example my Discrete Structures students assess on Learning Targets through in-class checkpoints (example). As we near the end of the semester, and the snowball really picks up speed, I’ll start offering the limited ability to reassess on past Learning Targets through office hours visits or Zoom appointments. This way, while students may still have to work on old topics alongside new ones, at least the time pressure of assessing on those old topics can be relieved somewhat. Remember that it’s not just the studying of old topics that is part of the snowball but also the assessment and reassessment logistics play a big part, so make sure to count reassessment time as part of the overall normal course workload.

To excel in an alternative grading setup, students need to strike a balance between the past and the present. They should appreciate the value of revisiting old learning objectives while eagerly embracing new ones. We play a critical role in facilitating this balance by designing courses that guide students through a logical progression from old to new objectives and by providing the necessary support and feedback.

Ultimately, alternative grading encourages students to view learning as an ongoing journey rather than a destination. In this dynamic environment, they learn to appreciate the past, embrace the present, and prepare for the future, ensuring that their knowledge remains relevant and deep-rooted.

A growth-focused icebreaker

Robert Talbert — Fri, 13 Oct 2023 17:26:13 GMT

This is a repost from the Grading For Growth blog, where David Clark and I write about alternative grading methods every Monday. It resonated with a lot of folks, so here it is again – with some additional new thoughts at the end.

I hate icebreakers. I don’t use that word “hate” often or lightly. But here, it’s justified. The purpose of icebreakers is, supposedly, to get people involved in some activity to make them more relaxed when working together. But the idea of going through some contrived activity, with a group of people I don’t know, specifically to expose myself to them on a personal level, usually through some horrifyingly embarrassing action or fact I’m forced to share — does not in any way relax me. And most of the time, icebreakers accomplish nothing except a lot of eye rolling and spiked anxiety levels.

So I’m not a fan of icebreakers. Therefore, when I say that I’ve finally found an icebreaker activity that really works, and one that I can live with, it’s a big deal. I’d like to describe it to you today. I use it on the first day of every class, as well as in the talks and workshops I give. You can run this activity with students at any time of the semester and get good results. In fact, it benefits from being given repeatedly throughout the semester. And it reinforces all the ideas about learning and growth that we write about on this blog.

Here’s how it works.

Two questions

The activity is very simple: I ask students (or audiences, workshop participants, etc.) two questions:

What is something that you are good at doing?
How did you get good at the thing you are good at doing?

I like to give these one at a time as think-pair-share questions. I will display the first question on a slide (or just say it out loud), then give everyone a little time (30-90 seconds, this usually doesn’t take long) to think about their response. Then they are instructed to pair off and share responses with their partner. If they are in groups of four, I might have the pairs pair off and have a 4-way sharing session. Then we do a large-group share of the responses. (In a class, you might have each student in a pair introduce the other to the class, using the other person’s “thing they are good at doing” as an interesting fact.)

For the second question, I also do this as a think-pair-share, but people respond using a polling tool like PollEverywhere to make a word cloud of the responses. Having a visual representation of the responses with the most common ones being the largest is important for this question, for reasons you’ll see below.

This activity can take as little as 5 minutes, or you could build an entire hour-long class unpacking the results and telling stories about how people got good at various things. There are a lot of variations you might try. However it’s done, these two simple questions not only teach you some things about your students, they reveal a great deal to students about your class, particularly if you are using alternative grading.

What are you good at doing?

When I ask this question, the only instructions I give with it are that the “thing” you’re good at doing has to actually be something that takes nonzero effort to do — sleeping, watching TV, biological functions, etc. aren’t allowed. Only rarely do students respond with academic things, even if it’s in a high-level academic environment. They don’t say things like chemistry, studying, or taking tests. They say things like snowboarding, soccer, guitar, Super Smash Bros, and the like.

In fact, I just gave this activity to my students on the first day of class, using a word cloud tool, and here are the responses from one of the sections (the other had similarly varied responses):

On its own, this question does good work for students and for us: It reinforces the fact that every student is good at something and not only “something” but a thing that requires focused effort and pushing through discomfort. I find students need to be reminded sometimes that there are things in their lives that they care about so much that they put up with the process of getting good at them — maybe even came to enjoy that process.

As an instructor, this question reminds me that students are human beings with lives that have greater depth than I often realize. There’s so much hand-wringing about the so-called “disengagement crisis” in higher education, but insofar as that crisis is real, it’s of our own making. Students know perfectly well how to be engaged with something difficult. If they don’t engage with us or our classes, that can’t be completely the student’s fault.

How did you get good at it?

Here’s the word cloud from my class, for the second question:

Every time I give this activity — whether it’s with students or faculty or ordinary people — the top responses are always the same: Practice. Determination. Effort. Time. Failure. Consistency. (The students and I decided “consityis” is supposed to be “consistency”; some people were responding on smartphones.) This word cloud could be from any class, keynote, or workshop that I’ve facilitated in the last several years since I’ve been doing this activity.

When I debrief students on these responses, I point out that the main responses are always the same no matter the audience. I also point out that there are certain ideas that never show up in these responses: Lectures. Exams. Points. Grades. The traditional structures we set up in the name of learning are never the ones that real people link with real learning. Occasionally, someone will reply with Teacher; when I ask for clarification, “Teacher” always comes to mean something like “Coach”, as in I got good because I had a teacher who was really responsive and gave excellent feedback. It’s never, I got good because I had a teacher who gave great lectures.

What we learn from this exercise

Students know perfectly well how to be engaged with difficult things, and they are crystal-clear on the process by which mastery of such things happens. They are not only clear on this process, they can point to concrete examples in their lives where this process produced many of the most important facets of their lives. That process, of course, is the feedback loop.

The feedback loop is the central organizing principle of all alternative grading systems and the roof of the Four Pillars model. The reason this is the case, is that — as students are very well aware, according to Question 2 — all significant human learning experiences happen through engagement with a feedback loop. And the biggest thing we learn through this exercise is that students have an intuitive grasp of this fact. It’s not something that requires extensive explanation or “buy-in”. It’s a fundamental part of students’ lives — and they know it.

These two questions very economically get to the heart of the matter in a college class: We’re trying to get good at something hard; how does that happen? Well, we can ask in reply, how has it ever happened? Answer: Through engagement with a feedback loop. Therefore to the extent that feedback loops are driving the work of the class, we can expect meaningful learning to happen. And this goes both ways: If there are no feedback loops to engage with, we can’t expect real learning to happen except maybe by accident.

If you are using alternative grading, the responses to these questions and the lessons we learn from them give you all the tools you need to address pushback, concern, doubt, or misunderstanding about the grading system. Just pause and ask: What are you good at doing, and how did you get that way?

Students, and all other humans, have a deep store of failure narratives that also have happy endings. It’s part of what makes every person uniquely interesting and invested with dignity. Once you, as an instructor, can tap into it and hold it up as a mirror to your students, I’ve found there’s very little convincing you must do get students on board with your grading system, because your grading system is the most natural thing in the world now. It’s just real life, which doesn’t flinch from the hard work of growth.

And that’s why this is an icebreaker that I can live with.

Bonus thoughts

This activity works really well with faculty too, especially those who might have some resistance to the idea of alternative grading.
This is the first time I’ve seen “Covid” or “lockdown” show up in the responses, which I found very interesting. When I asked for context, what people meant was that during the Covid lockdowns, people had the time and availability to really focus on the activity they’re (now) good at doing, which allowed them to get good at it. There are a lot of lessons we can learn here about minimizing course structures, keeping things simple, and providing students with time and space to work and to breathe. I explored this a little in my post about the 12-week plan for course building.
Students know intuitively that all significant learning happens through a feedback loop and I think they generally know that growth-oriented grading is better for them than traditional methods. But getting students to realize that they know it, and therefore that feedback loops and not points plus one-and-done assessments are the most appropriate basis for a grading system, can require sustained effort, to put it mildly.

Grading for growth in an engineering math class: Part 1

Robert Talbert — Thu, 15 Jun 2023 13:57:42 GMT

This post first appeared at Grading For Growth earlier this month. There is a second part as well which I'll repost here later, or you can just click and read it now. I've added some new thoughts to the end of this post just for rtalbert.org readers!

With summer officially underway, I'm going to be writing for the next two weeks about the grading system I had in place for the semester that just ended, in my Linear Algebra and Differential Equations classes. David wrote a couple of posts on his experiences (part 1, part 2) and mine will be along the same lines.

The first installation today is about the "theory" of the class -- all the background information about the class and my philosophy of teaching it that led to the grading process that I used. Next week, I'll focus on the "practice" -- how it was received by students, what worked and did not work in real-life practice, and what I'll do again and do differently next time.

For reference, here are some links for the class:

The class GitHub respository (which contains everything) — it’s all Creative Commons licensed so help yourself to whatever you like, just give attribution to me (Robert) if you use something.
The syllabus
Standards for Student Work document

What was the class?

The class that I taught was MTH 302: Linear Algebra and Differential Equations. This is a four-credit course primarily intended for students in engineering that combines topics from these two gigantic areas of mathematics. I taught two sections, each two hours long, back-to-back on Tuesdays and Thursdays. Enrollment hovered around 30 students in each section, and all the students were second- or third-year engineering majors (with the exception of one lone Accounting major who was working on a math minor).

You'll find versions of this course at many universities that offer engineering degrees, and it's clear that many profs who teach it don't quite know what to make of it. Linear Algebra and Differential Equations, as separate subjects, are significant enough to merit two required courses each, but in MTH 302 we put them both into a single class. There are two ways to handle this kind of compression. One is to build the course around a small, reasonable core of content and learning outcomes that necessarily leaves out some interesting math, and then set up structures to help students learn those things deeply. The other way, which is seemingly far more common, is to leave nothing out, and just Cover All The Things at an enhanced speed and diminished depth, trusting that the "good students" will somehow keep up.

The Cover All The Things approach makes the course tend toward procedural rather than conceptual knowledge (because conceptual knowledge is slow cooking), so the course ends up as a hyper-accelerated flythrough of a cookbook, with a lot of the best recipes missing. The experience for students and instructors alike becomes impoverished, uninspiring, and aimless.

Those three adjectives also describe many of my students' past experiences with math courses, including MTH 302's prerequisites of Calculus 1, 2, and 3. Every student in MTH 302 has completed these; a good portion of my students "completed" them as though escaping through a fire. Don't get me wrong: MTH 302 students are high-achieving and highly capable. But many were simply Calculus survivors, with a survivor mentality about Calculus and other forms of math.

And not to get ahead of myself, but when you've had experiences akin to Covering All the Things in a traditionally-graded system, the survivor mentality goes into overdrive. If you have to learn how to compute things correctly by hand, assessed by one-and-done tests with no feedback loops, every day in class is about survival.

My approach to the class

I'd taught Linear Algebra before, and Differential Equations before, but never this weird mashup. I certainly wanted to avoid the negative experiences with the class I'd seen elsewhere. My first order of business, then, was to make the class less weird, by answering the question: What is this class about? I don't mean, What topics does it cover or What does the catalog say. Instead I mean the same thing as when we ask someone what a book or a movie is about.

As I looked at the individual "halves" of the course -- linear algebra, and differential equations -- and how those two subjects interact, I decided that MTH 302 is about modeling systems that undergo change, and seeing what we can learn about those systems from the models. This felt right: Both topics grow out of the need to model real-life systems like ecosystems and spring-mass systems, that change and evolve over time and whose behavior we want to predict.

When you decide what a class is about, you are also deciding what it is not about. In my case, for example:

While there are mathematical computations to learn in MTH 302, the class is not about mathematical computation: It's about using the results of computation to say something insightful about a model of a system.
While some problems in MTH 302 may have right answers, the class is not about right answers: It's about understanding and communicating what the answers tell you, and evaluating the assumptions used in the model that led to those answers.

I was beginning to get a glimpse of the learning objectives for this course, and what students might do to provide evidence of learning. Pretty soon the grading approach would be on the horizon.

By the way, if this process sounds familiar, it’s because I’m following the workflow that I wrote about last summer in the mini-series “Planning for Grading for Growth”, which will also be appearing in David’s and my forthcoming book as a workbook chapter. You can read that series here, which is the final article in the series but has links to all the preceding articles.

On a more immediately practical level, I had factors to incorporate into the design of the class at this point. First, as I said, I'd never taught this particular class before, so I wasn't too interested in any radically different approach from what I had used in the past elsewhere. Second, and related, this was a service course for the Engineering school, and I didn't (and still don't) have a great sense of their tolerance level for "far out" pedagogical practices. Third, the students in the course, being engineering majors, were maxed out with a demanding set of courses in their discipline, and I was very hesitant to create a wild new course structure that demanded more cognitive load than necessary.

In particular, I decided at the outset that I would not use ungrading in the course. Since that term is so ambiguous, what I mean is that I would not do what David did in his geometry course or which I did in my Winter 2022 abstract algebra course, where student work got no grades but just feedback, and we collaboratively decided on course grades at the end. I don't think it would have been absolutely wrong to do this with MTH 302; but relative to my newness with this course and what I understood about the constraints on students, it didn't seem like the right call. However: Come back next week for more thoughts on this.

How did the class work?

I mentioned earlier that each class session was 2 hours long. Students were doing things before, during, and after class:

Before class: Students would complete a Class Prep assignment in which they did some initial reconnaissance work on the new upcoming topics, usually by watching some video or reading text and then completing some basic exercises, and asking questions along with those exercises. Class Preps were due the night before class, so I could scan them in the morning to look for trouble spots and frequently asked questions.
During class: We'd start each meeting with 15 or so minutes of Q&A time over the Class Prep, clearing up questions and the like. Then there would usually be a short demo from me to set up an activity. Then, students would work in groups at their tables on activities to provide practice with the new topics. Here is a folder at the GitHub repository with many of the activities. As you can see, some of the activities are starred, indicating that students will work on those in groups during class but write up their own solutions separately for later turn-in. The writeups of select group activity problems were called Application/Analysis assignments. We'd typically run the activities for 30 minutes or so, then 10 minutes to debrief and field questions, at which point we'd be at the one-hour mark. We'd take a 5 minute break, then come back and do the same thing for another hour, then end with questions and announcements and go home.
After class: Outside of class, students were prepping for the next class, finishing up their Application/Analysis writeups that they began in class, and completing Miniprojects which I'll say more about in a minute.

What you'd see if you walked into a typical class would be 6-8 groups of 3-4 students doing heads-down work on a problem that guides them toward understanding of some important learning objective of the course. Sometimes students went to the board to work; other times just working quietly, but very often not quietly, with their friends. My role was to be everywhere all at once, going from group to group to answer questions, prod people along with questions, and check to make sure everyone was OK and progressing.

How did students provide evidence of learning?

I've always felt that there were three more-or-less independent axes along which success in a course should be determined: mastery of basic skills, mastery of applying those basic skills to new situations, and what you might call "engagement" or "being in the course". An "A" student is one who can demonstrate consistent excellence along all three axes. A "C" student is one who is "just good enough" on the first two axes and makes a reasonable effort on the third. That 3D axis model seemed to fit particularly well in MTH 302, where there were a lot of basic skills that are important to master, as well as a large helping of applications to master as well.

Students gave evidence of their progress along these axes via five major forms of assessment in the class.

Class Preps: I mentioned these above. Here's a typical example. These are mainly in place so students will get the gist of the basics of new ideas, so that we can get a running start in class. They are graded on the basis of completeness and effort: Put in a good-faith attempt at a right answer for each non-optional item, and you receive a mark of "Success" which means full credit. Otherwise it's a mark of "Incomplete".
Skill quizzes: Although the class isn't about computation, it was still valuable to isolate a few Foundational Skills for students to learn how to do by hand. I ended up settling on eleven Foundational Skills which you can find in this appendix of the syllabus. The vast majority of times we needed to use one of these skills, we'd do it with a computer; but they are central enough that I wanted students to demonstrate they could do them in simple situations by hand with no significant mistakes. Every week on Thursday, we set aside the last 30 minutes of the class for quizzes over these skills. Here's a typical quiz showing some typical problems for some of the skills. Each skill appeared on three consecutive quizzes, then disappeared until one big "last chance" quiz at the end. Students had to complete a problem for a skill once, meeting all the "success criteria" shown on the quiz, at which point they had demonstrated sufficient skill.
Application/Analysis: Also mentioned above. These were turned in weekly and consisted of selected parts of in-class group work. So students worked on them in groups along with other problems, then wrote individual solutions up of the selections.
Practice problems: We didn't focus much on very basic computation during class. That was the subject of practice problems, which were weekly problem sets on our online homework system.
Miniprojects: Finally, the heart of the course were eight Miniprojects covering various applications and extensions of linear algebra and differential equations concepts learned in class. Here's one; here's another. The focus of miniprojects is on applying basic concepts to new problems and on communicating results and processes in written form in a coherent and professional way. These were done outside of class, and had a flexible deadline structure.

Practice problem sets and Foundational Skills via the quizzes assessed along the “basic skills” axis; Miniprojects assessed along the “apply basic skills to new things” axis; and I considered Class Preps to be more “engagement” than anything else. Application/Analysis was a late addition to the syllabus, and I almost didn’t add it at all in order to keep things as simple as possible. But without those, it felt like something was missing, namely an assessment in the space in between the lower and upper thirds of Bloom’s Taxonomy. That middle third of Bloom is labelled “Application” and “Analysis”, hence the name of the assessment. I would have second thoughts on several occasions about including this; see next week’s article for more.

How individual work was graded

With the context, philosophy, and assessments all laid out, we can now talk about the grading system. All the criteria for grading different forms of work is spelled out in this document called Standards for Student Work in MTH 302. To summarize:

Class Preps were graded either "Success" or "Incomplete" based on completeness and effort. There were no reattempts allowed since Class Preps are time-sensitive and only viable assessments if done before class.
Practice problems were auto-graded by the online homework system, with each problem receiving 1 point for a correct answer and 0 for incorrect with partial credit sometimes available.
Problems on Skill Quizzes were graded either "Success" or "Retry" based on whether the student's work met the "success criteria" listed on the problem. Each skill appeared on three consecutive quizzes, so if you got a "Retry" you'd try again on a new version of that problem in the following week.
Application/Analysis submissions were graded "Success", "Retry", or "Incomplete" based on completeness, effort, and "overall correctness". (Some minor mistakes were allowed but anything serious required redoing.) These were turned in on our LMS; work marked "Retry" or "Incomplete" would get lots of written feedback on the submission, and students were allowed to submit a single revision.
Miniprojects were also graded "Success", "Retry", or "Incomplete" based on completeness, effort, correctness, and organization. Specific requirements for "Success" were given on the Miniproject forms. These usually involved a combination of math, English, and Python code and were written up in Jupyter notebooks, and the notebooks submitted in the LMS. They'd then get written feedback, and revisions could be done if needed.

On the last two, the difference between "Retry" and "Incomplete" is mainly terminology. Either mark could be removed through a reattempt or revision. An "Incomplete" indicates that there was something serious missing from the submission that made it impossible to grade the work: A missing problem, a tangle of significant semantic or math errors, code that won't compile, a Google Doc with the permissions incorrectly set, and so on. If a student's work was incomplete, I'd stop grading immediately, assign the mark, and tell them I'll look closely at their work once they submit something that's complete.

How course grades were assigned

I said we shouldn't try to label the form of grading I was doing here, but if you must add a label, this is pretty much specifications grading, such as I've used in most of my courses since 2017. It features 2- or 3-level grade rubrics on each item, with those items being graded holistically according to specifications that are clearly spelled out.

My course grade assignment method was specs grading-like as well, with course grades being determined by counting up accomplishments. The syllabus uses this table:

A student's course grade is the highest row for which all the requirements of the row are satisfied. That establishes the "base grade" of A, B, C, D, or F. There are some rules in the syllabus guiding the assignment of plus and minus grades as well that involve a final exam. The only effect the final exam had on the course grade was to potentially add plus or minus modifiers to the base grade.

I turned this syllabus table into a checklist for students to use to track their grades:

They just had to print it out or keep it on their tablets, and check boxes as they accomplished things.

What’s next

That's more than enough detail for now, although if you have specific questions you can probably find the answers in the syllabus; ask a question in the comments if not.

Next week, I'll continue this story by writing about what happened when this system made contact with students. Would they like it? Would they be confused by it? Would it make them question their existence and see beyond the universe itself? You'll need to tune in next time to find out.

Updated thoughts

There's a question running in the background of this article that I think is interesting: How far do you push innovation in a course that is a service course for another department? Here, MTH 302 serves our Engineering school. I have a good relationship with the students and faculty in this school, but as I mentioned, to this day I don't have a great read on how tolerant they are of teaching innovations. My sense is that most of the faculty are quite traditional, a sense that is supported by what my students describe from their other classes. So when I approached this class, I deliberately throttled back on the innovation – for example I still used a final exam and we still had homework – because I wasn't sure how some of my ideas would land. I'm not sure how you go about working more innovation into a course like this, besides building things up slowly over the course of several semesters and earn the trust of the other department.
There is an interesting comment thread at the original post about my choices for distinguishing between a "D" and an "F" in the course. I could go into this more, but it's pretty basic for me: I don't really think too hard about "D" grades. It's the "C" grade that matters to me: That grade should mean "minimum baseline competence that makes me feel comfortable the student can succeed at the next level" (whatever "next level" means). So if that's the case, what is a "D"? For me, means that the student didn't totally abandon the class, and success at the next level is possible. So I won't try to prevent a student with a D from going on to the next course, but I would leave that choice up to the student and their advisor. (Although it's moot in many cases, for example our engineering school requires retaking the course if the grade is C- or less.)

A media guide to ungrading

Robert Talbert — Fri, 19 May 2023 12:01:36 GMT

This article originally appeared at Grading For Growth on May 1, 2023. My colleague and co-author David Clark had a number of important contributions. I've edited it slightly for reposting, and there are some updated thoughts at the end for you to check out.

Earlier this year, I wrote that 2022 seemed to be the year that ungrading really took hold in higher education, and that we can expect its prevalence and influence to increase through 2023. So far, that prediction seems to be right, especially if you gauge it by the number of media reports on ungrading beginning to percolate into our news feeds. For example:

Note especially that these aren’t in higher education-specific publications like the Chronicle or Inside Higher Ed but rather mainstream media or student publications. So it appears that ungrading is expanding from the realm of pedagogy nerds, into the public consciousness.

Two high-profile articles seem to be driving the most recent news items: This whitepaper from the Hechinger Report, and an article from National Public Radio that summarizes and expands on it. Many local news articles about ungrading (like this one) are based on these two items. I (Talbert) was grateful to be interviewed for both articles. They have generated a lot of discussion and interest in alternative grading generally.

David and I see this as a good thing -- on balance. However, sometimes the discussions we find ourselves in can be unhelpful, because there are misconceptions at work. Some of the reports we've seen get things, sometimes really important things, wrong about alternative grading, which leads to misconceptions that go viral, which isn't helpful to anyone. So I’m writing this article as a “read this first” guide for journalists and anybody else wanting to learn about ungrading, and all forms of alternative grading, and tell its story.

What is ungrading?

There is no single widely accepted definition for “ungrading”. It is a term with several widely diverging meanings. So, if you are interviewing people about “ungrading”, it is critical to ask for a detailed explanation of what ungrading means to them. This is especially important if you’re writing an article involving multiple sources, or asking one source to critique the arguments of another: It’s quite likely that they aren’t talking about the same thing.

That said, in our book, here is how David and I define ungrading:

Ungrading rejects grades entirely (or at least to the extent that is possible). Ungraded classes eliminate grades from as many assignments as possible and focus on feedback. … Instructors often provide a list of criteria or a narrative description for final grades, and some instructors build this list collaboratively with students, leading to an increasingly common name: “collaborative grading.” Ungraded classes often involve two key features: First, instructors hold periodic meetings with students (or ask students to write reflective essays) in order to come to an agreement on a student’s current level of progress. Second, students construct a final portfolio of work that shows how they have grown and/or met key course objectives.

So, we define ungrading as a grading practice in which individual items of student work receive no marks (such as points, letter grades, etc.) if withholding a mark is possible. Instead, the student receives helpful feedback on their work relative to appropriately-scaled professional quality standards. Students also have the opportunity to reattempt and resubmit without penalty any items that they believe need further attention. At the end of the course, each student makes a case for the grade they believed they earned in the course, supported by a portfolio of work they have completed, in collaboration with the professor.

Our definition captures many of the more common features included in anything called “ungrading”, but it’s far from the only one. Again, make sure to ask anyone you interview what they mean by “ungrading”.

For David's and my own experiences with ungrading and some of the details of implementation, see this post or this post.

If you're a journalist, you can look to your own profession to understand ungrading. Let's say you have an article to submit. Your publication presumably has high standards for quality that are well-understood (because they are posted somewhere or because you've discussed them with your editor). When you are finished with your article, you submit it to your editor. The editor then reads it and likely marks it up and gives you helpful feedback on what works and what doesn't. Then you rework the article and resubmit it, get more feedback, etc. until the deadline arrives. At that point, you might collaborate with your editor to determine when and where the article is published.

What doesn't happen with your article is as important as what does. You do not get only one chance to submit it. It does not receive a point score, that is averaged together with all the other point scores from your previous articles. Instead, the verbal feedback and the revise/resubmit process, along with the quality standards you can use for guidance, are enough.

What are some misconceptions about ungrading?

Ungrading is a pretty simple idea, but it's also easy to jump to incorrect conclusions about it. Here are some common misconceptions. You can find more in David's “mythbusters” post.

Misconception: Ungrading means getting rid of letter grades for courses. Ungrading, however the term is used, is about removing grades from student work within a course. But final letter grades are almost always still present. That final grade is determined in a different way than the traditional points-based approach, but students in the vast majority of ungraded courses still get an A, B, C, D, or F in the end. In most universities, every instructor is required to assign a letter grade to every student in the course, and David and I don't know of any university actively removing such a policy. Some (like Evergreen State College or New College of Florida) never used letter grades to begin with; and MIT has long experimented with giving Pass/Fail grades only in freshman-level courses. But this is a separate issue from ungrading.

Misconception: Ungrading means no feedback, or no concern for correctness. Go back up and read “What is ungrading?”: It’s entirely focused on feedback. Compared to a single number or letter grade, feedback is a much more effective way to communicate what’s good, correct, or well-done and what isn’t, and that’s what instructors focus on in ungrading. Single letter or number grades are a poor way to communicate such ideas.

Misconception: Ungrading lets students “pick their grade” in the course. In ungrading, students typically build a portfolio of work from throughout the course. Critically, the portfolio must make the case for a certain grade, that is, it needs to include concrete evidence of meeting the criteria for a grade. This is not “letting students pick their grade” as though they were given access to the registrar’s database.

Misconception: Students cannot fail an ungraded course. Students can, and sometimes do earn "D" or "F" grades in an ungraded course. This typically happens when the student does not produce evidence of learning that rises above a minimum threshold. This does happen, although not frequently in our experience because it requires significant disengagement with the class. Sometimes this is the student’s choice, but other times it isn’t. Ungrading tends to be more forgiving of late work, absences, and so on, as long as a student can eventually produce sufficient evidence of learning.

Misconception: Anything other than traditional points-based grading is ungrading. It's been increasingly common to use “ungrading” as an umbrella term to refer to any grading approach that does not adhere to traditional points-based grading. This is an oversimplification, and it causes a lot of confusion. It’s especially bad when, as often happens, different people interviewed in the same article use “ungrading” to mean significantly different things. If anything, it’s clearest to let “ungrading” to refer only to the particular grading practice I described a little earlier in this article. There are several other forms of grading that are neither traditional nor ungrading: specifications grading and standards-based grading are two prominent ones. None of these is “better” or “worse” than another, and many instructors use a combination of approaches anyway. We prefer the term alternative grading to refer to grading practices that are something other than traditional, then speak in specific terms about specific strategies under that umbrella — including “ungrading”.

What are some common questions about ungrading?

Why would an instructor use ungrading? I will speak for myself here. I have only used ungrading once, and I did it because I wanted my students (in an upper-level abstract algebra course primarily taken by pre-service math teachers) to stop focusing so much on points and grades and focus instead on the (difficult!) math concepts in the class and, especially, on improving their communication skills. I wanted to hold high standards for both math and communication mastery, but in the past I'd found that traditional forms of grading -- and even some nontraditional forms, like specifications grading -- just didn't go far enough. The moment something got a mark on it, the mark because the laser-focus of students’ efforts. The only way to get the level of focus I wanted, was to dispense with marks altogether. It was a matter of creating a learning environment where extrinsic distractions, like grades, were at a bare minimum.

Other instructors, I suspect, have a similar story. But additionally, others point to a desire to improve equity (David has a series of posts on this topic) or lessen the negative impact that grades have on student mental health.

Does ungrading turn students into entitled snowflakes? Remember that in ungrading, the focus is on detailed feedback and revision, and students' grades are not merely "picked" but are the based on a portfolio of work that addresses clear requirements and is the result of multiple iterations of feedback and improvement. Every aspect of the grade can be traced back to student work that meets quality standards. So, no.

In fact I’d say that the students who are used to receiving grades as more or less an entitlement — who tend to come from well-resourced schools and affluent backgrounds — face the greatest amount of adaptation to alternative grading, which firmly grounds grades in concrete evidence of learning and nothing else. For many, it’s a culture shock. (Several of the instructors who we interviewed for our book, who are at “high power” institutions that were concerned about grade inflation under traditional grading, noted an overall decrease in final grades when they switched to alternative grading.) For many others, who know how to work hard but always felt disadvantaged in school, it’s a relief.

Does ungrading promote grade inflation? No, in fact, the opposite is true. "Grade inflation" doesn't simply mean "higher grades", it means higher grades without a corresponding improvement in quality or depth. There is no evidence that grade inflation, in that formulation, is any worse in ungraded courses than it is in non-ungraded courses. Historically speaking, grade inflation came into existence through traditional grading (particularly through the use of grades to secure draft deferments during the Vietnam War). Alternative grading's practice of connecting course grades directly to student work rather than to points and averages as an intermediary makes it less susceptible to grade inflation.

Does ungrading lower the academic rigor of a course? Beware using the term “rigor”, since we don’t think it has any semantic meaning. Insofar as it means anything, alternative forms of grading -- including but not limited to ungrading -- improve the "rigor" of a course by imbuing the course grade with more construct validity.

There are some professors on my local campus using ungrading and I’m writing an article about it, can I talk to you about this? The best person to talk to is the faculty member themselves. David and I may be experts on grading practices, but we are not experts on your students, faculty, or campus. The people using ungrading are likely experts on both. Call one of us back once you have talked to a few of your own and can report to us what they said.

Updated thoughts

Journalists: The purpose of this article is to save you time by not having to ask the most common questions about ungrading if you want to interview me or anyone else about it. Seriously, please read it before doing that interview. You'll just be referred back to the article otherwise and it feels like working with a student who didn't read the assignment.
Especially seriously, for the love of God, please stop saying that "ungrading" means "getting rid of grades" (which is an oversimplification) and especially "getting rid of letter grades" (which is flat-out wrong).
I have been kicking around the idea of writing an article about why political conservatives should really like the idea of alternative grading. I've seen some hit pieces on conservative news sites (which I read) about alternative grading, and I think that's unfortunate. Alternative grading ticks a lot of boxes that matter to conservatives: A focus on hard work and effort, the ability to hold higher academic standards because of resubmissions, a direct link between a student's grade and evidence of learning rather than playing games with points, and more. I think there's a lot there that would be appealing to folks of that political persuasion.
David added a lot of language in this article to the effect of: Ungrading doesn't have one specific meaning and what it means depends on who is using it. I think that's wise to keep in mind. I've got my definition (above) and our book's definition (also above) is a little different, while others' vary widely from both of these. So again, before you try to talk to someone about ungrading, ask about what it means to them first. Especially, get them to describe their implementation in the classroom. If they don't have one... don't talk to them!

Building a specifications grading course, part 2

Robert Talbert — Thu, 09 Feb 2023 00:22:11 GMT

This is the second part of a two-parter where I am going into detail about how the grading system in my current course, Linear Algebra and Differential Equations, is set up. The first part, where I described the big picture of the course, its learning objectives, and what a "C" and "A" semester grade should mean, is here.

Today, we get into the weeds.

Assessments and marks

What will students do in the class, to demonstrate their progress toward the learning objectives I listed last time? It's complicated, because there are really three categories of learning outcomes that I'd like to see.

The first category is basic skills. These are tasks that you would typically find in the bottom one-third of Bloom's Taxonomy in the "Remember" and "Understand" regions:

For MTH 302, this includes tasks like being able to state important theorems, identify the order of a differential equation, and explain why a set of vectors is linearly dependent or linearly independent. I also include in this category any kind of basic mechanical computation that is to be done outside of an applied context, things like multiplying matrices, computing a determinant, or solving a separable differential equation.

Students show progress on basic skills through doing weekly Practice Sets on our online homework system WeBWorK. I assign 5-10 Practice Set problems per week, each auto-graded by the software, worth 1 point each purely on the basis of whether the answer is correct. At this rate, we should end up with around 100 Practice problems by the end of the course.

We also have eleven Foundational Skills, which I mentioned last time and which you can find here in the syllabus. These are the core mechanical competencies in the course. Students demonstrate their skill on these through weekly Skill Quizzes. Each Foundational Skill appears on three consecutive quizzes, so at any given time there are between 2 and 4 problems on each quiz, each problem focused on one skill, with one Skill being introduced for the first time and others showing up again for those who need to retake them. Here is the second quiz of the semester and here is the third one. You can see that new versions of a problem are very similar to older versions, with the details changed. The "success criteria" at the end of each problem explain what "acceptable work" looks like.

Quiz problems are graded not using points, but instead are marked either Success or Retry. Note that this is for quiz problems, not the entire quiz. A student taking the third quiz above could have Success on Skill LA.2 but not on LA.3 for example. A problem is marked Success if it meets the success criteria. Otherwise the student retries it on a later quiz, and I'll say more about that momentarily.

The general setup I'm currently describing uses specifications grading. The actual specifications -- the overall criteria for what is acceptable work -- are in this document called "Standards for Student Work in MTH 302". The Standards document is sort of the Bible for our class (and for my grading) and students can use it as a "pre-flight checklist" for self-assessment before they turn something in.

The second category of learning outcomes is applications, or whether you know how to take the basic skills and put them to use in various ways. These kinds of skills are in the middle and upper thirds of Bloom's Taxonomy, and we have two kinds of assessments for these as well.

Miniprojects are the main vehicle for applications. As the name suggests, these are small but extensive project-like problems that require coding (using the SymPy package in Python), problem-solving, and written communication. There will be eight of these overall, and here is the first one. Miniprojects really aim for the top third of Bloom, "Evaluate" and "Create". Students do these in Jupyter notebooks using Google Colab, and they are marked Success, Retry, or Incomplete using specifications that are found in the Standards document. Each miniproject has some additional specs that need to be met that are unique to that assignment.

I used to stop here with my specs-graded courses and just have Foundational Skills, Miniprojects, and maybe some practice homework that's done on the computer and auto-graded. Over time, I've realized I was missing something, namely the middle third of Bloom's Taxonomy, "Apply" and "Analyze". So this semester I am debuting a class of assignments I am calling Application/Analysis, to fill in the gap between absolute basics and high-level applications.

Application/Analysis sets are turned in weekly and consist of specially designated activities that are part of our daily active learning activities. Students are working on things in groups most of the time in class, and some of those are being tagged for turn-in later. Here's an example; the entire handout is done in groups but the items tagged (AA) are turned in. Students are free to discuss them at length in their groups but when it's time to turn them in, they must do individual writeups. These are graded Success, Retry, or Incomplete on the basis of completeness, effort, and whether most of the work is basically correct. Some errors are allowed but if there gets to be too many of them, or they are particularly severe, I'll give feedback and mark the work Retry and the student can put in a revision.

The third and final category of learning outcome isn't really a learning outcome: Engagement. I don't really have a definition for this term (nobody does) but generally speaking I mean that students ought to be a living, breathing part of the course ecosystem. What it boils down to, for me, is whether students are preparing properly for class. (Work during class, is more or less measured by Application/Analysis.) Especially since this is a flipped learning model I'm using, I need students to engage with the pre-class work that I call Class Prep. These assignments, due the night before a class session, involve doing reading and video-watching, leaving questions and replies on these using Perusall, and completing a diagnostic quiz. These are marked Success or Incomplete based on completeness and effort only (not correctness).

Feedback loops

With the exception of Class Prep, each of these five buckets of assessments has a feedback loop attached that allows for feedback and revision.

Practice Sets give immediate results about the correctness of an answer, and students can retry the problems unlimitedly until the due date.
Application/Analysis sets get written feedback from me, and if it's marked Retry then the student gets one revision, due in one week's time.
Foundational Skills, assessed on skill quizzes, appear on three consecutive quizzes. If a student's work is marked Retry then they take it again on a later quiz. If they try three times and never earn Success -- there is a Mega-Quiz at the end of the semester, an entire class period set aside for a quiz with all eleven Skills on it to give one "last chance" attempt to those who need it.
Miniprojects marked Retry can be revised as often as needed. There is a limit of two miniproject items submitted per week --- two revisions, two new miniprojects, or one of each. These are posted whenever we have covered the necessary material to do them, and there's no deadline other than an initial due date by which first drafts must be submitted, and the last day of classes. The "two items per week" rule prevents students from dumping half a dozen miniprojects on me in the last week of classes. The initial due dates prevent students from submitting the first draft of an old miniproject late in the semester.

Class Prep doesn't have revisions or feedback because revisions don't work with pre-class assignments. The fact they are graded only on the basis of completeness and effort means revisions don't really matter anyway.

But otherwise, the assessment in the class resembles a busy beehive, with lots of small items continuously darting in and out of my and my students' inboxes, some new items and others that are older but being shaped through feedback and revision. As I've used specs grading, I've come to greatly prefer the "numerous but small" approach to assessments over the standard "few but large" approach (three tests and a final, for instance) and I think students do too, once they learn how to manage their information (more on that below).

Course grades

You'll notice that none of the assessments other than Practice problems have point values. Course grades are instead determined by tallying up how many Successes you have over the course of the semester. In the syllabus it looks like this:

Grade	Total of Class Preps and Practice (100)	Application/Analysis (11)	Foundational Skills (11)	Miniprojects (8)
A	85	9	10	5
B	75	7	9	3
C	65	5	8	1
D	40	2	4	0

I count each Success-ful Class Prep as one point, and pool them together with Practice Sets, to form what is basically an amalgamated "engagement score". If you're behind on Class Preps, you can do some more Practice problems; and vice versa.

Last time I described, in nontechnical terms, what a grade of "C" should look like: Basically minimum viable competency, the lowest possible level of achievement a student can have and be allowed to move on to the next course. I built the "C" row in this table with this description in mind, and I think what's there represents minimum viable competency in MTH 302. You could argue with the fact I only require one miniproject and you might be right.

For an A, I was shooting for excellence across the board and I think the requirements for an A capture this. Even so, I left a little room at the top so that I am not requiring anything like perfection from the A students. Everybody can simply punt on one Foundational Skill of their choice, for example. (This helps me avoid conundrums like when a student has maxed out all the achievements in the course but missed on one Foundational Skill.)

If it were up to me, I would only assign "whole" letter grades, but alas, I have to also have options for "plus" and "minus" grades. This is where the last kind of assessment comes in: the Final Exam. I've been moving away from giving final exams lately, but here I felt like they could serve a purpose, namely to be the primary method for determining plus/minus grades. Simply put: If you earn above an 85% on the comprehensive final exam, your basic grade from the table will have a "plus" added. If you earn below 50%, it gets a "minus". If it's in between 50% and 85%, the basic grade is unchanged. I like this approach: The final exam matters, but not too much, so students will study for it but hopefully not stress out about it.

There are also some provisions in the syllabus for earning a plus or minus on a grade if you partially complete requirements in the table rows. You can read those for yourself.

How it's going

During week 1, I took an hour of class time to train students on the grading system. They took a quiz on Blackboard (that didn't count toward their grade) where they were given hypothetical student accomplishments at the end of the semester, and they had to determine using the syllabus each student's grade. We also did a few of these in class and had discussions about the nuances.

For example, one of the situations involved a student who had "A" level work in everything but Miniprojects but "C" work in Miniprojects (i.e. they only turned in one or two of them with Success marks). That students' basic grade before plus/minus considerations is a "C", which counters students' intuition that stuff in a course should average together, with poor work in one area being balanced out by good work elsewhere. That's not how it works here: An "A" requires "A" level work in everything.

During that first week, the initial reactions to the system were mostly very positive, with students commenting along the lines of "finally, a grading system that actually allows me some room to grow". Some students were cautious about it, with some questions about the details of operations. But none of the students voiced anything negative, and they had plenty of anonymous chances to do so.

I've been checking in with students every few days to ask them how things are going in the class and if there are any things we should stop, start, or continue doing especially regarding the grading system. And in all honesty, I don't think students are really thinking about the grading system that much. There are still some misconceptions and questions about the details. But most of the time it just never really comes up. It's just sort of running as a background process.

One explanation for this is that more and more of my students' other professors are using similar systems. A lot of us in the Math Department do; and there are some great case studies popping up among the Computer Science, Physics, and Chemistry departments at my university as well. So this is not something totally unheard-of before by my students.

If I were doing this over again

So far, so good as we enter week 6 of a 15-week semester. But if I could go back and make changes:

I'd allow alternative methods for showing proficiency on Foundational Skills -- not just quizzes but also oral quizzes in drop-in hours, or student-made videos. Some of my students freeze up when doing in-person quizzes and no amount of practice is ever going to change that.
I'm not sure I would include Application/Analysis as I currently do it, by tagging selected group work problems for turn in. It's OK, but what tends to happen is that students go straight to those tagged items and spend the entire group work time working on those to the exclusion of all the others. I have to tell students "Please don't work only the AA problems" and it never works.
These students are very high-performing, and although they are having few problems with the material they can sometimes be super grade-conscious. They don't have problems with my system; they just think about grades too much. So I wonder if I could or should have ungraded this course -- no graded work, just a portfolio for students to assemble through the semester that we discuss at the end for a collaborative grade. Maybe.

Thanks for reading all this! Again, you can follow the course's development through the stuff posted to the GitHub repository, and if you have questions for me, you can use the comment form on this website.

Building a specifications grading course, part 1

Robert Talbert — Wed, 01 Feb 2023 11:45:06 GMT

In this recent letter to the ungrading community, I challenged ungraders to share their processes through consistent, honest blogging. Being transparent about what's working, what isn't working, and how you're adapting will clarify ideas in your mind and demystify them for everyone else. I never ask somebody to do something I'm unwilling to do myself, so in the same spirit, today I wanted to share my grading setup and initial experiences in the class I'm teaching now, where I am using specifications grading.

This started as a single article about the specifications grading system in the course I am teaching now. Pretty quickly it morphed into two articles that cover not only the grading system but all the background work it took to get a system in place that makes sense. As I describe below, I think this is the correct order in which to build a grading system. It's impossible to understand how grades work in a course without having some sense of the course background and learning objectives and this goes for alternatively-graded courses and traditionally-graded ones alike. Today, I'll describe my process for thinking about the context of the course and what role this plays, and how I determined the learning objectives for the course. Then next week, I'll get into the weeds about assessments and grades.

The course I am teaching is Linear Algebra and Differential Equations (MTH 302). It's a four-credit course primarily serving our School of Engineering, with nearly all of the 60 students across both sections being second- and third-year engineering majors. It hits the highlights of both subjects in the title, with an emphasis on the connections between them. I've taught linear algebra before, and differential equations before, but never this particular class with its hybrid point of view. I learned soon after I started building the course back in October that this course is different enough from past experiences that it requires fresh thinking.

You can find all the documentation for the course at its GitHub respository. In particular here is the syllabus, and I'll link to other documents in this post as needed.

The big picture

"Step 0" in the course build process, which I started back in October, is scoping the course: What's its purpose? Who takes it? How does it fit into the larger curriculum? I noted some of those situational factors above. Especially important is that this is a math course for engineers and the engineering school, which to me implies a particular approach to the subject matter: heavy on the applications and connections between concepts, a light touch with the theory, and a central role for technology.

Starting from here, I skimmed the entire textbook and wrote down all the things that I think students would learn, going section by section using the university's syllabus of record to tell me what we should and should not cover. From that list, a "module" structure for the course began to emerge organically around which I could build the particular learning objectives, and from there the system for assessment and grading.

That big-picture view also helped me frame the "motto" for the course, a single easy-to-remember message for what the class is about. I settled on:

MTH 302 is about systems, how we can model systems, and what we can learn about systems from the models.

The notion of systems as the central organizing principle really fits well. We start off with linear algebra, which is about systems of linear equations. Then we move to differential equations, which are also a kind of system where a function and one or more of its derivatives are intertwined. Then, the centerpiece of the course is systems of differential equations -- systems of systems -- where the two subjects in the class come together. Since engineers are all about systems, this emphasis not only neatly summarizes the course but does so in a way that's compelling to the students.

It didn't hurt that I'm studying systems thinking as part of my ongoing leadership development and work in the President's Office. In fact shortly after I had finished the initial build of this course, I read this overview of systems thinking that reinforced my course design and gave me some ideas of new concepts about systems to include in the class.

Keeping the end in mind

In our forthcoming book, David Clark and I suggest that once you have a course outlined and scoped, a good step toward building a grading system is to take the course grades of C and A, and write a narrative description for what those should look like -- not in terms of specific assessments or grades, but as a general overall description that would make sense to an outsider. Here's what I settled on:

In MTH 302, a grade of C means: There is evidence of skill on all the fundamental, can't-live-without-it ideas and at least minimal success with applying those basic ideas to real problems. And, the student has participated and prepared meaningfully more often than not. Basically a "C" student is good to go with the bottom one-third of Bloom's Taxonomy, has made at least a little headway in the middle third, and they've given a good faith effort to be part of the learning community in the class.

In MTH 302, a grade of A means: The student has mastered all the fundamental ideas and has a pattern of success in applying those to authentic problems involving modeling systems. An "A" student also shows consistent engagement with practice and preparation for class. They have mastered the bottom two-thirds of Bloom's Taxonomy and have evidence of consistent success with the top third.

This is what "minimal viable progress" and "excellence", respectively, look like to me. In case you aren't familiar with Bloom's Taxonomy, this picture is pretty self-explanatory:

I wrote a little more in this post about how viewing Bloom's Taxonomy in thirds helps with course design.

A tale of two lists

Once I had clarity on the context and overall story arc of the course, and what "minimal competency" and "excellence" looked like, the next steps were to flesh out the learning objectives in the course and decide what students would do to demonstrate skill on them, and how those would be assessed.

Before I say more about that, note that the ordering of the steps in this process matters. What many faculty do, and what I definitely used to do, in building a course is to start with the assessments (how many tests, how often homework is turned in, etc.) and then write up the grading system, and give little attention to learning objectives in the course or what the course is about. I've found that what the course ends up being about, in the end, is grades. If you lead with grades, and just assume that the meaning of the subject and the specific skills attained in learning it will just sort of "happen" along the way, then you can't really expect students to pay attention to anything but their grades. Given how common this approach to course design is, and how demoralizing traditional grading can be, it's no wonder so many students and faculty are burned out and exhausted.

Back to MTH 302: I had already made a long list with all of the micro-scale learning objectives for the entire scope of the course when I reviewed the textbook. I put those into a long note:

And on and on this note went. It's a list of all the ideas and tasks that students would encounter. It was not a list of things that I assess! This list was like a list of all the things you see on the road when you drive from one city to another. But it is not a list of directions of things you must do, to make that drive. I'd realized before that not every idea that students learn in a course rises to the level of needing to be assessed. Indeed it is not humanly possible to assess every single thing that a student has encountered in a course to see if they learned it. Instead, the big list above was a guide for lesson planning; and I used it to isolate the truly central learning objectives that I would eventually assess.

You can see some of those "assessable objectives" in my list, in bold-face (for example "Add, subtract, and multiply matrices"). At this stage my goal was to filter the list of "things to learn" down to a minimal number of assessable learning objectives that fits my vision for the course. There were lose to 100 micro-scale objectives in the big list. I filtered this down to twelve central objectives:

And before it was all over, I ended up dropping the last one about Laplace transforms, ending with eleven Foundational Skills. (I really wanted to get down to ten, but I couldn't justify cutting any more to myself.)

The Foundational Skills are the "bottom third of Bloom's Taxonomy" that I referenced in my narrative about what a "C" and "A" look like. They form the basis for everything else students do in the course. Students will be doing more than just demonstrating skill on these; for example I said that both C and A students should show evidence of being able to apply these skills to real-life problems. But all of those applications come from these.

If you happen to know linear algebra or differential equations, you might wonder about some of the skills not on this list. What about determinants? Slope fields? And why did you get rid of Laplace transforms? To be clear – I didn't "get rid" of these topics. They are there, and we'll encounter them. But for this class and these students, I consider them not to rise to the level of needing explicit assessment.

Most of the topics that didn't make the final cut, are calculations that are best done on a computer. Here in 2023, with a superabundance of free computer tools that will compute determinants and generate slope fields quickly and flawlessly, I see no reason to task students with doing these by hand when we could use that time and energy to learn how to use the concepts to model systems and learn what we can from the models. So next week, we're going to spend about 15 minutes practicing finding the determinant of a 2x2 and 3x3 matrix to get the gist about how it works; and then we will never do it again but instead use SymPy to find determinants for us.

You could make the case that there are Foundational Skills on this list that can and should be treated similarly, for example generating numerical solutions to first-order DE's, a process clearly best suited for a computer. My take is that there are some tasks that, while best suited for a computer in actual practice, are worth the time and effort needed to do by hand, on a small scale, to learn how the algorithm for the task works. When I teach computer science majors about big-O analysis of algorithms, for example, I have them act out a sorting algorithm by sorting a stack of paper cards with numbers on them using one of the algorithms. This is unlikely to ever happen in real life; but when you do merge sort on a stack of 8 cards, then 16, then 32, you internalize what $O(n \log n)$ really feels like.

Likewise, I think running Euler's method on a small scale helps you internalize what the computer is doing and what the result allows you to do. Finding determinants of 3x3 matrices, on the other hand, doesn't seem to me like it provides the same level of return on investment. Definitely not 4x4 matrices! And, I might feel differently if it was another course with another demographic of students. For a crowd of computer science majors, doing a 5x5 determinant might be useful for getting a feel of the recursion in the process. With engineers, we probably have better things to do.

Also, I made a scientific poll of practicing engineers — OK, it was my sister and my two brothers-in-law over Christmas break, but at least they are real engineers — about the last time they ever had to compute a determinant or make a slope field by hand in each of their 30+ year careers as professional engineers. The unanimous answer was "Never". I'm not into making students do what amounts to fancy nerd party tricks just so I can say that we did them.

Getting there

Next up is the system for assessing whether students are learning the things I want them to learn, as well as the feedback loops in place for helping them grow, and how all of this fits together into a course grade. With all of the above in place, these bits can now make sense — but not before! We'll get into all that next week.

A stop/start/continue for the ungrading community

Robert Talbert — Wed, 04 Jan 2023 13:07:55 GMT

Forget goblin mode or gaslighting: In education, the word of the year in 2022 was definitely ungrading. While the idea has been around for a while, becoming particularly well-known after this essay by Susan Blum in 2017 and a book edited by Blum in 2020, it seemed that in 2022 the idea of ungrading found its footing. Everyone seemed to be talking about it on social media. Faculty reading groups emerged everywhere. I taught an ungraded course myself last Winter, and ungrading plays a major role in the Grading For Growth book coming out this year.

Ungrading will likely continue this upward trajectory in 2023 as higher education comes to grips with new realities, and especially as the community behind it continues to grow. Although I am not currently using ungrading in my classes (for reasons described here), I have a great respect for the ungrading community. Their energy and passion makes me hopeful that maybe we are on the cusp of the large-scale positive change in higher education some of us have been working towards for decades.

It’s from this place of hopefulness and respect, and in the alternative-grading spirit of growth through feedback loops, that I want to start 2023 with a dose of unsolicited feedback for this community using the stop/start/continue format: One thing that the ungrading community is doing doing that it should stop doing, one thing that it is not doing that it should start doing, and one thing that it is doing that it should continue doing.

Terminology

Before I get started, let me be clear about my terms. I am referring to “ungrading” as a particular approach to the evaluation of student work in which:

There are no marks (grades) of any kind put on individual items of work;
Student work fits into a portfolio, curated by students, that will be used to document their growth and development through the course;
Instead of a mark, each item of work gets feedback — written, and perhaps oral, but not numerical or mark-based — from the instructor that contains observations and suggestions;
Typically, the student can revise and resubmit work for the portfolio using the feedback (at least, I’ve never seen a version of ungrading that doesn’t allow revision and resubmission of significant items of work); and
Perhaps ungrading’s signature feature is that while course grades are still assigned, it’s done collaboratively: students assign themselves grades, often in dialogue with the professor, based on a self-evaluation of their portfolios, against descriptive criteria for what constitutes various course grades.

So, for me, ungrading is a particular form of what I usually refer to as “alternative grading”: Grading methods that adhere to the “Four Pillars” framework that David Clark and I have written about. There are other forms of alternative grading that are not ungrading, such as specifications grading and standards-based grading.

Very importantly, I am not using “ungrading” as an umbrella term that encapsulates all forms of alternative grading, or as an overall mindset about grading. This use, which has become increasingly common, views methods like specifications grading as a species (or perhaps an embryonic form) of “pure ungrading” as I described above. I think this usage of the term is problematic for a number of reasons; for example, it’s confusing to refer to approaches such as specifications grading as “ungrading” when grades are in fact being given. I’ll save that rabbit hole for another post. But suffice to say, when I refer to “ungrading” or “the ungrading community” I am talking about something specific.

Stop: Ungrading absolutism

Ungrading practitioners are passionate about student success and growth. They tend to be equally and oppositely passionate against anything that gets in the way of growth, and grades are often in the latter category. One often hears or sees statements like “I got rid of grades because I truly care about students”. Or:

Agency, dialogue, self-actualization, and social justice are not possible in a hierarchical system that pits teachers against students and encourages competition by ranking students against one another. Grades…are currency for a capitalist system that reduces teaching and learning to a mere transaction. Grading is a massive co-ordinated effort to take humans out of the educational process.

That’s from this blog post from ungrading pioneer Jesse Stommel, for whom I have the utmost respect. Another person I respect is Alfie Kohn, who said in his seminal article “The Case Against Grades”:

[G]rading for learning is, to paraphrase a 1960’s-era slogan, rather like bombing for peace. Rating and ranking students (and their efforts to figure things out) is inherently counterproductive.

In that article, Kohn goes on to describe, in great detail, how any form of grading mitigates against learning — including Four-Pillars forms of alternative grading like standards-based grading, that are not traditional but also not ungrading. He specifically singles out SBG to say that it’s “not enough”.

When you read Jesse’s work, Susan Blum’s book, Kohn’s essays, or the many articles or tweets from other ungraders, it’s hard to escape a certain message: That grades and the humanity of students are fundamentally incompatible with each other; that any form of grade-giving is at some level compromising student intellectual growth; and therefore, if you really and truly care about students, there is only one option: Ungrading, or at least a trajectory that is headed toward ungrading.

I am sure that in most cases where this message gets communicated, it’s not intentional and not even a view actually held by the person communicating it. But the message is real, and it’s a problem for a few reasons.

It can be a form of teacher-shaming. The message that if you really care about students, you’ll ungrade can make teachers who do care about students but who for whatever reason do not ungrade, feel worthless. A teacher might choose not to ungrade for many reasons: Specs grading is a better fit for the course, for example; or they're not sure how to make it work for large classes; or they’re contingent faculty and don’t get to decide such matters. It is possible to care deeply about student growth and success, and understand how ungrading works, and still choose not to do it. But the message that "care implies ungrading" is equivalent to "not ungrading implies you don't care".
It’s exclusionary. There’s a secondary message here as well: If there is no coexistence possible between student growth and grades, then ungrading is an all-or-nothing proposition. Nothing you do in the classroom is fully acceptable unless and until it’s ungrading. (I’ve seen firsthand a form of teacher-shaming in which an expert instructor doing specs grading was patronizingly told that they are “on their way”.) If the message is “all or nothing” then a lot of instructors are going to choose “nothing”. So it shuts people out. It also shuts people down, because if you’re not able or willing to “go the whole way” then why even try?
The jury is still mostly out on ungrading. There’s no shortage of testimonials on social media and elsewhere about how great ungrading is, but the fact that it works well for some people in some situations does not constitute data about its actual benefits, or an argument that it ought to be used in your class with your students. In fact, as I wrote here, I have serious questions about whether ungrading actually does more harm than good with students with significant background knowledge shortfalls. I believe generally that there’s sufficient evidence to say that traditional grading is full of intractable issues that all call for a repeal-and-replace approach to the entire concept. But there is more than one way to do it: Specifications grading is also good; so is standards-based grading (despite Kohn’s objections); so are all the many other Four Pillars-based approaches that center on feedback loops. (Reminder: I am not referring to ungrading as an umbrella term.)

Again, this message that I am referring to — that ungrading is the only choice of “grading system” that is compatible with student care — is one that I don't encounter often in real life interactions, and it’s not intentionally voiced by most ungrading folk I encounter (which is a lot). So rather than a thing to “stop doing”, this might be more along the lines of “please be careful not to do this”.

But it’s a message that has the unsavory taste of evangelism to it, and so to the extent that it appears, it needs to disappear.

Start: Getting into the weeds

I’d like to see more ungrading people start writing in detail about the day-to-day specifics about what they are doing with ungrading, how it’s working, and — especially — how it’s not working and the steps they are taking to adapt or renegotiate their practice.

I see tweets and longer-form articles about “how to ungrade” that have some good ideas on how to get started, but often they’re not much more than high-level suggestions or general descriptions of practices. Start by trusting students; Get rid of graded items that are just there for surveillance; Give lots of feedback. It all sounds good, and on some level this all is good. But getting this whole shebang to work on a daily basis, at scale, without collapsing under its own weight requires details. Blueprints. Failure narratives. All of it, not just the successes.

So I challenge the ungrading community to do something that sounds hard but is actually simple: Give us the details. Get unapologetically into the weeds through longer-form approaches of communicating your practice.

A straightforward way to do this is to start blogging about what you’re doing. Set yourself up a blog on some free platform, and once or twice a week, post a Captain’s Log about what’s happening with ungrading in your classes: What worked; what could have been done better; what totally did not work; how you will adjust, having done that thing that totally did not work; what you are hearing from your students; what you are hearing from your colleagues and administrators; how you are instantiating what you’re reading and thinking about into classroom practice; what you will do again next semester; what you swear you will never do again; your successes and realizations; your second thoughts.

You have more than enough material for a weekly update. Keep it short, unpolished, relatively unfiltered, and real. That’s what the rest of the world is all waiting for.

Continue: Being fearless

Higher education feels like it’s just a few steps away from a complete revolution right now. That’s why I have no intention of leaving this profession — I don’t want to miss this. History tells us that one essential part of every revolution, is the existence of a few elites who are in positions to expend capital to push the revolution over the edge, into places where it would otherwise never reach. If you are ungrading, you are one of those elites.

I mean “elite” in a positive sense: Talented, skilled, and creative — and above all, courageous, otherwise you would just be thinking about ungrading rather than doing it. You also have the capital to expend; not in terms of money, but in professional, intellectual, and political areas. Again, if you didn’t have this, you wouldn’t be doing what you’re doing. You have more influence on other people than you think.

So keep iterating on your ungrading practice (and again, blog what you are doing so we can all learn) to make it better, simpler, more transparent, more supportive of student success. Be fearless about taking educated risks that are likely to benefit students. Share your creations. Engage people in conversations about grades, including and especially the skeptics (and not just skeptical students). Be a happy warrior in the cause of making higher education better, starting with grading.

Who was Horace Mann?

Robert Talbert — Fri, 16 Dec 2022 13:00:56 GMT

This is a repost from my most recent article at Grading For Growth, which I co-author with my colleague Prof. David Clark. Here's the original. Join us there every Monday for new content about alternative grading systems! (We're taking a break until the start of the new year; new stuff coming January 9.)

Have you ever wondered how we got the “traditional” system of grading we have now? Many people assume it was handed down to us from centuries past, having stood the test of time across the entire scope of higher education. But the truth is much different.

You might be surprised to learn that what we now recognize as the “traditional” grading system — including the 4.0 GPA, the A/B/C/D/F scale, and the widespread use of points for assessments — is only about 125 years old. It is not hard-coded into the DNA of higher education itself! In fact, given that universities have been around since at least 1088 (possibly even longer than that), and formal education itself much longer, “traditional” grading doesn’t seem like that much of a tradition.

So, how did we arrive at the system commonly used today? This is a topic that David and I take up in one of the early chapters of the Grading For Growth book1 where we trace the evolution of grading from the beginnings of higher education up to the present day. In this article, I wanted to highlight a part of that story that, sadly, didn’t make it into the book. It’s about the outsized influence of one person on the development of school and grades today.

Education and grades in America, pre-1850

“Grades” as we now know them, didn’t really exist in American education prior to 1800. American universities adopted the methods of their European counterparts, meaning that students attended lectures — or not, as there was not really a requirement or even an expectation to do so — and engaged in salon-like discourses with their professors and classmates. But examinations were very infrequent. Students at the earliest universities took only a single, all-or-nothing oral exam at the very end of their studies, and it did not receive a grade. Instead, student performance on the exam was evaluated by a panel, which either agreed to let the student graduate or not.

Primary and secondary schools followed a similar path. Public schooling in the US did not become mainstream until the second half of the 19th century. Prior to that time, schools were primarily private or religious, and did what the universities were doing in terms of curriculum and evaluation. Even students in the one-room schoolhouses of that time took only a single final exam over each subject.

The evaluation practices of the different schools in America at that time were not designed to be compatible with each other. There was no way to compare the academic progress of two students from different schools, even in the same geographic area. This wasn’t necessarily a problem at that time, because students did not typically migrate between schools. But as America grew through westward expansion and the rapid increase in immigrant populations, mobility between schools became more common — and the limitations of the current models became more apparent. Suddenly, school reform was an important subject.

Horace Mann and the Prussian system

One of the most vocal reformers was Horace Mann. Born in 1796, he studied at Brown University and became a practicing lawyer in 1823, then was elected to the Massachusetts legislature in 1827 and then to the Massachusetts State Senate in 1835.

In 1837, Mann was appointed Secretary of the Massachusetts Board of Education. In this position, Mann became intimately familiar with the work of teachers and the need for school reform. He is said to have personally visited every school in Massachusetts to examine the school grounds. Perhaps motivated by that experience, he undertook a fact-finding tour of European schools in 1843, where he took particular interest in the Prussian school system.

Established in the late 1700s, the Prussian system was one of the first tax-funded compulsory public education systems in the world, as well as one of the most innovative. It grew out of the needs of the Prussian military, which was transitioning from a rigid command-and-control structure where troops executed orders from superiors, to a more flexible model in which soldiers made more of their own tactical decisions based on real-time battlefield situations. In order to function within such a decentralized system, the Prussian army needed soldiers who could analyze a situation and make good decisions with imperfect information and under stressful conditions. In other words, they needed educated soldiers. And to ensure every able-bodied person could attain that education, the entire Prussian educational system received an overhaul.

The existing schooling system was expanded to require all young citizens, both boys and girls, be educated by government-funded schools from the ages of 5 through 13. The eight-year course of primary education included reading, writing, music, and religious education (though a hallmark of the Prussian system was its secular approach). And among the innovations of the Prussian system was a curriculum structured around a series of incremental “grades” that allowed students to proceed at their own pace.

The structure of the Prussian system stood in contrast to the American system, which was largely uncoordinated. There was no real need for coordination prior to that time, since students rarely changed schools; and anyway record-keeping was all but impossible in many cases since student attendance was irregular and students of different ages and abilities were lumped together. And insofar as there was a system in place, it was often based on comparing students with each other. In many urban schools, students were examined, ranked against their classmates, and physically relocated in the classroom based on the results, with the top students being moved closer to the front.

Bringing the Prussian system home

Mann and his colleagues considered the Prussian system to be an upgrade and sought to institute similar reforms in America.

While many at the time believed the constant examination and ranking of students would motivate students, Mann instead believed that these practices would demotivate them. He wrote: “If superior rank at recitation be the object, then, as soon as that superiority is obtained, the spring of desire and of effort for that occasion relaxes”.2 He proposed replacing the examine-and-rank method with something familiar to us today, but unheard-of then: A series of written examinations, and monthly report cards.

The report card was to serve at least three purposes. First, they would provide a record of student progress so that the student could see their success over time. Second, they would provide parents with that information as well, and make it easier for parents to participate in their children’s education. Third, the report cards and the information they contained could serve as “an internal organizational device” as Schneider and Hutt put it, allowing schools to keep records and manage themselves.

By having stepped grades, written examinations, and report cards among other reforms, Mann hoped to transform American education from an endless demoralizing competition into a place where children could learn with each other without fear, and in which student success would be recorded in terms of carefully-curated information — a clockwork system much like the Prussian system that inspired it.

Soon, grades themselves began to take hold of higher education and made their way into secondary and primary schools. By the end of the Civil War, most schools gave grades to students. By the end of the 19th century, those grades were beginning to take the familiar form of letter grades, and then grade point averages, all awarded using written examinations graded using points. And by the early 20th century, fueled by the industrial revolution, we had essentially the system we have today.

Takeaways

So, we can trace some of the key components of “traditional grading” back to Horace Mann: grade levels in primary and secondary schools, written examinations, and grade reports. It would be easy to stop here and say that Horace Mann was a well-intentioned reformer whose innovations unwittingly paved the way for a system of grading and assessment in all levels of schooling that we can now recognize as irrelevant to modern education, perhaps even nonsensical and actively harmful. But I think that would miss some important lessons.

First, Mann’s reforms were legitimate improvements over what students were experiencing at the time. We can look back now at, say, the invention of the report card and wish it had never happened. But imagine schooling without report cards during that time: Kids (and their parents) had no reliable way of knowing what they had achieved or needed to achieve, or how they were growing over time. All they had was daily competitive oral exams whereby they were moved to the back of the room, or given a dunce cap, for poor performance. Grades, exams, and report cards may have mutated over time to be in opposition to student learning and growth, but it used to be worse. And perhaps giving students an explicit record of their academic work is not detrimental to motivation or learning, but a necessary feature for communicating with each other about growth.

Second, it’s important to note that traditions don’t just “happen”. Every tradition is an accumulation of choices made by people who are faced with a problem to solve3. Traditional grading, for all its flaws, was initiated to solve certain emerging and systemic problems in education, and for a while it solved those problems very well. And the system before Horace Mann, in turn, for all its flaws, solved many of the problems of the way things were before it (namely, having no educational system at all). At some point, we’ll all discover the flaws and issues of “alternative grading” and some future reformer will need to come along and start another tradition. Traditions are a choice. Do we make them with our hearts and heads in the right place?

Finally, I’m struck by just how much impact a single person can have on an entire institution. Horace Mann was one person with a passion for education reform, and through systematic inquiry and what we would today call “disruptive innovation” he changed the course of education. I bet many people out there now reading this blog, maybe you, can have a similar impact.

Note: Much of the historical content of this article is taken from an excellent paper by Schneider and Hutt.

The heart of the loop: Reattempts without penalty

Robert Talbert — Fri, 15 Apr 2022 13:06:23 GMT

This article originally appeared earlier this week at Grading for Growth, a blog about alternative grading practices that I co-author with my colleague David Clark. I post there every other Monday (David does the other Mondays). Check the end of this post for some extra thoughts that don't appear in the original.

Click here to subscribe, and get Grading for Growth in your email inbox, free, every Monday.

This post is the final installment in a series on the Four Pillars of Alternative Grading. The first post focused on Clearly Defined Standards, the second one on Helpful Feedback, and the third one on Marks Indicate Progress. The final one is in some ways the most important, and the most controversial: Reattempts Without Penalty.

Every alternative grading scheme we have mentioned here on this blog has this idea in common: That students are allowed to reattempt work and resubmit it for feedback, and they incur no grade penalty for doing so. Grades are not primarily based on "one-and-done" assessments, and early missteps can be corrected and improved without cost to the student's grade.

You can see why this idea is so provocative. For students, it holds out the promise that there is no more final judgment based on single moments in time --- that they will be allowed to improve and not just be expected to perform. For some instructors, it provides hope that student growth will (finally!) be the primary measure of success in a course, and some measure of grace and flexibility will be included along with high standards and "rigor". And for other instructors, this concept raises more questions than answers. Won't this just allow students to not take assessments seriously? Won't this cause a massive workload for me? and more.

So let's unpack this idea, starting with what we know.

How it traditionally works

In traditional, points-based grading systems, the evidence that students present about their learning is almost always in the form of one-and-done assessments: Tests, exams, homework, presentations, and the like. Those assessments can take on various forms, and in well-constructed courses they do have varying forms, corresponding to different levels of Bloom's Taxonomy. But no matter the form, they are are typically one-and-done: Students turn them in, and the work is graded, and that's that. Only rarely, and often only in certain disciplines like writing-intensive subjects, do you encounter the possibility of reattempts; and often those have penalties attached, by which we mean the reattempt does not earn the same credit as the original.

One-and-done assessment is clearly a terrible way to measure student learning. It captures a single moment in time, and this is taken to be representative. But if you were reading a research article and the author used a sample size of n = 1, how would you react? It makes sense as long as you don't think about it. But when you do start thinking about it, major issues arise. How do we know there were no confounding variables --- a.k.a. "life" --- contributing to a student's performance on a test? How do we know that Alice learns at the same speed as Bob, and that she wouldn't be better than Bob at the material if she had another week to work on it?

So the traditional one-and-done approach has many flaws. But there were smart and compassionate instructors who lived earlier than us, so how did this come to be the norm? I honestly don't know, but I suspect it's a combination of:

Convenience. One-and-done approaches are less work for everyone, especially the instructor. Nobody likes traditional grading because it is so soul-sucking and time-consuming, so why do it more often than necessary? This has a connection with the next point.
A misplaced trust in statistics. An argument for traditional grading goes like this: Sure, a single assessment might have a grade on it that doesn't accurately reflect student understanding. But factoring in all the assessments, the statistics done to compute the course grade will even out the noise; and in the end, the central tendency of student grades will be accurate. (Especially if "drop grades" are introduced to trim the outliers.) This is known as regression to the mean and it is a useful concept in certain contexts. But it has a fundamental flaw when attached to grading: The marks we feed into the stats are not really measurements. I promised last week to expand on this idea in the future, and I will. For now, I will claim: Although we put numerical points on student work, these are not truly numerical data but rather ordinal categorical data --- ordered labels, in other words. We shouldn't be performing statistics on these in the first place except for maybe modes, medians, and max values.
A fixation on rigor. The idea of letting students redo work strikes many people as soft. It leads to "grade inflation". It violates some kind of academic machismo code that views student assessment like competition in an arena. Now, we are all for high academic standards, and in fact alternative grading makes it possible to have higher standards than we've ever had, precisely because we don't give one-and-done assessments. But when academic standards and the related concept of "rigor" become the antagonist in your course and students the protagonists --- or maybe it's the other way around --- the environment becomes toxic, and the focus is moved from students and their growth to the abstracted concept of rigor, which we've written before is a meaningless term that should be abandoned, not celebrated.
Tradition itself. And of course, there's good old-fashioned inertia. Most of us instructors came up through courses that had one-and-done assessment, and it "worked for us", so we use it now. We never stop to think about the people for whom it didn't "work".

No penalties? Really?

Many instructors can get behind the idea of reattempts, but the idea of not penalizing them seems hard to swallow. It feels "un-rigorous" or a contributor to grade inflation, or just plain unfair, especially to students who did adequate work on the first attempt. But we really mean it: Reattempts should not be penalized and none of the alternative approaches seen here do so.

The reason is simply that growth is what grading is about, or should be about. And why would you penalize growth? Engagement with a feedback loop and the growth that takes place a result is a normal, healthy part of human learning. It's not a sign of a defect or a deficiency. Instead, we acknowledge that learning takes time, time and effort, and that means normalizing reattempts.

Does allowing reattempts without penalty lead to grade inflation? Not really. “Grade inflation” refers to increases in grade levels without a corresponding increase in the quality of the work. That second part is key; simply having higher grades by itself isn’t grade inflation. Reassessments let students demonstrate that they have actually learned. When we allow reassessments without penalty, grades do tend to increase, but they are tied to concrete evidence of improvements in learning. This isn’t “inflation” — it’s accuracy, and validity.

What about the objection that reattempts without penalty are unfair to students who do good-enough work on the first try? This seems to stem from a combination of two misplaced ideas about grades: that they are compensation, or that they are the result of a competition.

If you worked 8 hours at a job that pays $15 an hour, and I worked the same job for 4 hours, and we were both paid $120 at the end of the day, or if we were both paid $60, then that's unfair — because there is supposed to be a precise relationship between the amount of effort and the payment received, and it's broken. But grades are not like this, or at least they shouldn't be. If they are like rewards at all (a concept I am not comfortable with, but I'll accept it for this analogy) it's a reward for simply finishing a job, like paying someone $50 to mow my lawn. It doesn’t really matter to me how long it takes, as long as it’s done.

Many also view grades not so much as compensation, but rewards for placing high in a grading competition. If you see the point of taking a class as beating everyone else in the class — an approach sadly common among American students — then allowing others to earn the same grade as you on an assessment or in a course, even though the other person struggled where you didn’t, this can feel unfair. For those students, you can explain: What we are trying to do is not rank students but measure learning. There is no fixed, artificial amount of A, B, C, D, or F grades and so no need for competition to grab them — and one person’s success is not going to lessen another’s chances of success. There is no “curve” and every possibility that every student can earn an “A” through hard work, effort, and engagement with the feedback loop. So relax! Every student is working to meet the same standards and we are all on the same side.

Others might object that reattempts without penalty discourage students from doing good-enough work on the first try. There's something to this objection: Often the earlier objectives in a course need to be mastered in order for students to get the most out of the subsequent concepts. If students have less incentive to do that, there's a danger they'll fall behind if they don't give their best effort on the initial tries. It’s a legitimate concern; but this can be addressed through mindful course design (see the next section) and regular communication.

How to reassess without penalty

David wrote a post a while back that goes into detail on some of the mechanics of reassessment, and how to keep out of "grading jail" when you give reassessments. I'll try not to repeat his article. But I do want to stress that "reassessments without penalty" does not mean reassessment without responsibility. While we shouldn't penalize growth, we can design our courses so that students don't take the reassessment opportunities the wrong way.

Instead of penalizing, you can place reasonable limits on reassessment opportunities. For example, you can:

Limit the number of reassessments. In my Discrete Structures courses where I use specifications grading, for instance, when a new Learning Target is introduced, it appears on only three consecutive Learning Target quizzes, then it is "retired". That means that a student can continue to be assessed on it if needed, but only by request and with a cost (i.e. spending a token). This addresses the concern about students giving their best effort to do well on the first 1-2 tries; and it keeps the size of the quizzes down.
Limit the schedule of reassessments. For example, only do reassessments during your office hours; or only on Fridays, or every other Friday. An advantage of this approach is that it introduces time for thought and practice --- students can't typically take an assessment and then turn around mere hours later and do a reassessment.
Limit the frequency of reassessments. This approach works well with writing-intensive work. In my Modern Algebra class, which is primarily based on written mathematical proofs, students get two problems a week; they can revise any of these as often as needed, but with a cap of one problem per week. So revisions of problems aren't penalized, but they are scarce. The scarcity drives up the value, and it keeps my grading workload manageable. (Students can, if they want, submit multiple revisions of the same problem each week.)

Not only can you place reasonable limits on reassessments, you can also require extras:

Require a metacognitive reflection. When a student submits a revision of a proof, require a brief but substantive reflection that summarizes the important items that caused issues on the previous submission, along with a specific explanation of what they did to improve their understanding and how they have demonstrated that improvement.
Require evidence of successful practice. In skills-oriented assessments, you might require students to work additional exercises related to the work being reattempted, and submit those as a down payment on a revision, or alongside a revision. You don't necessarily have to grade the additional practice, just give it a once-over to make sure it's mostly OK and done with good-faith effort. Then you evaluate the reattempt.

Finally, on a reattempt, you can ask students to mix up their methods. For example, you might require some (or all!) reattempts to be done orally in your office hours; or perhaps through a Flipgrid video. This isn't a penalty by any means, but it asks the student to try again but this time in a different way. That can help you be more sure that the reattempt is not just done mindlessly; it can also help the student, because changing up the approach can help them learn and retain the concepts better.

Above all: Feedback loops

What I've stressed through all of these Four Pillars articles is what's on the "ceiling" of the visual: Feedback loops.

All human learning happens by engagement with a feedback loop, and we are simply remodeling our grading practices to be more in line with that fact. Giving reattempts without penalty is, in my view, the visible sign that we are serious about that alignment. But all of the pillars are grounded in the idea that grading should be about growth, and this takes time, effort, communication, and patience.

Bonus extra thoughts

Beware of pushback from students if you use a policy of reattempts without penalty. Wait, what? Students? Why would they push back on this when there is virtually no downside for them? I know it seems weird, but it's real. The ones who push back are the highest-flying students, the ones who take great pride in getting things right the first time. They are the ones for whom this concept will seem singularly unfair. In fact the first time I tried a reattempt policy in a class — a Calculus 2 class for engineers at Vanderbilt, so high-flying students indeed — one of them accused me of practicing "academic communism". I'm no therapist, but I think there's a line between taking pride in work well-done and centering how you value yourself as a human being on getting good grades. In some ways I think those kinds of students need growth-focused grading practices more than the ones who struggle with the material.
There's another objection to reassessment policies related to something I mentioned above: If students can reassess (particularly if there's no penalty) then they might not put their best work into an early concept, which is then built upon a later concept which is also assessed, and so on — creating a snowball effect where students are having to assess on newer topics before mastering the older ones. This snowballing is a little worse than simply falling behind in a class because it creates exponential growth in student assessments. This too is a legitimate concern. It merits a deeper dive, but for now: (1) I think you can address the snowballing issue pretty effectively just by frequent, high-quality communication with students and by keeping the individual standards simple, and (2) we have to remember humans don't learn in a linear way, where we master a subject first before moving on to something else that depends on the earlier subject. We're constantly having to go back to prior knowledge and practice, fill in gaps, unlearn bad habits, etc. So while the snowball effect can be serious, I don't see it as a glitch in alternative grading setups but an organic reflection of how lifelong learning takes place. It has to be managed but its presence doesn't mean the system is broken.

How specifications grading changed my view of academic dishonesty

Robert Talbert — Fri, 28 Jan 2022 13:25:31 GMT

This article originally appeared at Grading for Growth, a blog about alternative grading practices that I co-author with my colleague David Clark. I post there every other Monday (David does the other Mondays). Check the end of this post for some extra thoughts that don't appear in the original.

Click here to subscribe and get Grading for Growth in your email inbox, free, once a week.

Let's discuss academic dishonesty, by playing a game of attribution. Guess who wrote the following:

Academic dishonesty is not only easy to catch, it’s a horrible miscarriage of the mutual trust upon which all of education is built, and students who willfully engage in it deserve all the punishment they receive, if not more. There’s simply no rationalizing it, and I don’t think we in higher ed do nearly enough to eradicate it.

And this:

[Academic dishonesty is] more than just youthful indiscretion, like drinking too much at a frat party or sleeping through an exam [...]. Academic dishonesty is a willful, intentional violation of trust, and if you are a professor and have a shred of respect for the life of the mind, you have to do something about it, even if it might earn you a reputation as a mean SOB among students.

I don't know what went through your mind as you read those, but I was thinking: This person isn't necessarily wrong, but they seem to care a lot more about the integrity of "academe" than they care about their students. (Assuming they’re an educator in the first place; from these snippets, who can tell?)

If you found those words hard to read, then imagine what it's like for me to read them. Because I'm the one who wrote them, in two separate blog posts (here, and here) on the same day, almost 14 years ago.

It gets better. I also wrote this two years earlier:

The question is really one of economics. If you cheat on a quiz, for instance, and “earn” yourself five points by short-cutting mastery of some material -- and then go and take a test that has 20 points of questions on the same material, and you lose all 20 because you didn’t master that material -- then you are 15 points in the hole. There is a net loss in the process of cheating or plagiarizing, even if you don’t get caught. And if you do get caught, the stakes go that much higher. It’s rational choice theory applied to the classroom.

Regardless of what you think of these blasts from the past, they illustrate an important point: How we grade and how we think about academic dishonesty are deeply interconnected. As my beliefs and practices about grades have evolved over the last 25 years since started my career, my beliefs about academic dishonesty have changed significantly. If you’re using alternative grading or thinking about it, perhaps those beliefs are changing for you as well.

That was then

The definition of academic dishonesty that I am using (and in my mind have always used) is from our student handbook: It is “any action or behavior that misrepresents one’s contributions to or the results of any scholarly product submitted for credit, evaluation, or dissemination.” This includes cheating, collusion, dual submission, falsification, and plagiarism — or enabling other students to engage in these.

Up until a few years ago, I used traditional grading 100% of the time. My view of grades as “really a question of economics”, while cringe-worthy, is perfectly rational in that context. In traditional usage, grading consists of a system of points, incentives, and transactions — driven by scarcity, supply, and demand. It’s a little economy, living there in my classes.

If grades and grading are a form of economics, then academic dishonesty is a form of fraud. It’s cheating the rules of the system to get something for yourself that you didn’t earn. This may not be a wrong way to think about it in any case. But if you take the economic point of view on grades, academic dishonesty is not just undesirable, it’s illegal, something to be prosecuted. And in fact, in my old syllabi I used to write about how I’d “prosecute” instances of academic dishonesty when (not if!) I discovered them.

In the past, I pursued academic dishonesty with the zeal of an inquisitor. I didn’t want to see academic dishonesty in my classes. But if you committed academic dishonesty in my classes, then God help you. Because I would spend hours on Google hunting down the precise Stack Exchange post, the exact paragraph on the exact page of the PDF posted elsewhere where the answer was copied. And then I would do the following:

I would call the offending student (or one student at a time if there was a group involved) into my office and politely ask them to talk about their work — and listen like a prosecuting attorney ready to pounce on an inconsistency.
At some point, I would make a clever and dramatic shift along the lines of: “I had something else to discuss with you about this work. Here’s your work. [Lay out the student’s work.] And here’s [another student’s work | a Wikipedia article | a website | whatever]. These are very similar as you can see. Can you give me some context for what happened here?” I used to call this “the reveal”, like they do on those house-makeover shows.
Then I would expect an answer, specifically either (1) a “confession” (that legal language again), (2) a legitimate explanation, or (3) a BS explanation.
Then I would take all this information and administer a penalty I thought would be fair under the circumstances. Sometimes it was no penalty. But often what would go through my mind is that I wanted to give the harshest penalty that I could justify.

Those days were not happy ones. I had academic dishonesty issues at least once a semester, often more. Sometimes those cases were fraught; at least one of them involved a parent showing up to my office with a gun. It’s difficult to know which came first, the unhappiness or the hard-line stance I was taking.

But even when the situation was clear-cut and someone who blatantly broke the rules got their just desserts, it wasn’t satisfying for long. Because what exactly did it accomplish? Did I have any evidence that this made the student any better at math, or at life? The answer is no. I can’t even prove that the students who were caught and punished ended up avoiding academic dishonesty in the future — that they’d learned their lesson. Indeed some did go on to do it again.

This unhappy and unproductive state is all because of points and the economies we create around them. I’m not sure how we instructors are supposed to expect students not to be tempted into academic dishonesty, when we monetize academic success and create artificial scarcity around it.

This is now

But if you think of grading as a means for growth, then you will take a very different path. The best way I can explain this is with a real-life example that happened to me recently, with a student in a class where I was using specifications grading.

My student had turned in a problem set with one solution blatantly copied and pasted from a Stack Exchange post. Literally, the only thing the student did was Control-C the work and Control-V it into their submission. The terminology was weird, there mathematical symbols on the page that didn’t render in the student’s Word document… not one pixel of that webpage was processed in the student’s brain. It was the perfect cut-and-dry case that, circa 2008, would have had me buzzing with the anticipation of catching a cheater red-handed and suplexing them with the Student Code.

But then I started thinking about a couple of things.

First was the four pillars of alternative grading:

I thought, What’s the point of my grading system or of any grading system? The Four Pillars here don’t say it explicitly, but the point is growth. Whatever grade I give to student work or at the end of a course, it needs to signify and promote growth. Thinking about this (comically amateurish) picture made me think differently about the student’s situation. Rather than take sick pleasure in “prosecuting” an open-and-shut case of academic dishonesty, maybe this could be used as an opportunity to help the student learn something? To grow?

I also thought a lot about my friend Spencer Bagley, who on Twitter has said a lot about what he calls cop shit — the kinds of Law-and-Order cosplay that professors often engage in when they are out to “prosecute” something. This tweet in particular, and the thread it belongs to, stood out for me:

The whole reason why I, a person who fundamentally does not believe in cop shit, still want to check a student about """cheating""" is that it is work that does not produce learning. My whole thing is learning, and """cheating""" short-circuits this process. (10/n)
— Dr Spencer Bagley 🏳️‍🌈 (@sbagley) September 18, 2021

My university has an extensive policy for handling academic misconduct, that starts off with this:

If an instructor suspects any instance of academic misconduct, the instructor must notify and meet with the student to discuss the incident. Based on the outcome of that meeting, the instructor may find there was no act of academic misconduct and take no further action. If the instructor finds there was an act of academic misconduct and the instructor would like for corrective action to be taken, the instructor must report the matter to [the office that handles it] with sufficient evidence to substantiate their finding, and with a recommendation for a corrective action as listed below:

Following this, is a menu of corrective actions that I could recommend, ranging from simply requiring the student to redo the work to imposing a failing grade for the course. Without going into details, once I make a recommendation, it’s not the end but rather than beginning of the process involving an independent facilitator, who is not a faculty member, to assess — and if needed, adjust — my recommendation. And if the facilitator determines there was indeed academic misconduct but the student “does not accept responsibility”, the whole matter goes to a hearing involving multiple layers of review boards.

This policy is well intentioned and, in general, provides students with valuable protections against faculty members who were like me in 2008 and just want to impose the harshest penalty that can be justified, and get on with our lives. I thought carefully about how to follow this policy. In the end, I determined that there was academic misconduct, but I did not want “corrective action” to be taken. Instead, I just wanted to treat the student’s work like any other solution that was submitted without a sound explanation: I wanted to sit down alongside the student — to “assess” them — and get them to provide a better explanation. I wanted them to grow.

So instead, I did this:

I emailed the student with a link to the Stack Exchange post they copied and asked them to please come to the office to talk about it. When they did, I explained that they needed to revise the work and explain everything fully, right here and now.
Over the next half hour, I had the student go up to my whiteboard, and I coached them through a solution attempt (somewhat based on what was in the Stack Exchange post). It was a long way from me telling the student what to think. The student was having “aha” moments fast and furious. Oh, so THAT’s where the “n-3” term comes from in their formula. And so on.
When done, what I wanted from the student was for them to write up a complete and correct solution with everything explained in full detail — and no more copying! If they had any questions, they were to ask those to me, and not go hunting through the internet.
As the student was leaving, they said — So you’d rather see me learn stuff than turn me into the Dean5? And I said, yeah, basically.

The student ended up turning in good work on that problem set and likewise through the rest of the semester.

The moral of the story

When I first posted this story on Twitter, it got a lot of likes. I’m not interested in Fake Internet Points, however, and I don’t bring this up to pat myself on the back for being a good teacher or some kind of hero.

It also got a lot of pointed questions, such as these:

How do you know the student didn’t go on to tell everyone in the class how to get away with academic misconduct? I don’t. They may well have, and probably did. But, if being able to finally explain the solution to a problem so that they clearly demonstrate internalized understanding of the solution is “getting away with it”, then I suppose my entire career is all about helping students “get away with it”.
What if the student had committed academic dishonesty before and gotten caught, thereby deserving a harsher punishment? It’s possible. At our university, that’s not up for the professor to determine because FERPA would keep me from the student’s past records. But even if I did know that this wasn’t the student’s first rodeo, that wouldn’t change my thinking. Their growth is still, by far, more important than making sure they get their “just desserts”. I am just not ready to believe that a student who is willing to revise their thinking, learn, and grow deserves to be given an F in a course or kicked out of school. In fact that seems to be the exact opposite of what “school” is for.

The reason I bring this episode up is to drive home a couple of important points for all of us as we rethink grading.

First, our view of academic dishonesty — really of the entire process of student work — is deeply affected by how we view the purpose of grades. Maybe this is obvious. But I never really realized how a transactional, scarcity-driven approach to evaluation and grading can create a learning environment that has more in common with a police state than a classroom. Conversely, when you start thinking of student growth as the primary purpose of grades and switch from a scarcity model to an abundance model, then academic dishonesty can still be wrong, but for very different reasons than before. It’s wrong now because it inhibits student growth; and the path you take to deal with it is to find a way to inject growth into the process.

Second, you can change over time. As my early blog posts show, we are all works in progress in a constant state of revision — a lot like the work we have students revise in alternative grading systems. The longer I teach, the less simple it all seems, and the more I believe there’s much more to students and their work than a simple reduction into points and formulas can possibly indicate.

I’m not suggesting that all of us using alternative grading should turn a blind eye to academic dishonesty. Far from it! We need to address it head-on, because despite the annoying piety of my early blog posts, academic dishonesty is a breach of trust and it needs to be addressed in order for growth to occur. Your university may have ironclad procedures about how to address it, so read the footnotes in this post. But if or when you have to address it in your classes, think about the following questions:

What manner of addressing academic dishonesty leads to the greatest amount of student growth?
How can I leverage the grading policy (specs, ungrading, etc.), especially my revision policies, to prioritize student growth over punitive measures?
What conversations do I need to have with my students about the grading, and especially the revision policies in the course that ought to make academic dishonesty undesirable in the first place?
What do I need to change about my own teaching and relationships with students to help them choose honest growth over dishonest gain?

Bonus extra thoughts

I put the following as an important footnote in the original (this blog platform doesn't play nice with footnotes): I mention our academic misconduct policy not to criticize it, although I do have criticisms. I do it because faculty members need to think about their institution’s policies for academic misconduct before running off and doing what I describe here. In some universities, you may have significantly less freedom to interpret the rules. I certainly did not go the route I describe here until I was mostly certain that I wasn’t violating any significant faculty policies. I’m not generally a rules-follower regarding academic policies. But, when it comes to this kind of thing, those policies are usually there for a reason, and that reason often involves lawyers, and I’d rather not go there. So, as you read this, take some time to review your own institution’s policies and be clear on what you can do in this situation and distinguish it from what you must do.
A few people wanted to know more about the situation with the gun: This was at my previous institution with a different set of rules. I had two students in a class who I suspected of cheating on a homework assignment. At the end of a class meeting, I announced – in public! – that I needed to see the two students, calling them out by name, following class. At which point I had "the talk" that I described above. They denied any wrongdoing. I disagreed and submitted the incident to the dean. It turns out that the dad of one of the students was a police officer, and he showed up at my office hours unannounced, in uniform – including his police sidearm – to argue on behalf of his kid. Someone saw this happening, ran upstairs to the dean's office, and the dean came down and got into a shouting match with the dad about bringing a loaded gun on campus. The dad eventually left. So, this wouldn't have probably happened if I'd handled the situation better, and the dad was probably just swinging by while he was in the neighborhood and not with the purpose of threatening or intimidating. But... talk about "cop shit"!

Three steps for getting started with alternative grading

Robert Talbert — Thu, 30 Dec 2021 20:13:00 GMT

This article originally appeared at Grading for Growth, my blog about alternative grading practices that I co-author with my colleague David Clark. I post there every other Monday (David does the other Mondays). Check the end of this post for some extra thoughts that don't appear in the original.

Click here to subscribe and get Grading for Growth in your email inbox, free, once a week.

In the five months since David and I started this blog, one thing has become crystal clear to us about alternatives to traditional grading: There is a great hunger among educators to change how assessment and grading are done. We're closing in on 500 subscribers here at Grading for Growth, for which we're deeply grateful, but it goes much farther than just this blog. Everywhere we go, in person or online, David and I are finding that people in education have had enough of the failures of traditional grading and are ready to make the leap to something better.

Maybe you are one of those people, and you're ready to make the leap now. You've been doing traditional grading forever but, as was my case in 2014, something's finally clicked inside you, and you want to make a move to an alternative grading system, as soon as possible: starting next semester. Which starts in six weeks. And you're wondering, Where do I even start?

If that's your situation, then this article is for you. Here are three next steps to take, to get off to a great start transitioning to an alternative grading system.

Step 1: Start with learning objectives

The first concrete step toward transitioning to alternative grading systems is simple and will improve your courses, even if you end up changing your mind about switching grading systems: Write clear and measurable learning objectives for each module of your course.

A learning objective for a lesson (or module, etc.) is nothing more, or less, than an action that a student should be able to do to demonstrate that they have learned some important topic or concept in the course. For example, in my discrete structures course, being able to apply the binomial coefficient to solve counting problems is an important thing I want students to be able to do. So I turn it into a formal learning objective: I can compute a binomial coefficient and apply it to solve counting problems. If I were teaching a US History class and wanted students to be able to discuss the causes of the Great Depression, that might become a learning objective too: I can explain the causes of the Great Depression.

Learning objectives need to be clear and measurable in order to be of use to us or students. "Clear" means understandable from the student's perspective (not ours, although it should obviously be clear to us too). "Measurable" means there is some way of knowing whether or not the student can do the action that the objective describes. The "measurement" you administer might be something quantitative, or it might be something qualitative like an essay, a project, or a sit-down discussion with a student.

In fact, this is where learning objectives tie into grading: The "measurement" you give to students to gauge their attainment of a learning objective is an assessment, and what do we do with assessments? We grade them. In traditional grading, it might be the first and only time this measurement is taken. Alternative methods, on the other hand, use assessments as opportunities to engage in a virtuous feedback loop that eventually leads to real understanding, a.k.a. meeting the learning objective. A set of clear and measurable learning objectives is therefore the skeleton of all effective grading systems. Without those objectives, our assessments may not align with what we want to measure, and the feedback loop goes nowhere.

So that's why I say, start with the learning objectives. In the next couple of weeks, set aside an hour or two to look through what you want to do in your courses for next semester. Break the whole course into smaller modules. (If you're using a textbook, you can just use the chapters and sections you plan to cover.) For each of those modules, write a list of clear, measurable learning objectives that captures all the important items you want students to learn. (Note that word important. As David mentioned last time, resist the temptation to cram too many things into a course. Cut stuff out now, before things start!)

Then you can work backwards from there: With the list of learning objectives in hand, you can then ask, What evidence might students give, that would demonstrate sufficient understanding of the objective? Next ask: What activities or assessments can students do, to provide that evidence? This is where you can start thinking about how exactly you intend to assess and grade those activities and assessments. But again, it starts with the learning objectives; and it's a good thing to do regardless of your grading preferences. And getting that done now, will make life much easier for you later.

I've written much more about the importance of learning objectives if you want to dive deeper. This article gives a tutorial on how to write good learning objectives; this one explains why good learning objectives are essential for good online and hybrid courses, and this one shows how I wrote the learning objectives for my Calculus 1 class in Fall 2020 (which used specifications grading).

Step 2: Build a professional network of alternative graders

The second step is not like the first one. Transitioning to an alternative grading system is exciting, but it's also challenging. The first time you use an alternative system, there will be times when you'll face difficulties: helping students understand your system and helping them onboard and buy in to the system; dealing with students who push back; loopholes in your system you overlooked and might need to change; questions you never considered and now need to answer. You need a support network of trusted colleagues to share ideas with, and lean on when things get tough. Building that network, and getting involved in a community of practice around alternative grading, is step 2.

Fortunately, there are several ways to go about building your support network:

The Alternative Grading Slack workspace, which David and I help administer, is an online community of practice with hundreds of practitioners from all over the planet and from all walks of academic life who practice, or are curious about, alternative grading. It costs nothing, and it's a strictly judgment-free zone for connecting with others and sharing ideas, materials, and questions.
If your college or university has a teaching/learning center, check with them to get connected with resources and people on your campus and with professional organizations. Many of these centers will help you set up faculty learning communities where you and some of your colleagues can band together for a semester and do alternative grading together. If you don't have a teaching/learning center, just start asking around. You might be surprised at how many people with whom you work may be interested in this.
Many disciplines have professional organizations, or branches of those, that focus on teaching and learning. The Mathematical Association of America is one example. Within those organizations might be colleagues at other institutions, but in your discipline, that either practice alternative grading or would be interested in doing so.

And don't forget to talk to your department chair and/or your dean about what you are thinking about doing next semester. You'll want to explain to them what your goals are and what your approach will be, to get them in your corner from the beginning. The first time they learn about your efforts should not be when a student goes to them to complain about it. Rather, you'd like them to be allies in your efforts.

Step 3: KEEP IT SIMPLE

The third and final step to get started with alternative grading goes back to your course design process. The first time you do alternative grading, you may be tempted to craft a complex system that covers every possible situation in the course. Don't. Instead, keep the whole thing as simple as humanly possible.

When I made the leap to specifications grading, I built a grading system that I felt really worked well, and, in my mind at the time, it seemed simple. It was fine --- except there were 68 (!) learning objectives, split into three groups (some of which overlapped!) with multiple assessment types on each. For all its benefits, which were numerous, my first-attempt grading system was kind of a disaster because it was so complicated.

What I learned from that experience is that in every system of grading, there's a tradeoff between accuracy on the one hand (the extent to which grades are valid measurements of learning) and simplicity on the other. It’s hard hit that sweet spot where accuracy and simplicity are both maximized. If you must err on one side or the other, I believe it’s better to make your system simple rather than to go overboard trying to make it 100% accurate.

I also believe a simple system is more likely to be accurate than a complex one, because students will understand it better, therefore will be more relaxed, therefore will do better work in most cases. A clearer set of targets is simply easier to hit.

So while you should put in the work to make a grading system for your courses that is based on sound principles of learning, that produces valid information about what students are learning, and that helps students grow in their learning — you should also not stress out over making it perfect, because it won’t be. Instead, just keep it simple and make it something that you and your students will actually like using.

You don't really even need to aim for a full-on alternative grading system next semester if it seems like too much to take on. Remember the four pillars of alternative grading. Even if you can just address one or two of those pillars, you are making major progress and doing great things for students. For example, you could implement a scaled-back approach and just focus on having clearly defined standards and allowing reattempts on assessments without penalty --- even if you stick with using points, and can't yet commit to providing lots of feedback on student work.

If you're ready to make the jump to an alternative grading system, realize that it's entirely doable! In just a few short weeks, you'll be helping your students grow and wondering how you ever did grading differently. It's not easy and you'll need help along the way. I do think these three steps will get you moving in the right direction. You can do it!

Bonus extra thoughts

Do clear and measurable learning objectives kill a spirit of free inquiry in a class? Some people think so (example, another example), but I disagree. Absolutely, you can over-program a class so that the course becomes more focused on standards than people. But that's not the fault of the objectives. When it happens, it's our fault (or the fault of administrators who impose cookie-cutter standards from the outside in). It's completely avoidable and completely possible to have a course that embraces the full learning process in all its messy humanity, and yet provides clear guidelines that communicate to students what they should be learning and how we as instructors plan to tell if and to what extent they've learned it. See this older post for more.
A point I made here that needs signal boosting: Alternative grading is not a religion and you do not have to "convert" fully to it in order to explore it. Right now it's about two weeks before the start of people's semesters. It's probably not the best time to make wholesale changes to how you grade starting from zero. So just realize, any step you make, no matter how small, toward the four pillars (or a subset of those) is good. Even if you just commit to, say, providing clear and measurable learning objectives in your course, it's a great step and you'll be part of the solution.
Also important: The importance of simplicity. I have a quote from Leonardo daVinci in my office: Simplicity is the ultimate sophistication. That goes for your course structure and grading as much as it does anything else.

A word about words

Robert Talbert — Fri, 19 Nov 2021 17:40:48 GMT

This is a repost from Grading for Growth, my blog about alternative grading practices that I co-author with my colleague David Clark. I post there every other Monday (David does the other Mondays) and usually repost here the next day.

I probably put in more time and work on this particular post than anything I've done for a blog before. I have some additional thoughts on this, not found in the original, at the bottom.

Click here to subscribe and get Grading for Growth in your email inbox, free, once a week.

Words have power. If David and I didn’t think so, we wouldn’t be writing a book or this blog to advocate for changes in the way we grade in higher education. Over the last couple of years, we’ve been growing a global community of practice around alternative grading through our teaching, writing, and collaborations with others on the annual Grading Conference and the Mastery Grading Slack. Along the way, we’ve used a common umbrella term to refer to the alternative grading practices used in all of those places: mastery grading. It’s been handy to have a single term to refer to all of those practices. But there’s also been a growing realization that this term has issues that we can’t avoid any longer.

Today, David and I are making a choice to end the use of the term “mastery grading” to refer to alternative grading practices. We will not be using it in our book except as a historical reference, and we will be moving away from the use of that term in our own practice and in places where we’re involved, like the Slack workspace and the conference, where it’s currently used. In this post, we want to explain why, and give you some tools to think about the issues for yourselves.

The origins of the term "mastery grading" are (to us) unclear. But we have used it freely in the past with a clear intended meaning: grading practices that are not the traditional points-based, "one-and-done" assessments we're familiar with, but those which promote growth and individual mastery with concepts in a course. Whereas traditional grading audits a student's abilities and leaves it at that, the term "mastery grading" is intended to point toward grading philosophies that promote continued development of those abilities so that they are eventually honed into comprehensive skill.

As terminology, “mastery grading” has done good work for us as we have sought to address the pressing issues bound up with traditional grading systems. It has economically described a more valid and more just set of grading practices with the growth of the student in mind. It conveys the idea that all students have the ability — and should have the opportunity — to grow and develop comprehensive skill in what they are studying, by turning short-term failures into fuel for a feedback loop that eventually leads to success.

We still believe this. However, as the idea of “mastery grading” has evolved and its profile has risen, we have more and more frequently run into three major issues with that term. These three issues are why we are deciding to move on from this term.

As alternative grading practices have received more and more awareness, we've seen “mastery grading” become confused or conflated with other pedagogical ideas that also use the word "mastery": mastery learning, flipped mastery, and mastery-based testing. David wrote about mastery-based testing in this post. You can read about the others at the links. They are all ideas with merit, and they have some roots in common with mastery grading. But, none of them are what we mean by “mastery grading”.

As alternative approaches to education across the board gain traction, the confusion between all of these is growing. For example, I (Robert) was contacted recently to give a workshop on mastery learning by someone who had been reading this blog — even though I have never practiced mastery learning personally. I felt bad for wasting the other person’s time, and I wonder how much confusion is caused by the names being so similar. So we are no longer using “mastery grading” partially to end that confusion.

Issue #2: Issues with expectations and growth mindset

The more we use alternative grading practices in our own classes and talk to others who use them in theirs, the more we think the word "mastery" is too strong of a word to describe what we're after. Of course we do want students to master the material they are learning — eventually. But in reality, true mastery of a subject is something that often takes a lifetime to achieve, and it can look like different things. Inside a single course, really what we want is to set students up for mastery in the future by giving them a solid foundation of skill and the tools to keep learning. “Mastery” is not an accurate description of what we really expect.

And in fact, if we do say a student has “mastered” a concept, this description doesn’t invite further growth or reflection. Once you've mastered a skill, where can you go from there? It actually seems more productive and more conducive to a growth mindset to say, to a successful learner, that they have begun the journey toward mastery but that there’s much left to explore. Calling a learner’s skill “mastery” prematurely, even if they finish strong in our courses, might actually work against what we are really hoping for: Growth.

Issue #3: Connections with slavery and systemic racism

There are two similar, but very different connotations of the word "mastery" in English usage.

One of those is "comprehensive skill" of a human being in a task or a subject of study. One "masters" proof by mathematical induction or the bass guitar, for example. This is found in common academic concepts like the “Master’s degree”, mastery learning, and others. In this sense, “mastery” is a deeply human word that encapsulates the dignity and abilities of every learner. This is the sense that we have always intended with “mastery grading”.

But there is another connotation: the domination of one human being over another. This connotation directly leads to, and comes directly from, the concept of slavery and all other places where a person is the “master” of another person. It is an odious and inhuman concept, the very opposite of what we hope to achieve with “mastery grading”. Sadly, this second meaning has found its way into common technical uses: the term “master bedroom” in real estate, for example, or the idea of “master/slave” in computing and other technical fields. Over the last few years, the use of the word “mastery” in those contexts has been stopped; most realtors now call it the “primary” bedroom instead, for example.

We were first made aware of objections to the term “mastery” in “mastery grading” on these grounds in summer 2020, during the inaugural Mastery Grading Conference. Confronting systemic racism was fresh on the minds of everyone that summer. I remember someone posting on the Slack channel during the conference, asking when we (the organizers) were going to deal with the connotations to slavery that “mastery grading” carried. To be honest, it was the first time this idea had ever crossed my mind, and I was dismissive of it.

But those questions have come and gone in waves over the 18 months since that conference, from people whose sincerity and commitment to learners is beyond reproach. Those questions demand to be taken seriously. In response, the Mastery Grading Conference was renamed to “The Grading Conference”. But as David and I have worked on early drafts for our book, we’ve slowly realized that we need to do more to come to terms (so to speak) with these points.

To help us get outside our own biases and bubbles, we reached out to over a dozen professionals in higher education — some in our personal networks, some who are in the networks of those people — all of whose judgment we trust, and who have strong diversity/equity/inclusion bona fides. We asked them: What should we do about this term, mastery grading?

Every one of the people we reached out to said we should get rid of the term “mastery grading”. They all brought up the three issues I have outlined above, as well as the point that words matter. While for some, the term “mastery” points directly to the idea of “comprehensive skill”, it can also be a constant reminder of the horrors of slavery and the continuing impact of systemic racism.

So we are dropping the term.

We are not doing it to score Fake Internet Points, or to pat ourselves on the back for being progressive, or any such thing. We want something more: to help create a setting where learners and instructors can all thrive and engage their humanness to the greatest degree.

So the replacement term is…

Just kidding, actually there isn’t one.

As issues with the term “mastery grading” have persisted, people everywhere have tried to suggest a replacement name. I won’t try to list the ones we’ve heard or seen because there are too many, and as I wrote here, terminology can be a distractor from the urgent issues at hand. There is definitely the temptation to spend more time and energy figuring out a catchy name for all these grading practices, than is spent on actually doing the practices and iterating on them to make them better.

So, for now, we make no attempt to invent a new term to take the place of "mastery grading". We have tried, with the help of many others, and none of the proposed replacements captures the essence of what we mean without giving up some other essential aspect or introducing new cultural or linguistic baggage we'd rather not have. So we will typically refer to "alternative grading practices", the specific flavors of which do have distinct names like "standards-based grading", "specifications grading", and so on. And we will focus on the common concepts that drive all of these, rather than trying to come up with a common name. You are free to call it whatever you wish; if something catches on, please let us know.

Other thoughts on this for rtalbert.org readers:

You might ask, "Why was this such a hard thing to do? Why didn't you just drop the term at the first sign of problems?" It's a fair question. The answer is that there are few things in the world I dislike more than performative virtue-signaling designed to earn Fake Internet Points. I was not convinced, at all, in the beginning that dropping the word "mastery" from "mastery grading" was anything but this kind of fake concern. To be clear, I am deeply interested in social justice – as a Catholic and therefore someone who subscribes to the Catholic social doctrines of solidarity and subsidiarity and simply as a human being living in the world. I am not interested at all in getting retweets or appearing woke. Unless and until I was convinced that a terminology change was going to do real work to make the world more just, I was not going to be on board with it. I must admit I am still not fully convinced. But, I am convinced that most of the people who care deeply about this issue are sincere; and I am also convinced that the issue of connotations to slavery when put together with the other two issues I mentioned definitely is enough to warrant a change.
And it took me this long to get to that point, and it wouldn't have happened without the help of colleagues who could help me understand the problem. I think that's important lesson.
As I said, David and I are conspicuously avoiding having a naming contest. But if I had to propose a different name here... I have been liking the term iterative grading for an umbrella term, since all the grading systems we think and write about have that in common — iteration through a feedback loop.

Checking in with the system at mid-semester

Robert Talbert — Tue, 19 Oct 2021 17:19:57 GMT

This is a repost from Grading for Growth, my blog about alternative grading practices that I co-author with my colleague David Clark. I post there every other Monday (David does the other Mondays) and usually repost here the next day. Click here to subscribe and get Grading for Growth in your email inbox, free, once a week.

Last week David wrote about how to handle mid-semester evaluation of students. It turns out professors can be evaluated at mid-semester as well, and in fact I’ve written elsewhere that doing so is a really good idea. In addition to giving my own brand of frequent feedback surveys to my Discrete Structures for Computer Science students at three-week intervals this semester, last week I had a Mid-Semester Interview on Teaching (“MIT”) conducted by our Faculty Teaching and Learning Center.

Because David and I are not just pundits but actual instructors working out the day-to-day details of alternative grading systems with actual students — I thought it might be helpful, especially for newbies or those who are curious, to see what my students are actually thinking about my grading system and how I intend to make halftime adjustments to amplify what’s working and adapt to address their concerns.

What is an MIT?

In an MIT, a trained facilitator comes in, kicks me out of the room, then gets students into groups and asks them to respond individually to two questions:

What are the major strengths of this course? What is helping you learn?
What changes would you make in this course to assist you in learning?

Then students get into small groups and share their answers together and on the whiteboard. Then there’s a full-class discussion about what they said.

I find these MIT’s to be the most valuable thing I can do to see what’s really working in my classes and to surface potential problems. It’s especially helpful for students, since the discussions that take place tend to put their concerns in context. A student might think that “lots of people” are concerned about something in the class, but after discussions, it turns out that it was just a couple of people. And the results come early enough that I can make meaningful change in the course. MIT’s plus four rounds of my five-question summary data make course evaluations basically obsolete. If your institution offers such a thing, schedule one. If not, find a colleague and trade off doing DIY MIT’s for each other.

There was a lot to talk about with my class, not just the grading system, since I use a flipped learning approach and do some borderline-crazy things like teach students how to code in Python as part of the course. For this post, I’ll address only the stuff that students said about the grading system.

Recap of the system

Here’s my syllabus for the course. The grading system starts on page 3. In a nutshell:

There are three main kinds of assignments in the course: Daily Prep which contains recorded lectures and pre-class exercises used in the flipped structure; Learning Targets which are basic skills, assessed in three possible ways (quizzes, oral quizzes in office hours, or videos); and Weekly Challenges consisting of application and extension problems. Here’s the list of Learning Targets for the course. Here’s a sample Daily Prep, a sample Learning Target quiz, and a sample Weekly Challenge.
All of these are graded “Satisfactory/Unsatisfactory” using specifications that I set up in advance. For what follows, note that Weekly Challenges consist of multiple linked problems and the entire thing is graded as one unit, rather than each problem being graded separately. Learning Target quizzes are the opposite — each quiz contains a separate problem for each Learning Target, and students pick and choose which ones they want to try at each quiz (given once every two weeks, and more frequently than that near the end of the semester). Then each problem/target is graded separately.
Of the 20 Learning Targets in the course, eight of them are listed as Core targets which means they are the essential core skills of the class. By doing work on quizzes, or oral exams in the office, or videos — or some mixture of these — students demonstrate skill on the targets. Demonstrating skill on a Learning Target on two separate occasions means the student is fluent on that target.
To earn a course grade, students level up through various thresholds of accomplishments, summarized in this table from the syllabus:

Daily Preps do actually use points, a concession I made this time around to keep things simple. Each one is worth two points based on the outcome of pre-class work and the outcome of a group quiz at the start of class. There are 27 Daily Preps (total of 54 points possible); 20 Learning Targets, 8 of which are Core; and 10 Weekly Challenges.

There is also a system of tokens, a common feature of specifications grading. These are fake currency that can be used to bend the course rules. For example any deadline can be extended 24 hours by spending a token. Everyone starts with 5 of these and there are occasional opportunities to earn more.

And there is a system of revision and resubmission for everything other than Daily Prep. Learning Targets can be done and redone as many times as needed in any combination of the three methods (paper quiz, oral quiz, or video) students want. Weekly Challenges can be revised and resubmitted as often as needed — up to two revisions per week (three if you spend a token), and subject to last-chance deadlines for the early Weekly Challenges. Nothing is penalized; if you do work that isn’t up to specs, just do it again after studying the feedback.

Aside: What kind of grading is this?

One thing to point out parenthetically here is that I think my system here goes to show that mastery-based grading can be a mix-and-match of different approaches. What I have here is heavily influenced by specifications grading. But some of the actual grading, particularly on Weekly Challenges, is basically ungrading — I do put a grade on student work but only as a marker to let them know whether their work is “good enough yet” or not. The main action is in the feedback loop that takes place. And there are probably undertones of other grading approaches here as well. I don’t focus so much on the name of the thing I am doing, as much as on the underlying concepts of the thing itself. If you’re new, or curious, I think that can be quite freeing.

What students say about this after 7 weeks: The good

Here’s what my 24 students said helps their learning, specific to the grading system:

The fact they get many attempts at each assignment is, to them, “incredibly helpful”.
They also found that feedback helps them. In some ways that seems like a funny and obvious thing to say — I mean, of course feedback helps, doesn’t it? How could it hurt? But then I think about the many times that I, as a student, got feedback that was not only unhelpful but downright mean spirited — or the many more times that I got nothing at all but a number or a letter.
They also said that I give “fairly precise feedback”. When pressed on what “fairly precise” means and how I might give more precise feedback, nobody offered any suggestions. But actually I think “fairly precise” is the right thing to aim for. Feedback should be specific, but it also shouldn’t tell students exactly what they need to do next, at the level of “Change this sentence to say this instead…” My feedback tends to point out the exact spot where more work needs to be done and then give questions to think about.
On that last point, students suggested that a face-to-face meeting to discuss feedback might be even more helpful. I couldn’t agree more, and that’s why I have three office hours a week. Sounds like I need to be clearer with students that they have the right to ask for this kind of feedback if it’s really helpful.
Finally, students overall liked that the grading of Learning Target problems allows for two “stupid” (the specs call them “simple”) mistakes; that the entire system requires fluency (students like high standards if they are supported well); that it takes the stress off of grades and places it on learning instead; and generally all students feel it’s a better approach than traditional grading.

What students say about this system after 7 weeks: The not-as-good

But it’s not all sunshine and unicorns:

Approximately half the class indicated frustration with the grading system. It seems to be coming from various places.
Some wished for partial credit on Weekly Challenges. Others expressed concern about snowballing workloads as they are trying to revise old Weekly Challenges while new Weekly Challenges are coming in.
Students expressed that “forcing 100% correctness” doesn’t give “much leeway” in demonstrating skill.
And the Weekly Challenges — which focus not so much on computation but on problem solving, reasoning, and proof — are quite difficult because they are required to use general mathematical reasoning rather than just doing computations.
Finally, students were a little confused on tokens. What are you supposed to do with them? Will we have chances to earn more?

I think all of these are legitimate concerns, and I probably would have some of these too if I were a student in my class. One thing I notice is that many of the concerns are not about the system itself but about how clearly (or not) the system is explained. For example, it’s not the case that “100% correctness” is required on anything and students even express that they know this because of the “stupid mistake” allowance. But the binary grading scale of Satisfactory/Unsatisfactory might lead a person to think so — the common interpretation of that scale is “all or nothing”. And the use of tokens and the opportunities for earning more are listed in the syllabus — do students know it’s there? For these concerns, I need to be clearer in my explanations to students and I will probably take 10-15 minutes this week to address those.

I also need to do better with explaining the “why” behind some of the assignments. There’s a simple reason there’s no partial credit on Weekly Challenges: The entire assignment operates as a unit, like an essay. And like an essay, you wouldn’t be “passed” if the intro and references are OK but the middle of the paper needs work. And the nature of Weekly Assignments needs to be better explained: These are not basic skills tests (that’s what the Learning Target quizzes are for) but opportunities to show one’s skill in overall mathematical reasoning. This is a core computer science skill; and my students have not often, or ever, been asked to work on it before. Explaining more clearly why they are being asked to do this week after week, including revisions, is on me.

Finally, I need to do better with helping my students manage the workload. Students raised a concern about snowballing workloads as past revisions collide with new assignments. But in fact, prior to this weekend, only about 15 revisions of previous Weekly Challenges had been submitted at all, over 48 students in two sections. So I wonder about that “snowball” effect. It’s definitely not coming from actual revisions. Maybe students are putting off revisions because of new work they have to turn in. If that’s the case, they should realize that most revisions are pretty minor and don’t take up a lot of time, and are very doable in a given week if you know how to budget time. Which is a very big if.

Finally, trust

When I was debriefing my MIT with the facilitator, the first thing she emphasized to me was actually not on the notes she took: Students expressed that they trust you. We could have stopped right there and I would have been very happy. When we talk about “getting buy-in”, we are really talking about trust. Making a system like this work with students requires that we earn their trust first — and I do mean “earn” since students do not necessarily come into a class trusting professors. If students had expressed they were with the grading system but that they didn’t fully trust me yet, that’s a red flag. But if they express trust, even if there are issues (and there are!) then we have a solid foundation for working those out.

And I think a core foundation for that trust is asking students what they think, taking what they say seriously, and making a good faith effort to adjust. After all, that’s exactly what we are asking them to do when we engage them in the kinds of feedback loops we are talking about in alternative grading systems.

Moving on from rigor

Robert Talbert — Tue, 21 Sep 2021 19:50:37 GMT

This is a repost from Grading for Growth. I post there every other Monday (my colleague David Clark does the other Mondays) and usually repost here the next day. This one is a follow-up to a post from last week, that David and I c0-authored, in which we argued that the concept of academic rigor has no inherent meaning and therefore we need to find a better term to describe whatever it is we are talking about when we say "rigor". This post starts where that one ends.

Later this week I'll be posting a follow-up to this follow-up that describes some more of my thoughts about rigor.

Last week we dove into the idea of rigor and decided that it’s not a useful term to describe what we’re looking for in learner assessments, because it has so many potential definitions that it has no definition at all. Instead, it tends to become a pathway for injecting our biases into our teaching. We ended that post on a cliffhanger: We have a proposal for a replacement term.

That term is validity.

As David and I have asked others about “rigor”, and as I’ve examined my own beliefs about it, it became clear that despite the issues with that term, there is some kind of shared conception of academic quality underneath the bravado of “rigorous academics”. It’s hard to get a fix on it, but it seems like what we really want when we talk about “rigor”, is that we can trust the outcomes of our grades. When we give a high grade to a student, we want this to mean that the student actually learned what the course said they would learn, and learned it “well” (whatever that means). This is the primary concern behind grade inflation, behind courses with “low standards”, and so on — that the grade assigned doesn’t accurately reflect the learning that took place, or didn’t take place.

Well, that’s what validity means. And the benefit of using “validity” to describe academic environments over “rigor” is that validity is a well-understood methodological concept from social science research that is ubiquitous in that field, and even has a huge body of research just studying itself. It’s everything “rigor” is not. (One might say it’s a more rigorous approach to rigor.)

Grading as research

Every assessment we give our students is a mini-experiment whose purpose is to collect data on the “research question” of whether they learned something. Like all experiments, they can be designed well or poorly. The “research question” has to be focused and clear — I can’t give an assessment on whether my students “learned about discrete mathematics”. But I can design an assessment about whether they learned about how to solve recurrence relations or whether they understand how to construct truth tables.

Also like experiments, assessments are also subject to two kinds of error: Type I error where we encounter a false positive (the result of the assessment causes us to think that students really learned, but in fact they didn’t) and Type II error where we encounter a false negative (it looks like students didn’t learn, but they actually did). I’ve written before about how one-and-done testing greatly amplifies the probability of each kind of error, while non-traditional grading reduces it1.

In either case, we have issues with validity. We may get data from these “experiments” but it leads to false conclusions. This is what all of us teaching in higher education, whether or not we’re converts to alternative grading, want to avoid. An “academically rigorous class” is one where true learning, or lack of it, is faithfully indicated by the grades that are assigned. Having valid assessments therefore is good common ground for discussions about grading while talking about “rigor” only seems divisive.

The technical meaning of validity is wide-ranging and involves numerous flavors. I’m going to focus on two of those here.

Grading and construct validity

Construct validity is “the degree to which a test measures what it claims, or purports, to be measuring.” That word “test” in context means any kind of measurement; for us, it might literally be an in-class test or exam. But it really means assessment. So, do the assessments we give actually measure what we claim they measure? And how well?

But we need to back up first. What exactly are we claiming to measure when we give assessments?

I find it difficult to answer this question without using slippery words like “understanding” or “appreciation” or “knowledge”. But I still think this is correct: When we assess our learners, we want to see if they have “really learned” or “truly understood” something. Knowledge of how to solve recurrence relations; appreciation of how recursion can be used to model patterns; and so on. It’s exactly opposite what I preach about learning objectives. And I think this is OK, and one reason why teaching is hard. Our “construct” in learning is an abstraction like “knowledge” and “understanding”; while our assessments are how we concretely measure this abstract construct. We make clear, measurable learning objectives like “I can solve a recurrence relation” as an attempt to bridge the gap, to connect the test to the construct.

In fact, clear and measurable learning objectives are an essential piece of construct validity. If we are trying to measure a thing, then we have to clearly state what the thing is that we are trying to measure. If we write up assessments or assignments without linking those to criteria, then we’re attempting to access a pure abstraction — “know x”, “understand y”, etc. — and this is again ripe for bias and abuse.

There are several other ways to foul up the construct validity of an assessment or a grading scheme, including but not limited to:

Bias in the assessment itself. There’s an exercise in the Stewart Calculus book that reads: “Jason leaves Detroit at 2:00 PM and drives at a constant speed west along I-94. He passes Ann Arbor, 40 miles from Detroit, at 2:50 PM. Express the distance traveled in terms of the time elapsed.” I gave this exercise as homework once, and one of my international students replied, It’s not possible to answer this question because we don’t know how fast Ann is traveling. That’s a perfectly reasonable answer from someone who doesn’t realize “Ann Arbor” is a city, not a person. This exercise was intended to measure learners’ understanding of related rates problems, but what it really measured was their knowledge of Michigan geography. It had poor construct validity, in other words, because of the bias toward American citizens baked into the question.
Defining the criteria of the assessment too narrowly. This happens a lot when learning objectives are poorly defined. Overly narrow objectives can exclude a lot of relevant information, for example if define success in solving recurrence relations (generally) as “I can use the characteristic root method to solve a linear homogeneous second-order recurrence relation”. It can also happen if the criteria are not aimed properly, for example testing for knowledge of solving recurrence relations by asking students to do something only tangentially related to this construct.
Presence of confounding variables. This might be the most prominent of all threats and the biggest issue with one-and-done testing. Using a one-and-done timed exam as an experiment to see if a student learned a topic is vulnerable to a vast number of confounding variables: physical health, mental health, whether the learner brought a calculator, whether the busses were running on time, whether the learner is a native English speaker or not, and on and on. The time constraint amplifies all these issues.

So, an assessment has good construct validity if it accurately measures the construct of learning/knowledge/understanding. And a “rigorous” course is one where the assessments, and the grading scheme itself, have good construct validity — the assessments actually measure “real learning” and aren’t just fluff or busy work. Likewise a “non-rigorous” course is one where you just can’t trust or believe the results of assessments, for example an abstract algebra class where proofs are never assessed and the course grade is just based on participation; or the proofs are assessed but only using word count, or some other means that don’t measure “real understanding”.

Grading and criterion validity

Criterion validity by contrast is “the extent to which an operationalization of a construct, such as a test, relates to, or predicts, a theoretical representation of the construct—the criterion.” Or as this book puts it, “[c]riterion validity compares responses to future performance or to those obtained from other, more well-established surveys.”

Criterion validity can be broken into further subcategories, but I’d like to dwell on the overall concept of predictive accuracy for now. An assessment that has good criterion validity is one that accurately predicts future performance under “real” conditions. I go back to my colleague’s question that I wrote about last month. That colleague was questioning whether the use of specifications grading in abstract algebra might produce students who can’t do work on their own without significant help. I think this is a question about the criterion validity of my assessments and my system. Sure, students can get good grades now on their proofs; but what about when they get into graduate school?

It was and still is a fair question. I’d turn it around and say we should be asking this question about every assessment and every syllabus we see. So for example, you might have a grading system where the grades are based on three tests, a final, and some timed quizzes (and there’s no retaking of any of these). Students and profs might like this because it’s uncomplicated. But I would have serious questions about criterion validity. A student earns a 92% on the final exam; great, but does this actually predict anything about future performance? A student got a “B” in the class by earning 80% on everything, and never getting a single exam/quiz problem totally right; does this accurately predict “above average” success in other contexts, like graduate school or a job?

An assessment has good criterion validity if it does predict future results well. And we think of a “rigorous” course as one where most or all of the assessments, and the grading system itself, have good criterion validity. A “non-rigorous” course would be one where sure, a student can get a good grade now, but it doesn’t translate into success in graduate school, on the job, or even the next course in the sequence.

Another way to view criterion validity is whether the assessment compares well to the same construct being measured by a “gold standard” assessment — one that is known to have good criterion validity already. The “gold standard” I hear kicked around most often in my discipline (mathematics) is the oral exam. Sure, your students can get good grades if they are allowed infinite reattempts, but if I sit them down and drill them in person, how will they hold up? I’m not sure we all mean the same thing by an “oral exam” (are we actually assessing student learning or just trying to make them squirm?) but again, I think it’s a question worth asking for all assessments and systems: If you took students who succeeded in your assessment and grading scheme and them submitted them to fair, unbiased oral examinations, how would they do?

Where we’re going with this

Let me put my cards on the table here. I honestly believe with all objectivity that validity rather than rigor is the correct framework for thinking about assessments and grading. Do our assessments and grading schemes have validity? is a better question than Are they rigorous? because validity has scientific meaning whereas rigor does not.

But I also honestly believe, from the heart, that alternative grading systems have much greater validity, no matter how you view it, than traditional systems. No, I do not have data yet to back this up2. I’ve spent almost 30 years doing both in multiple contexts. When I look back on it, I simply have much more trust in the results of my specifications grading results, than I do in my traditional grading results. The assessments in specs grading, being criterion-referenced, are much more likely to accurately measure the construct they are intended to measure, and the fact that they’re graded on the results of feedback-focused iteration make them more believable as predictors of future results. Traditional grading, on the other hand, I never fully trusted, and that was a big reason why I ditched it when I did.

Ironically, if “rigor” really means “validity”, then this makes alternative-graded courses a lot more rigorous by definition than traditionally-graded courses. You have to work to overcome the lack of validity — the lack of rigor — inherent in traditional grading systems. And many of the efforts by a lot of self-important academics out there to “increase rigor” by simply making tests “harder”, imposing more restrictions on students, and so on actually make the course less rigorous because you lose trust in the validity of the assessments.

So let’s talk about validity instead of rigor from now on, and see where it leads us.

One last thing: While it’s not good form to hedge your bets when writing about something, I need to make clear that despite having some educational research under our belts, neither David nor I are actually social science researchers and are presenting this idea of validity as people learning the concept. (Maybe that’s obvious?) Anybody who is actually a social science researcher out there, is not only welcome to correct and clarify us — we encourage your doing so in the comments.

Finding common ground with grading systems

Robert Talbert — Tue, 31 Aug 2021 11:50:05 GMT

This is a repost from Grading for Growth. I post there every other Monday (my colleague David Clark does the other Mondays) and usually repost here the next day. Check out the bottom of this post for some additional thoughts that didn't appear in the original!

As David and I write and engage with others about grading, there’s definitely a sense that the time is coming, and maybe is already here, for a wholesale change in how we grade in higher education. When David wrote last week about the profusion of alternative grading techniques that are out there, I think the sheer variety signifies a deep and widespread desire to make this change. People are realizing that reforming assessment and grading can have outsized results in improving higher education as a whole. It’s one of those places where 20% of the effort will produce 80% of the results.

But the variety can also be overwhelming. Instructors might say, I want to change my grading practice, but should I go with specifications grading? Standards-based grading? Ungrading? Contract grading? Most real-life approaches to alternative grading don’t fit neatly into any of those boxes, and often none of these general categories will be a perfect fit to your students in your classes. And how are we supposed to keep up with all these terms? Do you have to be an expert even to get started?

It seems smarter to focus on the overall ideas that unify these different approaches. So this week, rather than introduce another kind of grading practice, we’re going to pull back to a higher altitude and try to distill what all these ideas have in common and come up with a general framework for these practices. Not a “definition” of anything — there’s still too many idiosyncrasies and varied practices to hope for something that’s both precise and general — but instead a map, with room for interpretation, that stakes out some of the common ground that we seem to be walking together.

Common ground

Despite the differences in the ways that all these grading practices are worked out in real classrooms, what do they seem to have in common? Here’s what I see:

Student work is evaluated against clearly defined and context-appropriate standards for what constitutes “acceptable work”. In other words, the systems are rooted in students knowing what acceptable work looks like, using standards that are professionally appropriate but scaled to the level of the student. Standards-based grading and specifications grading are obviously built on this principle (just look at the names). Ungrading advocates might disagree (see Alfie Kohn’s famous essay “The Trouble with Rubrics”). But even when ungrading, although you might not use a concrete rubric, you are still making decisions about whether student work is “good enough” or not. Presumably those decisions aren’t just made by “gut feel” (which is one way of saying “personal bias”) but through standards that you, as a content expert, believe are appropriate for determining quality. In other words, we’re all using standards. Ethics and common decency would say we should externalize those and be up-front with students about it, and so that’s part of the system.
Student work, when evaluated, is given helpful, actionable feedback that the student can and should use to learn and improve their work. Feedback is the beating heart of all of these practices. Traditional grading looks at student work, assigns a number or a letter to it — and that’s all. It gives student work the silent treatment. In all these alternative practices, instead, the students’ work opens up a conversation and initiates a feedback loop.
Student work doesn’t have to receive a mark, but if it does, the mark is a progress indicator and not an arbitrary number. The alternative practices we’ve mentioned here all share the realization that marks, if given, are just at-a-glance summaries of what the feedback says — nothing more. They are there primarily for convenience and for entry into a gradebook. In particular, these grading practices do not pretend that numbers assigned to student work (75%, 8/10, etc.) are numerical data. They are not. They are categorical data disguised in numerical form, like zip codes, and the statistical contortions used by traditional grading to convert those numbers into letter grades are fundamentally irrelevant and merely give the illusion of objectivity. (“Objectivity theater” is how it’s been described.) It would probably be better to dispense with marks altogether, as ungrading typically does, given their tendency to distract and demotivate students. But if we must put marks in a gradebook, they should be informative. They should be informative categorical data rather than fake numerical data.
Students can revise, resubmit, or reattempt work without penalty, using the feedback they receive, until the standards are met or exceeded. All of these alternative frameworks are predicated on feedback loops. This seems to be their defining and essential ingredient. They don’t only have clear and appropriate standards and regular streams of feedback: They also allow students to combine their work, the standards, and the feedback and then try again. It’s in the trying again that grading turns into growth. And we don’t penalize this, because what kind of person penalizes growth?

Not a definition

There is a temptation at this point to look to the four observations I’ve just made and turn them into a definition of a general category of grading, with a special name, of which SBG, specifications grading, etc. are all instances. (David and I are mathematicians, after all — abstraction is what we do.) But I am going to resist that temptation, and I think you should too, for two reasons.

First, definitions are exclusionary by nature. When you define a thing, you draw a line between instances of that thing and non-instances of it, and the “canonical” instances tend to receive pride of place. This is OK in some situations (e.g. defining terms in mathematics so you can meaningfully prove theorems about them) but in other situations, especially education, it tends to be highly counterproductive because it locks people out unnecessarily. If you’re thinking of instituting a grading system that involves a lot of feedback and revision, but for whatever reason you still want to assign points to things, you shouldn’t feel left out of this conversation or pressured to do things a different way because a definition said so. If you’re an ungrader and feel that some of the observations above don’t quite fit what you’re trying to accomplish, you should still feel welcome at the table and able to have a real conversation about student success with someone who does specifications grading.

Second, definitions of educational ideas in my experience tend to derail people’s focus. I learned this when writing my flipped learning book. Flipped learning at the time needed an operational definition that made it possible for people to do research about it, and made it OK for instructors not to use video. So I came up with one; but a lot of faculty stopped asking good questions about flipped learning (What’s the best way to use class time if I’m not lecturing?) and instead focused on whether what they were doing was “real” flipped learning or not. So rather than give a definition of “Proficiency Grading” or “Awesome Grading” or whatever you might want to call it, let’s just not, for now, and focus instead on how best to do whatever it is we are describing here.

Four Pillars (beta version)

So we are setting up a big tent with a lot of room underneath for anybody who wants to think about the sort of grading approaches being described here. Stealing shamelessly from our friends in the IBL community, I’d like to close here by visualizing this “tent” as a building with four pillars.

(A graphic designer I am not.) As advertised, this is a beta version, not in any way guaranteed to be complete or even correct. In fact David has already informed me that I need to work on this some more. But that’s what the comment section is for, and anyway I think it’s more useful than a definition of a term.

In fact what I hope, is that in the near future, what we’re describing here won’t need a special term — it will just be “grading”, and grading using these practices will be so normative that it’s the departures from these practices that will need special terminology.

Some further thoughts:

About that definition of flipped learning: It was important to have some operational definition of the idea at the time, and I think still is important now, because research was beginning to really ramp up on flipped learning but what people were actually studying was all over the map. In particular, there were emerging research definitions of flipped learning that insisted that students must watch video prior to group meetings, or else what's taking place isn't "really flipped". This was and still is misguided but that didn't stop the idea from taking hold, even in one of the most cited early research reviews on flipped learning at the time. I don't think we're to that same point with alternative grading practices – yet.
As I noted in a footnote to the original article, in fact we have in places given this general concept a specific name: Mastery grading, or sometimes “mastery-based grading”. There are several issues with this term, none of which I am going to discuss here and now because every time it gets discussed it becomes politicized, which draws focus even further off the main point. Everybody wants to be the person who came up with "The Name" for this concept but we're thinking way too hard about The Name and not nearly hard enough about how to explain the underlying idea, implement it, and make it work with students. So focus on that instead.
Regarding that last paragraph, credit where it's due: Sharona Krinksy, the main driver of the annual Grading Conference, is the one who’s said this the most about grading. I have said and still do say a very similar thing about flipped classrooms, that one day we'll just call it "the classroom". Again, maybe that day's arrived?
There may be another level of common ground to explore here, and that's the path that these kinds of grading systems share with the natural, human way of learning anything, in or outside of school. I met my Fall 2021 classes for the first time yesterday and asked them two questions: (1) How they got to be good at the thing they are best at doing, and (2) what they were excited, curious, or nervous about in the class. For the first question, as always happens, students pointed out without any prompting on my part that we learn things through mindful practice, informed by failure and feedback. But then, every single student said they were both curious and nervous about the grading system. Maybe there's room for both, but the first point ought to be used to alleviate the nervousness in the second. We use a grading system like this because it's how you've learned your whole life. It's not "new", "unusual", etc. — it's as old as human learning itself. It's just not how we've played school up to this point.

Mastery Grading - Robert Talbert, Ph.D.

Taming the snowball

This is real life

It’s ugly but not a bug

How do we help?

A growth-focused icebreaker

Two questions

What are you good at doing?

How did you get good at it?

What we learn from this exercise

Bonus thoughts

Grading for growth in an engineering math class: Part 1

What was the class?

My approach to the class

How did the class work?

How did students provide evidence of learning?

How individual work was graded

How course grades were assigned

What’s next

Updated thoughts

A media guide to ungrading

What is ungrading?

What are some misconceptions about ungrading?

What are some common questions about ungrading?

Updated thoughts

Building a specifications grading course, part 2

Assessments and marks

Feedback loops

Course grades

How it's going

If I were doing this over again

Building a specifications grading course, part 1

The big picture

Keeping the end in mind

A tale of two lists

Getting there

A stop/start/continue for the ungrading community

Terminology

Stop: Ungrading absolutism

Start: Getting into the weeds

Continue: Being fearless

Who was Horace Mann?

Education and grades in America, pre-1850

Horace Mann and the Prussian system

Bringing the Prussian system home

Takeaways

The heart of the loop: Reattempts without penalty

How it traditionally works

No penalties? Really?

How to reassess without penalty

Above all: Feedback loops

Bonus extra thoughts

How specifications grading changed my view of academic dishonesty

That was then

This is now

The moral of the story

Bonus extra thoughts

Three steps for getting started with alternative grading

Step 1: Start with learning objectives

Step 2: Build a professional network of alternative graders

Step 3: KEEP IT SIMPLE

A word about words

Issue #1: Confusion with related concepts

Issue #2: Issues with expectations and growth mindset

Issue #3: Connections with slavery and systemic racism

So the replacement term is…

Other thoughts on this for rtalbert.org readers:

Checking in with the system at mid-semester

What is an MIT?

Recap of the system

Aside: What kind of grading is this?

What students say about this after 7 weeks: The good

What students say about this system after 7 weeks: The not-as-good

Finally, trust

Moving on from rigor

Grading as research

Grading and construct validity

Grading and criterion validity

Where we’re going with this

Finding common ground with grading systems