*Welcome to another installment of the 4+1 Interview, where I track down someone doing cool and interesting things in math, technology, or education and ask them four questions along with a special bonus question at the end. This time I caught up with Kate Owens, a professor in the Department of Mathematics at the College of Charleston. Kate is an innovative and effective teacher whose work with students is well worth paying attention to, and she's someone I've enjoyed interacting with for several years on Twitter and elsewhere. *

*You can find more 4+1 interviews here. *

*What's your origin story? That is, how did you get into mathematics, what led you to earn a Ph.D. in the subject, and what led you to the College of Charleston?*

As a kid, I was often bored in math class at school because I didn’t find it particularly challenging or engaging. My dad has a Ph.D. in mathematics and he was always happy to give me new mathematical ideas to think about. In seventh grade, we were supposed to design posters featuring our favorite number, and I picked 43,252,003,274,489,856,000 -- the number of permutations of the Rubik’s cube. I had no idea how to solve the cube, but I was really interested in things like combinatorics and math that wasn’t the “boring stuff” they were making me do in algebra class.

In high school, my plan was to study astrophysics or aerospace engineering. Inspired by images coming from the Hubble telescope, my dream job was to work for NASA. During my first few semesters of college, I was an astrophysics major. One day I realized that I was much happier in calculus than in physics; I spent most of my physics courses feeling confused. More than once I went to my calculus professor asking for physics insight. I got the sense that I spoke mathematician and not physicist, and I changed majors. Eventually I finished my degree in Pure Mathematics from U.C. San Diego. I decided to pursue graduate school in mathematics and I was accepted into the Mathematics Ph.D. program at the University of South Carolina. I finished my M.A. there in 2007 and completed my Ph.D. in 2009.

While in graduate school at the University of South Carolina, I fell in love with another graduate student. He finished his Ph.D. in 2007 and we married in 2009, right as I wrapped up my own dissertation. We spent a long time talking about how we could achieve both our family goals and our career goals, and eventually decided that we would follow his career path -- even if it meant giving up my own job search. My husband accepted a postdoc position in Texas; after a year, he transitioned to an industry job and we moved to Charleston, South Carolina. I had contacts from graduate school and spent a few years at the College of Charleston as a Visiting Assistant Professor before a permanent Instructor position became available. I’ve been teaching here since 2011.

**2. One of the innovations you've championed is the use of mastery-based grading. In your view, what is the purpose of mastery grading, and how well does it work with your students? **

Before I switched to mastery-based grading, I had concerns about how well grades were correlated with student learning. Grades, even those given on assignments early in the semester, always seemed like a final judgement since my students didn’t have a way to demonstrate growth in their understanding. Also, I realized that I couldn’t always diagnose knowledge gaps among my students; many students might earn the same grade on a test for very different reasons. After handing back their assignments, I wouldn’t know how to advise them on what topics they should review or how they could improve. I wanted my gradebook to reflect exactly what content a student knew at this particular time, instead of what percentage of topics they knew at some point in the past.

Now that I’ve switched to mastery-based grading, my gradebook reflects what each student presently knows and what topics they still need to work on. Additionally, it gives me an overview about what the entire class knows already, what they’re still struggling with, and what ideas are most appropriate for us to tackle together next.

The reasons I switched to mastery-based grading are still there, but the two big reasons I won’t switch back to traditional grading are something different. First, mastery-based grading has changed the kinds of conversations I have with students in a fundamental way. I no longer have conversations that begin with questions like, “Why did I get only 8 out of 13 points on this problem?” or “What percent do I need to make on Test 3 to have an average of 88% in the class?” Instead, conversations more often begin with things like, “I don’t how a quadratic equation can tell me if its parabola has *x*-intercepts or not, can you help?” Students are able to track what they’ve mastered and what they haven’t. Second, my system allows for students to improve old scores, so students are incentivized to learn old material that they didn’t quite get the first time. I believe in the importance of having a growth mindset. Mastery-based grading is built on the belief that grades should reflect demonstrated knowledge and that providing many opportunities for the demonstration of newly gained knowledge is important.

**3. College of Charleston is one of the oldest higher education institutions in the United States, founded in 1770. Have you perceived any tension between the history and tradition of the institution on the one hand, and your innovation in the classroom on the other? (If so, how do you make innovation work for you? If not, how does the culture of CofC support both tradition and innovation?)**

You’re right -- the College of Charleston is the 13th oldest educational university in the United States. We are a public, liberal arts college with an undergraduate enrollment around 11,000. The Math Department has over 30 faculty members, whose research areas encompass algebra, numerical methods, logic, number theory, statistics, and more. I believe that our differences in background, research, and instructional methods give us strength as a department. Since CofC is a small liberal arts college, it means that a lot of our mission is about delivering quality undergraduate instruction. Although each faculty member makes different choices in their courses, we have a supportive Department that allows each of us to make our own academic judgments about our courses.

In the Math Department, I’ve helped pilot a program turning traditional, lecture-based college algebra courses into emporium-style classes. In our program, students only work on topics they haven’t yet mastered and in which they have the opportunity to get more one-on-one help on a daily basis. Over the last several years, our data have shown that students are more successful in these college algebra courses as compared to the traditional format, both in terms of course grades and also their raw scores on our departmental-wide final exam for the course. We are now researching longer-term trends of a student’s path through several linked courses (college algebra -> precalculus -> calculus I -> …), and we hope to find ways to raise student success through this course sequence. I’m also piloting an emporium-style approach in precalculus and gathering data about how it’s impacting students.

Outside of the Math Department, one way that CofC supports innovation is through our “Teaching and Learning Team (TLT) for Holistic Development” division. Part of TLT’s mission is to provide support and professional development for faculty interested in cultivating a culture of innovation on campus and in their courses. More than once, I have participated in Professional Learning Clubs about mastery-based grading. They were both a reading group -- we read Linda Nilson’s book Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time -- and a support group, offering each other instructional ideas about ways to implement mastery-based tasks or non-traditional grading schemes into our courses. I’ve also been a panelist talking about mastery-based grading at TLTCon, CofC’s “Teaching, Learning, and Technology Conference.” There are several faculty members here at CofC who are using non-traditional grading schemes, and I hope our group will continue to grow.

**4. What's something with your teaching and your students right now that you are excited about?**

Our semester is almost over here at CofC. Our last day of classes is April 23 -- only a couple of weeks from now! On the last day of my precalculus course, we have what I call a “Re-Assessment Carnival.” On this day, each student may choose to re-try as many problems as they can complete in the 50-minute class. This gives them a last opportunity to demonstrate knowledge of our course standards before the final examination. It’s an exciting thing to watch: Students are *thrilled *that they’re allowed to take an extra 6 quizzes. From my viewpoint as the instructor, I am thrilled to give out high-fives as they finally master those tricky problems we’ve seen all semester. Mastery-based grading means students can’t get by relying on partial credit, and so they really have to re-visit the tricky topics several times -- but it’s a really great moment when students realize everything has finally clicked.

**+1: What question should I have asked you in this interview?**

What are some projects you’re involved in outside of the classroom?

I’m very involved with our “Science and Mathematics for Teachers (SMFT)” Master of Education (M.Ed.) program. This is an interdisciplinary program designed for in-service middle school and high school teachers. At the end of this semester, two of our students will present their Capstone Projects and officially complete their degrees. I’m excited to see how their projects turn out and how their learnings will impact their classrooms and students.

- Although most of my time is spent on teaching-related tasks, one of the best parts of my job is when I get to be a learner instead of an instructor. Graduate student Colin Alstad is defending his masters thesis (“Categorifications of Dihedral Groups”) later this month. Serving on Colin’s thesis committee has given me a great excuse to keep learning more math -- in this case, some category theory.
- Since 2015 I’ve been the co-Director for the College of Charleston’s “Math Meet,” an annual event held each February. The Math Meet attracts hundreds of students from the region -- this year was the 41st annual Math Meet and we hosted over 450 middle school and high school students from South Carolina, North Carolina, and Georgia. In one day, we offer more than a dozen different events, including three levels of a Written Test, a Math Team Elimination, a Math Team Relay, several Math Timed Sprints, a Physics Brainstorm, a Chemistry Brainstorm, and a trophy presentation in the afternoon. While it seems like the 2019 Math Meet just wrapped up, we have already begun planning for Math Meet 2020.
- Lastly, I’m a parent of three fantastic kids (ages 8, 5, and 3), so I spend a lot of time juggling work-related tasks with gymnastics practice, soccer games, swim lessons, playing outside, laundry, etc. I’m excited for the summer months since it means I’ll have more time to spend with my family. In particular, it’ll mean more time to share some mathematics with my 8-year-old son -- he has decided he wants to become a mathematician when he grows up!

The longer I use specifications grading, and the more I see how differently students experience college courses that use mastery grading compared to courses that don't, the more I believe that the reform of our grading practices is an urgent ethical imperative. Like I said on Twitter last week:

Not just less important - it's clearer every year to me that grades are increasingly corroding education and student well being. The alarm bells are getting louder.

— Robert Talbert (@RobertTalbert) January 18, 2019

I switched from traditional, points-based, no-revision grading a few years ago to specifications grading because I had a strong sense that not only was traditional grading uninformative (large numbers of false positives and false negatives, and no clear link between the grade and what the students can do) but actively harmful to many students in many ways, one of the biggest being *motivation*. When I used traditional grading, students always seemed motivated not by the promise of learning the subject but by the inner game of scoring enough points in the right ways to get the grade they needed to move on --- or else they had no motivation at all.

This intution that traditional grading is demotivating was just that: An intuition. But a study I came across recently gives results about the real effects of traditional grading on motivation.

Chamberlin, K., Yasué, M., & Chiang, I. C. A. (2018). The impact of grades on student motivation. Active Learning in Higher Education, 1469787418819728.

Link to paper: https://journals.sagepub.com/doi/pdf/10.1177/1469787418819728

The authors in this study investigate how "multi-interval" grades (read: the A/B/C/D/F system) affect the basic psychological needs and academic motivation of students when compared with "narrative evaluation", where the instructor gives students verbal feedback both instead of, and in addition to multi-interval grades.

The theoretical basis of the study is self-determination theory (SDT). This framework is where we get the concepts of *extrinsic* and *intrinsic* motivation, where people are motivated to complete a task either by an external reward or for the sake of the task itself, respectively. (For more background, I wrote about SDT and flipped learning in this post.) According to SDT, there are three basic psychological needs that learners have while they are involved in a learning process: **competence** (the need to be good, or at least feel that they are good, at what they are learning), **autonomy** (having choice and agency), and **connectedness** or "relatedness" (being psychologically connected to others while doing the task). Essentially, the more these three needs are met in a learning process, the more intrinsic motivation the learner will experience; the lack of satisfaction of these needs leads to less intrinsic motivation, either in the form of extrinsic motivation or no motivation at all.

They studied 394 students at three different universities. One of those universities gave exclusively multi-interval grades in its classes; another had institutionally eschewed multi-interval grades and used only **narrative evaluations** in its courses. This is where instead of a grade, students get verbal feedback (that is honest, detailed, constructive, and actionable) on what they did and what they need to do. The third used a mix of narrative evaluation and letter grades. The students were given two surveys on academic motivation, and a subset of those underwent semi-structured interviews to dig deeper.

The results are a sobering indictment of traditional grading. Here are just a few that stood out.

Students were asked, among other things, about what information (if any) they got from their grades, whether their grades affected their decisions on what classes to take, and whether their relationship with grades had changed since high school. The prevailing opinion was that grades do *not* convey "competence-enhancing feedback" that can be used to improve; most students could not give any examples of how they used grades to improve their learning. Worse, the information that grades *did* give students tended to be negative signals about the students' self-worth. High-achieving students experienced pressure to achieve high grades; low-achieving students felt condemned by their low grades. All students associate the word "stress" with grades far more frequently than any other concept.

Moreover, traditional grades actively decayed students' sense of autonomy because many times the grade they get and what they have learned seem unrelated. As one student said:

And it was actually pretty frustrating because it felt like even in classes where I was really into the content and worked really hard I came out with a B+. And in classes that I didn’t care about and didn’t work very hard I still got a B+.

Grades worked against relatedness as well, as expressed by some students who described how their relationships with their parents suffered when their grades were poor.

The authors also noted that when discussing traditional grades, students readily adopted capitalist-style business language, for example referring to "cost-benefit analysis" and "payoffs" in describing how they approach class. That's strategic learning and extrinsic motivation taking hold.

The results from students who experienced narrative evaluation were almost completely the opposite of the results from multi-interval grading. Every "narrative evaluation" student interviewed expressed that narrative evaluation gave them usable information about their competence and were more useful than multi-interval grades. The study found strong links between narrative evaluation and enhanced competence, autonomy, and connectedness, and many of the students commented about how narrative evaluation built *trust* between the student and the instructor --- even if the feedback was largely negative.

These results came not just from the interviews but also from the quantitative results of the surveys, with statistically significant differences in measures of academic motivation found between students from traditional grading backgrounds versus narrative backgrounds (with narrative grading leading to higher indicators of motivation). Students from the university that used mixed grading methods experienced some of the benefits of narrative evaluation, but also some of the detractions of traditional grading --- and although the study didn't say this directly, it seems clear to me that the detractions happen because of the letter grades. (If you put a student in a "mixed" environment and give them good narrative evaluations followed by a "B+" grade, guess what the student will tend to focus on?)

So what do we do about this? For me, the course of action is clear: **We need to walk away from traditional grading** --- in which I include not only multi-interval letter grades but also grades based on statistical point accumulation. We've seen enough. Grades are harmful to students' well-being; they do not provide accurate information for employers, academic programs, or even students themselves; and they steer student motivations precisely where we in higher education do *not* want those motivations to go. There is no coherent argument you can make any more that traditional grading is the best approach, in terms of what's best for *students*, to evaluating student work. If we value our students, we'll start being creative and courageous in replacing traditional grading with something better.

Cue the objections about how this can't be done because of transfer credit issues, making non-traditional grading work at scale, etc. I agree partially, in the sense that this move is a long sequence of small steps. The article here is similarly pragmatic and gives some good advice:

Few universities are likely to abolish grades. However, universities should question the conventional use of multi-interval grades and consider their advantages and disadvantages in different departments, years of study, courses and learner types. For example, there may be specific courses or programs [...] in which cultivating deep learning and motivation may be more important than standardized communication of performance to external audiences. For such courses, greater use of narrative evaluations (as opposed to multi-interval grades) may be warranted. In addition, withholding grades from students or providing narrative/ written feedback several days prior to the grades may help students focus on mastery-related learning goals rather than extrinsic rewards.

I'd add the following ideas that I've learned from using specifications grading and hearing about how others use this and other forms of mastery grading:

- It's possible to keep the A/B/C/D/F system for reporting
*semester*grades, but use narrative evaluation and mastery grading instead of points and statistics to determine students' grades. Here's an example.^{[1]} - Do what the article suggests and start changing your grading practices over to something less focused on letters and points, in those courses where narrative evaluation and mastery grading make the most sense: Graduate courses, seminars, proof-oriented upper-level math courses, honors sections of courses, and so on.
- I think you could also make a strong case that introductory courses are also fertile ground for trying out narrative evaluation and/or mastery grading because these are where student motivation tends to be at its lowest point.
- Treat student work --- as much of it as possible --- like submissions to a journal. When we academics submit articles to a journal (or tenure portfolios, etc.) we don't get a point value or letter grade attached. We get verbal feedback with a brief summary: "Accept", "reject", "Major revision", "Minor revision" etc. followed by details. Then assign course grades, if you must have them, based on how much acceptable work the student was able to produce.

There are some practical issues at work here that can't be minimized, for example (and especially) large sections. The issue of scaling is a tough one, but it's not impossible. In my experience with specs grading, doing narrative evaluation takes no more time per student than traditional grading (which involves endless hair-splitting on how many points to give a response), so I don't think there's any reason to believe that nontraditional grading can't scale up.

Moving away from traditional grading could be one of those Pareto principle concepts where focusing intently on this one idea could usher in outsized improvements in many other areas of student learning. I think it could be a fulcrum for bringing about wholesale, even revolutionary change in higher education. Let's give it a try.

Although: I have to admit that recently, I've noticed that students in my specs-graded classes tend to focus laser-like on their grading checklist where they keep track of how many Learning Targets they've passed, rather than on what those Targets represent. In other words the specs end up becoming a proxy for letter grades and students fixate on those accordingly. I'm still thinking about how to handle this. ↩︎

Two and a half years ago, I decided that the traditional system of grading student work --- based on assigning point values to that work and then determining course grades based on the point values --- was working against my goals as a teacher, and I decided to replace it with specifications grading. I had just learned about specs grading through Linda Nilson's book on the subject. This happened right at the end of Fall semester 2014, and I spent the entire Christmas break doing a crash-course redesign of my Winter 2015 classes to install specs grading in them.

I've used specs grading fifteen times since then: once in Cryptography and Privacy, once in Abstract Algebra 2, twice in Calculus 1, four times in Discrete Structures 1, and eight times in Discrete Structures 2. It's fair to say that my implementation has been battle-tested and has undergone a fair bit of evolution in that time. The first attempt in Winter 2015 was pretty rough, but very promising. Every semester since then, I made changes and updates to try to address issues that students and I noticed.

But it was only this last semester, the one that just concluded this week, where I felt that at every point during the semester --- from day 1 all the way through turning in course grades yesterday --- the specs grading system I had in place was working the way I wanted. It's still not 100% there, of course, but I think I have a blueprint of how to use specs grading moving forward^{[1]} and of course, I want to share it with everyone.

In specifications grading, instead of using points to assess student work, the work is graded on a two-level rubric --- that is, some variation on Pass/Fail or Satisfactory/Unsatisfactory. Instructors craft a set of specifications or "specs" for assignments that define what Satisfactory work looks like. When the work is handed in, the instructor simply categorizes it as Satisfactory or Unsatisfactory depending on whether it meets the specs or doesn't. There are no points, so there is no partial credit. Instead, instructors give detailed feedback on student work, and specs grading includes giving students the opportunity to revise their work based on the feedback, and submit a revision as an attempt to meet specs.

Specs grading still uses an A/B/C/D/F course grade reporting approach, but the letter grades are earned differently. Rather than calculating complex weighted averages of points --- which you can't do because there are no points --- letter grades are earned by completing "bundles" of work which increase in size and scope as the letter grade being targeted goes higher. The idea is that students who want a "C" in the course have to do a certain amount of work that meets the specs; those wanting a "B" have to do everything the "C" people do, but more of it and of higher quality and/or difficulty level. Similarly the "A" students do everything the "B" students do plus even greater quantity and quality.

Done right, specs grading allows students choice and agency in how and when they are assessed; students are graded on what they can *eventually* show that they know, and they get to learn from mistakes and build upon failures; their grades are based on actual concrete evidence of learning; and the grades themselves convey actual meaning because they can be traced back to concrete evidence tied to detailed specifications of quality. The instructor often saves time too, because instead of determining how to allocate points (which takes more time than you think), she just determines whether the work is good enough or not, and gives feedback instead.

The specs grading setup I am going to describe here is for Discrete Structures 2, a junior-level mathematics course taken almost exclusively by Computer Science majors. It's the second semester of a year-long sequence and it focuses on mathematical proof and the theory of graphs, relations, and trees. I think that much of the structure I am going to describe here could be ported to other math classes, though.

My overall belief about the course grade in this class (and in others) is that grades should be based on *concrete evidence of student success* in three different areas:

- Mastery of
**basic technical skills**; - Ability to
**apply basic technical skills and concepts to new problems**, both applied and theoretical; and **Engagement**in the course.

Some people may debate whether or not "engagement" ought to be part of the grade. Personal experience with this course tells me that it should, in this case. What I mean here is not just attendance in class, but also preparation for class, active participation during class, and enagagement in the course outside of the class. I want students to treat the course as a high priority and engage with it as such.

If these are the things I want from the course, then I need to set up stuff for students to do and submit to me that will allow me to measure whether or not they are progressing or succeeding in those areas.

For basic technical skills, I combed through the course and decided on a list of 20 basic skills that I felt were essential building-block skills for the course. Those are called **Learning Targets**. These were keyed to the four major topics in the course (proof, graphs, relations, trees). Here are a couple:

P.2:I can identify the predicate being used in a proof by mathematical induction and use it to set up a framework of assumptions and conclusions for an induction proof.

G.6:I can give a valid vertex coloring for a graph and determine a graph's chromatic number.

Here's the full list. Notice these are phrased in terms of concrete action verbs that produce assessable results.

For ability in application of these basic skills, students were given a series of **Challenge Problems**. These are problems that require students to apply what they learned about the basic skills and included a mix of "Theory" problems that involved writing proofs, programming assignments where students had to write Python code to solve a problem, and real-world applications. I started with a core of ten Challenge Problems but also wrote some more during the semester as I got inspired, and we ended up with 17 of these total. Here's one that involved doing some proofs by induction. Another had students write a Python function that would compute the composition of two relations on a finite set. Another had students use Python code to experiment with a class of graphs, make a conjecture about their clustering coefficients, and then prove their conjecture.

Finally, for engagement, I broke from the specs grading mold and used points, or what I called *engagement credits*. Students accumulated engagement credits through the course for doing things like completing their Guided Practice pre-class work on time and participating in class on certain days (especially the days close to breaks). Basically this was a way of incentivizing student work on the course for things that I needed them to do, especially outside of class.

I also had students take a final exam in the course, that I'll describe in the next section.

Each of these three areas above were assessed in pretty different ways.

The Learning Targets were assessed using short quizzes called *Learning Target assessments*. I set aside every other Friday in the course for students to take Learning Target assessments as well as a few extra days during the semester. Here's an example of the assessment for Learning Target P.2 and here's the one for G.6. Notice they are simple tasks that deal directly with the action verb in the Learning Target.

Students could come on these Fridays and take as many or as few of these Learning Target assessments as they wanted. Only the Learning Targets that we'd discussed in class were available, but once they were available they were *always* available. Previously-given Learning Target assessments would have new versions of the same problem available to do. So a student who didn't feel ready to be assessed on Learning Target G.6 didn't have to take the assessment for G.6, but just wait two weeks and try it then.

Learning Target assessments were graded **Satisfactory/Unsatisfactory** according to specifications that I determined, and those specs are at the bottom of each Learning Target assessment so it's very transparent for everyone. If student work was Satisfactory, I just circled the "S" at the top of the page, and circled "U" otherwise.

Challenge Problems were *not* graded Satisfactory/Unsatisfactory but rather using the EMRN rubric, which is a modification of the EMRF rubric I wrote about here.^{[2]} "Pure" specs grading would say that I should not make things so complicated, and just use Satisfactory/Unsatisfactory with a high bar set for Satisfactory. But I've found that in math classes, written work is hard to get right and students get easily discouraged, so I felt there should be some detail added to the 2-level rubric to distinguish between Satisfactory work that is excellent versus merely "good enough", and Unsatisfactory work that is "getting there" versus that which has major shortcomings. Students would submit their Challenge Problems as PDFs or Jupyter notebooks on Blackboard; I'd grade it there and leave feedback, then students could revise (see below).

Importantly, there were no recurring deadlines on Challenge Problems. Instead, students were allowed up to two Challenge Problem submissions per week (Monday--Sunday) which could be two new submissions, a new submission and a revision, or two revisions. The only fixed deadline for Challenge Problems was 11:59pm on the last day of classes, after which no submissions of any kind were accepted. This helps keep students from procrastinating until the end of the semester and dumping a ton of Challenge Problems into the system all at once. (Although there were issues with this; keep reading.)

Engagement credits were given for a variety of tasks, so whenever there was a task given out that could earn engagement credit, I'd just explain what it takes to earn the engagement credit and go from there.

At the end of the course students took a final exam. The final exam consisted of eight randomly selected Learning Target assessments that were given previousy in the course along with a final question for feedback on the course. The Learning Targets were selected so that at least one Learning Target from each of the four main course topics was represented. I'd never given a final exam in a specs grading class before this semester; in past classes, the final exam period was set aside as one more session for any student who needed to pass Learning Targets to have a chance to do so. I instituted the final exam this time because I wasn't satisfied that the combo of Learning Target assessments plus Challenge Problems was giving me reliable data about student learning. I was getting students who would pass a Learning Target early in the course, then forget that they had done so, and "accidentally" retake that Learning Target later... and not pass it. So I wanted to have one additional layer of assessment to get students to recertify on the basic skills at the end of the course. The fact that all I was doing was recycling old Learning Target assessments made this easy to make up, by just randomly selecting the Learning Targets and assessment versions and then merging the PDFs. (I made four different versions for test security.)

I broke again from the specs grading mold and graded the final using points, grading each recycled Learning Target with either 0, 4, 8, or 12 points. A 12-point score was given if the work would have earned Satisfactory marks according to the original specs, 8 if it was "almost Satisfactory", and so on. The feedback questions were given 4 points, bringing the total to an even 100 points.

As in all specs grading courses, almost all student work of consequence could be revised in some way.

Learning Target assessments could be "revised" by retaking them, either at a subsequent Learning Target assessment session on Friday, or by scheduling a 15-minute appointment in the office to do it orally. There was no limit on the number of times students could retake Learning Target assessments, but there *was* a limit on office hours appointments: no more than twice a week, for 15 minutes each, and no more than two Learning Targets attempted per 15-minute session, and appointments had to be scheduled 24 hours in advance. Also, students had to try the Learning Target on paper first before doing it in the office. This was a policy purely to keep the number of office hours visits for Learning Target revisions down to a reasonable level.

For Challenge Problems, students could revise any Challenge Problem that received an "R" or "N" grade just by submitting a new version on Blackboard that addressed the feedback I gave. There were no limitations on the number of times students could revise Challenge Problems other than the two-submissions-per-week rule, and the fact that any Challenge Problem work that earned "N" required students to spend a token to revise.

What's a "token"? A token in specs grading is a sort of "get out of jail free" card that a student can spend to bend the rules of the course a little. Every student in my course started with five tokens. By spending a token, a student could purchase a third submission of a Challenge Problem in a given week (but these couldn't be "stacked", for example to get four submissions in a week for two tokens), to purchase a third 15-minute oral revision session in a week, or to purchase five engagement credits.

There were no revisions available for Guided Practice, the final exam, or any item that earned engagement credits.

The "specs" part of this system so far has come from the Satisfactory/Unsatisfactory rubric and EMRN rubric used for grading Learning Target assessments and Challenge Problems. Most of the engagement credit-earning items were also graded Satisfactory/Unsatisfactory. Specs grading also has to do with the assigning of course grades, and here is how it worked in my course.

First of all, let's distinguish between the **base grade** for the course and the **modified grade**. The base grade is just the A B, C, D, or F that a student earns. The modified grade is the base grade modified up or down by a plus or minus. Course grades were determined by a simple two-step process.

The *base grade* in the course was determined using this table:

To earn this grade: | Accomplish the following: |
---|---|

A | Earn Satisfactory marks on 19 Learning Targets; and complete 10 Challenge Problems with at least an M mark, including at least five "E" marks. |

B | Earn Satisfactory marks on 17 Learning Targets; and complete 7 Challenge Problems with at least an M mark, including at least three "E" marks. |

C | Earn Satisfactory marks on 15 Learning Targets; and complete 5 Challenge Problems with at least an M mark. (No "E" marks required.) |

D | Earn Satisfactory marks on 13 Learning Targets. (No Challenge Problems required.) |

So the base grade in the course is entirely determined by three points of information: (1) how many Learning Targets you pass, (2) how many Challenge Problems you pass, and (3) how many Challenge Problems show excellent work. (An "F" grade is awarded if a student doesn't complete the requirements for a "D".)

The grade of "C" is considered "baseline competency", and to earn that grade you have to complete the "C bundle", which is passing 75% of the Learning Targets and completing five Challenge Problems, with no requirement of excellent/exemplary work required. The "B bundle" is everything in the "C bundle" with more Learning Targets passed and more Challenge Problems completed plus some evidence of excellent/exemplary work. The "A bundle" is likewise everything in the "B bundle" with even more Learning Targets and Challenge Problems completed plus even more extensive evidence of excellent/exemplary work. Notice, students get to choose which Challenge Problems they attempt -- we had 17 Challenge Problems in all and students just picked the ones they liked^{[3]}.

Additionally, students targeting an A or B grade in the course had to complete at least one "theory" oriented Challenge Problems with an E or M grade (i.e. successfully write a proof for a mathematical conjecture) or else the final grade was lowered by one-half letter.

If the only grade we awarded were these five letters, this would be extraordinarily simple. But we also award plus/minus grades, and so I had to add rules into the system for how this works. I chose to approach this by awarding plus/minus modifications on the basis of the final exam and on engagement. The base grade was raised by a plus, lowered by a minus, or lowered by an entire letter as follows:

- Add a
**plus**to the base grade if you earn at least**60**engagement credits*and*earn**at least an 85%**on the final exam. - Add a
**minus**to the base grade if you earn between**30 and 39**engagement credits (inclusively)*or*earn**between 50% and 69%**(inclusively) on the final exam. - Lower the base grade
**one full letter**if you earn**fewer than 30**engagement credits*or*earn**lower than 50%**on the final exam.

So, students' *base grades* are determined by the really important stuff in the class --- basic skills and the ability to apply them. Those base grades earn plus/minus grades by their performance on the final and their engagement in the course. There was some insulation in place in case a student did poorly on the final or had low levels of engagement, but it would still affect their grades. Note that there's a "safe zone" cutoff here of 70% on the final and 40 engagement credits. Students who do this well are immune from their base grade being penalized.

In my view, the whole plus/minus system here fouls up what it otherwise a beautifully simple grading system, but according to the Dean's office I *have* to have some plus/minus system in place. This is the best I could come up with.

To help students keep up with this, they were given two visual aids. First there was this scorecard that they could use to track their progress on their base grade --- just check off or fill in the boxes as the semester progressed. Then, to navigate the plus/minus system, there was this flowchart that got students from their base grade to the final grade in just a few questions.

I haven't gotten back evaluations for this course yet, so I just have verbal feedback and mid-semester surveys to go on. But based on what I have, students were totally thrilled by this system. They remarked about how it took a little while to get used to it, but once they "got it" they wished that all their other courses did the same thing. Computer science majors in particular --- who make a career out of determining how to debug their code from feedback given by the compiler --- really resonate with the idea of being able to debug their math work by using detailed feedback. I've been contacted by at least one other professor in the CS department here who's had students from my course talk about how much they appreciated it and how much it helped them learn.

For my part, I could definitely see students learn in ways that traditional grading, with its one-and-done approach to assessing skills, simply doesn't support. Students' ability with proof especially benefitted. Most of my students had *seen* proof in their first-semester class because that's a required topic, but their proof skills were virtually non-existent in my class.Their first attempts were usually pretty sad. But with feedback and office hours visits, they kept at it, and eventually almost everyone was able to whip at least one induction proof into shape, and could demonstrate skills with proof through Learning Targets P.1--P.4. A few became really fascinated with induction proofs and did several of these on their own free will.

I really liked the *simplicity* of this system as well, with the base grade determined by a simple lookup on a four-line table and the modifications done with another similar table. My past attempts at specs grading were detailed but tended to be like Rube Goldberg machines, sort of bloated and complicated. I was really going for simplicity and minimalism this time, and while again the plus/minus system messes this up somewhat, I was still very happy with the results.

I also liked that this system focused student conversations away from points and the grubbing of points and toward content and knowledge. I never heard anything like, "*I need to earn an 86.2 on the last test in order to bring my average up to a B*". Instead I heard conversations like, "*I need to complete one more Challenge Problem to get a B, should I focus on applications of graph theory or writing code to implement a relation?*" or "*I've taken Learning Target P.3 three times without success and it's because I don't get structural induction --- can we talk about that?*"

Likewise, this system inverts the way students tend to approach the course as a whole, that is by coming to class without a clear idea of what they want out of it and just hoping for the best. Here, students have to think about the grade they want to earn *first*, then this tells them which "bundle" to look at in the syllabus and this lays out an agenda for what they need to accomplish in the course. There is no "hoping for a grade"; the student is in control and we talk about *targeting the grade you wish to earn* instead of hoping for the grade you think you "deserve".

This was the first time I'd tried oral revisions of Learning Targets in the office and I thought that was a great success, especially on proof-related targets where students could just *talk* through the problems rather than writing so much down. I think in some cases I got much better information about student learning by talking face-to-face with them rather than reading their writing.

I also think the final exam was good for providing that extra layer of assessment that allowed me to triangulate my data about student performance from the Learning Targets and the Challenge Problems, and students couldn't just clock out of the course once they'd reached the requirements of their chosen grade bundle.

I continue to believe that doing away with recurring deadlines for Challenge Problems is a good idea and student work is better without them. At the same time, students really struggled with procrastination. Although I had measures in place to help with this^{[4]}, by the end of week 9 in a 14-week semester, the median number of *submissions* of Challenge Problems --- not Challenge Problems passed, but *submitted* --- was *one*. Therefore the vast majority of students were still cramming in Challenge Problems during the last three weeks of class; I received 75 different Challenge Problems to grade over the weekend of week 12 for instance. I eventually dug out from under the grading, but procrastination cost some students a passing grade. So while I'll continue this quota/single deadline system in the future, we all need to take procrastination more seriously.

The final exam helped eliminate false positives in the course, but next time I'm going to make the final contain not only old Learning Targets but also some conceptual questions, to get data on students' conceptual understanding and not just basic technical skill. For example I had some students who could perform Warshall's Algorithm but I am not sure they know what this algorithm does or why it works.

Finally, while I'm convinced this specs grading system isn't more complex than traditional grading systems, it's quite different, and it requires that students really take a consistently hands-on approach to the class by learning the system, reading the syllabus carefully, and paying attention to announcements and calendar events regarding graded work. Unfortunately this is by far my students' weakest link --- managing information streams, projects, calendar events, and tasks. I feel like there should be a course-within-a-course here that includes some basic GTD training and accountability for staying current with course info, because failure on this front absolutely destroyed some students.

To conclude here: I am a convert to specs grading and I do not see myself going back to traditional grading anytime soon. It's just too good. I still hate grading like everybody else, but at least now when I grade, I am giving feedback to students rather than splitting hairs over points, and it's really changed the dynamic of my classes for the better.

For those who are interested, here is the syllabus for the course with all the details (yes, I actually left stuff out in this post).

Now, ask me some questions about this.

My next class isn't until Fall 2018 because of my sabbatical, so I need this post for myself, too, to help me remember how to do this 15 months from now. ↩︎

I changed the "F" to "N" because the letter "F" in my opinion is so emotionally loaded for students that it simply cannot be used in any context for grading any more. I could make the top grade "F" for "Fabulous" rather than "E" for "Exemplary" and students would still think they failed. The letter "F" is done in education. ↩︎

Which in practice often turned out to be the ones they felt would be easiest. Some definitely looked for the easy way out. But most students found they gravitated toward certain kinds of problems and developed a real interest in those topics because

*they picked them*. ↩︎For example I set up "incentive checkpoints" that awarded engagement credit in big chunks for finishing a certain number of Challenge Problems and Learning Targets by certain points in the course, and the only way to earn a plus grade in the course was to hit at least one of those checkpoints. ↩︎

In the few days, two national op-ed pieces about grades and grading in higher education have appeared. Corinne Ruff wrote this piece for the Chronicle (paywall, sorry), and then Mark Oppenheimer wrote this Washington Post op-ed provocatively titled "There's nothing wrong with grade inflation". The fact that these appeared within a few days of each other possibly signals that there is a growing sense that something is wrong with grades in higher education, and it definitely affords opportunity to raise awareness about alternatives like **standards based and specifications grading** (SBSG).

The WaPo piece is about grade inflation, specifically about the pointlessness of trying to combat grade inflation any longer. Oppenheimer points out that all the major efforts to combat grade inflation in the elite schools have ended up causing more problems than they solve. And so, as Oppenheimer says, "Our goal should be ending the centrality of grades altogether. For years, I feared that a world of only A’s would mean the end of meaningful grades; today, I’m certain of it. But what’s so bad about that?" He goes on to point out many of the failings of traditional grades that I've mentioned here: grades promote extrinsic motivation and surface or strategic learning at best, they don't always measure learning accurately, and they don't measure certain important kinds of cognition at all.

Oppenheimer says "We need to move to a post-grading world. Maybe that means a world where there are no grades — or one where, if they remain, we rely more on better kinds of evaluation." He then proposes a system of "nuanced transcripts with comments" and gives several examples of schools taking this path. This proposed system of "transcripts with comments" will remind some readers of this article I posted in September where I proposed basically the exact same thing.

He points out that this "nuanced transcript" approach is being used at elite institutions and small schools, and that this can't necessarily be replicated by larger universities or by contingent faculty who don't have the time or resources for investing hours of time in writing detailed letters for each student's portfolio. His answer to this is that maybe the larger schools can make small steps toward change, for example by abolishing the use of the SAT for admissions and doing *something* about transcripts. To someone who might have been nodding in agreement along with this op-ed up to this point, that conclusion must be disappointing. Isn't there *something* that can be done about grades if you're not tenure-track at a small or elite institution?

Let's cut over to Corinne Ruff's article in the *Chronicle*. The article asks a question (Why do colleges still use grades?) but never seriously attempts to answer it. Instead Ruff, like Oppenheimer, raises the concern that grade inflation is so bad that grades themselves have becoe meaningless. Ruff also mentions a potential fix for this problem in the form of competency-based education as practices by institutions such as Western Governor's University. But like the Oppenheimer piece, Ruff's article ends on a somewhat negative note. Quoting Woodrow Wilson National Fellowship Foundation president Arthur Levine, the article makes meaningful reform of grading in higher ed as something far off in the future:

"This isn’t all going to happen next week," [Levine] says, adding that most institutions still haven’t taken steps to move away from grades. "We’re talking about an evolution over time."

If the situation is so bad, then isn't there *something* that can be done about grades in higher ed that doesn't involve a wholesale revolution in higher education itself that would take decades, and quite frankly isn't likely to happen at all if it's framed as something that requires a revolution?

If you read this blog on any kind of basis you know that my answer to this question is "yes", and that the answer is SBSG. I think SBSG addresses the core concern of both of these articles -- that grades have become or are becoming meaningless -- and implements the actions implied by both of these articles (we need to replace traditional grading with something else) in a way that gives individual instructors and students control over the process, so that the change is closer to the ground and requires only some careful planning and marketing, rather than wholesale revolutionary change.

If grades have become meaningless -- and I think that they are getting to that point, if not already -- then it's because grades have become decoupled from demonstrable student learning. What does a "B" in Calculus actually mean about what a student knows or doesn't know about Calculus? It *might* mean that the student knows considerably more than someone who has a "D" in the course. But beyond that, it's impossible to say. Even if we knew the assessments used in the course and the sorts of work that students were asked to do, it's impossible to say. Without having grades tied to concrete accomplishments of specific learning goals done to clear specifications of professional quality, we simply don't know what a grade means.

What about grade inflation? Oppenheimer suggests that the inflation of grades has caused, or is causing, grades to become meaningless. But it might well be the other way around -- that the meaninglessness of grades, by which I mean the inability to deduce information about learning from the grade itself, could be driving grade inflation. If professors and future employers don't believe that grades have meaning, why *shouldn't* we give students high grades for poor quality work, and let the "real" grade become -- as Oppenheimer suggests -- letters of recommendation and the like? On the other hand, if grades really *did* have meaning, then perhaps we'd be less likely to inflate them and give high grades for poor quality work, both out of a sense of professional ethics and also because the system that delineates what grades mean wouldn't allow it.

This is where SBSG comes in. In SBSG, we have

- Specific learning targets that undergird the whole course that spell out exactly what learning targets students need, eventually, to show proficiency towards.
- Assessments that ask students to demonstrate specific evidence that those learning targets were met.
- High standards of professional quality for what constitutes acceptable evidence on each assessment.
- Opportunities for revision and learning from mistakes, so that the assessments of learning are less prone to false positive or false negatives.
- A course grading system that is tied specifically to the quantity and quality of evidence that students provide of their learning, relative to our targets and standards.

In short, in SBSG, grades *mean* something. When a student earns a B in my discrete structures course, I know what it means: the student demonstrated proficiency on 20 out of 20 learning targets that address core competencies; the student was able to demonstrate additional evidence on five of those 20 targets; that the student completed six short projects throughout the semester that met standards of quality for such work; and that they maintained an 80% completion rate of all course preparation and homework tasks. If needed, I can produce the quality standards and the learning targets themselves. And all of this is spelled out in the course syllabus -- it's not occult knowledge or a subjective opinion. Even if I *wanted* to give high grades for poor or insufficient work, the system itself works against that.

Last semester when I was teaching this discrete structures class, it turned out that around half of my class earned grades of A or A-. I was worried, to be honest. I felt that perhaps I had made the course too easy. But then I went back and looked at each student's track record in the course, and every student who earned those grades did so because of a concrete, specific body of work that they had worked hard to produce over the semester. I could point to specific work that showed that the students had given acceptable evidence -- acceptable on my terms -- that they had satisfied the learning objectives of the course at "A" or "A-" level. If the specifications for acceptable work themselves aren't too lax -- and I felt like they weren't in this case -- then this is not an instance of grade inflation. It's an instance of large-scale student success, something to be celebrated and not stigmatized.

And to reiterate, SBSG is not something that requires a massive systemic change to get started, as would be the case if a university wanted, say, to transition to the "nuanced transcript" system. We don't have to *wait* for our system of higher education to "evolve". SBSG is something that individual instructures and students can begin to use as early as next semester. We keep the usual way of *reporting* grades using the ABCDF system (although I would love to get rid of that, too someday) -- just set up a backend for assessment that makes these letters actually mean something. I like the chances of SBSG being successful in the short term a lot more than those of competency-based or transcript-based "grading" simply because it's simpler, and especially because it's more organic. These kinds of changes are best done from the bottom-up, where it enjoys the support of faculty and especially students.

So perhaps the answer to the problems raised in these articles is right under our noses and is a lot simpler and closer than we think. What do you think?

]]>I haven't yet posted a complete rundown of what I call *specs grading iteration 4* -- the version of specifications grading that I am using in my classes this semester, the fourth semester after first rolling that system out last year. That would be more like an e-book than a post. So I am posting in bits and pieces. In this post I wanted to focus on an aspect of my assignments in my discrete structures course that is connected to the grading system: Namely, how I am handling deadlines for significant, untimed student work.

How I am handling deadlines is that I eliminated them.

Students in the class do three major kinds of work: timed assessments on learning targets, which are done in class; course management items that include guided practice assignments and weekly syllabus quizzes; and what we call *miniprojects*, which are like homework assignments targeted at applications of basic content to new problems. The miniprojects are what this no-deadline policy targets.

Miniprojects are significant assignments that are challenging in nature, graded using the EMRF rubric. Students are allowed to revise work that isn't "passing" (E or M grades) as well as to attempt to push "M" work up to "E" level. In fact students should *expect* to have to revise their work on these since an "M" is not always easy to get. I am planning on writing 10-12 of these for the semester and students have to pass 8 of them, including at least 2 "E" grades, to get an "A" in the course. (The full grading system is here in the syllabus starting on page 5.)

I used to have hard deadlines on these. In fact the first two this semester I assigned had hard deadlines. Students could spend a token to get a 24-hour extension on that deadline (and up to three tokens to get up to 72 hours of extension) but those deadlines were fixed. About two weeks into the semester, however, I decided that deadlines were not in harmony with the spirit of specs grading. More on this below. So I replaced the deadline policy with this:

- Students are allowed to submit
**up to two miniproject-related submissions per week**(= Monday through Sunday). This can be two first submissions, a first submission and a revision, or two revisions. Their choice. **No submissions are accepted past 11:59pm EST on Friday, April 22**(the last day of classes).

I'm calling this the **quota/single deadline** system. Students get the freedom to choose what they submit on a weekly basis; and they cannot put it all off until the end of the semester because they can only submit up to two items a week, and there is a fixed no-exceptions single deadline for the whole semester.

Why did I do away with fixed deadlines and replace them with this?

- I don't think it's true that having to work with fixed deadlines on every assignment promotes the kind of behavior some people think it promotes. I've often heard the line that
*having to work with deadlines prepares you for the working world.*After being in the working world for 20+ years, I think the value of deadlines as a means of personal growth is vastly overrated. When I look at my own work, the majority of the tasks that I need to do -- and I have hundreds of them at any given moment -- either have no deadlines at all, or the deadlines are self-imposed or can be re-negotiated if needed. And somehow, I learned to be a responsible adult anyway. I tend to think that this happened not because I was compliant, but because I had*freedom to choose my work*within reasonable guidelines. I changed the deadline structure on these assignments in my class because I want students to experience how cool and empowering it is to be invested in one's own work for a class, just like I learned it, by being given some freedom to study what they want and do it on something resembling their own schedule. - I also disagree with the notion that
*having to work with deadlines teaches responsibility and self-motivation*. If I complete a task against my will because there is a deadline attached, that*might*be considered "being responsible" but it is most certainly not being "self-motivated". It's sort of the*opposite*of being self-motivated, specifically being extrinsically motivated. Self-motivation -- or more importantly, self-regulation -- requires some kind of individual agency and a sense of self-efficacy. Getting the work done needs to be the student's idea. - I also came to realize that most of the complaints that I was getting from students -- and I get several of them every time the semester starts -- had nothing to do with the class but rather were expressions of frustration and stress that were amplified by the presence of deadlines. Sometimes putting boundaries around tasks creates some productive energy. I do this myself sometimes by self-imposing deadlines on projects that have gotten stuck in neutral. But other times -- quite often, when you're a student working two jobs and commuting an hour to and from campus and carrying 16-18 credits of courses -- deadlines just cause stress that is completely _un_productive.
- Finally I realized that having fixed deadlines on the assignments goes against the flipped learning design that I employ in the class. According to the "F" in the four pillars of FLIP, a good flipped learning environment is a
*flexible*learning environment in a number of senses, including the flexibility to choose what and how and when you learn something if you're a student, within reason and within the instructors framework for the course. Additionally the whole point of having specs grading is to give students choices on when and how they are assessed, and fixed deadlines don't work in harmony with that idea.

So far the results have been great. Far from procrastinating, students have been very productive. I've been getting about 30-40 submissions a week from 60 students total. Many of them do the math and realize that they need to maintain forward motion on getting things done so as not to wind up in an untenable position at the end. Also, since no single miniproject is required -- they just have to pick from among the ones that are posted -- the students' investment and energy level on these has really improved. (They still have to pass a sequence of timed assessments on the core learning targets of the course, so there's no worry that by not choosing a particular miniproject that they'll miss out on demonstrating mastery on something.) I also have stopped getting those panicky emails at 11:58pm about SageMath Cloud or Blackboard not working. Everybody's stress level has dropped.

So the freedom they get to choose their work and their work schedule has made them exactly what deadlines did not make them: happy, productive, and interested in the material. Maybe deadlines are necessary on some level but I would caution against giving them too much credit for students' development.

]]>In a previous post, I wrote about the EMRF rubric and how I am using it right now in my classes, which are using specifications grading. Here I want to discuss a few instances of how I've used it so far, and the kinds of effects this rubric has had on the narrative within, and about those classes.

In one of my classes (Cryptography and Privacy) one of the learning targets is

I can find the value of $a \pmod n$ for any integer $a$ and positive integer $n$ and perform modular arithmetic using addition and multiplication.

The problem they get consists of eight basic modular arithmetic computations to do involving adding, multiplying, and exponentiating. Unlike most work I give students to do, I don't especially care if they show their work on this problem. All I care about is whether they can compute the answer correctly or not. So the EMRF rubric is simply:

- E = All eight answers are correct.
- M = Either 6 or 7 out of 8 are correct.
- R = All parts are attempted but fewer than 6 answers are correct.
- F = Not all parts are attempted.

The idea being that if students can do modular arithmetic correctly 8 times in a row in a single sitting, that's pretty exemplary and I am relatively certain they have mastered the concept. If they can do it correctly about 3/4 of the time then I consider that "good enough". Otherwise they need to practice some more and try again later.

The EMRF rubric gets a little more interesting when applying it to more complex work, like mathematical proofs. I have an activity I like to do in proof based courses where students "grade" a piece of work that I present to them. Students read through a proposition and proposed proof and then use clickers to rate the work according to the specifications for student work document.

Students were given a pre-class activity with an induction proof where the base case was left out. In the pre-class activity only 65% of students correctly identified that the proposition was true but that it had this major flaw. Most of the remaining 35% said the proof had only minor errors to be corrected that had to do with style and phrasing. In class, I put this proof up on the screen and asked students what grade they would give it, if they were me. About 1/3 of the students said either E or M, about 1/3 R, and about 1/3 F. We had a really interesting discussion then about what constitutes passing versus non-passing work, and what differentiates E from M. Once students saw the missing base case, all the E/M people switched to F! Students are much harsher graders than I am.

For me, this is an F grade because it's fragmentary. (One could make a good argument for R, though.) A nice teaching moment in the discussion was that **the grade of F does not mean catastrophic failure. It means "fragmentary"** and many times, work graded at an F is five minutes and two sentences away from E, which is the case for this proof.

In my discrete structures course (same class as Case 2) we have this learning target on which students have to demonstrate proficiency:

I can outline and analyze a mathematical proof using weak induction, strong induction, and structural induction.

(There's another learning target where they have to actually *write* a proof.) Here is one of the problems from a recent assessment on this target:

Consider the following proposition:Every positive integer greater than 1 is divisible by at least one prime number.Assuming we prove this using strong induction, write clear statements of the base case, induction hypothesis, and inductive step.

Here are some of the less-than-E responses and how I graded them with the rubric:

- A student used $n = 1$ as the base case and showed that since $1$ is divisible by itself and $1$ is prime, then the base case holds. The rest of the proof outline went off without errors.
**In this case I gave the student an M.**Does the work meet expectations?**Yes**: The student has provided evidence that they understand the most important aspects of strong induction. Is it complete and well-communicated?**No**: The base case is wrong. This is not a trivial error, otherwise this would be an E. But it's an error that can be corrected through written comments. If the student really wants to earn an E on this target, they can always take the assessment again later and get the base case right. - A student set up the correct base case. For the inductive hypothesis, the student assumed that for some positive integer $k$, $k$ is divisible by a prime number; then stated that we want to prove that $k+1$ is divisible by a prime number.
**In this case I gave the work an R.**Does the work meet expectations?**No:**The whole point of strong versus weak induction is that the inductive hypothesis is different, and work that doesn't demonstrate evidence that they understand this, has not met expectations. Is there evidence of partial understanding?**Yes:**The rest of the outline is fine. The student just needs to try it again. - A student set up the right base case. For the inductive hypothesis, the student said that we will assume that $2, 3, 4, \dots, k$ are divisible by a prime number
*for all positive integers $k$*; then stated that we want to prove that $k+1$ is divisible by a prime number.**In this case I gave the work an R.**Does the work meet expectations?**No:**Outlines of induction proofs are expected to show understanding of the basic logic underlying the concept of induction, and getting the quantifier wrong in the inductive hypothesis casts doubt on that understanding. It's a significant logical error in which the proof assumes the conclusion. But, is there evidence of partial understanding?**Yes:**The rest of the proof is fine. The student just needs to try it again and get the quantifier right.

One of the many things I like about this rubric and the process of continuous revision that it feeds is that the assessment process is now, in the words of Dee Fink, "educative" rather than "auditive". That is, the assessment process is helping students *learn* mathematics rather than simply telling them what they don't know.

It also saves a ton of time. In all these cases, the decision of what grade to attach took me less than 90 seconds, including the time it took to actually read the work. In a traditional grading setting, I would have to not only spot the error but then go on to agonize over whether a messed-up base case should get 10 out of 12 or 11 out of 12 or... And then repeat for the other two cases, and I can guarantee that would take an order or magnitude longer. It left me with enough time to write down meaningful feedback in complete sentences.

Another way this saves time is that I very rarely get work that evaluates to an F. Since students can redo assessments through the semester, if they find themselves taking an assessment and can't produce a coherent finished project, they just bail out, and the work never gets turned in. And that's exactly what they should do, and make plans to try again next time.

Yesterday I handed back some work on assessments, and a student pulled me aside and asked if he could argue for a higher grade. He had done work on a problem (solving a recurrence relation) where he made an initial algebra error that was serious, but then worked through the rest of the procedure correctly. I had given him an R; he was arguing for an M.

Sounds familiar, except this time the student was not grubbing for points -- what's a "point"? -- but rather presenting a coherent and well-considered explanation for why, in his opinion, his work meets the standards for M. It was an exchange between two people on the same level. I did not agree with his argument in the end -- the standards for this problem clearly state that you have to get the answer right in order to be eligible for a passing mark -- but after telling the student this, I could also say, "You definitely show evidence of understanding. You'll get this right next time." It wasn't about *points*, it was about *quality* and there's a world of difference here.

I suspect some of you reading my evaluations above are disagreeing with me, but probably the disagreement is on *quality* issues (standards), not *quantity* issues (points). That's a narrative that I want to support.

A little over one year ago, I make a decisive break with traditional percentage-based grading systems and embraced specifications grading. I was motivated by experiences in my calculus classes where, after over 20 years of using traditional grading, I was finally fed up with the way it gives false positives and false negatives, stresses students out, and disadvantages students who need more flexibility and more choices to show evidence of learning.

That first implementation in January 2015, which I call "Iteration 1", had lots of bugs, but it was still the best experience I'd ever had with grading up to that point. Iteration 2 was for an online course, and although I used specs grading it was so different due to being online that it's hard to compare it with Iteration 1. Iteration 3 was in the fall, and as I noted here before it was worse than either Iterations 1 or 2 because I made the system too complicated. What I want to talk about now is Iteration 4, which is what I am currently using and I think it's the closest I've gotten so far to the ideal grading experience for both myself and for students.

In this post I want to focus on something I stumbled across that makes Iteration 4 work so far, and it's the grading rubric I am using for the major pieces of student work. It's known as the **EMRF rubric** and it's due to Rodney Stutzman and Kimberly Race. It came up in a discussion on the Standards-Based/Specifications Grading community on Google+ (which all the cool kids have joined, so get on that) and once I saw it, I knew that this was the rubric I had been looking for.

I've learned that in this style of grading, **it really pays to have a simple, visual standard grading rubric for all major assignments.** If you have one, then it's helpful for students because it provides transparency and a way for students to self-evaluate; and for you, it provides some measure of consistency in grading without having to agonize about giving the same number of points for similar work. Classic Linda Nilson-style specs grading uses a two-level rubric -- Pass/Fail. I instituted a three-level rubric for grading proofs and other complex problem solving tasks -- Pass/Progressing/Fail. (I changed "Fail" to "No Pass".) The middle layer of the rubric is for work that doesn't quite meet the specifications I set out, but it's close, and pragmatically the difference is that students have to spend a token to revise and resubmit "No Pass" work but they can revise "Progressing" work for free.

What I found was missing from this three-level rubric was a designation for really excellent work. The "Pass" level was synonymous with "good enough", and in my courses this is exactly what I was getting -- "good enough". There wasn't much incentive for excellence. Yes, I could set the bar very high for "good enough" and call that "Passing", but it always felt to me that there needed to be something like "Pass+" in my system, and then the requirements to earn an A in the course would require a certain number of instances of really excellent work.

The EMRF rubric does this. It is basically a Pass/Fail rubric with one extra layer on each side. In my specs grading system in Iteration 4, anything that registers as E or M is considered "Pass", and I have laid out in mind-numbing detail in a specifications document what it takes to attain these levels. Likewise anything that grades out to R or F is considered "No Pass" or (as I prefer) "Not Yet" or "Try Again".

In my Discrete Structures courses, students work on three kinds of assignments -- **Assessments** (short timed in-class quizzes that measure proficiency on a single one of 20 different learning targets for the semester), **Miniprojects** (which apply basic knowledge to new problems), and **Course Management** tasks that include preparation activities, daily homework, and weekly quizzes over the syllabus. Assessments and Miniprojects are graded on the EMRF rubric while course management tasks are graded Pass/Fail, usually on the basis of just completeness and effort. To get an "A" in the class, students must:

- Earn "Pass" grades (E or M) on all 20 learning target assessments, then provide a second item of evidence of proficiency for 10 of those 20 learning targets, and earn at least five grades of E in the process.
- Earn "Pass" grades (E or M) on 8 Miniprojects (out of 10--15 in all) including at least two grades of E.
- Pass at least 90% of all the course management tasks.

The "second item of evidence of proficiency" can be taking a timed assessment a second time and passing it, or doing an oral assessment in the office during office hours, or making a case that the work on a Miniproject shows evidence of proficiency.

For a "B", students need to Pass all 20 assessments, provide secondary evidence for 5 learning targets, and earn at least two E grades in the process. They must also Pass 6 Miniprojects and earn at least one E grade; and pass 80% of the course management tasks. For "C", students have to Pass all 20 assessments, and provide secondary evidence on at least 3 of these and there is no requirement for E grades. And "C" students also must Pass 4 miniprojects, again with no quota for E grades, and pass 60% of the course management tasks. There are also contingencies for D and F grades and a set of rules for determining plus/minus grades.

This, I think, captures what I want from grading -- students choose the grade they want to earn and that grade sets the agenda for their work in the class. Baseline proficiency is considered to be showing one piece of evidence that you are proficient with all the major skills objectives in the course, turning in about 1/3 of the available miniprojects assigned, and giving a reasonable amount of attention to course management. That's a "C". To get higher than a C, you have to demonstrate work that shows both more depth (secondary evidence of proficiency on learning targets), more breadth (more miniprojects), and some evidence of true excellence (the "E" quotas).

The difference between EMRF and straight Pass/Fail is the kind of feedback the letter communicates in EMRF. A grade of M means *This meets expectations* but it also honestly commnicates that it could still be better. For many students who get an M on an assignment, the existence of an E will impel them to retry an assignment to raise their grade even though it was "good enough" the first time. Likewise, a grade of R or F *means* the same thing in the grading system -- you still have to revise and resubmit if you want the work to be counted -- but it *communicates* two different things, an "R" saying *There is partial understanding here, but something important is missing* and an "F" says *There was too much missing to really know if you understand*.

I do not like the letter "F" in this rubric, though, because of the emotional baggage attached to it. People assume it means "fail", and it sort of *does* mean that, but then students too often see failure with such negativity and finality that they miss the message that they can try again. I would probably rebrand that level, maybe "I" for "Incomplete" or "S" for "Significantly flawed".

In the next post I'll show some examples of how I've graded with this rubric and how we've included it in some class discussions about the quality of work and professional standards.

]]>*Happy New Year everybody. This post is the first one I've made since November 2015, but I am making an effort to get back on the wagon and write here more often (like a lot of guilty bloggers are possibly doing). So here we go.*

Tomorrow (January 11) our new semester kicks off. Confession: I am not good with first-day activities. I don't enjoy icebreakers -- didn't like them as a student, don't like them as a professor. At the same time I don't like launching right into the course material on the first day because enrollments tend to remain in flux for a week or so, and I don't like putting new people behind at the outset. My solution for the last year or so is to use a variation on Dana Ernst's "Setting the Stage" presentation, which gets students thinking big-picture on the first day really effectively, and to gather some personal information about students that helps me get to know them better.

This time around I am doing the latter via a Google Form survey that I want students to do before day 2. I've done this before in the past, but this time somehow it turned out differently, because I've been using and thinking about specifications grading for a full year now. I want students to think about their grades on the first day -- to begin with the end in mind, as they say.

Actually I would rather students not think about grades at all. But until we get rid of grades entirely, their mind-altering influence will persist among students and faculty alike, and so insofar as students think about grades I would like for them to think of grades as *goals*. I don't want them to think of grades in terms of "hope" -- as in, *I really hope I get at least a B in the class* -- but rather as the outcomes. Not the outcome of random processes, in which students are like ancient pagans sacrificing time and energy to the Grade Gods (i.e. professors) in hopes of a good harvest. Instead, these should be the outcomes of reasonable goal-setting, careful planning and personal management, and of concrete evidence of learning. There should be no need for "hope" to be involved.

Among other things on this survey, students respond to the following three items. First, this:

At the beginning of each semester I like to ask each student to set a goal for the grade they would like to earn. Please don't say "A" just because that's what you think you're supposed to say. Many students drive themselves crazy because they think they are supposed to earn A's in everything when actually a "B" is perfectly suited for their goals and far more reasonable. Think carefully about how far you want to go in the course: Think about your personal interests, your academic goals, your intervening work and life responsibilities, and your skill set and set a goal for a grade that is realistic and attainable, whether that's an "A" or a "C". Take 5 minutes to think it through. Once you are done: Check the grade that represents what you think is the most realistic, reasonable, and attainable grade for you given all the factors you considered. Whatever you choose (as long as it's passing!) I will support.

There's a pulldown menu below this item with the grades A through C- on it. Next, they answer:

Now explain your reasoning behind the grade you chose.

There's a paragraph to enter text below it. Finally there's this:

Now go to the syllabus and take a 3x5" notecard, and write down all the coursework you need to complete in order to earn the grade you chose. I may ask to see this in class. This card is important -- you can use it at any point in the course to check against your grade records to see how much further you need to go. Go and do that now.

Students are supposed to click "OK" once they read this. And yes, I do intend on spot-checking people's cards -- whenever a student later in the semester wants to come and talk about his or her grade in the course, I will tell them to make sure to bring their card along.

I've had students do this in the past but only informally. This time I really want every student to *start* with the grade and then work *towards* it, rather than work like crazy to ace everything and then "hope for the best". My experience has been that most students still select "A" and most of the time this is just a reflexive action in my opinion. But there are some students who have never been given permission to aim for less than an A in a course, even though their life situation and skill set make earning an A an uphill battle that they are likely to lose. And it's very freeing for those students to have the prof say: *If a B is the best you can do in the course for whatever reason, that's OK, and I will have your back the whole way as you earn it.*

It seems to me that we have a problem in higher education with not setting our own goals. We are constantly trying to attain goals that we didn't set. Students deal with this because professors or parents or programs often insist on only the highest levels of attainment even when this doesn't necessarily make sense. And we faculty often have to deal with it as well through tenure and promotion goals that, in some places, are wildly optimistic and totally opposed to the natural skill sets and interests that faculty have. Very rarely have I seen universities where faculty are allowed to set their own goals for teaching, scholarship, and service within a reasonable and broad framework. It's so much like our ingrained system of grading that you realize that the apple doesn't fall too far from the tree.

What's hopeful for me is that standards-based and specification grading sets up a natural structure for students to participate in their grades in a healthy and proactive way, where they are in control and they get to decide what they want from the course. To some extend traditional grading *might* be amenable to this, but that seems to be the exception.

Also it's always interesting to see what students say for their rationale -- and a good point of reference for how to work with those students in the course.

]]>