Negative results about flipped learning from a randomized trial: A critique (Part 1)

Negative results about flipped learning from a randomized trial: A critique (Part 1)

Back in August, this paper got a lot of headlines/retweets. It is a report from a study on flipped learning in a higher education setting that used a randomized trial with a significant number of students that arrived at some less-than-flattering results: namely, that the positive effects of a flipped learning environment were not widespread and did not persist over time, and not only did the flipped environment not close achievement gaps between white men and underrepresented groups, those achievement gaps were actually made worse.

The study was tweeted far and wide and was frequently held up as an "I told you so" moment by many, like this (referring to a talk given before the paper was finalized):

A lot of people have asked me to address this paper – August wasn't a great time for me to do a deep dive on it, since I had just been appointed department chair and was frantically trying to get my head above water. I read it during the semester and had a few isolated tweets about it. Over the holiday break, I had time to organize my thoughts about it, and now I'm sharing them with you.

But first...

I feel it's important to say that, while I have numerous issues with this study, I am not writing this as a takedown, or in an attempt to "defend" flipped learning against negative results from research. It's OK to be a proponent of flipped learning; but it's not OK to be an uncritical fanboy/fangirl or some kind of religious zealot who feels the need to go to war against any person who dares to point out shortcomings in our pet pedagogical method. That's not scholarship. We need to teach like scholars – and part of that means that when someone uncovers a blind spot or shortcoming of Our Favorite Method, we listen and analyze and adjust, or even walk away from that method if it comes to that.

I don't think flipped learning is flawless or even that it's the best method to use in every circumstance. To the extent that research points out legitimate issues with flipped learning, we need to sit up and take notice. So despite my critiques, I want to thank the authors of this study for contributing to the conversation about how best to educate our students.

What this paper is, and what it isn't

This paper is a "discussion paper" done for the School Effectiveness and Inequality Initiative (SEII) at MIT. Β The SEII website describes these papers like this:

SEII Discussion Papers report on our work in progress. Discussion Papers include results from SEII's ongoing experimental and econometric evaluations of school effectiveness, our studies of the American income distribution, and our research on student-school matching.

I noticed that the paper is also hosted by the Annenberg Institute at Brown University, which has a similar mission focused on equity in higher education.

The paper has four authors. The lead author, Elizabeth Setren, is an economist at Tufts University whose work centers on education and labor economics, and econometrics. The other three authors are economics or math professors at West Point, one of whom (Greenberg), along with Setren, is a member of SEII.

Note well: this is a working paper, i.e. a work in progress. For whatever its merits, it is not a study that has been published in a peer-reviewed scholarly journal. And that's the gist of my first criticism of the study: Β I can find no evidence at MIT or Brown that this study has undergone any form of scholarly peer review. (Not yet, at least; I would assume that by now the paper is somewhere in a pipeline for publication, but I don't know.) Β 

Is this really a valid criticism? It's certainly true that a lot of peer-reviewed scholarship is bunk, and peer review is no guarantee of quality. Conversely a paper doesn't need to be peer reviewed in order to contain good scholarship. However, without peer review, what you're getting is basically a preprint that has not undergone systematic review by experts who have applied their expertise to detect and point out flaws β€” and those flaws most certainly exist simply because every study has flaws. Those flaws, having not been pointed out, are sitting there in the study, and they very likely affect the validity of the results.

In other words, for whatever merits it may have, this paper is not a finished product. It's still scholarship in a stage of development. Until it has undergone external peer review, its results have to be taken as preliminary findings whose validity is still in question. We need to keep that filter on while reading it.

The paper in a nutshell

The research question that this study addresses is never explicitly stated, but I'd frame it like this:

Does a flipped learning environment have a positive causal effect on student learning? And do such effects persist over time and across socioeconomic classifications?

That word "causal" might raise eyebrows, and rightfully so, since causality is a very strong claim requiring stringent conditions to be met. The vast majority of educational research studies are either qualitative or quasi-experimental and therefore lack true control-experiment groups – and therefore any significant differences between the "control" and "treatment" groups (for example, a section of a course that was done traditionally vs. a section done flipped) only correlate with the treatment; true causality cannot be concluded.

This study, on the other hand and as the title suggests, really did involve randomized trials, and that's the main strength of the research. Specifically:

  • Students at West Point enrolled in introductory Calculus or Economics courses (n = 1328, 80 sections, 29 instructors) were randomly assigned to either control (traditional lecture-based) or treatment (flipped) sections of their courses. Of the flipped sections, 26 were math and 14 in economics.
  • By this we mean that the treatment sections had a segment that was flipped – three units of instruction, which means (I think) three class sessions.
  • Student performance was measured via a quiz over the unit -- the same quiz for the same unit in both control and treatment sections. Differences were also studied on two final exam questions over that unit and in the overall final exam scores. The authors looked at differences between flipped and non-flipped sections in the short term (quizzes) and longer term (exam) and across socioeconomic and gender groups.

Now for the results, which were what really got people's attention:

  • The flipped sections for math showed significant gains in performance in the short term but no significant difference in the final exam. Economics students did not show even the short-term gains.
  • The flipped classes had large positive effect for men, but smaller or even statistically insignificant differences for women.
  • The flipped environment led to significant effects for white students, but Black and Hispanic experienced effects close to zero.
  • Students with ACT scores in the bottom quartile of the ACT had no signficant improvements.

In other words, achievement gaps between white and non-white student groups, between men and women, and between high-achieving and lower-achieving students not only did not close as a result of flipped learning, they got worse. And even for those who appear to reap the benefits of flipped learning – which appears to be a highly privileged group – those benefits were short-term only.

What to make of this?

That's a pretty damning set of results for flipped learning if we take them at face value. They certainly got my attention, because one of the reasons I have invested so much time in practicing and studying flipped learning is because of these very issues of equity and justice. We have some very strong research results that suggest that active learning in general, and flipped learning in particular can address equity issues in a powerful way:

  • One of the original implementations of flipped learning was aimed squarely at equity issues in a large-lecture course and found that those issues were ameliorated by flipped learning (what the authors at the time called the "inverted classroom").
  • One particular instantiation of flipped learning – peer instruction – has been found to eliminate gender differences in performance on the Force Concept Inventory.
  • Another instantiation – the SCALE-UP approach by Brooks et al. that mixes flipped pedagogy with a radically different approach to physical classroom space – found profound improvements in failure rates across gender and ethnic groups, particularly Hispanic students.
  • My own current (unpublished, so take my advice from above) research on flipped learning is showing signs that flipped environments are particularly helpful for students with learning disabilities and executive functioning disorders.

The Setren et al. paper begins by mentioning that "[a]dvocates of the flipped classroom claim the practice not only improves student achievement, but also ameliorates the achievement gap." Well, yes, we do, because this is backed up by 20 years of research and (in my case) 10+ years of frontline teaching practice. But could I maybe have been wrong all this time?

Words (definitions) matter

My first criticism of this paper is that it isn't peer reviewed. The second will sound familiar to anyone who reads my stuff regularly: Their operational definition of "flipped learning" is flawed. The exact way in which it is flawed, is that it's never fully specified; and to the extent it is specified, it's significantly wrong. Here's what the paper says in its opening lines:

Technology plays an increasing role in education and opens up a myriad of possibilities for educators to innovate on the traditional lecture format. One option, called the β€œflipped classroom,” involves students learning the material by watching video lectures prior to class. This frees up class time for more in-depth discussion and application of the concepts through practice problems, group work, and increased interaction with the instructor.

That's as detailed of a definition of "flipped learning" as we get, and that's a big problem. What's specifically problematic is not the last sentence about freeing up time for more in-depth work and interaction – I think that idea is fine. It's the one before it: "...students learning the material by watching video lectures prior to class." It does say that flipped learning "involves" this but says nothing about what else it involves.

I really, really wish people would stop saying this.

Let's review what I wrote about in this post ("Defining flipped learning: Four mistakes and a suggested standard"). There are four distinct mistakes someone can make in positing an operational definition of flipped learning:

  1. Assuming that flipped learning requires videos
  2. Assuming that flipped learning requires lecturing (including "watching recorded lectures on video")
  3. Assuming that flipped learning requires an in-class face-to-face meeting Β (not so much of an issue here)
  4. Assuming flipped learning is "recent" (more on that in part 2 of this critique)

This study makes the first two mistakes within the first 100 words of the paper and sticks with those mistakes throughout.

Why does this matter? It matters because if you say that flipped learning is where "students watch recorded lectures on video prior to class" and then do more in-depth work in class – and say nothing more about that pre-class time other than it's spent with students watching video lectures – you are missing one of the most important parts of flipped learning: structure and guidance during the pre-class time.

Flipped learning is not about simply dropping big slabs of content on students and telling them to consume it prior to a class discussion. If it were, flipped learning would not be a distinct concept. It would be just the same thing that many humanities subjects have done since the eleventh century: Read Chapter 3 before class and come ready to discuss. In fact many flipped learning skeptics make this mistake and therefore conclude that flipped learning really isn't anything new.

The problem with the "Read Chapter 3 before class" approach – or the "watch this 20-minute video before class" approach – is that if you stop there and just give students content to consume, it assumes that students know how to consume it meaningfully. This means knowing what they are supposed to learn from that content, and how, and having the self-regulation skills to know if they are learning well and how to adjust if they aren't. This skill of meaningful consumption of material is critically important for students' futures, at least as much as any of the specific content objectives we expect and probably moreso. It's also incredibly difficult, and only the top 1% of college students will come into a class with any idea of how to go about it without coaching.

It makes sense, because up until probably the mid-20th century, higher education was intended precisely for those 1%-ers, so we never heard anything bad about the "read the chapter and come ready to discuss" approach. We didn't need to think about teaching students how to read the chapter, because you probably already knew (because you were a 1%-er) and if you couldn't adapt, then you'd just end up quitting school. It was not a particularly student-focused approach – unless by "students" you mean the 1%-ers – because higher education was not particularly student-focused. Not then, and to the extent that this practice still goes on, not now either. But it worked despite this because it fit the times and the homogeneous demographic higher education served.

But it doesn't work any more, given the sheer diversity of students who are coming into our classrooms today. They are not 1%-ers. They all have the capacity to learn because all humans can learn. But not without help. As much as we may not want to hear it, professors now have the responsibility not so much to "teach the material" as to teach students how to learn the material.

And this is why flipped learning cannot simply be described as "having students watch videos before class". Meaningful consumption of video content is not really that much different than meaningful consumption of text content, if you are trying to learn things from it. If all you are doing is dropping videos on students and not providing any kind of structure or guidance for how to learn from the videos, then you are not doing flipped learning – or any other kind of learning. You're just asking students to be in the 1% again.

Also, it puts some light on the result from this study that white men benefited from "flipped learning" while women and non-white students had little to no benefit. Why should we expect them to benefit, when the approach to "flipped learning" is nothing but the same that has been used to educate only the privileged 1% for hundreds of years?

So the reason this criticism matters is that without explicitly addressing the need for guidance and structure during the pre-class activities – and saying that those pre-class activities are just watching videos – it's setting up the research to ignore a critically important part of the learning process, namely the things that students do during the pre-class phase in which they learn the material and how they self-evaluate the extent to which they believe they have learned it. You take this away, you're not doing flipped learning. And the research results you come up with may not be about flipped learning, either. It raises serious issues about the construct validity of the research, in other words – issues that will come home to roost when I get to part 2 of this critique.

Next up

In part 2, I'll go through my remaining criticisms of the study, which have to do with:

  • The depth of the literature survey
  • The subject population and external validity
  • The way flipped learning was implemented in the classes (which will be in two parts: issues with what we know, and issues with what we don't know)

One last thing: I'm being fairly harsh in my criticisms here because I want flipped learning to get better, and to do this we need studies like this one that involve good research approaches like randomized control trials and large samples. I reserve the toughest criticisms for the things that have the potential to do the most good.

Robert Talbert

Robert Talbert

Mathematics professor who writes and speaks about math, research and practice on teaching and learning, technology, productivity, and higher education.