Classroom Observation Significantly Influenced by Context

“Despite the intense focus on the use of student test scores to gauge teacher performance, the majority of our nation’s teachers receive annual evaluation ratings based primarily on classroom observations (Steinberg & Donaldson, in press). These observation-based performance measures aim to capture teachers’ instructional practice and their ability to structure and maintain high-functioning classroom environments. However, little is known about the ways that classroom context—the settings in which teachers work and the students that they teach—shapes measures of teacher effectiveness based on classroom observations. Given the widespread adoption of high-stakes evaluation systems that rely heavily on classroom observations, it is critical that we have a clearer understanding of how the composition of teachers’ classrooms influences their observation scores.

. . . We find that teacher performance, based on classroom observation, is significantly influenced by the context in which teachers work. In particular, students’ prior year (i.e., incoming) achievement is positively related to a teacher’s measured performance captured by the FFT.” [Bold added]

—Matthew Steinberg, University of Pennsylvania and Rachel Garrett, American Institutes for Research, “Panel Paper: Classroom Context and Measured Teacher Performance: What Do Teacher Observation Scores Really Measure?

Public Debates on Education are Ideological, Rather than Sociological

“Yet it struck me that most of the tensions the struggling school experienced that year were sociological rather than ideological: They concerned the challenge of bringing together people of different races and backgrounds (most of the families were low-income and black whereas most of the teachers were young, white, and middle-class) around a shared vision of what education can and should be. Yet our public debate is centered squarely on the ideological rather than the sociological. We endlessly debate the overall “worth” of various institutions—from “no excuses” charter schools to teachers unions—with a political or ideological framing. But we rarely venture inside, scrutinizing the arguably more important question of how people relate, or fail to relate, within these realms. Venturing inside—at least in a meaningful way—takes time, trust, and an open mind.”

—Sarah Carr, “There Are No Simple Lessons About New Orleans Charter Schools After Katrina. Here’s How I Learned That.” on Slate

Pineapple Express: Tests Shortchanging Student Literary Analysis Skills

The infamous Pineapple Passage on the 8th grade NY state test is rightfully making the rounds on  netmedia. It’s a prime example of something that has surprisingly thus far gone relatively unmentioned*, which is that as test-makers attempt to make test questions “higher order” in the form of inference and reading-between-the-lines, they necessarily skirt the fine line between what is easily quantifiable and what must be qualified by interpretation.

I noticed on my 5th graders’ ELA exam this past week that many of the questions were so subject to interpretation as to be perplexing as a multiple choice question.

I was an English major in college. Though I can’t claim to have engaged extensively in it, literary analysis is something I am not a complete stranger to, and I know that literary criticism can be highly subjective (though not as subjective as non-fuzzy major folks may assert). Much like in the art critic world, consensus towards a perspective on a particular work is arrived at via a long-form process of back-and-forth akin to peer review. Papers are written, professors stake careers on counterpoints, and over time, paradigms shift and the critique of a given work merges with the living history of a society. (Not sure if this last part was stated very clearly by the way, but I’m trying to push this post out while the issue is still relevant, so let me know how I can re-write that.)

It’s via the process of dialogue, therefore, that perspectives on literature evolve. It’s qualitative. You can’t assign a number to it without putting it into context.

Yet test-makers, due to pressure from policymakers concerned foremost with the short-term and the political, are attempting to assign numbers to the process of deeper literary analysis, which simply can’t be done. Ostensibly, they are measuring reading comprehension, but this disassociated push for “higher order thinking” mistakes simple comprehension of plot, setting, and character for deeper interpretation of what the text might mean.

For example, on the 5th grade test, there was a story about a family who lives in a cabin, and a blizzard suddenly occurs. Through a misunderstanding, the mother (or grandmother, I don’t remember) gets locked in the cellar by the father, who doesn’t know she’s in there. When he finally realizes what has happened, he opens it up, and the mother comes out, rubbing her hands and stomping her feet, and she quips something along the lines of “If you were going to lock me up somewhere, it should have been in the barn.”

A question then asks (wording might not be exact): After coming out of the cellar, the mother MOST LIKELY felt:
A. amused
B. anxious
C. angry
D. relieved

I was perplexed by this one. As I train my students to do, I kept going back to the passage to re-read the section on when she came out for evidence. She could be argued to have been somewhat amused, because she cracks a joke the moment she pops out. She may possibly be angry. One can infer that she is relieved, because who wouldn’t be relieved to be released after being locked up in the darkness?

The answer they want, obviously, is that she is relieved. But given that one could make an argument, based upon the evidence from her statement and via making a deeper inference about her evident dry wit, that she was also amused, it seems highly suspect to give a kid 0 points for one answer, and 1 point for the other. In other words, if you are testing a kid’s inferencing ability, well, then both answers are plausible when applying that skill.

There were a number of questions like this throughout the test. They are most certainly challenging questions, and interesting from a purely academic standpoint. But they are not amusing to me as I watch my students with exceptional learning needs with their heads bowed for over 2 hours grappling with passages that are well above their reading level. I witness children who whisper “I can’t do this” and put their heads down in the middle of the test. As GothamSchools noted in a recent article, this is akin to torture, and this grand experiment by short-sighted adults who simplemindedly clamor for quick and easy data has real and very human consequences on children.

There is a value and purpose for multiple choice questions. It’s like sticking your finger into the air after licking it. It gives you a quick sense in what direction the wind blows. But we need to stop pretending that we’re getting a true picture of an individual child’s ability to analyze and infer. So while NY may have “canned the pineapple,” we need to can the tests.

*This great opinion piece in the NY Times, Teach the Books, Touch the Heart, makes a parallel argument on how tests are hardly culture neutral, and how we should scrap multiple choice tests altogether and host written exams based on passages and books children have read in class. Great advice, and this is a must-read. Thanks to @KellyDillon1 for tweeting out the link to this.