18 December 2009

"Edwards Deming meets No Child Left Behind"

The Reality-Based Community | Mark Kleiman | Edwards Deming meets No Child Left Behind:

One of the striking features about NCLB is the primitive evaluation mechanism it employs. It’s pure defect-finding: measuring the percentages of kids of different types who fail to achieve some standard, as measured by standardized tests. Henry Ford would recognize it. W. Edwards Deming would be appalled by it. [...]

One of the reason Honda and Toyota ate General Motors’s lunch is that the Japanese car companies adopted statistical quality assurance while Detroit was still inspecting every part coming off the assembly line to see whether it was within tolerance. Why are we using those same outdated principles to manage the much more complicated problem of teaching children to read, write, and reckon?

Applying statistical QA to education would involve:
  • Selecting a sample of students for high-quality, expensive testing rather than settling for the level of observation we can afford to do on every student.
  • Using information about the whole range of performance rather than fixating on an arbitrary cutoff.
  • Taking measurements all through the school year, not just at the end, and getting the results back to the teachers promptly.
I think this is too clever by half.

The kind of statistical methods used in manufacturing rely assumptions you just can't make about students. You put n widgets through a mechanical process and it's safe to assume they're all going in more or less the same, they're all coming out more or less the same, deviations will be according to a known probability distribution (typically Gaussian), and you can measure that distribution accurately with sample much smaller than n. Widgets don't have any say in the outcome of the process they're being subjected to: they don't refuse to do their homework, or cut class, or get mono. Students don't go into the school year roughly interchangeable, and since students themselves and their families are such an important part of education, they're not all undergoing the same process. Most importantly, the goal of statistical QA is to turn out lots of items that are all the same. The goal of education is not uniformity, the goal is improvement. Or it had damn well better be.

Schools already try hard enough to juke the stats when every kid has to get tested. I'm loathe to give them the opportunity to fiddle with a sampling process too.

Yeah, the testing regime used for NCLB is bullshit. Wait, check that. The testing regimes I suffered through in school predate NCLB and they were still bullshit. We especially need to focus on the entire range of student aptitude, and not just meeting minimum standards. But I don't see how this sampling program is the answer. Especially this matter of testing continuously through the year. For one thing, we already use way too many school hours giving tests (both system-wide tests and the old-fashioned ones for class). Secondly, it sounds great to have rapid, standardized feedback for teachers, but shouldn't they be the one group of people who already have their fingers on the pulse of their classes? Don't students' grades and class behavior give them continuous feedback as it is?

(Via Megan McArdle)


  1. The reason the tests are bullshit is that they reference students, not the instruction the students have received in the past or on instructional options in the future. Instruction remains a black box between "standards" and "standardized tests."

    There are some aspects of instruction in which there is no latitude--arithmetic is a function of the number system and reading and spelling are functions of the English Alphabetic Code. There may be a few alternative ways of going about teaching kids how to handle the complexities involved in each area of expertise, but there aren't "oodles" of ways.

    Students and teachers do get continuous feedback in terms of grades and behavior. What they don't get is any information pertaining to their status in terms of the instruction that has been delivered or that will be delivered to achieve any specified instructional accomplishment.

    Teachers and students drift from one assignment to another. Some students learn with little or no instruction and some learn despite shoddy instruction. The school takes credit for these accomplishments and attributes instructional failures to the student, parent, or "society."

    Constructing the analog to "Business Intelligence" to create Instructional Intelligence requires more effort than is required in the Corporate world to specify Key Performance Indicators. But the same logic is directly applicable, and it provides an alternative to the current "primitive evaluation mechanism" that you describe very well.

  2. The reason the tests are bullshit is that they reference students, not the instruction the students have received in the past or on instructional options in the future. Instruction remains a black box between "standards" and "standardized tests."

    A fair number of the standardized tests I took actually never referenced me. They were amalgamated into school-wide results for state and county evaluation purposes. This is still entirely disconnected from the individual teachers I had, and it introduces a whole other set of problems I don't want to discuss. I take your general point that instruction is a black box, but the problems with testing aren't that the results are tied only to students.

    Regarding your last paragraph, I'll refer you to the first bullet of another post I just made on this topic:
    I am very willing to embrace better statistical models than we currently use. I just don't think QA is the best paradigm with which to build those models.