Standardized Testing: The New Phrenology?

Standardized Testing: The New Phrenology?





We seem to have a preoccupation with measuring people’s heads. I guess if we find some way to measure people’s heads, then we can label those people. If we label them, then we can put them in a box. That’s a lot easier for us.


Phrenology and Its Origins

Phrenology is the now-discredited theory that scientists could read a person’s personality and intelligence by looking at the bumps on a person’s head.

Austrian physician Franz Joseph Gall, 1758–1828, is generally considered to be the father of phrenology. He said that the brain had localized centers—that different parts of the brain were responsible for different functions. He was right about that, but unfortunately, he was wrong about everything else. He went on to say that these various parts of the brain, which he called “organs,” grew as they got stronger. So, a person who had a good memory for words would have a larger brain organ for verbal memory. This larger brain organ, according to Gall, would push on the skull, causing a bump on the head in the area of the “verbal memory” region of the brain. Gall, reading off his handy map of “regions of the brain,” could tell what a person was good at by looking at the bumps on a person’s head.

Some of the organs of a the brain, according to Gall, were Combativeness (the disposition to fight and argue); Friendship (the disposition be friendly and courteous); Secretiveness (the disposition to conceal and be sneaky); Causality (the propensity to understand the reasons why events occur); Hope (the disposition to be optimistic); and many other values, skills, and attributes.

Phrenology became known across Europe, and became even more popular in the United States. Its popularity in the United States was due to a machine that was invented (and sold!) that could give exact, numeric measurements on the shape of head bumps, thus avoiding any type of bias.

Phrenology was never fully accepted by mainstream scientific thought. The writings of phrenology always had an evangelistic tone to them, as if they were trying to defend their “science” and make converts.

In 1808, a group of French physicians said that phrenology was a dubious proposition. Historians debate whether this group actually came to that conclusion on their own, or whether they were ordered to come to that conclusion by Napoleon Bonaparte. Napoleon, legend tells us, had his head bumps read, and was reportedly furious that the phrenologist missed some of the heroic qualities he believed he had.

Phrenology, as a science, had faded away by the early twentieth century—just as another way to measure heads began to emerge: testing.


Testing and Its Origins

While testing had moments of popularity through history, the current theory of testing might be traced to a confluence of several ideas from the end of the nineteenth century and beginning of the twentieth century.

One idea that emerged at that time was statistics. This theory of statistics became popular in the 1890s. The idea of reliability, or getting the same result on a test over and over, became a useful idea.

At the same time, the idea of mental measurements was growing in popularity. Alfred Binet developed a test for intelligence that measured the number of facts a person could remember.

An influential educational researcher named E.L. Thorndike became a fan of these kinds of tests. They fit in well with his own theories, most famously a theory he called the “Law of Effect.” Published in 1904, this theory stated that a person’s behavior would be more likely to repeat if it is followed by pleasing stimuli. He believed that all mammals learned in the same way, which is why he studied learning in cats rather than children. Thorndike died in 1949, and was a major influence on B.F. Skinner.

B.F. Skinner believed that little else was important besides observable behavior. Ideas about emotions, curiosity, and knowledge transfer were irrelevant. The only thing that was important was what could be measured.

More recently, books such as E.D. Hirsch’s series, What a Fourth Grader Should Know, became popular. The books, one for every grade, were essentially lists of words and concepts that “should” be known by all children of that age group. For many people, this was an easily understandable way to organize learning. Not only was it understandable, it was easily testable.

The preoccupation with measuring children became set into law with No Child Left Behind. Tests of every conceivable type, for every conceivable age group, were created to measure and sort children and their schools.


Seven Shared Assumptions of Phrenology and Testing

Shared Assumption #1: All People are the Same

Phrenology and testing both accept the idea that all people are the same. If you believe everyone has the same head bumps, then phrenology works. If you believe everyone learns and recalls information in the same way, then testing works.

The problem, of course, is that we’re not all the same, whether we’re talking about head bumps or learning. We don’t all learn in the same way, or at the same rate. We don’t remember facts in the same way. We can’t all recall facts in the same way, at the same speed, or in the same order. Some of us can recall random facts better than others. Some of us reason one way, others reason a different way. A few of us learn incrementally, but most of us learn in fits and spurts—we learn in long plateaus followed by large gains.

Shared Assumption #2: Numbers Don’t Lie

The American machines that measured head bumps gave clear, unambiguous numbers. The numbers were objective, unmarred by human subjectivity. We love numbers, because, it is said, “numbers don’t lie.”

The problem was that the numbers in phrenological head measurements didn’t mean anything. They were just numbers. Then, as today, we are often so enraptured by numbers that we assume that those numbers must mean something. But, in phrenology, the clear, objective, unambiguous numbers meant nothing.

Testing gives us lots of numbers as well. Many of us assume that the test results mean something, because they are, after all, numbers.

While tests are statistically reliable (they give the same spread of results each time), we still don’t know if they are valid (are they testing what we want them to test?). In fact, across all disciplines (not just education), when reliability has been achieved, people tend to forget about validity. We tend to assume that because tests are reliable, they must be valid. However, for educational testing, this has yet to be proved.

Shared Assumption #3: Intelligence is a Single, Monolithic Entity

When I drive down the road, I can glance down at my gas gauge, which tells me the amount of gasoline in my tank. I even have a handy digital readout that tells me how many miles I can go before I run out of gas. The amount of gasoline in my gas tank is a single, monolithic entity. Gasoline is not a multi-faceted phenomenon. The amount of gasoline can be measured with a single number, i.e., “56 miles to empty.”

In the nineteenth century, scientists believed intelligence was a single, monolithic entity. A person could have “above average intelligence,” be of “average intelligence,” or have “below average intelligence.” Because they believed intelligence was a single, monolithic entity, it could be measured with a single number. Consequently IQ tests were born. For the nineteenth-century scientist, intelligence was the same as the number of isolated facts a person could recall in a short amount of time.

The problem is that “intelligence” is an ambiguous, multi-faceted, entity with many different definitions. Most tests have us simply remember facts, which is different than critical thinking. Critical thinking is different than being clever. Reason is different than knowledge. Being street smart is different than being book smart. The ability to remember facts correctly is different than the ability to remember facts quickly. Wisdom is different than intellect.

Phrenology and testing both rely on outdated models that regard intelligence as a single monolithic entity.

Shared Assumption #4: Intelligence is Innate, Fixed at Birth

Phrenologists believed that mental qualities were innate, and fixed at birth. You were pretty much stuck with the head you were born with. If you worked hard, you might be able to increase one faculty or another, but only within parameters.

Most testing advocates also view intelligence as fixed and innate. Many testing advocates have pushed for kids to be tested earlier and earlier, so we can label kids as early as possible. In education, we constantly hear which kids are “above average” and “below average.”

The other option is to understand that intelligence is dynamic (always changing) and malleable (able to be shaped and changed). In this view, children’s test scores vary with their interest in the topic, their boredom, their stress level, and their attention span. With a perspective of malleable intelligence, children can learn topics that interest them, and can become good at whatever skills and topics they choose.

In the final irony, research has demonstrated that teachers’ perspectives on intelligence create different learning environments for children. Even on tests, teachers who believe intelligence is fixed will have kids who do poorly in school. Teachers who believe intelligence is malleable will have kids who do much better.

Shared Assumption #5: The Brain is Like a Computer

Computers always work the same (at least, they’re supposed to). When you click an icon, the computer does what you have told it to do. It works the same way all the time. One of the dominant views of the brain is that it is like a computer. This view says that the brain will work the same way all the time.

Phrenologists believed that the brain grew in the same place each time the same kind of learning happened. Most testing advocates believe that the brain will work in the same way each time, too. Teachers are dumbfounded when children’s test scores are worse in the spring than they were in the fall. Teachers are surprised when a “gifted student” gets an average test score, or when a “below average student” gets a good test score.

Of course, the brain is not like a computer at all. We can’t instantly retrieve factoids. We can rarely remember something when it’s out of context (that’s why you can’t remember people’s names when you see them outside of what you consider their normal setting). We forget things that don’t have meaning to us. Our sense of learning is tied into our sense of place (which is why kids score better on tests when they’re in the same room in which they learned the information). Our sense of learning is tied into our relationships (which is why we don’t learn anything from teachers we don’t like). Our sense of learning is tied into our level of stress (which is why kids who come from difficult family situations have difficulty learning).

Phrenologists and many testing advocates assume that the brain should be able to retrieve information and conduct mental operations without difficulty or distraction. Unfortunately, human brains don’t work like that.

Shared Assumption #6: Learning Can Happen in Isolation

The process of long-term learning never happens in isolation. We can “learn” isolated bits of information for the short term, but we forget the information quickly. We can remember the name of our waitress while we are at the restaurant, but we’ll forget it quickly after we leave. We can’t remember a string of random letters for more than a few seconds, but we can still remember the floor plan of the house where we grew up, even though the floor plan is much more complex.

Learning always happens in a relationship with something else. Long-term learning is tied into meaning, significance, interpersonal relationships, and geographical places. In learning, context is always the key. Our brains rarely store any information by itself; we always attach some kind of story with it.

Phrenologists believed that brains grew in certain places no matter what else happened in life. Most testing advocates believe that children can recall information out of context from wherever it was learned.

Shared Assumption #7: Mental Measurements Work as Well Today as They Did in the 1800s

Testing and phrenology make sense if we are overconcerned with measuring people. Phrenologists originally had good intentions. They wanted to be able to find a way to identify criminals before they committed personal or property damage. When intelligence testing began in the early twentieth century, the military picked it up as a quick way to separate soldiers. They wanted to find the “smart” soldiers who could safely operate the cannons, as opposed to the “below average” soldiers who would blow themselves up. Many testing advocates have good intentions, too. They want to identify the kids who will go into the “gifted classes,” and those who need extra help. Those who need extra help go into remedial education, even though we know that remedial classes tend to keep kids performing at a low level.

Perhaps the most fundamental problem with educational testing is that it is based on a nineteenth-century understanding of intelligence. We know so much more about intelligence than we did in the late 1800s, but we continue to test people based on an outdated philosophy. In education, we continue to use the 1904 philosophy of E.L. Thorndike and company, who studied learning in cats, rather than humans.


Unintended Consequences of Standardized High-Stakes Testing

There are a host of unintended consequences of high-stakes testing in our educational system.

One of the obvious unintended consequences of high-stakes testing is that teachers and administrators (and parents and kids!) begin to accept the above assumptions. We begin to think of kids as “above average” or “below average.” We begin to accept that testing gives us meaningful information. We begin to accept that children should be able to recall random facts in isolation. Slowly, a high-stakes testing environment will start to change attitudes and beliefs about the teaching-learning process.

Another unintended consequence of high-stakes testing is that the current system forces us to come up with a new fad each year. The No Child Left Behind laws expect that schools will get better every year. They can raise the mean for a couple years, but inevitably they will have a slideback (this is a phenomenon called regression toward the mean). This slideback is simply the normal statistical process. So, schools are forced to create a new meme every year; forced to change to a new educational theory. Principals have to watch for the “next big thing.” We do whatever it takes to increase test scores—having workshops and classes for kids, handing out peppermint candies on test day, etc.

Another unintended consequence of high-stakes testing has to do with the 1904-era theory. Theories will always limit how we think. Theories will allow us to ask some questions, and not other questions. This is why many of the great discoverers in history were not schooled in the discipline where they made their discovery. They made the discovery because they were not socialized in the discipline—they were not taught how to think (Charles Darwin trained as a geologist; Alfred Wegener, the creator of the continental drift theory, was trained as a meteorologist).

So, we are always limited by the theories we have. And in high-stakes testing, we can ask, “How do we improve test scores?” But we can’t ask, “Do tests actually evaluate what we want them to evaluate?” We can ask, “Is this child gifted or not?” But we can’t ask, “Are all children gifted, in the area of their personal interest?”

Another unintended consequence of high-stakes testing is the emphasis on curriculum. We continue to think if we can just find the right curriculum, then kids will, ipso facto, learn. So curriculum becomes primary, and learning becomes secondary. We buy into the curriculum companies’ assertions that learning isn’t important; only having the right books is important. We buy into the curriculum companies’ beliefs that children should grow incrementally, even though we know they grow by long plateaus and then sudden leaps.

Ultimately, when we are overfocused on testing and measuring heads, then imagination and curiosity become unimportant. We focus on results and numbers, assuming that test results and numbers are the same as learning. Perhaps imagination and curiosity are the foundation of real, long-term learning. But as long as we are overfocused on testing, imagination and curiosity cannot be nurtured.


So, What Do We Do?

I’m not saying we shouldn’t assess kids. Of course we need assessment. However, I think we need alternate methods of assessment. The simplistic, 1904-era tools that we currently use are probably not the best tools for assessment.

Perhaps assessment shouldn’t be used to measure kids at all. Perhaps assessment should be used as information for the teacher, so that the teacher can see what the children are learning and where they are getting stuck. Teaching is a process that never goes the same way twice, so assessment can help teachers adjust their methods to help kids get unstuck from problematic issues.

I’m not saying that teachers shouldn’t be held accountable. The vast majority of teachers are dedicated and hard-working. A few, of course, are poor teachers, and they should find other professions. However, the ideas of evaluating teachers based on children’s test scores is akin to measuring bumps on people’s heads. There is early, tantalizing research that suggests that drill-and-kill methods of teaching will produce better test scores in the long run. However, kids taught with those methods will, in the long term, hate the discipline and never learn another thing in that field. Research suggests that inquiry-based, curiosity-cultivating teaching methods will produce short-term results of lower test scores, but will bring a long-term love for the discipline, and the students will continue to learn and critically think about that discipline.

Finally, we need to rediscover the wonderful complexity of learning. The learning process should be filled with imagination and curiosity. Teachers should become experts not in delivering curriculum, but in seeing whether kids understand it, and figuring out ways to help them “get it.”

Measuring heads will not help the next generation develop a love for learning. Nurturing imagination and curiosity will.

—Jim Ollhoff, PhD, 2010



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: