Friday, May 9, 2014

Testing: A PUMP or a FILTER?

Parents, students and educators are upset about testing.  When did testing take over the lives of educators and their students?  Why did it happen? Has it helped or hurt?  Are there any unintended consequences of testing systems?  It would be of value to take a look at how standardized testing has evolved in Pennsylvania.  Pay close attention to how the tests are reported out.

1971 - 1983 Pennsylvania Department of Education
The first test mandated by the Pennsylvania Department of Education was the Educational Quality Assessment (EQA).  This test of basic skills was administered every three years (on a revolving basis) in 5th, 8th and 11th grade.  Individual or teacher scores were not reported. The results were a tool to measure student achievement at the school level. This allowed districts to get some form of information as to how their schools were doing. Results on the EQA were highly correlated with standardized achievement tests that were often used by Districts (e.g. California Achievement Test, Iowa Test of Basic Skills.)
1984 - 1992 Pennsylvania Department of Education
In 1984 PDE moved from the EQA to a mandatory standardized test called Testing for Essential Learning and Literacy Skills (TELLS) in third, fifth and eighth grades. There were two key aspects of TELLS that differed from the ELA.  First, results were provided for each student.  Students who were working at less than a proficient level were given the opportunity for remediation offered through the using state funding. Second, results were reported in the media listing the percentage of students in each district that were eligible for remediation. For the first time testing became a methodology for comparing student achievement between districts.  
1986 - 1992 The Pittsburgh Public Schools
In 1986, Pittsburgh implemented its own testing program called Monitoring Achievement in Pittsburgh (MAP). The goal of MAP was to articulate clear learning objectives (basic skills) and raise student achievement by providing parents, students, teachers and administrators with quarterly updates as to how individual students were doing. Since the data could be analyzed at the student, classroom and school level, accountability was increased for all teachers and administrators.  The program was very successful at raising basic skill competencies at all levels.  This helped to boost student achievement on the state TELLS tests as well as other standardized tests used in the District (California Achievement Test.)

*The 1980's was the decade when standardized testing became high stakes. During the 1980's, the Pittsburgh Public Schools was administering six tests a year at all grade levels, MAP (4), TELLS and the California Achievement Tests.  At the state level, the publication of the TELLS test results in the media created an enormous amount of tension within and between districts. Real estate values were effected by these results. Better schools meant higher property values.  At the local level, the MAP test with its comprehensive reporting to all stakeholders increased tension within and between schools in Pittsburgh. Teachers were being called into their supervisor's offices to discuss low MAP results. Parents were calling teachers to know why their children were not improving on certain test items. In addition, due to the increased accountability for teachers, the MAP test items became the curriculum in classrooms.  Thus, there was pressure for Districts to teach to the TELLS test and for teachers to teach to the MAP test.  Using high stakes test and sharing the results had the unintended consequence of focusing the classroom curriculum almost exclusively on the items tested.
1992 - 2012 Pennsylvania Department of Education
In 1992 PDE moved from the TELLS exam to the Pennsylvania State Standardized Assessment (PSSA).  The PSSA was PDE's response to the issue that the test becomes the entire curriculum.  In order to move beyond basic skills, the PSSA was aligned with state standards for what should be taught in each subject area at each grade level.  The PSSA was a three part system: Core Learning Objectives, Assessment Anchors that articulate what students should be able to demonstrate for each objective and the Assessment itself.  The test is criterion referenced and has four levels of achievement: Advanced, Proficient, Basic and Below Basic. The test is currently offered in Reading/English, Writing, Mathematics and Science in grades 3-8 and 11. Results are reported by District and School externally and by Teacher and by Student internally.  

2001 - 2014 US Department of Education
No Child Left Behind (NCLB) was a bipartisan program implemented under the Bush administration to increase student achievement. The program forced every state to create a testing system that was vetted by the Department of Education and made to align with certain common structures. The tests are only administered in Reading/English and Mathematics. Pennsylvania used the PSSA to meet this requirement.  The federal government negotiated cut scores by state and set yearly benchmarks that states (and schools) must achieve. The benchmarks represented what % of students must be proficient in order to make Adequate Yearly Progress (AYP).  If a school or district did not meet the benchmark they would be placed in Warning.  Each additional year the benchmark was not reached moved the school or district through more negative distinctions (School Improvement or Correction status.)  And what exacerbated this distinction, was that the benchmark was raised year after year. DIstrict and School data were published in the media. The bad publicity for a school was devastating. Not surprisingly, NCLB in America and PSSA in Pennsylvania became the curriculum.  And because they tested just in Reading/English and Mathematics for NCLB, many other content areas were marginalized in terms of imortance.  


*At this point three important issues were being raised in the assessment dialogue.  First, by raising the benchmark year after year, most every school would eventually go into Warning. By the year 2014, 100% of the students in America were supposed to be proficient.  Thus, inherent to the design of the program, nearly every school in Pennsylvania would eventually be seen in a negative light. 
Second, a quick review of which schools went into Warning early on in the program found that it was schools dealing with students in poverty (urban and rural.)  Studies of the rankings of districts and schools according to PSSA results showed that there was a significant correlation between individual PSSA scores and socio-economic status.  Poor kids got low scores.  In fact, going back to the times of TELLS, researchers found only two significant correlates to achievement on tests, SES status and the student's mother's educational attainment.  This then created a backlash from schools working with students in poverty.  The ranking of districts was seen by educators as a punishment, not as a method for improving education.  
Third, once districts or individual schools received negative distinctions such as Warning or School Improvement a greater focusing on the test occurred.  Not only did the test become the curriculum in the classroom, but other classes (cuch as Art, Physical Education, Foreign Languages) went by the wayside to double the time spent in English, Reading and Mathematics classes.  Simply put, the more negative publicity and accountability occurred, the more schools focused on one thing - The TEST.  Frankly, in most schools in Pennsylvania, during the month or two before the PSSA exam, all education stops and students work exclusively on practicing to take the test.  

2005 - 2014 Pennsylvania Department of Education
Educators at low-SES schools demanded a value-added measure.  Basically, they were saying that they should be evaluated on how much their students improved, rather than on hitting some arbitrary benchmark.  The Pennsylvania Department of Education agreed and created PVAAS (Pennsylvania Value Added Assessment System). This program took the students' PSSA scores and looked for individual growth from one year to a next.  This would provide a measure of the quality of the teacher and the school.  However, the federal government did not approve PVAAS as a valid measure for NCLB.  Thus this system was created and continues to be reported, but it carries little weight.  

The evolution of testing in Pennsylvania has moved from the PSSA to the Keystone Exams (in high school) and the NCLB has changed somewhat in the last year. However, the use of standardized tests to rank school districts, evaluate teacher performance and evaluate administrators continues. So let's look at our initial questions. 

When did testing take over the lives of educators and their students?  I would suggest this occurred the first time that test results were published in the newspaper.  In Pennsylvania this was around 1984.  In the United States this was in 2001.

Why did it happen? Legislators responded to what they considered to be a crisis.  The Nation at Risk report in 1983 had a profound effect on what was expected of our educational system. Educators were slow to respond to this crisis.  This left a vacuum for others to take up the cause.  Unfortunately, when politicians and the media take up the cause of education, they tend to take a punitive approach to the problem.  If you don't comply or create the desired results they will punish you in the public eye.

Has it helped or hurt?  Obviously this is a much harder question to answer.  In past posts I've talked about how the world has changed since 1983.  Steel mills are gone, the workplace demands educated workers; high school and college are requirements.  High stakes basic skills tests produce what you would expect, basic skills.  Harder tests (like the PSSA) promote higher order thinking, but they are limited to two content areas.  Some schools got better at educating all students, some suffered due to the effects of poverty.

Are there any unintended consequences of testing systems?  Clearly, we have found that when a test is high-stakes (for the student, the teacher, the principal, the district) the curriculum gets limited to what is on the test. Increase the pressure and non-tested content areas get marginalized. We have also found that poverty and other at-risk factors mitigate success on these tests. Thus high stakes testing confirms the public's superficial opinions as to whether poor children can succeed.

I would suggest that our education institutions were presented with a challenge in 1983 with the Nation at Risk report.  The education community did a very bad job of it, not transitioning to a system that meets the needs of a 21st century economy;  and not creating schools that meet the needs of students who are at risk due to extreme poverty. Like any bureaucracy, educational institutions are not good at change. And I would suggest that the move from an industrial economy to an information economy demanded a complete rethinking of what and how education is structured. If educators don't adapt (and we didn't during this time frame) politicians will fill the vacuum with alternatives. The first was a skewed and punitive testing/accountability system. The second was the creation of Charter Schools.

So is testing a PUMP or a FILTER?  That's an easy one.  If we create narrow high-stakes tests and publish the results in the paper, than they act as a FILTER.  They vindicate the people who move to the suburbs, attend the best schools and look down at poor achieving schools.  More importantly they confirm to the poor, the disenfranchised, the students of color that they cannot succeed in the United States of America.

If, on the other hand, we create quality tests that inform students, teachers and parents on how students are doing, than we can adjust our curriculums, pedagogy and programs to succeed.  But this requires two very significant initiatives.  First, educators have to do a better job of informing the public of how our schools are doing. If we don't create a quality measure, than politicians will create a simple minded one.  Second, we have to commit ourselves to creating quality schools in the inner city and rural areas that meet the needs of all students.  That is the goal of this blog. And that is what it would mean to use testing as a PUMP.

The answer is G.