Massachusetts Comprehensive Assessment System

Share on Facebook

Share on Twitter

Share on

LinkedIn

Guest Post by Richard P. Phelps

On November 17, the Massachusetts Board of Elementary and Secondary Education (BESE) will decide the fate of the Massachusetts Comprehensive Assessment System (MCAS) and the Partnership for Assessment of College Readiness for College and Careers (PARCC) in the Bay State. MCAS is homegrown; PARCC is not. Barring unexpected compromises or subterfuges, only one program will survive.

Over the past year, PARCC promoters have released a stream of reports comparing the two testing programs. The latest arrives from the Thomas B. Fordham Institute in the form of a partial “evaluation of the content and quality of the 2014 MCAS and PARCC “relative to” the “Criteria for High Quality Assessments”[i] developed by one of the organizations that developed Common Core’s standards—with the rest of the report to be delivered in January, it says.[ii]

PARCC continues to insult our intelligence. The language of the “special report” sent to Mitchell Chester, Commissioner of Elementary and Secondary Education, reads like a legitimate study.[iii] The research it purports to have done even incorporated some processes typically employed in studies with genuine intentions of objectivity.

No such intentions could validly be ascribed to the Fordham report.

First, Common Core’s primary private financier, the Bill & Melinda Gates Foundation, pays the Fordham Institute handsomely to promote the standards and its associated testing programs. A cursory search through the Gates Foundation web site reveals $3,562,116 granted to Fordham since 2009 expressly for Common Core promotion or “general operating support.”[iv] Gates awarded an additional $653,534 between 2006 and 2009 for forming advocacy networks, which have since been used to push Common Core. All of the remaining Gates-to-Fordham grants listed supported work promoting charter schools in Ohio ($2,596,812), reputedly the nation’s worst.[v]

The other research entities involved in the latest Fordham study either directly or indirectly derive sustenance at the Gates Foundation dinner table:

the Human Resources Research Organization (HumRRO), which will deliver another pro-PARCC report sometime soon,[vi]
the Council of Chief State School Officers (CCSSO), co-holder of the Common Core copyright and author of the “Criteria.”, [vii]
the Stanford Center for Opportunity Policy in Education (SCOPE), headed by Linda Darling-Hammond, the chief organizer of the other federally-subsidized Common Core-aligned testing program, the Smarter-Balanced Assessment Consortium (SBAC),[viii] and
Student Achievement Partners, the organization that claims to have inspired the Common Core standards[ix]

Fordham acknowledges the pervasive conflicts of interest it claims it faced in locating people to evaluate MCAS versus PARCC. “…it is impossible to find individuals with zero conflicts who are also experts”.[x] But, the statement is false; hundreds, perhaps even thousands, of individuals experienced in “alignment or assessment development studies” were available.[xi] That they were not called reveals Fordham’s preferences.

A second reason Fordham’s intentions are suspect rests with their choice of evaluation criteria. The “bible” of test developers is the Standards for Educational and Psychological Testing, jointly produced by the American Psychological Association, National Council on Measurement in Education, and the American Educational Research Association. Fordham did not use it.

Instead, Fordham chose to reference an alternate set of evaluation criteria concocted by the organization that co-sponsored the development of Common Core’s standards (Council for Chief State School Officers, or CCSSO), drawing on the work of Linda Darling-Hammond’s SCOPE, the Center for Research on Educational Standards and Student Testing (CRESST), and a handful of others. Thus, Fordham compares PARCC to MCAS according to specifications that were designed for PARCC.[xii]

Had Fordham compared MCAS and PARCC using the Standards for Educational and Psychological Testing, MCAS would have passed and PARCC would have flunked. PARCC has not yet accumulated the most basic empirical evidence of reliability, validity, or fairness, and past experience with similar types of assessments suggest it will fail on all three counts.[xiii]

Third, PARCC should have been flunked had Fordham compared MCAS and PARCC using all 24+ of CCSSO’s “Criteria.” But Fordham chose to compare on only 15 of the criteria.[xiv] And those just happened to be the criteria favoring PARCC.

Fordham agreed to compare the two tests with respect to their alignment to Common Core-based criteria. With just one exception, the Fordham study avoided all the criteria in the groups “Meet overall assessment goals and ensure technical quality”, “Yield valuable report on student progress and performance”, “Adhere to best practices in test administration”, and “State specific criteria”[xv]

Not surprisingly, Fordham’s “memo” favors the Bay State’s adoption of PARCC. However, the authors of How PARCC’s false rigor stunts the academic growth of all students[xvi], released one week before Fordham’s “memo,” recommend strongly against the official adoption of PARCC after an analysis of its test items in reading and writing. They also do not recommend continuing with the current MCAS, which is also based on Common Core’s mediocre standards, chiefly because the quality of the grade 10 MCAS tests in math and ELA has deteriorated in the past seven or so years for reasons that are not yet clear. Rather, they recommend that Massachusetts return to its effective pre-Common Core standards and tests and assign the development and monitoring of the state’s mandated tests to a more responsible agency.

Perhaps the primary conceit of Common Core proponents is that ordinary multiple-choice-predominant standardized tests ignore some, and arguably the better, parts of learning (the deeper, higher, more rigorous, whatever)[xvii]. Ironically, it is they—opponents of traditional testing regimes—who propose that standardized tests measure everything. By contrast, most traditional standardized test advocates do not suggest that standardized tests can or should measure any and all aspects of learning.

Consider this standard from the Linda Darling-Hammond, et al. source document for the CCSSO criteria:

“Research: Conduct sustained research projects to answer a question (including a self-generated question) or solve a problem, narrow or broaden the inquiry when appropriate, and demonstrate understanding of the subject under investigation. Gather relevant information from multiple authoritative print and digital sources, use advanced searches effectively, and assess the strengths and limitations of each source in terms of the specific task, purpose, and audience.”[xviii]

Who would oppose this as a learning objective? But, does it make sense as a standardized test component? How does one objectively and fairly measure “sustained research” in the one- or two-minute span of a standardized test question? In PARCC tests, this is done by offering students snippets of documentary source material and grading them as having analyzed the problem well if they cite two of those already-made-available sources.

But, that is not how research works. It is hardly the type of deliberation that comes to most people’s mind when they think about “sustained research”. Advocates for traditional standardized testing would argue that standardized tests should be used for what standardized tests do well; “sustained research” should be measured more authentically.

The authors of the aforementioned Pioneer Institute report recommend, as their 7^th policy recommendation for Massachusetts:

“Establish a junior/senior-year interdisciplinary research paper requirement as part of the state’s graduation requirements—to be assessed at the local level following state guidelines—to prepare all students for authentic college writing.”[xix]

PARCC and the Fordham Institute propose that they can validly, reliably, and fairly measure the outcome of what is normally a weeks- or months-long project in a minute or two.[xx] It is attempting to measure that which cannot be well measured on standardized tests that makes PARCC tests “deeper” than others. In practice, the alleged deeper parts of PARCC are the most convoluted and superficial.

Appendix A of the source document for the CCSSO criteria provides three international examples of “high-quality assessments” in Singapore, Australia, and England.[xxi] None are standardized test components. Rather, all are projects developed over extended periods of time—weeks or months—as part of regular course requirements.

Common Core proponents scoured the globe to locate “international benchmark” examples of the type of convoluted (i.e., “higher”, “deeper”) test questions included in PARCC and SBAC tests. They found none.

Dr. Richard P. Phelps is editor or author of four books: Correcting Fallacies about Educational and Psychological Testing (APA, 2008/2009); Standardized Testing Primer (Peter Lang, 2007); Defending Standardized Testing (Psychology Press, 2005); and Kill the Messenger (Transaction, 2003, 2005), and founder of the Nonpartisan Education Review (http://nonpartisaneducation.org).

[i]http://www.ccsso.org/Documents/2014/CCSSO%20Criteria%20for%20High%20Quality%20Assessments%20 03242014.pdf

[ii] Michael J. Petrilli & Amber M. Northern. (2015, October 30). Memo to Dr. Mitchell Chester, Commissioner of Elementary and Secondary Education, Massachusetts Department of Elementary and Secondary Education. Washington, DC: Thomas P. Fordham Institute. http://edexcellence.net/articles/evaluation-of-the-content-and-quality-of-the-2014-mcas-and-parcc-relative-to-the-ccsso

[iii] Nancy Doorey & Morgan Polikoff. (2015, October). Special report: Evaluation of the Massachusetts Comprehensive Assessment System (MCAS) and the Partnership for the Assessment of Readiness for College and Careers (PARCC). Washington, DC: Thomas P. Fordham Institute. http://edexcellence.net/articles/evaluation-of-the-content-and-quality-of-the-2014-mcas-and-parcc-relative-to-the-ccsso

[iv] http://www.gatesfoundation.org/search#q/k=Fordham

[v] See, for example, http://www.ohio.com/news/local/charter-schools-misspend-millions-of-ohio-tax-dollars-as-efforts-to-police-them-are-privatized-1.596318 ; http://www.cleveland.com/metro/index.ssf/2015/03/ohios_charter_schools_ridicule.html ; http://www.dispatch.com/content/stories/local/2014/12/18/kasich-to-revamp-ohio-laws-on-charter-schools.html ; https://www.washingtonpost.com/news/answer-sheet/wp/2015/06/12/troubled-ohio-charter-schools-have-become-a-joke-literally/

[vi] HumRRO has produced many favorable reports for Common Core-related entities, including alignment studies in Kentucky, New York State, California, and Connecticut.

[vii] CCSSO has received 22 grants from the Bill & Melinda Gates Foundation from “2009 and earlier” to 2015 exceeding $90 million. http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/k=CCSSO

[viii] http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/k=%22Stanford%20Center%20for%20Opportunity%20Policy%20in%20Education%22

[ix] Student Achievement Partners has received four grants from the Bill & Melinda Gates Foundation from 2012 to 2015 exceeding $13 million. http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/k=%22Student%20Achievement%20Partners%22

[x] Doorey & Polikoff, p. 4.

[xi] To cite just one example, the world-renowned Center for Educational Measurement at the University of Massachusetts-Amherst has accumulated abundant experience conducting alignment studies.

[xii] For an extended critique of the CCSSO criteria employed in the Fordham report, see “Appendix A. Critique of Criteria for Evaluating Common Core-Aligned Assessments” in Mark McQuillan, Richard P. Phelps, & Sandra Stotsky. (2015, October). How PARCC’s false rigor stunts the academic growth of all students. Boston: Pioneer Institute, pp. 62-68. https://pioneerinstitute.org/news/testing-the-tests-why-mcas-is-better-than-parcc/

[xiii] Despite all the adjectives and adverbs implying newness to PARCC and SBAC as “Next Generation Assessment”, it has all been tried before and failed miserably. Indeed, many of the same persons involved in past fiascos are pushing the current one. The allegedly “higher-order”, more “authentic”, performance-based tests administered in Maryland (MSPAP), California (CLAS), and Kentucky (KIRIS) in the 1990s failed because of unreliable scores; volatile test score trends; secrecy of items and forms; an absence of individual scores in some cases; individuals being judged on group work in some cases; large expenditures of time; inconsistent (and some improper) test preparation procedures from school to school; inconsistent grading on open-ended response test items; long delays between administration and release of scores; little feedback for students; and no substantial evidence after several years that education had improved. As one should expect, instruction had changed as test proponents desired, but without empirical gains or perceived improvement in student achievement. Parents, politicians, and measurement professionals alike overwhelmingly rejected these dysfunctional tests.

See, for example, For California: Michael W. Kirst & Christopher Mazzeo, (1997, December). The Rise, Fall, and Rise of State Assessment in California: 1993-96, Phi Delta Kappan, 78(4) Committee on Education and the Workforce, U.S. House of Representatives, One Hundred Fifth Congress, Second Session, (1998, January 21). National Testing: Hearing, Granada Hills, CA. Serial No. 105-74; Representative Steven Baldwin, (1997, October). Comparing assessments and tests. Education Reporter, 141. See also Klein, David. (2003). “A Brief History Of American K-12 Mathematics Education In the 20th Century”, In James M. Royer, (Ed.), Mathematical Cognition, (pp. 175–226). Charlotte, NC: Information Age Publishing. For Kentucky: ACT. (1993). “A study of core course-taking patterns. ACT-tested graduates of 1991-1993 and an investigation of the relationship between Kentucky’s performance-based assessment results and ACT-tested Kentucky graduates of 1992”. Iowa City, IA: Author; Richard Innes. (2003). Education research from a parent’s point of view. Louisville, KY: Author. http://www.eddatafrominnes.com/index.html ; KERA Update. (1999, January). Misinformed, misled, flawed: The legacy of KIRIS, Kentucky’s first experiment. For Maryland: P. H. Hamp, & C. B. Summers. (2002, Fall). “Education.” In P. H. Hamp & C. B. Summers (Eds.), A guide to the issues 2002–2003. Maryland Public Policy Institute, Rockville, MD. http://www.mdpolicy.org/docLib/20051030Education.pdf ; Montgomery County Public Schools. (2002, Feb. 11). “Joint Teachers/Principals Letter Questions MSPAP”, Public Announcement, Rockville, MD. http://www.montgomeryschoolsmd.org/press/index.aspx?pagetype=showrelease&id=644 ; HumRRO. (1998). Linking teacher practice with statewide assessment of education. Alexandria, VA: Author. http://www.humrro.org/corpsite/page/linking-teacher-practice-statewide-assessment-education

[xiv] Doorey & Polikoff, p. 23.

[xv] MCAS bests PARCC according to several criteria specific to the Commonwealth, such as the requirements under the current Massachusetts Education Reform Act (MERA) as a grade 10 high school exit exam, that tests students in several subject fields (and not just ELA and math), and provides specific and timely instructional feedback.

[xvi] McQuillan, M., Phelps, R.P., & Stotsky, S. (2015, October). How PARCC’s false rigor stunts the academic growth of all students. Boston: Pioneer Institute. https://pioneerinstitute.org/news/testing-the-tests-why-mcas-is-better-than-parcc/

[xvii] It is perhaps the most enlightening paradox that, among Common Core proponents’ profuse expulsion of superlative adjectives and adverbs advertising their “innovative”, “next generation” research results, the words “deeper” and “higher” mean the same thing.

[xviii] The document asserts, “The Common Core State Standards identify a number of areas of knowledge and skills that are clearly so critical for college and career readiness that they should be targeted for inclusion in new assessment systems.” Linda Darling-Hammond, Joan Herman, James Pellegrino, Jamal Abedi, J. Lawrence Aber, Eva Baker, Randy Bennett, Edmund Gordon, Edward Haertel, Kenji Hakuta, Andrew Ho, Robert Lee Linn, P. David Pearson, James Popham, Lauren Resnick, Alan H. Schoenfeld, Richard Shavelson, Lorrie A. Shepard, Lee Shulman, and Claude M. Steele. (2013). Criteria for high-quality assessment. Stanford, CA: Stanford Center for Opportunity Policy in Education; Center for Research on Student Standards and Testing, University of California at Los Angeles; and Learning Sciences Research Institute, University of Illinois at Chicago, p. 7. https://edpolicy.stanford.edu/publications/pubs/847

[xix] McQuillan, Phelps, & Stotsky, p. 46.

[xxi] Linda Darling-Hammond, et al., pp. 16-18. https://edpolicy.stanford.edu/publications/pubs/847