New article on exploring the validity of assessment items for the NGSS using machine learning

Joshua Rosenberg


Daniel Anderson, Brock Rowley, Sondra Stegenga, and Shawn Irvin, and I published a paper on the role of a text-based machine learning technique - topic models - can be used to provide content validity evidence for Next Generation Science Standards (NGSS)-related assessment items.

While I have some experience with topic modeling, as well as experience with NGSS-related assessments, the measurement focus of this work was new to me, and so was a great opportunity for me to learn more; Dan and his colleagues conceptualized and carried out this work in an impressive way, and I was happy to contribute to it.

The paper is in Educational Measurement: Issues and Practice and is available as a pre-print from here. The pre-print is a part of a repository with all of the code used to carry out the analysis (and to write the paper, to boot):

Here is the abstract for the paper:

Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high‐stakes testing and accountability frameworks, content‐related validity evidence is typically gathered via alignment studies, with panels of experts providing qualitative judgments on the degree to which test items align with the representative content standards. Various summary statistics are then calculated (e.g., categorical concurrence, balance of representation) to aid in decision‐making. In this paper, we propose an alternative approach for gathering content‐related validity evidence that capitalizes on the overlap in vocabulary used in test items and the corresponding content standards, which we define as textual congruence. We use a text‐based, machine learning model, specifically topic modeling, to identify clusters of related content within the standards. This model then serves as the basis from which items are evaluated. We illustrate our method by building a model from the Next Generation Science Standards, with textual congruence evaluated against items within the Oregon statewide alternate assessment. We discuss the utility of this approach as a source of triangulating and diagnostic information and show how visualizations can be used to evaluate the overall coverage of the content standards across the test items.

In addition to this project, I’m involved with related work in which machine learning is also used - in the case of this project (with Christina Krist, a very early version which was published in the proceedings for the 2016 International Conference of the Learning Sciences here), we use a text-based, machine learning approach to inductively “flesh out” a construct for which existing coding frames did not (yet) exist. I hope to share more about this work soon.

Thanks to Dan and colleagues for the fun and productive collaboration.

The URL for the paper on the journal’s site is here: