New article on combining machine learning and qualitative methods to study students’ involvement in science practices

Joshua Rosenberg


My colleague Christina Krist and I recently published an article on our efforts to combine machine learning and qualitative methods to study students’ involvement in science practices. This work emerged from work that we began wayyy back 2015 (!!!), when we were both graduate students. At a meeting for the Science Practices project for which we were graduate students, Christina presented the results of a text-based automated analysis of written responses from students. I was interested, and over the next few years we learned about machine learning, coded more responses, published a proceedings paper for the International Conference of the Learning Sciences, and chipped away at the work that (finally) led to this article.

We also received a few grants (one from the National Science Foundation, and one through the TIER-ED initiative at the University of Illinois) that emerged to a lesser and greater extent, respectively, from our collaboration on this. So I guess my takeaway is the benefit of sticking with ideas you think are promising, even if it takes year to see them to fruition.

Here’s the abstract for the paper:

Assessing students’ participation in science practices presents several challenges, especially when aiming to differentiate meaningful (vs. rote) forms of participation. In this study, we sought to use machine learning (ML) for a novel purpose in science assessment: developing a construct map for students’ consideration of generality, a key epistemic understanding that undergirds meaningful participation in knowledge-building practices. We report on our efforts to assess the nature of 845 students’ ideas about the generality of their model-based explanations through the combination of an embedded written assessment and a novel data analytic approach that combines unsupervised and supervised machine learning methods and human-driven, interpretive coding. We demonstrate how unsupervised machine learning methods, when coupled with qualitative, interpretive coding, were used to revise our construct map for generality in a way that allowed for a more nuanced evaluation that was closely tied to empirical patterns in the data. We also explored the application of the construct map as a framework for coding used as a part of supervised machine learning methods, finding that it demonstrates some viability for use in future analyses. We discuss implications for the assessment of students’ meaningful participation in science practices in terms of their considerations of generality, the role of unsupervised methods in science assessment, and combining machine learning and human-driven approach for understanding students’ complex involvement in science practices.

Thank you Xiaoming Zhai for editing the special issue on machine learning in science education assessment. Thank you Christina Schwarz for supporting me to be involved in this project.

A “green” (always available) open-access version of the publication is here:

A version from the publisher which may be openly accessed only 50 times is available here: