This page provides a brief overview of a suite of automated assessment and formative feedback tools designed for virtual environment-based curricula. It is essentially an executive summary of my dissertation. The LENS suite of tools was designed to be embedded in EcoXPT, a curriculum and virtual environment developed by the EcoLEARN group at the Harvard Graduate School of Education.
Guided authentic scientific inquiry activities can give students a clear picture of the nature of science and how its fields operate in practice, but such activities are difficult to do well. Too much structure negates the benefit of open-ended activities and student-led investigation, while too little can result in frustration or unproductive floundering. Scaffolds in such activities are essential for their successful implementation, and virtual environments can feature built-in scaffolds such as feedback that reacts dynamically to the actions of learners in real time. The open-ended nature of inquiry-based activities makes the assessment of student actions and generation of relevant feedback difficult, but cutting-edge techniques at the forefront of the educational data mining field are being developed for analyzing student actions in such settings.
To infuse dynamic scaffolds and new types of supports into the virtual world, five student-facing features were added to the software portion of the EcoXPT unit. In addition, a daily teacher report was designed in consultation with teachers from previous classroom studies of EcoXPT. These new features range across several types of feedback (model-based, metacognitive, direct guidance) and delivery methods (automatic or upon request). While all features are available to all students in the new sample, analyses can reveal which are most used and which types teachers and students report to be the most helpful.
Two NPCs were modified to act as pedagogical agents that offer advice and feedback based on group progress through the curriculum and demonstrated understanding of the complex causal relationships at the pond.
Ranger Susan provides students with information about the pond from the perspective of a park ranger. In addition to her static information on the different virtual days of the activity, in the LENS modification to EcoXPT she delivers general advice to groups on what to do on a given day based on what tools are unlocked and what they have done thus far. This type of direct guidance is available upon request and is intended to echo instruction delivered by teachers in EcoXPT. The intention of this tool is to reduce the load on the instructor in the classroom, thereby freeing them to focus on groups who may be struggling with deeper issues than what to do next. What advice is delivered depends on what tools are currently unlocked (which is linked to specific days of the curriculum), as well as what logged events are present for different groups.
The other NPC turned pedagogical agent is Dr. Jabir Hatami, an ecosystem scientist stationed at the lab building that houses the experimental tools. His previous role was to guide groups to the correct experimental tool to answer different types of questions. This is an example of the static scaffolding present in EcoXPT. For his expanded role in the modification for LENS, Dr. Hatami is now able to look over group concept maps and provide feedback based on the current state of the concept map. He evaluates concept maps in progress and identifies common errors without specific “next step” advice. The algorithm that selects advice looks for if nodes are present without connections, if known incorrect claims are present, if “Does Not Affect” claims are being made, whether evidence and reasoning are present, and if factors are connected to multiple other factors. This type of model-based, upon request feedback again echoes teacher instruction on causality and how to use the concept mapping tool.
Experiment Tool Supports
Two additional supports were added to the comparison tank tool, which allows students to set up experiments to explore what effect different features of the pond ecosystem have on measureable factors. The first support is a worked example tutorial for the tool that shows students how to use the user interface as well as how to frame a question, to set up an experiment to answer questions, and to save the results. In EcoXPT, all tutorials for the experimental tools take the form of multiple screenshots of the tool with a large volume of accompanying text. Analyses of log files from historical EcoXPT data shows that groups look at tutorial messages for an average of 32.5 seconds, which may be insufficient time to read and deeply process the information presented. By automatically providing this direct guidance to all groups upon encountering the tool, groups are introduced to all aspects of the tool that are necessary to use it effectively. In order, the tool shows students how to fill the tanks with water, how to add factors to the water to address an example question, how to measure an attribute of the tanks, and how to save their data to their notebook.
The other modification to the comparison tank tool is less structured and requires students to use help-seeking behaviors to request metacognitive support. In lieu of repeating the tutorial if having difficulty with the comparison tank tool, students can click a new help button that frames student questions in terms of causality to highlight the importance of experimental evidence to establish causal relationships. When a student asks for help, a new window appears where students can frame their investigations by selecting what effect adding a factor to the pond water might have on a measureable quantity. The advice echoes classroom instruction on experimentation strategy, such as varying one thing at a time and writing explanations for saved experimental results.
The final student-facing support added to the EcoXPT virtual world is a new type of notebook entry specifically to record reflections electronically within the virtual world. Aligned with self-explanation and reflection literature, this support provides an automatic, meta-cognitive chance for groups to pause their investigations and think deeply about the nature of the problem they are investigating, how their investigations have gone thus far, and what unanswered questions they have. The standard EcoXPT curriculum provides similar reflection questions as exit ticket activities or optional homework activities. These reflections have not previously been separately analyzed by researchers. In the LENS modifications, additional opportunities for reflection were created for each day of the curriculum, and time in the lesson plans was included at the end of each lesson.
A daily report for teachers visualizes how groups spent their time over the course of an entire period. While not useful for detecting off-task behavior in real time, this visualization can aid teachers in planning how to allocate their time among groups during the next session. Additionally, a series of actionable observations from the groups’ log files are provided beneath the visualization to address certain issues and outliers that can be automatically detected. Between two to four pieces of advice are provided to teachers based on the contents of their students’ notebooks and their current concept maps. Advice is contextualized to the day of the curriculum and is generally intended to highlight features that have been missed by most of the class or groups that have not yet completed tasks that are typical by a certain day. Advice centers around common errors in concept mapping, underutilization of certain tools, and progress relative to others in the class.
Data for this study were collected in the Fall of 2019 from 595 7th-grade students across seven teachers and two schools in a suburban school district in New England. After filtering for signed permission slips, 587 students working in 304 groups remained in the sample. Average middle school class size in the district is 21.7 pupils, with 15% of students eligible for free and reduced-price lunch, 18% of students receiving special education services, and 7% of students classified as English language learners. The student population of the district is 62% white, 19% Asian, 8% Hispanic or Latino, and 5% African American. Historical data used in this dissertation came from two prior implementations of EcoXPT carried out in the Spring semester of 2018.
Intended to closely match past iterations of the EcoXPT curriculum to facilitate comparisons, classes progressed through the 13-day curriculum after taking a pre-intervention survey. Lesson plans and accompanying presentations are provided for each day of the curriculum along with short videos and handouts that frame groups’ investigations of the virtual ecosystem. After the completion of the 13 lessons, teachers distribute individual links to the post-intervention survey. Soon thereafter, a member of the research team interviews each teacher.
Data Sources and Measures
Three types of quantitative data were analyzed for this study (individual pre-post surveys, log files from groups of 2-3 students using the software, and group concept maps), and qualitative interviews were conducted with all teachers after completion of the curriculum.
Individual students took a pre- and post-assessment designed for EcoXPT. Assembled from a mixture of pre-validated instruments from science education literature, the survey contained six constructs: affective dimensions, ecosystem science content, understanding of causality, correlation versus causation, use of experimental methods, and epistemology of science. The assessment was comprised of a variety of multiple choice and Likert items designed and tested by educational researchers, psychometricians, and an ecosystem scientist.
During a group’s use of EcoXPT, 68 unique logged events can be recorded by the software and are saved during every session in a PostgreSQL database on a server located on the Harvard University campus. Each row represents one event logged by one user in the virtual world and is tagged with its anonymous identifier, what anonymized class it is a member of, the type of event, any relevant details (e.g., settings for an experiment or what dialog option was selected), and a timestamp. To reduce the grain size of the data for several analyses, events were binned into five different meta-categories that encompassed the most common problem-solving behaviors seen in the world: exploring, collecting data, analyzing data, experimenting, and hypothesizing.
Concept maps are saved as part of a large JSON file that serves as the saved state of the software for each group. The concept map portion saves what connections are present, what evidence is linked to each claim, and what reasoning is provided. As part of the work done with EcoXPT concept mapping, a series of 16 claims were selected as core aspects of the fish kill scenario. In addition to the presence of evidence, the source of the evidence can be automatically evaluated. Sources of evidence for the concept map include experimental tools, observations, graphed data, the field guide, and NPC testimonial. A rubric was designed to assess the overall completeness and quality of the causal concept maps.
Teachers were interviewed in a semi-structured fashion in focus groups at each school location. Questions focused on efficacy of the new tools, what “stuckness” looks like while using the software, how teachers intervene when they detect unproductive struggle, and what could be added or modified to ease this process. Interviews took one 45-minute period as soon after the conclusion of the curriculum as possible. Interviews were audio recorded with participant permission and transcribed. Thematic analysis was used to assign preliminary codes to participant responses from which central themes in all interviews were identified. These themes served as triangulation for the efficacy of the student-facing features and as a primary data source for assessing the utility of the teacher-facing report.
Differences in Student Outcomes
Students in both the EcoXPT dataset and the LENS sample showed significant gains on all six survey constructs. Students in the LENS sample gained significantly more on the epistemology construct, and teacher assignment had no significant impact on those gains. Epistemology gains were sensitive to student reading level, engagement, and IEP/504 status when controlling for all other student-level covariates.
On overall concept maps cores, LENS classes had significantly higher concept map scores than EcoXPT classes. Within each subcategory, LENS classes had higher average claim scores and evidence scores than baseline EcoXPT classes, although average reasoning scores did not differ significantly between conditions. Both evidence scores and total scores were positively correlated with average group normalized learning gain on the causality construct of the survey.
To explore how groups moved through the virtual world, Markov models were used to calculate transition matrices for groups to see if the increased prevalence of certain transitions was indicative of better understanding or a superior investigatory strategy. Unsupervised learning was first used to identify different types of behaviors exhibited by groups in their logged actions. Next, transition matrices from groups in different quartiles of learning gains and concept map quality were compared to detect any differences in behavior between those groups.
Cluster analysis via the k-means algorithim identified two different types of groups from their transition matrices. The largest differences are that groups in cluster 0 will go from hypothesizing to seeking feedback (while groups in cluster 1 never do) and are much more likely to go from hypothesizing to experimenting and are not as prone to go from exploring to hypothesizing. Despite these differences, cluster membership is not significantly associated with normalized learning gain on any construct nor is it associated with concept map quality.
To gain detail on what specific transitions might be more characteristic of a group that is performing well or poorly, sequences of actions for groups in the lowest and highest concept map quality quartiles were extracted and examined. The most frequent 3-grams (sequences of three logged actions) were extracted from group log file activity. The high quartile’s high incidence of ('experimenting', 'analyzing', 'analyzing') may indicate attempts to rectify experimental evidence with correlational evidence observed in plotted weather and water quality data. Likewise, the ('analyzing', 'analyzing', 'experimenting') 3-gram that occurs almost as often shows that, while groups are still in the experimental trailer (since an ‘explore’ event would mark their movement back to the outdoor space), they observed the correlations between their different data and then attempted to test hypotheses or answer questions these data raise. In the lowest quartile, the most common 3-gram involving experimentation is ('exploring', 'exploring', 'experimenting'), meaning a significant amount of time was spent solely moving through the world before running an experiment. While it is possible to experimentally test relationships that are observed while exploring the world, exploration activities are typically logged less frequently by days 6, 7, and 8 of the curriculum. Behaviors such as these may indicate a lack of focus to investigations.
Several 3-grams including "feedback" events in the virtual world were correlated with dependent variables. ('collecting', 'feedback', 'feedback') and ('experimenting', 'experimenting', 'feedback') were both positively correlated with concept map quality, while ('collecting', 'feedback', 'hypothesizing') and ('exploring', 'experimenting', 'feedback') were negatively correlated. The intention of these feedback tools was that they would be used more often by struggling learners, thus it is not surprising that some feedback uses are more common among groups with weaker concept maps. ('exploring', 'experimenting', 'feedback') is negatively associated with content gains and ('feedback', 'hypothesizing', 'collecting') is negatively associated with causality gains.
In an attempt to predict the quality of student concept maps from log file data, support vector machine (SVM), Random Forest (RF), and long short-term memory (LSTM)-based models were fit on group log data. With the addition of the feedback features generating new logged events, all models are able to make more accurate predictions. The accuracy of all models for the LENS condition classes are similar, although the SVM model slightly outperforms the LSTM model. The SVM model also has the highest recall (avoiding false negatives), while the RF model has a superior precision (avoiding false positives). Taken together as an F1 score (the harmonic mean of precision and recall), all classifiers are again similar, but the RF model is slightly more optimal.
Use of New Features
The two pedagogical agents were used differently over the course of the curriculum, with Ranger Susan being used heavily on Day 2, then not being consulted as frequently while Jabir Hatami was used more consistently after the experimental tools were unlocked. Classes in School 2 appear to have used the pedagogical agents much more than their peers in School 1.
All groups made use of the comparison tank tools and thus completed the interactive tutorial at least once. Groups in School #2 were the only ones to repeat the tutorial, with groups in school #1 utilizing it the one mandatory time. The optional comparison tank help that groups could activate was used sparingly, with an average of 0.51 questions framed by the tool per group (SD = 2.38). While many elected not to use the tool, the 27 groups that did make use of it posed an average of 3.78 questions with a maximum of 28 uses observed in the log files for one group. In keeping with observed trends, groups in School #2 were more likely to utilize this framing feature during their use of the comparison tank tool.
Group save states show a varying amount of reflection notes between groups and between teachers throughout the curriculum. A total of 587 reflection notes were saved by groups during the curriculum, with a minimum of 0 and a maximum of 21 notes. As there were only 11 prompts for reflection provided, it is likely that some groups inadvertently saved observational data as reflection notes or saved multiple notes per reflection in lieu of editing one note per prompt. 162 groups made at least one reflection note (53% of participating groups), resulting in an overall mean of 2.13 notes per group and a mean of 3.62 notes per group for groups who completed at least one reflection. When examined by teacher, four teachers out of 7 appear to have used the reflection tool more consistently than the remaining three.
Qualitative findings from teacher and student feedback on the new features are analyzed in depth in the main dissertation.
In general, the LENS suite of feedback tools helped students learn about epistemology and the nature of science, increased the number of claims groups made, increased the relevance of the evidence they used, and increased the accuracy of classifiers designed to predict student success by 37%. Student opinions on the tools suggest that some view them as helpful while others view them as insufficient as they still crave direct instruction and are not used to ASI activities. Teachers used these tools in varying ways in their classrooms and provided very positive feedback on the visualization and daily report.
- Teachers found the daily reports helpful, but teacher variation in implementation led to them being used in unexpected ways.
- Variable teacher fidelity of implementation will require proper documentation and guidance for using formative feedback features.
- Higher epistemology gains in the LENS group may be due to some features more explicitly modeling aspects of the scientific method.
- Tools offered in the LENS suite do make a difference in how well students are able to model the particular complex causal relationships of the pond ecosystem.
- The balance between student and teacher desires for getting the "right" answer and presenting an authentic scientific inquiry task without one "right" answer requires a careful framing of any tool that might short circuit the natural flow of inquiry.
- Several sequences of logged events may indicate lack of focus or suboptimal uses of group time in the virtual world, but log file data is generally insufficient to explain exactly what each sequence signified for each group or to deterministically label each instance of these sequences as a negative.
- The newly added hint and feedback tools resulted in a much greater ability to predict concept map quality than previously possible.