Sapling Learning’s content is created by a team of educators and subject-matter experts. A typical question authored by Sapling Learning has passed through the hands of five people before it is placed in assignments and made available to professors and students. At least three of those individuals have Masters’ degrees or PhDs in the question’s subject area. We take care to ensure that questions placed in assignments and taken by students are of the highest quality, but the process of quality assurance does not end once an item has made it into active student assignments.
Periodic Review Process
Each year, the Content Team at Sapling Learning goes through a process we call periodic review. During periodic review, we gather, assess, and improve our most used questions, as well as questions not performing how we expected.
Every time a student makes an attempt on a question, that data is stored. For example, if a student makes three attempts on a homework question, gets generic (default) feedback for one attempt, specific feedback on the second attempt, and then gets the question correct on the third attempt, we don’t just record that the student got the answer correct; all of the tries are preserved.
During periodic review, we collect and organize all student data for questions we’ve authored. First, the Director of Innovation, Jon Harmon, compiles the summative question data, such as the number of students who got the question correct, the number of attempts needed to get the question correct, and the number of attempts students made before giving up on the question. The process of compiling the data takes multiple days on a very large machine. Five separate processors (the equivalent of five computers) are necessary. Jon monitors the process regularly and the machine produces a report every 100 items to ensure the compilation goes smoothly.
Once Jon has compiled the data, he passes it to the Director of Content, Clairissa Simmons, in a large Excel file. Clairissa makes sure the data is well defined, consistent with previous years, includes any data we may need in the future, and available for all disciplines within the Content Team. Next, Clairissa puts all the data into graphing and data organization software called Zoho Reports, which generates a standard set of graphs to compare across disciplines. From there, the subject-matter experts in the Content Team further refine and analyze the data to determine which questions need evaluation and improvement.
This graph plots the questions with the highest average attempts made by students so our content experts can identify which questions students struggle to answer, even with our feedback. In this example, there is one question that has a much higher chance of a student giving up.
Organizing and Prioritizing Data
Each of the subjects have a slightly different approach to analyzing the data because of the disciplines’ specific needs.
The members of the Chemistry team divided their questions into those most taken from the subject’s taxonomy, the questions from the subject’s taxonomy with the lowest scores overall, and the most used questions from other taxonomies. A minimum of 25 questions were updated for each subject based on that division: 10 of the top used, 10 of the lowest scores, and 5 from the most used from other disciplines.
The two Biology subjects, Genetics and Introductory Biology, are newer to the market and thus organized their reviews slightly differently. Because this was the first year of data to analyze, it was most important to fix outlier questions. They organized their questions into those most given up on, most taken, and those with the most attempts. From there, any questions that were in more than one category were prioritized for review.
For economics, the team wanted to know if their problem questions were in need of update or in need of removal from one or more templates. For a given question, they compared the data for the question in cases where it was written for the text to cases where the question was not written for the text (that is, the question was originally written for one textbook but was added to a template for another textbook). This comparison was done for average attempts, average score, the average number of attempts when the answer is correct, and the average score when the answer is correct. Questions with poor ratios were then reviewed to determine if they were in need of improvement, or if they just needed to be taken out of some templates.
For physics (including conceptual, algebra-based, and calculus-based) and astronomy questions, the team chose 1–2 questions with the lowest scores, highest number of attempts, and the highest percentage of giving up; 1–2 of the most-used questions; and 1–2 of the least-used questions to evaluate for suitability. Chemical engineering did not receive periodic review this year.
For all disciplines, much of the updating was done through deprecate and replace, that is, authoring a fresh question and replacing the old item with the new one in assignments. To keep questions between students fair, the change will be made prior to the start of the next semester if some students in that assignment had already seen the older question. This separation of questions helps confirm that the updates we make are helpful and improve the student experience because we can compare the data for the old question with the data for the new one next year.
Additionally, all teams must consider that other disciplines use their questions, which is why most of the teams looked at the top used questions from other disciplines. For example, an introductory chemistry course likely uses many questions from our general chemistry taxonomy. The student experience for those general chemistry questions could differ very much for students in the introductory chemistry course compared to students in a general chemistry course, so it’s important to check that we’ve appropriately placed questions from outside the discipline.
Findings and Results
Data obtained in periodic review can indicate if we need to fix an item, and if so, it can sometimes indicate what we need to fix. For example, in economics periodic review, several questions had good scores for students using the textbook for which the questions were written, whereas students using other textbooks did poorly. The fix for an issue like that is to remove the question from templates for texts we determine are a bad fit with the question.
For other questions, periodic review identifies an issue, but we need to look into the question to determine potential causes.
The following intro bio question had a high number of attempts per student and a high number of students who gave up on the question.
To improve the question, the biology team rewrote the instructions in the question stem to give more explicit instructions. They also changed the difficulty of the item from medium to hard to better reflect how challenging the question is. Finally, they revised the solution to be more approachable and easier to read by removing excess content and bolding key terms.
Introductory Chemistry Example
Sometimes the best fix for an item is to change the module type. In an intro chem item, the original question was multiple select, in which all the choices were correct.
The chemistry team authored the question as a multiple choice question in which one of the answers is “all of the above”-style. This new setup is less tricky for students because it allows them to focus on content of the question. We also think it is less likely that students will second guess the “everything” response for multiple choice, since it is a specific choice, as opposed to an action of checking all checkboxes, which is not commonly correct in multiple select questions.
Again in this case, the solution was improved. This time, however, the solution was expanded.
Finally, questions that are the top used are looked over even if their stats indicate it is a good question. These items often receive improved feedback, revised solutions, and updated art to increase the experience of the tens of thousands of students seeing it each semester. For example, the following physics question was updated throughout.
The question stem was simplified and the labels in the figure were made more clear.
Improvement does not end with just question content. We have always been careful to store more data than we knew what to do with in the moment. We can use previous years’ data as a benchmark when we find new ways to compile and analyze data.
This year, we refined the process for pulling the data so it can be more efficient. We also have more characterization of feedback this year, such as how often the default tab gets triggers per question is new (this helps us to identify items that need more specific feedback).
Eventually, we plan to allow the script to run continuously so we’ll have the most up-to-date information possible about the quality of our question libraries. For example, we might run periodic review on each new question after enough students to be statistically significant have tried it, instead of waiting until the end of the academic year. As we improve the periodic review process, we have even better tools to keep improving the content we provide for professors and students.