In collaboration with Prof. Neil Heffernan and Prof. Stacy Shaw at Worcester Polytechnic Institute and Cristina Heffernan at the ASSISTments Foundation. See project website for details.
In collaboration with Prof. Ryan Baker at University of Pennsylvania and Prof. Neil Heffernan at Worcester Polytechnic Institute. See project website for details.
More Recent Work
Fine-tuning Language Models to Generate Math Word Problems

We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. We develop a novel MWP generation approach that leverages i) pre-trained language models and a context keyword selection model to improve the language quality of the generated MWPs and ii) an equation consistency constraint for math equations to improve the mathematical validity of the generated MWPs. The paper can be found here:
Data-driven Computerized Adaptive Testing

Computerized adaptive testing (CAT) methods adaptively select the next most informative question/item for each student given their responses to previous questions, effectively reducing test length. Existing CAT methods use item response theory (IRT) models that are not highly predictive of performance and static question selection algorithms that cannot improve by learning from large-scale data. We propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data. BOBCAT is agnostic to the underlying student response model and outperforms existing CAT methods (sometimes significantly) at reducing test length. The paper can be found here:
Representing Math Operations to Scale up Error Feedback

Feedback on student answers and even during intermediate steps in their solutions to open-ended questions is an important element in math education. Most existing approaches for automated student solution analysis and feedback are not scalable since they require manually constructing cognitive models and anticipating student errors for each question. Leveraging a recent math expression encoding method, we represent each math operation applied in solution steps as a transition in the math embedding vector space. We can learn implicit and explicit representations of math operations and use them to i) identify math operations a student intends to perform in each solution step, regardless of whether they did it correctly or not, and ii) select the appropriate feedback type for incorrect steps. The paper can be found here:
Meaningful Knowledge Tracing: Option Tracing and Attentive Knowledge Tracing


Knowledge tracing refers to a family of methods that estimate each student’s knowledge component/skill mastery level from their past responses to questions. One key limitation of most existing knowledge tracing methods is that they can only estimate an overall knowledge level of a student per knowledge component/skill since they analyze only the (usually binary-valued) correctness of student responses. Therefore, it is hard to use them to diagnose specific student errors. We extend existing knowledge tracing methods beyond correctness prediction to the task of predicting the exact option students select in multiple choice questions. We evaluate their ability in identifying common student errors in the form of clusters of incorrect options across different questions that correspond to the same error. We have also developed an interpretable, attention-based knowledge tracing method that was the state-of-the-art at the time. The papers can be found here:
Student Affect Detection and Intervention with Teachers in the Loop

"Sensor-free" detectors of student affect that use only student activity data and no physical or physiological sensors are cost-effective and have potential to be applied at large scale in real classrooms. These detectors are trained using student affect labels collected from human observers as they observe students learn within intelligent tutoring systems (ITSs) in real classrooms. We investigate whether active (machine) learning methods can improve the efficiency of the affect label collection process. We propose a new method that is ideally suited for the problem setting in affect detection, which outperforms other active learning methods on a real-world student affect dataset. We have also studied how using past data from a different student population affects the performance of active learning algorithms. The papers can be found here:
Career Path Modeling and Recommendation

The development of new technologies at an unprecedented rate is rapidly changing the landscape of the labor market. Therefore, for workers who want to build a successful career, acquiring new skills required by new jobs through lifelong learning is crucial. We propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals. Our model is interpretable and can be used for important tasks including skill gap identification and career path planning. It can provide i) actionable feedback to users and guide them through their upskilling and reskilling processes and ii) recommendations of feasible paths for users to reach their career goals. The paper can be found here:
Previous work
Learning and Content Analytics
SPARse factor analysis for learning and content analytics (SPARFA)
SPARFA is a purely data-driven framework for learning and content analytics. Under the observation that there are only a small number of latent factors (which we term ''concepts'') that control students' performance, SPARFA analyzes binary-valued (correct/incorrect) graded student responses to assessment questions, and jointly estimates i) question-concept associations, ii) student concept knowledge, and iii) question intrinsic difficulties. SPARFA performs learning analytics by providing personalized feedback to the students on their knowledge level on each concept, and performs content analytics by analyzing how every question is related to each concept and how difficult it is. The original SPARFA paper can be found here:
An extension to analyze ordinal responses (partial credits) can be found here:
An extension that jointly analyzes graded response data and question text to interpret the meaning of the latent concepts can be found here:
An extension that performs time-varying learning analytics by tracing students' knowledge evolution through time and also improves content analytics by analyzing the content and quality of learning resources (e.g., textbooks, lecture videos, etc.) can be found here:
Non-linear student-response models: Dealbreaker and BLAh
Most existing student-response models are linear and additive, which achieve good prediction performance but admits limited interpretability. We develop two non-linear student-response models, the Dealbreaker model, which models a students' chance in answering a question correctly as only dependent on their minimum concept knowledge among the concepts the question covers, and the Boolean logic analysis (BLAh) model, which models binary-valued graded student responses as outputs of Boolean logic functions.

Traditional compensatory student-response models, including SPARFA, characterizes a student's success probability when answering a question as dependent on a linear combination of their knowledge on different concepts. Such linear models can be used to predict unobserved responses, but offer limited interpretability since they allow students to make up for their lack of knowledge on certain concepts with high knowledge on other concepts. On the contrary, the Dealbreaker model is a non-linear model that characterizes a student's success probability on a question as only dependent on their weakest knowledge among all concepts tested in the question. The Dealbreaker paper can be found here:

The BLAh model goes beyond the "AND" family of models the Dealbreaker model belongs, and characterizes the graded response of a student on a question as the output of the Boolean logic function corresponding to the question, therefore being much more flexible and interpretable than the the Dealbreaker model. The BLAh paper can be found here:
Automatic question generation: QG-Net

The ever growing amount of educational content renders it increasingly difficult to manually generate sufficient practice or quiz questions to accompany it. We propose QG-Net, a recurrent neural network-based model specifically designed for automatically generating quiz questions from educational content such as textbooks. QG-Net outperforms state-of-the-art neural network-based and rules-based systems for question generation, both when evaluated using standard benchmark datasets and when using human evaluators. The paper can be found here:
Grading and Feedback
Mathematical language processing (MLP)
MLP is a framework for analyzing students' responses to open-response mathematical questions for grading and feedback. We featurize and cluster students' responses to open-ended mathematical questions, e.g., freelancing derivations that are common in science, technology, engineering and mathematics (STEM) fields. Then, we perform automatic grading and feedback using a small number of instructor-graded responses. The MLP paper can be found here:
Misconception detection

We developed a new natural language processing-based framework to detect the common misconceptions among students' textual responses to short-answer questions. Our framework excels at classifying whether a response exhibits one or more misconceptions. More importantly, it can also automatically detect the common misconceptions exhibited across responses from multiple students to multiple questions; this property is especially important at large scale, since instructors will no longer need to manually specify all possible misconceptions that students might exhibit. The paper can be found here:
Personalization
Personalized learning action selection
We study the problem of turning the insights gained from learning and content analytics into personalization -- providing personalized recommendations for each student on what learning actions (read a section of a textbook, watch a lecture video, work on a practice question, etc.) the should take. We make use of the contextual bandits framework; the papers can be found here:
An extension on taking uncertain context into account can be found here:
Safe personalization

We demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum a-posteriori estimation. More importantly, we derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators. Applying our linearization technique to IRT models (the Rasch model, in particular) yields much tighter bounds on learner and question parameter estimates, especially when the numbers of learners and questions are small. Therefore, our analysis has the potential to improve the safety of personalization. The papers can be found here:
Behavior Analysis
Measuring engagement from clickstream data
We propose a new model for learning that relates video-watching behavior to engagement level. One of the advantages of our method for determining engagement is that it can be done entirely within standard online learning platforms, serving as a more universal and less invasive alternative to existing measures of engagement that require the use of external devices. We also find that our model identifies key behavioral features (e.g., larger numbers of pauses and rewinds, and smaller numbers of fast forwards) that are correlated with higher learner engagement. The paper can be found here:
Instructor preference analysis


We propose a latent factor model that analyzes instructors' preferences in explicitly excluding particular questions from learners' assignments in a particular subject domain. We incorporate expert-labeled Bloom's Taxonomy tags on each question as a factor in our statistical model to improve model interpretability. Our model provides meaningful interpretations that help us understand why instructors exclude certain questions, thus helping automated learning systems to behave more "instructor-like". The paper can be found here:
Prerequisite structure extraction from user clickstreams

Existing approaches to automatically inferring prerequisite dependencies rely on analysis of either content (e.g., topic modeling of text) or performance (e.g., quiz results tied to content) data, they are not feasible in cases where courses have no assessments or only short content pieces (e.g., short video segments). We propose an algorithm that extracts prerequisite information using learner behavioral data instead, and apply it to an online short course. Our algorithm excels at both predicting learner behavior and revealing fine-granular insights into prerequisite dependencies between content segments, with validation provided by a course administrator. The paper can be found here:
Personalized thread recommendation in MOOCs

We propose a probabilistic model for the process of learners posting on such forums, using point processes. Different from existing works, our method integrates topic modeling of the post text, timescale modeling of the decay in post activity over time, and learner topic interest modeling into a single model, and infers this information from user data. Our method also varies the excitation levels induced by posts according to the thread structure, to reflect typical notification settings in discussion forums. We experimentally validate the proposed model on three real-world MOOC datasets, with the largest one containing up to 6,000 learners making 40,000 posts in 5,000 threads. Results show that our model excels at thread recommendation, achieving significant improvement over a number of baselines, thus showing promise of being able to direct learners to threads that they are interested in more efficiently. The paper can be found here:
Non-educational Applications
I also collaborate with other researchers on some non-educational applications.
Learning robust binary hash functions

We propose a new data-dependent method to learn binary hash functions. Inspired by recent progress in robust optimization, we develop a novel hashing algorithm, dubbed RHash, that minimizes the worst-case distortion among pairs of points in a dataset. We show that RHash achieves the same retrieval performance as the state-of-the-art algorithms in terms of average precision while using up to 60% fewer bits, using several large-scale real-world image datasets. The paper can be found here:
Sensor selection for biosensing and structural health monitoring

We develop a new sensor selection framework for sparse signals that finds a small subset of sensors (less than the signal dimension) that best recovers such signals. Our proposed algorithm, Insense, minimizes a coherence-based cost function that is adapted from classical results in sparse recovery theory. Using a range of datasets, including two real-world datasets from microbial diagnostics and structural health monitoring, we demonstrate that Insense significantly outperforms conventional algorithms when the signal is sparse. The paper can be found here:
Cloud dynamics and bidding strategy

We propose a nonlinear dynamical system model for the time-evolution of the spot price as a function of latent states that characterize user demand in the spot and on-demand markets. This model enables us to adaptively predict future spot prices given past spot price observations, allowing us to derive user bidding strategies for heterogeneous cloud resources that minimize the cost to complete a job with negligible probability of interruption. The paper can be found here:
Phase retrieval

We show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. We demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and real-world data. The paper can be found here:
A method relying on a novel linear spectral estimator (LSPE) to obtain accurate initialization for phase retrieval: