In collaboration with Prof. Neil Heffernan and Prof. Stacy Shaw at Worcester Polytechnic Institute and Cristina Heffernan at the ASSISTments Foundation. See project website for details.

In collaboration with Prof. Ryan Baker at University of Pennsylvania and Prof. Neil Heffernan at Worcester Polytechnic Institute. See project website for details.

# More Recent Work

## Fine-tuning Language Models to Generate Math Word Problems

We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. We develop a novel MWP generation approach that leverages i) pre-trained language models and a context keyword selection model to improve the language quality of the generated MWPs and ii) an equation consistency constraint for math equations to improve the mathematical validity of the generated MWPs. The paper can be found here:

*"Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints,"*Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2021

## Data-driven Computerized Adaptive Testing

Computerized adaptive testing (CAT) methods adaptively select the next most informative question/item for each student given their responses to previous questions, effectively reducing test length. Existing CAT methods use item response theory (IRT) models that are not highly predictive of performance and static question selection algorithms that cannot improve by learning from large-scale data. We propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data. BOBCAT is agnostic to the underlying student response model and outperforms existing CAT methods (sometimes significantly) at reducing test length. The paper can be found here:

*"BOBCAT: Bi-level Optimization-Based Computerized Adaptive Testing,"*International Joint Conference on Artificial Intelligence (IJCAI), Aug. 2021

## Representing Math Operations to Scale up Error Feedback

Feedback on student answers and even during intermediate steps in their solutions to open-ended questions is an important element in math education. Most existing approaches for automated student solution analysis and feedback are not scalable since they require manually constructing cognitive models and anticipating student errors for each question. Leveraging a recent math expression encoding method, we represent each math operation applied in solution steps as a transition in the math embedding vector space. We can learn implicit and explicit representations of math operations and use them to i) identify math operations a student intends to perform in each solution step, regardless of whether they did it correctly or not, and ii) select the appropriate feedback type for incorrect steps. The paper can be found here:

*"Math Operation Embeddings for Open-ended Solution Analysis and Feedback,"*International Conference on Educational Data Mining (EDM), June 2021

## Meaningful Knowledge Tracing: Option Tracing and Attentive Knowledge Tracing

Knowledge tracing refers to a family of methods that estimate each student’s knowledge component/skill mastery level from their past responses to questions. One key limitation of most existing knowledge tracing methods is that they can only estimate an overall knowledge level of a student per knowledge component/skill since they analyze only the (usually binary-valued) correctness of student responses. Therefore, it is hard to use them to diagnose specific student errors. We extend existing knowledge tracing methods beyond correctness prediction to the task of predicting the exact option students select in multiple choice questions. We evaluate their ability in identifying common student errors in the form of clusters of incorrect options across different questions that correspond to the same error. We have also developed an interpretable, attention-based knowledge tracing method that was the state-of-the-art at the time. The papers can be found here:

*"Option Tracing: Beyond Correctness Analysis in Knowledge Tracing,"*International Conference on Artificial Intelligence in Education (AIED), June 2021

*"Context-Aware Attentive Knowledge Tracing,"*ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2020

## Student Affect Detection and Intervention with Teachers in the Loop

"Sensor-free" detectors of student affect that use only student activity data and no physical or physiological sensors are cost-effective and have potential to be applied at large scale in real classrooms. These detectors are trained using student affect labels collected from human observers as they observe students learn within intelligent tutoring systems (ITSs) in real classrooms. We investigate whether active (machine) learning methods can improve the efficiency of the affect label collection process. We propose a new method that is ideally suited for the problem setting in affect detection, which outperforms other active learning methods on a real-world student affect dataset. We have also studied how using past data from a different student population affects the performance of active learning algorithms. The papers can be found here:

*"Active Learning for Student Affect Detection,"*International Conference on Educational Data Mining (EDM), July 2019

*"Using Past Data to Warm Start Active Machine Learning: Does Context Matter?"*International Conference on Learning Analytics and Knowledge (LAK), Apr. 2021,

*best paper nominee*

## Career Path Modeling and Recommendation

The development of new technologies at an unprecedented rate is rapidly changing the landscape of the labor market. Therefore, for workers who want to build a successful career, acquiring new skills required by new jobs through lifelong learning is crucial. We propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals. Our model is interpretable and can be used for important tasks including skill gap identification and career path planning. It can provide i) actionable feedback to users and guide them through their upskilling and reskilling processes and ii) recommendations of feasible paths for users to reach their career goals. The paper can be found here:

*"Skill-based Career Path Modeling and Recommendation,"*IEEE International Conference on Big Data, Dec. 2020,

*best student paper award*

# Previous work

## Learning and Content Analytics

### SPARse factor analysis for learning and content analytics (SPARFA)

SPARFA is a purely data-driven framework for learning and content analytics. Under the observation that there are only a small number of latent factors (which we term ''concepts'') that control students' performance, SPARFA analyzes binary-valued (correct/incorrect) graded student responses to assessment questions, and jointly estimates i) question-concept associations, ii) student concept knowledge, and iii) question intrinsic difficulties. SPARFA performs learning analytics by providing personalized feedback to the students on their knowledge level on each concept, and performs content analytics by analyzing how every question is related to each concept and how difficult it is. The original SPARFA paper can be found here:

*"Sparse Factor Analysis for Learning and Content Analytics,"*Journal of Machine Learning Research (JMLR), Vol. 15, pp. 1959–2008, June 2014

An extension to analyze ordinal responses (partial credits) can be found here:

*"Tag-Aware Ordinal Sparse Factor Analysis for Learning and Content Analytics,"*Proc. International Conference on Educational Data Mining (EDM), pp. 90–97, July 2013

An extension that jointly analyzes graded response data and question text to interpret the meaning of the latent concepts can be found here:

*"Joint Topic Modeling and Factor Analysis of Textual Information and Graded Response Data,"*Proc. International Conference on Educational Data Mining (EDM), pp. 324–325, July 2013

An extension that performs time-varying learning analytics by tracing students' knowledge evolution through time and also improves content analytics by analyzing the content and quality of learning resources (e.g., textbooks, lecture videos, etc.) can be found here:

*"Time-Varying Learning and Content Analytics via Sparse Factor Analysis,"*Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 452–461, Aug. 2014

### Non-linear student-response models: Dealbreaker and BLAh

Most existing student-response models are linear and additive, which achieve good prediction performance but admits limited interpretability. We develop two non-linear student-response models, the Dealbreaker model, which models a students' chance in answering a question correctly as only dependent on their minimum concept knowledge among the concepts the question covers, and the Boolean logic analysis (BLAh) model, which models binary-valued graded student responses as outputs of Boolean logic functions.

Traditional compensatory student-response models, including SPARFA, characterizes a student's success probability when answering a question as dependent on a linear combination of their knowledge on different concepts. Such linear models can be used to predict unobserved responses, but offer limited interpretability since they allow students to make up for their lack of knowledge on certain concepts with high knowledge on other concepts. On the contrary, the Dealbreaker model is a non-linear model that characterizes a student's success probability on a question as only dependent on their weakest knowledge among all concepts tested in the question. The Dealbreaker paper can be found here:

*"Dealbreaker: A Nonlinear Latent Variable Model for Educational Data,"*Proc. International Conference on Machine Learning (ICML), pp. 266–275, June 2016

The BLAh model goes beyond the "AND" family of models the Dealbreaker model belongs, and characterizes the graded response of a student on a question as the output of the Boolean logic function corresponding to the question, therefore being much more flexible and interpretable than the the Dealbreaker model. The BLAh paper can be found here:

*"BLAh: Boolean Logic Analysis for Graded Student Response Data,"*IEEE Journal of Selected Topics in Signal Processing (JSTSP), Vol. 11, Issue 5, pp. 754-764, Aug. 2017

### Automatic question generation: QG-Net

The ever growing amount of educational content renders it increasingly difficult to manually generate sufficient practice or quiz questions to accompany it. We propose QG-Net, a recurrent neural network-based model specifically designed for automatically generating quiz questions from educational content such as textbooks. QG-Net outperforms state-of-the-art neural network-based and rules-based systems for question generation, both when evaluated using standard benchmark datasets and when using human evaluators. The paper can be found here:

*"QG-Net: A Data-Driven Question Generation Model for Educational Content,"*ACM Conference on Learning at Scale (L@S), pp. 1-10, June 2018

## Grading and Feedback

### Mathematical language processing (MLP)

MLP is a framework for analyzing students' responses to open-response mathematical questions for grading and feedback. We featurize and cluster students' responses to open-ended mathematical questions, e.g., freelancing derivations that are common in science, technology, engineering and mathematics (STEM) fields. Then, we perform automatic grading and feedback using a small number of instructor-graded responses. The MLP paper can be found here:

*"Mathematica Language Processing: Automatic Grading and Feedback for Open Response Mathematical Questions,"*Proc. ACM Conference on Learning at Scale (L@S), pp. 167–176, Mar. 2015

### Misconception detection

We developed a new natural language processing-based framework to detect the common misconceptions among students' textual responses to short-answer questions. Our framework excels at classifying whether a response exhibits one or more misconceptions. More importantly, it can also automatically detect the common misconceptions exhibited across responses from multiple students to multiple questions; this property is especially important at large scale, since instructors will no longer need to manually specify all possible misconceptions that students might exhibit. The paper can be found here:

*"Data-mining Textual Responses to Uncover Misconception Patterns,"*Proc. International Conference on Educational Data Mining (EDM), pp. 208-213, June 2017

## Personalization

### Personalized learning action selection

We study the problem of turning the insights gained from learning and content analytics into personalization -- providing personalized recommendations for each student on what learning actions (read a section of a textbook, watch a lecture video, work on a practice question, etc.) the should take. We make use of the contextual bandits framework; the papers can be found here:

*"A Contextual Bandits Framework for Personalized Learning Action Selection,"*Proc. International Conference on Educational Data Mining (EDM), pp. 424–429, June 2016

An extension on taking uncertain context into account can be found here:

*"Contextual Multi-armed Bandit Algorithms for Personalized Learning Action Selection,"*Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6344-6348, Mar. 2017 (invited paper)

### Safe personalization

We demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum a-posteriori estimation. More importantly, we derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators. Applying our linearization technique to IRT models (the Rasch model, in particular) yields much tighter bounds on learner and question parameter estimates, especially when the numbers of learners and questions are small. Therefore, our analysis has the potential to improve the safety of personalization. The papers can be found here:

*"Linearized Binary Regression,"*Conference on Information Sciences and Systems (CISS), Mar. 2018, to appear

*"An Estimation and Analysis Framework for the Rasch Model,"*Proc. International Conference on Machine Learning (ICML), July 2018

## Behavior Analysis

### Measuring engagement from clickstream data

We propose a new model for learning that relates video-watching behavior to engagement level. One of the advantages of our method for determining engagement is that it can be done entirely within standard online learning platforms, serving as a more universal and less invasive alternative to existing measures of engagement that require the use of external devices. We also find that our model identifies key behavioral features (e.g., larger numbers of pauses and rewinds, and smaller numbers of fast forwards) that are correlated with higher learner engagement. The paper can be found here:

*"Behavior-Based Latent Variable Model for Learner Engagement,"*Proc. International Conference on Educational Data Mining (EDM), pp. 64-71, June 2017

### Instructor preference analysis

We propose a latent factor model that analyzes instructors' preferences in explicitly excluding particular questions from learners' assignments in a particular subject domain. We incorporate expert-labeled Bloom's Taxonomy tags on each question as a factor in our statistical model to improve model interpretability. Our model provides meaningful interpretations that help us understand why instructors exclude certain questions, thus helping automated learning systems to behave more "instructor-like". The paper can be found here:

*"A Latent Factor Model For Instructor Content Preference Analysis,"*Proc. International Conference on Educational Data Mining (EDM), pp. 290-295, June 2017

### Prerequisite structure extraction from user clickstreams

Existing approaches to automatically inferring prerequisite dependencies rely on analysis of either content (e.g., topic modeling of text) or performance (e.g., quiz results tied to content) data, they are not feasible in cases where courses have no assessments or only short content pieces (e.g., short video segments). We propose an algorithm that extracts prerequisite information using learner behavioral data instead, and apply it to an online short course. Our algorithm excels at both predicting learner behavior and revealing fine-granular insights into prerequisite dependencies between content segments, with validation provided by a course administrator. The paper can be found here:

*"Behavioral Analysis at Scale: Learning Course Prerequisite Structures from Learner Clickstreams,"*International Conference on Educational Data Mining (EDM), pp. 66-75, July 2018

### Personalized thread recommendation in MOOCs

We propose a probabilistic model for the process of learners posting on such forums, using point processes. Different from existing works, our method integrates topic modeling of the post text, timescale modeling of the decay in post activity over time, and learner topic interest modeling into a single model, and infers this information from user data. Our method also varies the excitation levels induced by posts according to the thread structure, to reflect typical notification settings in discussion forums. We experimentally validate the proposed model on three real-world MOOC datasets, with the largest one containing up to 6,000 learners making 40,000 posts in 5,000 threads. Results show that our model excels at thread recommendation, achieving significant improvement over a number of baselines, thus showing promise of being able to direct learners to threads that they are interested in more efficiently. The paper can be found here:

*"Personalized Thread Recommendation for MOOC Discussion Forums,"*European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Sep. 2018, to appear

## Non-educational Applications

I also collaborate with other researchers on some non-educational applications.

### Learning robust binary hash functions

We propose a new data-dependent method to learn binary hash functions. Inspired by recent progress in robust optimization, we develop a novel hashing algorithm, dubbed RHash, that minimizes the worst-case distortion among pairs of points in a dataset. We show that RHash achieves the same retrieval performance as the state-of-the-art algorithms in terms of average precision while using up to 60% fewer bits, using several large-scale real-world image datasets. The paper can be found here:

*"RHash: Robust Hashing via \ell_{\infty}-norm Distortion,"*Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 1386-1394, Aug. 2017

### Sensor selection for biosensing and structural health monitoring

We develop a new sensor selection framework for sparse signals that finds a small subset of sensors (less than the signal dimension) that best recovers such signals. Our proposed algorithm, Insense, minimizes a coherence-based cost function that is adapted from classical results in sparse recovery theory. Using a range of datasets, including two real-world datasets from microbial diagnostics and structural health monitoring, we demonstrate that Insense significantly outperforms conventional algorithms when the signal is sparse. The paper can be found here:

*" Insense: Incoherent Sensor Selection for Sparse Signals,"*Signal Processing, 2018

### Cloud dynamics and bidding strategy

We propose a nonlinear dynamical system model for the time-evolution of the spot price as a function of latent states that characterize user demand in the spot and on-demand markets. This model enables us to adaptively predict future spot prices given past spot price observations, allowing us to derive user bidding strategies for heterogeneous cloud resources that minimize the cost to complete a job with negligible probability of interruption. The paper can be found here:

*"Learning Cloud Dynamics to Optimize Spot Instance Bidding Strategies,"*IEEE International Conference on Computer Communications (INFOCOM), Apr. 2018

### Phase retrieval

We show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. We demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and real-world data. The paper can be found here:

*"PhaseLin: Linear Phase Retrieval,"*Conference on Information Sciences and Systems (CISS), Mar. 2018

A method relying on a novel linear spectral estimator (LSPE) to obtain accurate initialization for phase retrieval:

*"Linear Spectral Estimators with an Application to Phase Retrieval,"*Proc. International Conference on Machine Learning (ICML), July 2018