Competency 6.1: Engineer both feature and training labels

My notes/ learning

Behavior Detectors

Behavior detectors are automated (predictive) models that can infer from log files whether a student is behaving in a certain way. 
Disengaged behaviors 
-gaming the system by trying to succeed without learning
-off-task behavior
-carelessness by giving wrong answer even when having the required skills
-WTF behavior – Without Thinking Fastidiously (by doing unrelated tasks while using the system)
Metacognitive behaviors
-unscaffolded self-exploration
-exploration behaviors
Related Problem:
-sensor-free affect detection (without the use of video-capture, gesture capture etc.)
– detecting boredom, frustration, engaged concentration, delight

Ground Truth

Ground truth is the accuracy of classification in supervised learning/ machine learning.
Where to get the prediction labels from is the big issue in developing behavior detectors.
E.g. How to identify when a student is off-task/ gaming the system?
Behavior labels are noisy; there is no perfect way to get indicators of student behavior.
Sources of ground truth:

  • Self-Report: -common for affect, self-efficacy; not common for labeling behavior (students may not admit gaming)
  •  Field observations
  • Text replays
  • Video coding

Field observations:
One or more observers watch students and take notes
– requires training to do it right
Text Replays:
Analyzing student interaction behavior from log files based on their input in the system.
– Fast to conduct
– Decent inter-rater reliability
– Agrees with other measures of constructs
– Can be used to train behavior detectors
– Only limited constructs can be coded
– Lower precision than field observation due to lower bandwidth
Video Coding:
Videos of live behavior in the classrooms or screen replay videos analyzed
– slowest, but replicable and precise
– challenges in camera positioning

Kappa= 0.6 or higher expected for expert coding
However, 1000 data points with kappa= 0.5 > 100 data points with kappa= 0.7

Once we have ground truth, we can build the detector.

Feature Engineering

Feature engineering is the art of creating predictor variables. The model will not be good if our features (predictors) are not good. It involves lore rather than well-known and validated principles.

The big idea is how we can take the voluminous, ill-formed and yet under-specified data that we now have in education and shape it into a reasonable set of variables in an efficient and predictive way.


  1. Brainstorming features – IDEO tips for brainstorming
  2. Deciding what features to create – trade-off between effort and usefulness of feature
  3. Creating the features – Excel, OpenRefine, Distillation code
  4. Studying the impact of features on model goodness
  5. Iterating on features if useful – try close variants and test
  6. Go to 3 (or 1)

Feature engineering can over-fit –> Iterate and use cross-validation, test on held-out data or newly collected data.

Thinking about our variables is likely to yield better results than using pre-existing variables from a standard set.

Knowledge Engineering and Data Mining:

Knowledge engineering is where the model is created by a smart human being, rather than an exhaustive computer (that searches through all possibilities). It is also called rational modeling or cognitive modeling.

At its best:
Knowledge engineering is the art of a human being becoming deeply familiar with the target construct by carefully studying the data, including possible process data, understanding the relevant theory and thoughtfully crafting an excellent model.
-achieves higher construct validity and comparable performance than data mining
-may even transfer better to new data (while data-mined model may get trapped at finding specific features to the population)

E.g. Alevan et al.’s (2004, 2006) Help-seeking model

It was developed based on scientific articles, experience in designing learning environments, log files of student interaction and experience watching students using educational software in classes.

At its worst:
If it refers to making up a simple model very quickly and calling the resultant construct by a well-known name, not testing on data or has no evidence.
– poorer construct validity than data mining
– predicts desired constructs poorly 
– can slow scientific progress by false results
– can hurt student outcomes by wrong intervention

It is easier to identify if a data mining model is bad, from the features, validation procedure or goodness metrics; but difficult for knowledge engineering since the hard-work process in researcher’s brain is invisible.

To Do’s for both methods:
– Test the models
– Use direct measures (Training labels) or Indirect measures (E.g. predicting student learning). 
– Careful study of construct leads to better features and better models

Assignment – Critical Reflection:

Possible uses in education:
Behavior detection can be used to create automated learning management systems that will give hints to users/ comment on their performances by detecting their behavior. It can be used in places where the tutor is not available to help all students. The online automated tutor can jump in to give suggestions. If the behavior is still detected to be disengaged, an available tutor can be mapped to the student.

Leave a Reply

Your email address will not be published. Required fields are marked *