Q1) Using regressor-data-asgn2.csv, what is the Pearson correlation between data and predicted (model)? (Round to three significant digits; e.g. 0.24675 should be written as 0.247) (Hint: this is easy to compute in Excel)
Q2) Using regressor-data-asgn2.csv, what is the RMSE between data and predicted (model)? (Round to three significant digits; e.g. 0.24675 should be written as 0.247) (Hint: this is easy to compute in Excel)
Q3) Using regressor-data-asgn2.csv, what is the MAD between data and predicted (model)? (Round to three significant digits; e.g. 0.24675 should be written as 0.247) (Hint: this is easy to compute in Excel)
Calculate theabsolute values of the previous residual values in an array =ABS(RMSE!A2:A1001) and average them.
Q4) Using classifier-data-asgn2.csv, what is the accuracy of the predicted (model)? Assume a threshold of 0.5. (Just give a rounded value rather than including the decimal; e.g. write 57.213% as 57) (Hint: this is easy to compute in Excel)
Q5) Using classifier-data-asgn2.csv, how well would a detector perform, if it always picked the majority (most common) class? (Just give a rounded value rather than including the decimal; e.g. write 57.213% as 57) (Hint: this is easy to compute in Excel)
Q6) Is this detector’s performance better than chance, according to the accuracy and the frequency of the most common class?
Q7) What is this detector’s value for Cohen’s Kappa? Assume a threshold of 0.5. (Just round to the first two decimal places; e.g. write 0.74821 as 0.75).
Q8) What is this detector’s precision, assuming we are trying to predict “Y” and assuming a threshold of 0.5 (Just round to the first two decimal places; e.g. write 0.74821 as 0.75).
Q9) What is this detector’s recall, assuming we are trying to predict “Y” and assuming a threshold of 0.5 (Just round to the first two decimal places; e.g. write 0.74821 as 0.75).
Q10) Based on the precision and recall, should this detector be used for strong interventions that have a high cost if mis-applied, or fail-soft interventions with low benefit and a low cost if mis-applied?
Q11) What is this detector’s value for A’? (Hint: There are some data points with the exact same detector confidence, so it is probably preferable to use a tool that computes A’, such as http://www.columbia.edu/~rsb2162/computeAPrime.zip — rather than a tool that computes the area under the ROC curve).
Building a simple text classification experiment – Training and evaluating a simple predictive model
Prominent Areas of Text Mining
Document Classification/ Text Categorization:
Trend Analysis is the process of discovering the trends of different topics over a given period of time. It is widely applied in summarizing news events and social network trends. An example would be the prediction of stock prices based on news articles.
Sub-area of Text Mining
Collaborative Learning Process Analysis
- General indicators of interactivity
- Turn length
- Conversation Length
- Number of student questions
- Student to tutor word ratio
- Student initiative
- Features related to cognitive processes
Metrics for Classifiers
The easiest measure of model goodness is accuracy. It is also called agreement, when measuring the inter-rater reliability.
Accuracy = # of agreements/ Total # of assessments
It is generally not considered a good metric across fields, since it has non even assignment to categories and not useful. E.g. 92% accuracy in the Kindergarten Failure Detector Model in the extreme case always says Pass.
Kappa = (Agreement – Expected Agreement) / (1 – Expected Agreement)
If Kappa value
= 0, agreement is at chance
= 1, agreement is perfect
= negative infinity, agreement is perfectly inverse
> 1, something is wrong
< 0, agreement is worse than chance
0<Kappa<1, no absolute standard. For data-mined models, 0.3-0.5 is considered good enough for publishing.
Kappa is scaled by the proportion of each category, influenced by the data set. We can compare the Kappa values within the same data set, but not between two data sets.
The Receiver Operating Characteristic Curve (ROC) is used while a model predicts something having two values (E.g correct/incorrect, dropout/not dropout) and outputs a probability or other real value (E.g. Student will drop out with 73% probability).
It takes any number as cut-off (threshold) and some number of predictions (maybe 0) may then be classified as 1’s and the rest may be classified as 0s. There are four possibilities for a classification threshold:
True Positive (TP) – Model and the Data say 1
False Positive (FP) – Data says 0, Model says 1
True Negative (TN) – Model and the Data say 0
False Negative (FN) – Data says 1, Model says 0
The ROC Curve has in its X axis Percent False Positives (Vs. True Negatives) and in Y axis Percent True Positives (Vs. False Negatives). The model is good if it is above the chance line in its diagonal.
A’ is the probability that if the model is given an example from each category, it will accurately identify which is which. It is a close relative of ROC and mathematically equivalent to Wilcoxon statistic. It gives useful result, since we can compute statistical tests for:
– whether two A’ values are significantly different in the same or different data sets.
– whether an A’ value is significantly different than choice.
A’ Vs Kappa:
A’ is more difficult to compute and works only for 2 categories. It’s meaning is invariant across data sets i.e) A’=0.6 is always better than A’=0.5. It is easy to interpret statistically and has value almost always higher than Kappa values. It also takes confidence into account.
Precision and Recall:
Precision is the probability that a data point classified as true is actually true.
Precision = TP / (TP+FP)
Recall is the probability that a data point that is actually true is classified as true.
Recall = TP / (TP+FN)
They don’t take confidence into account.
Metrics for Regressors
Linear Correlation (Pearson correlation):
In r(A,B) when A’s value changes, does B change in the same direction?
It assumes a linear relationship.
If correlation value is
1.0 : perfect
0.0 : none
-1.0 : perfectly negatively correlated
In between 0 and 1 : Depends on the field
0.3 is good enough in education since a lot of factors contribute to just any dependent measure.
Different functions (outliers) may also have the same correlation.
R square is correlation squared. It is the measure of what percentage of variance in dependent dependent measure is explained by a model. If predicting A with B,C,D,E, it is often used as the measure of model goodness rather than r.
Mean Absolute Error/ Deviation is the average of absolute value of actual value minus predicted value. i.e) the average of each data point’s difference between actual and predicted value. It tells the average amount to which the predictions deviate from the actual value and is very interpret able.
Root Mean Square Error (RMSE) is the square root of average of (actual value minus predicted value)^2. It can be interpreted similar to MAD but it penalizes large deviation more than small deviation. It is largely preferred to MAD. Low RMSE is good.
Goes in the right direction, but systematically biased
Values are in the right range, but doesn’t capture relative change
Types of Validity
Does your model remain predictive when used in a new data set?
Generalizability underlies the cross-validation paradigm that is common in data mining. Knowing the context of the model where it will be used in, drives the kind of generalization to be studied.
Fail: Model of boredom built on data from 3 students fails when applied to new students
Do your findings apply to real-life situations outside of research settings?
E.g. If a behavior detector built in lab settings work in real classrooms.
Does your model actually measure what it was intended to measure?
Does your model fir the training data? (provided the training data is correct)
Does your model predict not just the present, but the future as well?
Does your results matter?
From testing; Does your test cover the full domain it is meant to cover?
For behavior modeling, does the model cover the full range of behavior it is intended to?
Are your conclusions justified based on evidence?
I think that the lessons in Week 5 and 6 are very useful, especially when we want to get our hands deep into predictive modeling and diagnosing its usefulness. I hope to use them in my predictive modeling work 🙂
My notes/ learning
- Self-Report: -common for affect, self-efficacy; not common for labeling behavior (students may not admit gaming)
- Field observations
- Text replays
- Video coding
- Brainstorming features – IDEO tips for brainstorming
- Deciding what features to create – trade-off between effort and usefulness of feature
- Creating the features – Excel, OpenRefine, Distillation code
- Studying the impact of features on model goodness
- Iterating on features if useful – try close variants and test
- Go to 3 (or 1)
Knowledge Engineering and Data Mining:
– can hurt student outcomes by wrong intervention
Assignment – Critical Reflection:
– Developing a model that can infer an aspect of data (predicted variable) from a combination of other data (predictor variables)
– Inferences about the future/ present
Two categories of prediction model:
- Training label – a data set where we already know the answer, to train the model for prediction
- Test data – the data set for testing our model
- SAS Enterprise miner
- Step regression
- logistic regression
- J48/ C4.5 Decision Trees
- JRip Decision rules
- K* Instance-based classifiers
Uses of Prediction Modeling
To work big on big data, big support is needed from the top management. The top management should foresee the future and possibilities of learning analytics as to what it can achieve. Only with promised outcomes, they can be expected to support it at a big level. It is not a small change to bring about in a day.
A new department may be needed to manage what should be done in learning analytics. This will require funding, responsible experts, manpower and technical training. Do the institutions have what it takes to commit to this new venture?
Personal Data Protection is a growing concern these days. When data is analyzed, it has to pass through humans and systems. How safe can our data be? Could there be a possible breach in security and what could be its implication?
When we have answers for all these questions, we could probably move forward to the next era of data analytics!