Competency 8.2: Build and evaluate models using alternative feature spaces.
I used the different feature spaces that I saved in the previous exercise for building models. My data set was very small and I intended to use it just for testing. I found significant improvement in metrics while comparing the models of POS features Vs Unigrams and bigrams. I could see from my data that the n-grams were most predictive of the categories.
I couldn’t find significant improvements in model metrics for many basic features. I used Naive Bayes as the classification algorithm. I also tried other algorithms, but there was not a big difference in the metrics’ values. Few feature spaces I tried along with the metrics for their models are below:
Feature Space
|
Accuracy
|
Kappa
|
POS grams
|
42%
|
0.12
|
12 grams_count
|
58%
|
0.36
|
1 grams_pairs
|
61%
|
0.41
|
12 grams_length
|
61%
|
0.41
|
12 POS grams
|
65%
|
0.47
|
12 grams_no stop
|
69%
|
0.52
|
12 grams
|
73%
|
0.59
|
123 grams
|
73%
|
0.59
|
To test with a real data set, I tried the hands on activity of text feature extraction given in Prosolo using sentiment_sentences data set. I extracted different feature spaces from the basic feature set and used logistic regression. There was significant improvement while expanding the feature set.