Competency 8.2 – Shibani’s blog

Competency 8.2: Build and evaluate models using alternative feature spaces.

I used the different feature spaces that I saved in the previous exercise for building models. My data set was very small and I intended to use it just for testing. I found significant improvement in metrics while comparing the models of POS features Vs Unigrams and bigrams. I could see from my data that the n-grams were most predictive of the categories.

I couldn’t find significant improvements in model metrics for many basic features. I used Naive Bayes as the classification algorithm. I also tried other algorithms, but there was not a big difference in the metrics’ values. Few feature spaces I tried along with the metrics for their models are below:

Feature Space	Accuracy	Kappa
POS grams	42%	0.12
12 grams_count	58%	0.36
1 grams_pairs	61%	0.41
12 grams_length	61%	0.41
12 POS grams	65%	0.47
12 grams_no stop	69%	0.52
12 grams	73%	0.59
123 grams	73%	0.59

To test with a real data set, I tried the hands on activity of text feature extraction given in Prosolo using sentiment_sentences data set. I extracted different feature spaces from the basic feature set and used logistic regression. There was significant improvement while expanding the feature set.

Leave a Reply Cancel reply