Competency 7.1/ 7.2 Text Mining

Text Mining is the process of extracting and identifying useful and meaningful information, from different sources of unstructured text data.

Prominent Areas of Text Mining

Information Retrieval:

Information Retrieval is the process of searching and retrieving the required document from a collection of documents based on the given search query. The search engines we use like Google, Yahoo etc. make use of IR techniques for matching and returning documents relevant to the user’s query.

Document Classification/ Text Categorization:

Classification is the process of identifying the category a new observation belongs to, on the basis of a training set consisting of data with pre-defined categories (supervised learning). An example is the classification of email into spam/non-spam.


Clustering is the unsupervised procedure of classification where a set of similar objects are grouped to a cluster. An example analysis would be the summarization of common complaints based on open-ended survey responses.

Trend Analysis:

Trend Analysis is the process of discovering the trends of different topics over a given period of time. It is widely applied in summarizing news events and social network trends. An example would be the prediction of stock prices based on news articles.

Sentiment Analysis:

Sentiment analysis is the process of categorizing opinions based on sentiments like positive, negative or neutral. Sample applications include identifying sentiments in movie reviews and gaining real-time awareness to users’ feedback.

Sub-area of Text Mining

Collaborative Learning Process Analysis

It is the process of analyzing the collaborative learning process of students using text mining techniques. Different indicators and language features are used for this study. Some of them are:
  • General indicators of interactivity
  • Turn length
  • Conversation Length
  • Number of student questions
  • Student to tutor word ratio
  • Student initiative
  • Features related to cognitive processes
  • Transactivity
Data familiarity in the domain is important to understand and develop features that are relevant.

Leave a Reply

Your email address will not be published. Required fields are marked *