Notes: Visualizing sequential patterns for text mining


Wong, P. C., Cowley, W., Foote, H., Jurrus, E., & Thomas, J. (2000). Visualizing sequential patterns for text mining. In Information Visualization, 2000. InfoVis 2000. IEEE Symposium on (pp. 105-111). IEEE.


  • Mining Sequential patterns aims to identify recurring patterns from data over a period of time.
  • A pattern is a finite series of elements from the same domain A -> B -> C -> D
  • Each pattern has a minimum ‘support’ value which indicates the percentage of pattern occurrence. (E.g. 90% of people who did this process, did the second process, followed by the third process)
  • Sequential pattern vs association rule:
    • Sequential pattern – studies ordering/arrangement of elements E.g. A -> B -> C -> D
    • Association rule – studies togetherness E.g. A+B+C -> D


  • Presenting a visual data mining system that combines pattern discovery and visualizations.



Open source corpus containing 1170 news articles from 1991 to 1997 and harvested news of 1990 from TREC5 distribution.


  1. Topic Extraction: Identifies the topic in documents based on the co-occurrence of words. Words separated by white space evaluated – stemming done, prepositions, pronouns, adjectives, and gerunds ignored.
  2. Multiresolution binning: Bins articles with the same timestamp (E.g. Binning by day, week, month, year)

Discovery of sequential patterns by Visualization:

  • Plotting topics/ topic combinations over time.
  • Strength: Can quickly view overall patterns and individual occurrence of events.
  • Weakness: No knowledge on exact connections that make up the pattern and statistical support on the individual patterns.

Discovery of sequential patterns by Data mining:

  • Building patterns on n-ary tree with elements as nodes.
  • Patterns are valid if the support value is greater than threshold.
  • A sample pattern mining from given input data is given in Figure 2 of the paper.
  • Strength: Provides accurate statistical (support) values for all weak and strong patterns.
  • Weakness: Loses temporal and locality information, large number of patterns produced in text format making human interpretation harder.

Visual Data Mining system:


  • Combining visualization and data mining to compensate each others’ weaknesses (Refer Figure 4 & 5 in the paper to see the pattern visualizations).
  • Binning resolution can be changed to see different patterns based on day, week, month, year etc.
  • Patterns associated to a particular topic can be picked.


  • Strength of pattern is not easily identifiable from the visualization without statistical measures. Pattern mining gets enhanced by graphical encoding with spatial and temporal information.
  • Knowledge discovery by humans is aided by combining statistical data mining and visualization.

Future Work:

  • Handling larger data sets using secondary memory support and improve display.
  • Integrating more techniques like association rules into visual data mining environment.