Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya

LARCA:Laboratory of Relational Algorithmics, Complexity and Learnability

PhD Thesis

View more documents from Albert Bifet.

Adaptive Learning and Mining for Data Streams and Frequent Patterns | [pdf] | [slides]

24 April 2009 | Advisors: Ricard Gavaldà and José Luis Balcázar.

Ph.D. Dissertation Committee: Joost Kok, Joao Gama, Minos Garofalakis, Christian Borgelt, and Lluis A. Belanche.

This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees.

In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naive Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks.

Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures.

And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.

Wordle:  Adaptive Learning and Mining for Data Streams and Frequent Patterns ADWIN Example

Publications

Evolving Data Stream Mining

Albert Bifet and Ricard Gavaldà: " Learning from Time-Changing Data with Adaptive Windowing". [pdf] | [pdf extended version] . Poster. In 2007 SIAM International Conference on Data Mining (SDM'07), Minneapolis, Minnesota.

Albert Bifet and Ricard Gavaldà: " Kalman Filters and Adaptive Windows for Learning in Data Streams ". [pdf] | [pdf extended version]| [slides] | Proc. 9th International Conference on Discovery Sicence (DS 2006). Springer-Verlag Lecture Notes in Artificial Intelligence 4265, 29-40.

Albert Bifet and Ricard Gavaldà: " Adaptive Learning from Evolving Data Streams". | 8th International Symposium on Intelligent Data Analysis 2009 | Extended Version Technical Report LSI R09-9

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà: " New ensemble methods for evolving data streams" . In 15th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining (KDD'09), Paris, France, June 2009.

Frequent Closed Tree Mining

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Intersection Algorithms and a Closure Operator on Unordered Trees ". [pdf] | Workshop Mining and Learning with Graphs MLG 2006.

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Closed and maximal tree mining using natural representations". [pdf] | Workshop on Mining and Learning with Graphs MLG 2007.

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Subtree Testing and Closed Tree Mining Through Natural Representations ". [pdf] | ACKE Workshop on "Advances in Conceptual Knowledge Engineering" 2007, Regensburg, Germany.

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Mining Frequent Closed Unordered Trees Through Natural Representations ". [pdf] | [slides] | In 2007 International Conference on Conceptual Structures, Sheffield UK.

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Mining Frequent Closed Rooted Trees ". [pdf] | To appear in Machine Learning Journal.

José Luis Balcázar, Albert Bifet and Antoni Lozano: " Mining Implications from Lattices of Closed Trees ". [pdf]| [slides] | Extraction et Gestion des Connaissances EGC'2008, Sophia Antipolis, France.

Evolving Tree Stream Mining

Albert Bifet and Ricard Gavaldà: " Mining adaptively frequent closed unlabeled rooted trees in data streams". [pdf] | [Software] | [Datasets] | [Slides] . In 14th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, USA, August 2008.

Albert Bifet and Ricard Gavaldà: " Adaptive XML Tree Classification on Evolving Data Streams". [pdf] | Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009

Some of the published articles may be covered by copyright. This the case, at least, for papers published by Springer, ACM, and IEEE. You may browse the articles at your convenience, in the same spirit as you may read a journal or a proceedings volume in a public library. Retrieving, copying, or distributing these files may violate the copyright protection law.

Links:

Contact information:

Research Fellow at University of Waikato in Hamilton, New Zealand.