|
Click here for full text:
Tackling Concept Drift by Temporal Inductive Transfer
Forman, George
HPL-2006-20R1
Keyword(s): text classification; topic identification; concept drift; time series; machine learning; inductive transfer; support vector machine
Abstract: Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework--the Daily Classification Task--which can be applied to large time-based datasets, such as the Reuters RCV1. In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifiers learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested. Notes: Copyright 2006 ACM. Published in and presented at SIGIR '06, 6-11 August 2006, Seattle, WA, USA
9 Pages
Back to Index
|