Instant Interactive Data Mining - Workshop at ECML PKDD 2012

Instant and Interactive

The goal of the Instant and Interactive Data Mining workshop (IID) is to address the development of data mining techniques that allow users to interactively explore their data, receiving near-instant updates to every requested refinement. While Instant mining and Stream mining start from different perspectives and operate under different constraints, there is a significant overlap in techniques and developments in either setting can have a significant impact on the other. Therefore, this workshop aims to bring together researchers interested in instant and adaptive data mining methods, whether for use in interactive systems or in the processing of large streams of evolving data.

Workshop Program

IID is a full-day workshop on September 24th, organized in conjunction with ECML PKDD 2012. The workshop will be located in room 3.30 of the Wills Memorial building, Park street, Bristol.

The program for IID is:


9:00	Workshop Opening
9:05	Keynote Presentation (abstract) 'Real-World Interactive Machine Learning of Customer Support Logs at Hewlett-Packard' by George Forman We have built and internally deployed an interactive text mining tool that is regularly used to analyze our customer support logs. For some years now, it has been used by various analysts throughout Hewlett-Packard who are not machine learning specialists. Using this tool, they seek to: understand what the major patterns are, determine their trends over time, discover emerging types of issues, and quantify and rank the relative importance of the various issues in terms of their number of cases, overall cost, and the ability to take useful action. The tool provides various forms of clustering, classification, and quantification-i.e. estimating the number or total cost of cases that belong to a category. As the user works, they identify new categories, search for and label cases to train the classifiers (which are instantly updated), restructure the class hierarchies or their modes of mutual exclusion, evaluate classifier accuracy, correct labeling errors in the training set and/or classification errors, perform active learning for particular classes, etc. All in all, the interactive experience is very far removed from the simple "waterfall" model of traditional supervised machine learning. Whereas the experience of editing text in an unfamiliar editor is mostly a matter of learning how to activate familiar commands (such as boldfacing and centering text), interactive machine learning presents the non-expert user with mostly unfamiliar commands, which sometimes must be combined in unanticipated ways to accomplish their amorphous and evolving goals. These issues can be fully appreciated only with experience; toward this end, a live demo will be included to illustrate these issues concretely.
10:05	'A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining' by Esther Galbrun & Pauli Miettinen

10:30	Coffee Break

11:00	'Online Estimation of Discrete Densities using Classifier Chains' by Michael Geilke & Eibe Frank & Stefan Kramer
11:20	'iST-MRF: Interactive Spatio-Temporal Probabilistic Models for Sensor Networks' by Nico Piatkowski
11:40	'From Block-based Ensembles to Online Learners In Changing Data Streams: If- and How-To' by Dariusz Brzezinski & Jerzy Stefanowski

12:00	Lunch (on your own)

13:30	Keynote Presentation 'Real Data Mining for Real Users: Instant, Interactive – A Dream?' by Michael Berthold
14:30	'Towards Exploratory Search of Scientific Information' by Dorota Glowacka & Ksenia Konyushkova & Tuukka Ruotsalo & Samuel Kaski
14:55	'Towards Real-Time Machine Learning' by Andreas Hapfelmeier & Christian Mertes & Jana Schmidt & Stefan Kramer
15:20	'Instant Selection of High Contrast Projections in Multi-dimensional Data Streams' by Andrei Vanea & Emmanuel Müller & Fabian Keller & Klemens Böhm
15:45	Discussion & Closing

16:00	Coffee Break
16:30	Conference Opening

Invited Speakers

We are proud to have

Michael Berthold (Konstanz University)
George Forman (HP Labs)

as the keynote speakers at our workshop.

Michael Berthold will give a keynote with the title: 'Real Data Mining for Real Users: Instant, Interactive - A Dream?'. He holds the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University in Germany, where his research focuses on using data mining methods for the interactive analysis of large information repositories in the Life Sciences. Most of the research results are made available to the public via the open source data mining platform KNIME.

George Forman will present: 'Real-World Interactive Machine Learning of Customer Support Logs at Hewlett-Packard'. He is a senior research scientist at Hewlett-Packard Labs. His research interests stem from practical issues that arise in the application of machine learning to industrial problems, e.g. feature selection, robustness, small training sets, and novel problem formulations, such as interactive machine learning. With over 40 publications and 48 patents, he frequently serves as a journal reviewer and on program committees of conferences such as KDD and ECML PKDD. He received his PhD in Computer Science & Engineering from the University of Washington, Seattle, in 1996.

Show abstract

Important Dates

Submission Deadline	29th of June 2012, 23:59 PST
Notification to Authors	20th of July 2012, 23:59 PST
Camera-ready Deadline	3rd of August 2012, 23:59 PST
Workshop day	24th of September 2012

Organizers

Jilles Vreeken (Universiteit Antwerpen)
Nikolaj Tatti (Universiteit Antwerpen)
Bart Goethals (Universiteit Antwerpen)
Anton Dries (Katholieke Universiteit Leuven)
Matthijs van Leeuwen (Katholieke Universiteit Leuven)
Siegfried Nijssen (Katholieke Universiteit Leuven)

You can contact us at:
iid2012 (at) easychair.org

Program Committee

Bettina Berendt, KU Leuven
Michael Berthold, University of Konstanz
Albert Bifet, University of Waikato
Mario Boley, University of Bonn and Fraunhofer IAIS
Polo Chau, Georgia Tech
Tijl De Bie, University of Bristol
Jaakko Hollmén, Aalto University
Florian Mansmann, University of Konstanz
Naren Ramakrishnan, Virginia Tech
Thomas Seidl, RWTH Aachen University
Geoff Webb, Monash University
Indrė Žliobaitė, Bournemouth University

What's IID?

Today, we lack the technology to perform `free-style' exploratory analysis on large amounts of data, allowing users to make discoveries by following their intuition. Standard data mining aims at finding highly interesting results, but this typically results in techniques that are computationally extremely demanding and therefore time consuming. Consequently, these techniques are hardly useful for the interactive exploration of large databases. To tackle this problem, we propose instant, interactive and adaptive data mining as a new data mining paradigm.

By instant, we mean that good results should be presented to the user within a few seconds. Short waiting times are essential to keep the user's attention. By interactive, we mean that the user should be able to give feedback on-the-fly, allowing to user to influence the analysis. Identifying intermediate results as (un)interesting allows the algorithm to focus on specific parts of the database and certain types of results. By adaptive, we mean that the system should learn from previous interactions with the user. These elements are clearly intertwined: without near-instant results, there can be no true interactivity, and without techniques that can take feedback into account, it is unrealistic to expect good results fast. Being instant, interactive and adaptive are therefore key requirements for next generation data mining. This will require a shift of focus compared to contemporary data mining work, in that instant algorithms will no longer be complete or optimal, while interactive algorithms will provide users an easy means to influence their calculations.

IID and Stream Mining

Several key challenges of instant and adaptive data mining are shared with the field of stream mining, which starts from the premise that the flow of data is continuous and never-ending. The key limitation in stream mining is the lack of arbitrary access to the data; data is provided in a given order and at a given rate, and only limited amounts of data can be buffered for processing at a later time. This brings along the need for on-line and any-time algorithms, that is, algorithms that are capable of processing the data as it becomes available and that can quickly provide partial results based on the data seen so far, without accessing the complete data. These algorithms should also be capable of adapting to changing circumstances such as gradual drifts of distributions or sudden shifts of the concepts underlying the data. Hence, adaptability and instantaneousness are key for developing successful stream mining algorithms as well.

Even though instant mining and stream mining start from different perspectives and operate under different constraints, we strongly believe there is a significant overlap in techniques and that developments in either setting can have a significant impact on the other. Therefore, the goal of this workshop is to bring together researchers with a shared interest in instant and adaptive data mining methods, whether for use in interactive systems or in the processing of large streams of evolving data.