The main goal of the ODD
workshop is to bring together academics, industry and government
researchers and practitioners to discuss and reflect on outlier mining
the 1st ODD
workshop (2013) focused on
outlier detection and description, with particular emphasis on
descriptive methods that could help make sense of the detected
outliers. The 2nd
ODD^2 workshop (2014)
extended the focus areas to outlier detection and description under
data diversity, with emphasis
on challenges associated with mining outliers in heterogeneous data
environments (graphs, text,
streams, metadata, etc.).
This year, we broaden the scope to also include the translation of real world applications to different outlier definitions. Our goal is to highlight challenges associated with (1) outlier mining by new theoretic models and efficient algorithms, (2) translating real world problems to one/multiple of these definitions, and (3) comparing these definitions in their detection quality for unknown outlier instances. In all, the 3rd ODDx3 aims to increase awareness of the community to the following challenges of outlier mining:
- What is an outlier/anomaly?
- How can we define an anomaly in heterogeneous data environments?
- How do different definitions translate to real world applications (spam, fraud, etc.)?
- How can real world scenarios help shape new anomaly definitions?
- How can we build descriptive detection methods?
- How could data visualization aid anomaly mining?
Each keynote will be 35 minutes long, including questions.A panel consisting of researchers from both academia and industry with expertise/experience in outlier mining and fraud detection (60 minutes, including 5 minute presentation by each panelist followed by Q&A and discussions)
Despite its immense popularity, anomaly mining remains an extremely challenging task for many real world applications. For many practitioners, the task is poorly defined and under-specified as existing definitions and solutions have been often too simplistic and do not directly correspond to the needs of modern applications.
The first goal of the panel is to have people from various domains (or people who heavily collaborate with such) to describe the kind of anomaly problems they are facing with in the real world. The second goal is then to try to tie existing definitions in the literature to those encountered in the real world, and if no appropriate definitions exist, try to brainstorm possible new formulations.
To kick-start the panel discussions, we will introduce typical scenarios and use cases from various domains; including network intrusion, insider trading, bank fraud, medical referral fraud, opinion spam, Web spam, computer malware dissemination, social malware, etc. The panelists will then elaborate on these scenarios with possible formulations and approaches. We expect these discussions to spark ideas as to how existing approaches for one problem domain (e.g. bank fraud) can be applied to those in other domains (e.g. medicare fraud).
|Submission Deadline||June 5, 2015, 23:59 PST|
|Notification to Authors||June 30, 2015, 23:59 PST|
|Camera-ready Deadline||July 10, 2015, 23:59 PST|
|Workshop day||August 10, 2015|
- Interleaved detection and description of outliers
- Description models for given outliers
- Pattern and local information based outlier description
- Subspace outliers, feature selection, and space transformations
- Ensemble methods for anomaly detection and description
- Descriptive local outlier ranking
- Identification of outlier rules
- Finding intensional knowledge
- Contextual and community outliers
- Human-in-the-loop modeling and learning
- Visualization techniques for interactive exploration of outliers
- Comparative studies on outlier description
- Related research fields
- Formal outlier mining models
- Supervised, semi-supervised, and unsupervised models
- Statistical models
- Distance-based models
- Density-based models
- Spectral models
- Constraint-based models
- Ensemble models
- Outlier mining for complex databases
- Graph data (e.g. community outliers)
- Spatio-temporal data
- Time series and sequential data
- Online processing of stream data
- Scalability to high dimensional data
- Applications of outlier detection and description
The maximum length of papers is 10 pages in this format. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages.
The papers should be in PDF format and submitted via the following EasyChair submission site.
Accepted papers will be included in the KDD 2015 Digital Proceedings, and made available in the ACM Digital Library.
- Fabrizio Angiulli, University of Calabria
- Ira Assent, Aarhus University
- James Bailey, University of Melbourne
- Arindam Banerjee, University of Minnesota
- Albert Bifet, Yahoo! Labs Barcelona
- Christian Böhm, LMU Munich
- Rajmonda Caceres, MIT
- Varun Chandola, Oak Ridge Nat. Lab.
- Polo Chau, Georgia Tech
- Sanjay Chawla, University of Syndey
- Feng Chen, SUNY Albany
- Tijl De Bie, University of Bristol
- Christos Faloutsos, Carnegie Mellon
- Jing Gao, University of Buffalo
- Manish Gupta, Microsoft, India
- Jaakko Holmén, Aalto University
- Eamonn Keogh, UC Riverside
- Matthijs van Leeuwen, KU Leuven
- Daniel B. Neill, Carnegie Mellon University
- Hasan Timucin Ozdemir, Panasonic R&D
- Naren Ramakrishnan, Virginia Tech
- Spiros Papadimitriou, Rutgers University
- Koen Smets, University of Antwerp
- Hanghang Tong, CUNY
- Ye Wang, The Ohio State University
- Arthur Zimek, LMU Munich
- Leman Akoglu (Stony Brook University)
- Sanjay Chawla (University of Sydney)
- Emmanuel Müller (Karlsruhe Institute of Technology)
- Ted E. Senator (Leidos--previously SAIC)
odd15kdd (at) outlier-analytics.org