|9:30||Welcome & Remarks|
'Graph OLAP, Anomaly and Query-based Outlier Detection'
by Xifeng Yan
Research talks (15+5 minutes each)
|12:30||Lunch (on your own)|
'Identifying Rare Class in Absence of True Labels:
Application to Monitoring Forest Fires from Satellite data'
by Vipin Kumar
This talk will present RAPT, a new predictive modeling framework for identifying rare classes in complete absence of labeled data. The RAPT framework is designed to use imperfectly annotated training data to learn classification models in the absence of expert-annotated training samples. Our results show that, under some reasonable assumptions, the classifiers trained from imperfectly labeled training data using the RAPT approach have performance comparable to the classification models trained using expert-annotated training data. This capability of learning from imperfect supervision is advantageous in a wide range of applications where the target class of interest is relatively rare and obtaining a precise labeling of even a small number of training samples is infeasible. The talk will present the application of the RAPT framework for creating historical maps of forest fires from satellite data for the tropical forests. This new forest fire product identifies approximately 1 million sq. km. of burned areas in the tropical forests in South America and South-east Asia during years 2001-2014, which is more than double of the total burned area reported by the state-of-art NASA products. We show validation of these results using burn-scars visible in satellite images, including high resolution Landsat images, to confirm the veracity of the previously unreported forest fires.
'What is an Anomaly?'
by Tiberio Caetano, Tina Eliassi-Rad, Vipin Kumar, Ted Senator, Jimeng Sun
Here is the list of discussion questions.
Research talks (15+5 minutes each) |
|5:45||Discussion & Closing|
The main goal of the ODD
workshop is to bring together academics, industry and government
researchers and practitioners to discuss and reflect on outlier mining
the 1st ODD
workshop (2013) focused on
outlier detection and description, with particular emphasis on
descriptive methods that could help make sense of the detected
outliers. The 2nd
ODD^2 workshop (2014)
extended the focus areas to outlier detection and description under
data diversity, with emphasis
on challenges associated with mining outliers in heterogeneous data
environments (graphs, text,
streams, metadata, etc.).
This year, we broaden the scope to also include the translation of real world applications to different outlier definitions. Our goal is to highlight challenges associated with (1) outlier mining by new theoretic models and efficient algorithms, (2) translating real world problems to one/multiple of these definitions, and (3) comparing these definitions in their detection quality for unknown outlier instances. In all, the 3rd ODDx3 aims to increase awareness of the community to the following challenges of outlier mining:
- What is an outlier/anomaly?
- How can we define an anomaly in heterogeneous data environments?
- How do different definitions translate to real world applications (spam, fraud, etc.)?
- How can real world scenarios help shape new anomaly definitions?
- How can we build descriptive detection methods?
- How could data visualization aid anomaly mining?
We are proud to have Vipin Kumar and Xifeng Yan as our keynote speakers.
Each keynote will be 45 minutes long, including questions.A panel consisting of researchers from both academia and industry with expertise/experience in outlier mining and fraud detection (60 minutes, including 5 minute presentation by each panelist followed by Q&A and discussions)
Despite its immense popularity, anomaly mining remains an extremely challenging task for many real world applications. For many practitioners, the task is poorly defined and under-specified as existing definitions and solutions have been often too simplistic and do not directly correspond to the needs of modern applications.
The first goal of the panel is to have people from various domains (or people who heavily collaborate with such) to describe the kind of anomaly problems they are facing with in the real world. The second goal is then to try to tie existing definitions in the literature to those encountered in the real world, and if no appropriate definitions exist, try to brainstorm possible new formulations.
To kick-start the panel discussions, we will introduce typical scenarios and use cases from various domains; including network intrusion, insider trading, bank fraud, medical referral fraud, opinion spam, Web spam, computer malware dissemination, social malware, etc. The panelists will then elaborate on these scenarios with possible formulations and approaches. We expect these discussions to spark ideas as to how existing approaches for one problem domain (e.g. bank fraud) can be applied to those in other domains (e.g. medicare fraud).
|Camera-ready Deadline||July 19, 2015, 23:59 PST|
|Workshop day||August 10, 2015|
- Interleaved detection and description of outliers
- Description models for given outliers
- Pattern and local information based outlier description
- Subspace outliers, feature selection, and space transformations
- Ensemble methods for anomaly detection and description
- Descriptive local outlier ranking
- Identification of outlier rules
- Finding intensional knowledge
- Contextual and community outliers
- Human-in-the-loop modeling and learning
- Visualization techniques for interactive exploration of outliers
- Comparative studies on outlier description
- Related research fields
- Formal outlier mining models
- Supervised, semi-supervised, and unsupervised models
- Statistical models
- Distance-based models
- Density-based models
- Spectral models
- Constraint-based models
- Ensemble models
- Outlier mining for complex databases
- Graph data (e.g. community outliers)
- Spatio-temporal data
- Time series and sequential data
- Online processing of stream data
- Scalability to high dimensional data
- Applications of outlier detection and description
The maximum length of papers is 10 pages in this format. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages.
The papers should be in PDF format and submitted via the following EasyChair submission site.
Accepted papers will be included in the KDD 2015 Digital Proceedings, and made available in the ACM Digital Library.
- Fabrizio Angiulli, University of Calabria
- James Bailey, University of Melbourne
- Albert Bifet, University of Waikato
- Petko Bogdanov, SUNY Albany
- Christian Böhm, LMU
- Rajmonda Caceres, MIT Lincoln Laboratory
- Sanjay Chawla, University of Syndey
- Feng Chen, SUNY Albany
- Tina Eliassi-Rad, Rutgers University
- Christos Faloutsos, Carnegie Mellon
- Jing Gao, University of Buffalo
- Arun Maiya, Institute for Defense Analyses
- Daniel B. Neill, Carnegie Mellon University
- Raymond Ng, University of British Columbia
- Spiros Papadimitriou, Rutgers University
- Mykola Pechenizkiy, Eindhoven U. of Tech.
- Naren Ramakrishnan, Virginia Tech
- Fabio Ramos, University of Sydney
- Joerg Sander, University of Alberta
- Oliver Schulte, Simon Fraser University
- Ambuj Singh, UC Santa Barbara
odd15kdd (at) outlier-analytics.org