Outlier Definition, Detection, and Description

The main goal of the ODD workshop is to bring together academics, industry and government researchers and practitioners to discuss and reflect on outlier mining challenges. Specifically, the 1st ODD workshop (2013) focused on outlier detection and description, with particular emphasis on descriptive methods that could help make sense of the detected outliers. The 2nd ODD^2 workshop (2014) extended the focus areas to outlier detection and description under data diversity, with emphasis on challenges associated with mining outliers in heterogeneous data environments (graphs, text, streams, metadata, etc.).
This year, we broaden the scope to also include the translation of real world applications to different outlier definitions. Our goal is to highlight challenges associated with (1) outlier mining by new theoretic models and efficient algorithms, (2) translating real world problems to one/multiple of these definitions, and (3) comparing these definitions in their detection quality for unknown outlier instances. In all, the 3rd ODDx3 aims to increase awareness of the community to the following challenges of outlier mining:

      • What is an outlier/anomaly?
      • How can we define an anomaly in heterogeneous data environments?
      • How do different definitions translate to real world applications (spam, fraud, etc.)?
      • How can real world scenarios help shape new anomaly definitions?
      • How can we build descriptive detection methods?
      • How could data visualization aid anomaly mining?

Invited Keynote Speakers

We are proud to have Vipin Kumar and Xifeng Yan as our keynote speakers.

Vipin Kumar is a William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. Dr. Kumar's research interests include data mining, high-performance computing, and their applications in Climate/Ecosystems and Biomedical domains. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES) and graph partitioning (METIS, ParMetis, hMetis). He has authored over 300 research articles, and has coedited or coauthored 11 books including widely used text books ``Introduction to Parallel Computing'' and ``Introduction to Data Mining''. Dr. Kumar co-founded SIAM International Conference on Data Mining and served as a founding co-editor-in-chief of Journal of Statistical Analysis and Data Mining (an official journal of the American Statistical Association). Dr. Kumar is a Fellow of the ACM, IEEE and AAAS. Kumar's foundational research in data mining and its applications to scientific data was honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD). His h-index is 90.
Xifeng Yan is an associate professor at the University of California at Santa Barbara. He holds the Venkatesh Narayanamurti Chair of Computer Science. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2006. He was a research staff member at the IBM T. J. Watson Research Center between 2006 and 2008. He has been working on modeling, managing, and mining graphs in information networks, computer systems, social media and bioinformatics. His works were extensively referenced, with over 9,000 citations per Google Scholar and thousands of software downloads. He received NSF CAREER Award, IBM Invention Achievement Award, ACM-SIGMOD Dissertation Runner-Up Award, and IEEE ICDM 10-year Highest Impact Paper Award.

Each keynote will be 35 minutes long, including questions.

ODDx3 Panel: "What is an anomaly?"

A panel consisting of researchers from both academia and industry with expertise/experience in outlier mining and fraud detection (60 minutes, including 5 minute presentation by each panelist followed by Q&A and discussions)


Despite its immense popularity, anomaly mining remains an extremely challenging task for many real world applications. For many practitioners, the task is poorly defined and under-specified as existing definitions and solutions have been often too simplistic and do not directly correspond to the needs of modern applications.
The first goal of the panel is to have people from various domains (or people who heavily collaborate with such) to describe the kind of anomaly problems they are facing with in the real world. The second goal is then to try to tie existing definitions in the literature to those encountered in the real world, and if no appropriate definitions exist, try to brainstorm possible new formulations.
To kick-start the panel discussions, we will introduce typical scenarios and use cases from various domains; including network intrusion, insider trading, bank fraud, medical referral fraud, opinion spam, Web spam, computer malware dissemination, social malware, etc. The panelists will then elaborate on these scenarios with possible formulations and approaches. We expect these discussions to spark ideas as to how existing approaches for one problem domain (e.g. bank fraud) can be applied to those in other domains (e.g. medicare fraud).

Panelists (under construction)

      • Tina Eliassi-Rad (Rutgers) (malware & fraud detection)
      • Ted E. Senator (Leidos) (insider threat detection)
      • Jimeng Sun (Georgia Tech.) (outliers in medical data)
      • Weng-Keen Wong (Oregon State U.) (outbreak detection)

Workshop Program

TBD

ODDx3 is a half-day workshop, organized in conjunction with ACM SIGKDD 2015.

Important Dates

Submission Deadline June 5, 2015, 23:59 PST
Notification to Authors June 30, 2015, 23:59 PST
Camera-ready Deadline July 10, 2015, 23:59 PST
Workshop day August 10, 2015

Call for Papers

Topics of interests for the workshop include, but are not limited to:

Submission Guidelines

We invite submission of unpublished original research papers that are not under review elsewhere. All papers will be peer reviewed. If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted according to the ACM Proceedings Template (Tighter Alternate style).

The maximum length of papers is 10 pages in this format. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages.

The papers should be in PDF format and submitted via the following EasyChair submission site.

Accepted papers will be included in the KDD 2015 Digital Proceedings, and made available in the ACM Digital Library.

Program Committee (under construction)

Organizers

You can contact us at:
odd15kdd (at) outlier-analytics.org