Welcome to The Bump Hunting Project by Patient Rule Induction Method. This website hosts a brief description of the goal of the project and its software PRIMsrc. It describes why and how you can use the software and provides some general remarks and links about it.


Overview

The general problem in "Bump Hunting" (BH) is to identify, characterize and predict hidden structures in the data that are informative and significant. In practice, "Bump Hunting" refers to the task of mapping out local regions of the input space (attribute/feature/predictor) where a target function of interest, usually unknown, assumes larger (or smaller) values than its average over the entire space. These sought-after regions of extreme values in the target function are also known as local/global extrema supports. The input space to perform the "Bump Hunting" search may be any low or high-dimensional space where inputs may be any variables such as attributes, features, predictors, etc. The target function may be any function of interest. See the Wiki page for details.

The picture below illustrates the idea. The sunshine over the mountain range shows how light can uncover peaks, highlands and valleys, just like we want to do for data structures in the target function by "Bump Hunting".

Mountains (Bill Wight Photography, Copyright 2015, with permission)

"Bump Hunting" applies to mathematical / statistical problems such as:

PRIMsrc implements a unified treatment of the "Bump Hunting" task in high-dimensional space. It uses a generic rule-induction algorithm by recursive peelings derived from the Patient Rule Induction Method (PRIM), initially introduced by Fisher & Friedman in 1999 (see Wiki "References"). It generates simple decision rules delineating a region (or regions) in the multi-dimensional input space, where the target function is unusually larger (or smaller) than its average over the entire space.


Why Use PRIMsrc?

The fact that the method (i) makes minimal assumptions about the data, (ii) gives easily interpretable rules with estimated variance and (iii) can target for any desired responses (being supervised for Survival, Regression and Classification (SRC) settings), makes it highly attractive to the user.

Unlike classical regression, classification and clustering problems, "Bump Hunting" is interested in:

Multiple applications exist in an increasing range of problems spanning from Medical, Engineering, Materials Research, Marketing, Business Analytics, Actuarial Science, Behavioral Science, etc... :


Readme

Visit the software Readme webpage to learn about License, Downloads, Branches, Requirements, Installation and Usage


Wiki

Visit the project Wiki webpage for Roadmap, Documentation ,Examples, Publications, Case Studies, Support and How to Contribute (code and documentation).


Authors/Contributors

Jean-Eudes Dazard, PhD.
Center for Proteomics and Bioinformatics (at the time of study/design)
Case Western Reserve University
Cleveland, Ohio, USA

J. Sunil Rao, PhD.
Division of Biostatistics
Department of Epidemiology and Public Health
The University of Miami
Miami, Florida, USA

Michael LeBlanc, PhD.
Fred Hutchinson Cancer Research Center
Public Health Sciences.
Department of Biostatistics, School of Public Health
The University of Washington
Seattle, Washington, USA

Michael Choe, MD.
Case Western Reserve University (at the time of study/design)
Cleveland, Ohio, USA

Tarn Duong, PhD.
Research scientist
Computer Science Laboratory (LIPN)
University of Paris 13
Paris, France 


Acknowledgements

Project funded in part by the National Institute of Health - National Cancer Institute, Grant: R01-CA160593 awarded to J.Sunil Rao/J-E. Dazard (co-PIs). This work was also made possible thanks to the help of Alberto Santana, MBA (Analyst Programmer, CWRU) and the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. Thanks also to professional photographer Bill Wight CA for the nice illustration picture above.


web counter
web counter