In conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12)
|BIOKDD '12 Workshop|
Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the inherent uncertainties in data collection processes, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.
The past two decades have witnessed rapid technological advances in biological data collection and acquisition. These advances in biotechnology enabled interrogation of cellular systems at various levels, leading to generation and collection of large-scale biological data (mostly in public databases) at an exponential rate. The explosion of biological data is leading to a paradigm shift in research methods in life sciences; from hypothesis-driven research to data driven research. In the last decade, sophisticated algorithms for knowledge discovery and data mining have demonstrated great promise in extracting novel biological information from complex, heterogeneous, and very high dimensional biological datasets.
While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.
Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2012), held annually or biannually in conjunction with the ACM SIGKDD Conference. This will be the 11th year for the workshop.
Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field. The program of the workshops included 10-11 contributed papers, and 1-2 invited talks. Information on past workshops is available at the following web pages:
BIOKDD has successfully established a tradition in providing a platform for the presentation and discussion of advances in data mining techniques that primarily target biological data in the last ten years. BIOKDD 2012 will target submissions on analyzing a broad range of biological, biochemical, and clinical datasets. The data of interest include “omic” datasets (genomic, transcriptomic, proteomic, metabolomic, interactomic), biochemical datasets, and clinical datasets (ranging from physiological measurements to free text). Papers that integrate multiple types of data to extract novel information will also be of great interest to the workshop. The topics of interest, classified according to data type, include the following:
Health Informatics/Translational Science:
Data Mining Methodologies:
Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides. Paper should be submitted in PDF/PS format through Easychar at the following link: http://www.easychair.org/conferences?conf=biokdd12
Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD10)
All papers will be published at the workshop proceedings and at the ACM digital library.
Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.
Publication of proceeding and expanded papers. A selection of accepted papers will also be invited to be submitted to a special issue of IEEE/ACM Transactions on Computational Biology (TCBB). Each paper submitted to the special issue should contain "a sufficient amount of new material" relative to the worksop version, by TCBB's (and IEEE's) rules as specified here:
We will support Google Checkout to pay the workshop publication fees.
8:30-8:35: Opening Remarks
8:35-9:30: Keynote presentation. Mining Genetic Interactions in Genome‐Wide Association Study
Session I (9:30 am – 10:30 am)
9:30-9:50 Detecting Protein Complexes from Noisy Protein Interaction Data
9:50-10:10 Globalized Bipartite Local Model for Drug‐Target Interaction Prediction
10:10-10:30 2D Similarity Kernels for Biological Sequence Classification
10:30-11:00 Coffee Break
Session II (11:00 am – 11:40 am)
11:00-11:20 Learning to Extract Chemical Names based on Random Text Generation and Incomplete Dictionary Direction for Our Field
11:20-11:40 Biomedical Text Categorization with Concept Graph Representations Using a Controlled Vocabulary
11:40 – 11:50 Closing Remarks