Computational Oncology
Introduction
Modern molecular methods in biology and medicine are producing an unprecedented amount of data which are withstanding manual analysis by their mere volume.
Computerized methods have so far mainly been developed for sequence data, while data from DNA chips, proteomic analyses by 2D gel electrophoresis or mass spectrometry, comparative genomic hybridization (CGH), multicolor fluorescent in-situ hybridization (FISH), loss-of-heterozygosity analysis (LOH), and single nucleotide polymorphism (SNP) analysis still present a major challenge to computational biology.
In our project, we are adapting methods from machine learning to these data and are developing new methods for their analysis, in particular with applications to oncology. We are seeking methods to find associations between such molecular data and clinical information like tumor stage or grade, disease progression, degree of malignancy or responsiveness to chemotherapy.
Description
DNA Microarrays represent an important step towards the comprehensibility of organisms on the genomic level. Transcriptional activities of tens of thousands of genes can be determined at the same time, even the whole genome of one organism can be investigated. This process provides huge quantities of data, which can only be investigated by computerized methods.
We are developing an integrated concept in order to store microarray data in a central data warehouse and to access these data in an analyzable way. Considering the heterogeneity of experimental techniques and of the description of microarray experiments, this includes the reorganization of data in combined data models. We develop analytical techniques to establish generalizable analysis processes such as normalization, identification of differentially transcribed genes as well as clustering of probes and genes in databases.
In addition, we are applying techniques from the field of data mining, especially decision trees, neural networks and support vector machines to microarray data. Thus classes may be discovered and predicted for new probes, either on the basis of newly discovered classes or based on a training set, of which the distribution is known. Some important examples would be the classification of tumor samples on the basis of already existing groupings (tumor stage, cytogenetic data) or the discovery of new subclasses in a set of tumor specimens based on the expression profile of these probes.
We also port such analysis methods to other data from molecular genetics, in particular from CGH and matrix-CGH, multicolor FISH, LOH analysis, and proteome analysis.
The successful application of the described analysis techniques resulted in several publications.
For example, we have developed a classification method based on decision trees [8]. Decision trees have the advantage of yielding transparent and human readable decision rules. Furthermore, there is no need for additional feature selection methods for selection of a small number of informative genes.
Using another approach, we combined a feature selection method with an artificial neural network in order to associate data from comparative genetic hybridisations with the probability of metastasis existence in breast cancer [3]. In addition, our analysis resulted in a new hypothesis on the formation of metastases in breast cancer.
Another project utilised Bayesian networks for construction of a tumor progression model for urotheliae carcinoma from data on genetic abberations in this cancer type [1].
For discrimination of dedifferentiated and pleomorphic liposarcoma, we applied cluster analysis and support vector machines. We were able to identify the chromosomal locations relevant for the distinction of these tumor types and to define classification rules [7].
In an investigation on the effects of deletions of different subunits of the Casein-Kinase II on the cell cycle of the yeast, we used correspondence analysis in order to identify genes with modified transcription [9].
As a last example to be presented here, we were able to show the statistical significance of the observed accumulation of differentially expressed genes on certain chromosomal locations if two sub-populations of thymus epithelial cells are compared [2].
Project Members
Dr. Benedikt Brors (head of project)
Peter Bewerunge
Dr. Alla Bulashevska
Dr. Svetlana Bulashevska
Theodora Manoli
Jasmin Müller
Yvonne Koch
Dr. Lars Kaderali
Dr. Marc Zapatka
former members:
associated: Dr. Thomas Kochmann
Gaëlle Dubois
Dr. Jan Wiemer (now Europroteome)
Falk Schubert
Dr. Patrick Warnat
Collaborations
Prof. Dr. T. Chakraborty, Institut für Medizinische Mikrobiologie, Universität Giessen
PD Dr. Torsten Haferlach, Leukämie-Diagnostiklabor, Universitätsklinikum Großhadern, München
Dr. Stefan Joos, Abt. Molekulare Genetik, DKFZ Heidelberg
Prof. Dr. Panthel, Institut für Tumorbiologie, Universitätsklinikum Hamburg
PD Dr. Gunhild Mechtersheimer, Pathologie, Universitätsklinikum Heidelberg
Dr. Christoph Klein, Institut für Immunologie, Ludwig-Maximilians-Universität München
Prof. Dr. G. Kovacs, Chirurgische Universitätsklinik Heidelberg
PD Dr. R. Kronenwett, Klinik für Hämatologie, Onkologie und Klinische Immunologie, Universität Düsseldorf
Prof. Dr. Lichter, Abt. Molekulare Genetik, DKFZ Heidelberg
Prof. Dr. Matthias Löhr, Innere Medizin, Universitätsklinikum Mannheim
Prof. Dr. Christoph Niehrs, Abt. Molekulare Embryologie, DKFZ Heidelberg
Prof. Walter Pyerin, Abt. Biochemische Zellphysiologie, DKFZ Heidelberg
Prof. Dr. Guido Sauter, Universität Basel, Schweiz
Prof. Dr. B. Schlegelberger, Mol. Pathologie, MH Hannover
Dr. Alexander Schramm, Pädiatrische Onkologie, Tumorzentrum, Universität Essen
Prof. Dr. F. Speleman, Mol. Cytogentics, Universitair Ziekenhuis, Gent, Belgien
Dr. C. Thiede, Innere Medizin, Universitätsklinikum Dresden
Dr. F. Westermann, Abt. Cytogenetik, DKFZ Heidelberg
phase-IT intelligent solutions AG, Heidelberg
Boehringer Ingelheim Austria
Dr. Hiltrud Brauch, IKP Stuttgart
Publications
Vinayagam A., Koenig R., Moormann J., Schubert F., Eils R., Glatting K.H., Suhai S. (2004) Applying Support Vector Machines for Gene ontology based gene function prediction. BMC Bioinformatics. 5(1):116.
Bulashevska,S., Szakacs, O.,Brors, B., Eils, R. and Kovacs, G. (2004) Pathways of urothelial cancer progression suggested by Bayesian network analysis of allelotyping data. Int. J. Cancer. 110:850-856.
Gotter, J., Brors, B., Hergenhahn, M. and Kyewski, B. (2004) Medullary epithelial cells of the human thymus express a highly diverse selection of tissue-specific genes colocalized in chromosomal clusters. J. Exp. Med. 199: pp. 155-166.
Schmidt-Kittler, O., Ragg, T., Daskalakis, A., Granzow, M., Ahr, A., Blankenstein, T.J.F., Kaufmann, M., Diebold, J., Arnhold, H., Müller, P., Bischoff, J., Harich, D., Schlimok, G., Riethmüller, G., Eils, R. and Klein, C.A. (2003) From latent disseminated cells to overt metastasis: genetic analysis of systemic breast cancer progression Proc. Natl. Acad. Sci. U.S.A. 100: pp. 7737-7742.
Wiemer, J., Schubert, F., Granzow, M., Ragg, T., Fieres, J., Mattes, J. and Eils, R. (2003) Informatics united. Exemplary studies combining medical informatics, neuroinformatics and bioinformatics. Methods Inf. Med. 42: pp. 126-133.
Bulashevska, S., Groll, A., Eils, R. (2002) "ISCN Parser": a Software Tool for Interpreting Cytogenetic Data Notated in The International System for Human Cytogenetic Nomenclature (ISCN 1995). In Proc. of NETTAB "Network Tools and Applications in Biology" Conference, Genoa, Italy.
Fellenberg, K., Hauser, N.C., Brors, B., Hoheisel, J.D. and Vingron, M. (2002) Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis. Bioinformatics 18: pp. 423-433.
Fritz, B., Schubert, F., Wrobel, G., Schwaenen, C., Wessendorf, S., Nessling, M., Korz, C., Rieker, R.J., Montgomery, K., Kucherlapati, R., Mechtersheimer, G., Eils, R., Joos, S. and Lichter, P. (2002) Microarray-based copy number and expression profiling in dedifferentiated and pleomorphic liposarcoma. Cancer Res. 62: pp. 2993-2998.
Schoch, C., Kohlmann, A., Schnittger, S., Brors, B., Dugas, M., Mergenthaler, S., Kern, W., Hiddemann, W., Eils, R. and Haferlach, T. (2002) Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proc. Natl. Acad. Sci. U.S.A. 99: pp. 10008-10013.
Fellenberg, K., Hauser, N.C., Brors, B., Neutzner, A., Hoheisel, J.D. and Vingron, M. (2001) Correspondence anaylsis applied to microarray data. Proc. Natl. Acad. Sci. U.S.A. 98: pp. 10781-10786.
Dubitzky, W., Granzow, M. and Berrar, D. (2001) Data Mining and Machine Learning Methods for Microarray Analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis - Papers from CAMDA 2000, Boston. Kluwer, Academic Publishers.
Dubitzky, W., Berrar, D., Granzow, M. and Eils, R. (2001) Detecting Broad-Band and Selective Correlation Patters among Gene Expression and Drug Activity Data. In: Proc. of Critical Assessment of Techniques for Microarray Data Mining: pp. 17-22, Duke Univ., NC, US.
Berrar, D., Dubitzky, W., Solinas-Toldo, S., Bulashevska, S., Granzow, M., Conrad, C., Kalla, J., Lichter, P. and Eils, R. (2001) Design and Implementation of a Database System for Comparative Genomic Hybridization Analysis. IEEE Eng. Med. Biol. 20: pp. 75-83.
Berrar, D., Dubitzky, W., Granzow, M. and Eils, R. (2001) Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. In: Proc. of Critical Assessment of Techniques for Microarray Data Mining: pp. 23-28, Duke Univ., NC, US.
Granzow, M., Berrar, D., Dubitzky, W., Schuster, A., Azuaje, F. and Eils, R. (2001) Tumor Classification by Gene Expression Profiling: Comparison and Validation of Five Clustering Methods. ACM SIGBIO Newsl. 21: pp. 16-22.
Bulashevska, S., Dubitzky, W. and Eils, R. (2000) Mining Gene Expression Data using Rough Set Theory. In: Proceedings of Critical Assessment of Techniques for Microarray Data Analysis (CAMDA'00 Conference), Duke University, NC, US.
Dubitzky, W., Granzow, M., Berrar, D., Bulashevska, S., Conrad, C., Gerlich, D. and Eils, R. (2000) A Comparison of Symbolic and Subsymbolic Machine Learning Approaches to Molecular Classification of Cancer and Gene Identification. In: Proceedings of Critical Assessment of Techniques for Microarray Data Analysis (CAMDA'00 Conference) , Duke University, NC, US.
Schuster, A., Dubitzky, W., Azuaje, F., Granzow, M., Berrar, D. and Eils R. (2000) Tumor Identification by Gene Expression Profiles: A Comparison of Five Different Clustering Methods. In: Proceedings of Critical Assessment of Techniques for Microarray Data Analysis (CAMDA2000). Duke Univ., NC, US.
Beißbarth, T., Fellenberg, K., Brors, B., Arribas-Prat, R., Boer, J.M., Hauser, N.C., Scheideler, M., Hoheisel, J.D., Schütz, G., Poustka, A. and Vingron, M. (2000) Processing and quality control of DNA array hybridization data. Bioinformatics Vol. 16: pp. 1014-1022.
Brors, B. (2000) Qualitätsstandards für DNA Chip-Analysen (in german). Medizinische Genetik Vol. 12: pp. 301-303.
