Projects

Evolutionary genomics: Modeling and prediction of progression of breast and lung cancer

Grant NCN: Opus 2021/41/B/NZ2/04134
Total budget [PLN]: 1 474 980
Start date: 01/02/2022; End date: 01/02/2026
Status: 33 month/s left

Introduction
From genomic viewpoint, cancer evolution is driven by two types of events: point mutations (and deletions/insertions) and copy number alterations, including major genomic rearrangements. In bulk sequencing data, these events are reflected by changes in numbers of reference and variant reads. Existing recent mathematical and computational approaches include mostly techniques to estimate clusters of clones representing major genome transformation events and their evolution. We recently published a methodological paper in which we provide a method of rigorously inferring parameters characterizing tumor evolution, based on analysis of site frequency spectra (SFS) computed using sequencing data from human tumors. This sampling theory seems to be unique in the literature. As detailed in Preliminary Studies section of the proposal, we also have documented experience in building population genetics and population dynamics models of cancer and drawing medically relevant conclusions from the models, as well as in data analysis and development of mathematical tools underlying the models. In the current proposal, we plan to follow-up on recent work, and past experience, in modeling cancer progression based on molecular, demographic and epidemiological evidence. The important methodological challenge involves mathematical transformations of the site frequency spectra (SFS), under various genomic rearrangements and selection patterns. We plan the following specific aims, both methodological and applied.Aim 1 To study SFS for the clonal selective sweeps model of Dinh et al. (2020) extended to include chromosomal rearrangements and copy number variants (Mode 1 selection). To extend SFS study to Mode 2 selection models using two adaptations of the Tug-of-War model of McFarland.Aim 2 To obtain high-coverage (100×) DNA whole exome sequencing (WES) data from specific types of breast (primary tumor and cancerous lymph node) and lung cancer (specimens from different regions of the primary), supplemented by low-coverage (1×) DNA whole genome sequencing (WGS) data. To obtain WES-based tables of somatic mutations and WGS-based estimates of copy number variation as well as coordinated clone pedigrees of the specimens from different locations of the tumor. To compare with public database data such as TCGA to determine presence of population-specific mutational and evolutionary patterns in data at our disposal.Aim 3 To develop methods of estimation of the parameters of tumor clonal evolution (Modes 1 and 2), including times of mutations initiating selective sweeps and clonal growth rates based on analyses of SFS. To compute the estimates based on the data obtained as a result of Aim 2. To validate the estimates using current statistical models of breast and lung cancer progression based on demography and epidemiology.
Goal
We plan to develop new mathematical and computational machine-learning (ML) methods of estimating the evolution of solid tumors based on the distribution of genomic variants at diagnosis. The topic is of current interest, as it is believed that deciphering the past of tumors leads to understanding the causes of their growth and progression. The hypothesis we plan to demonstrate using our research is that the timeline of cancer progression before diagnosis, which is not observed, can be estimated quantitatively based on molecular data at diagnosis. The outcome may impact the policies of early detection and prevention, which have profound public health importance. In the planned research, we will focus on two of the most common and deadly cancers: lung cancer and breast cancer, which also seem to represent two different models of clonal evolution, as explained further on. To validate mathematical methods, we plan for DNA collection and sequencing but also we will use publicly available databases such as TCGA.
Tasks
Studying SFS for the clonal selective sweeps model extended to include chromosomal rearrangements and copy number variants (mode 1 selection)Branching process models for different selection modesExtension of SFS study to models in which the non-driver mutations are slightly deleterious (Tug-of-War model of McFarland); mode 2 selectionObtaining WES and WGS DNA sequencing data from specific types of breast (primary tumor and cancerous lymph node) and lung cancer (specimens from different regions of the primary)Processing data from WES and WGS sequencing. Obtaining tables of somatic mutations and estimates of copy number variation as well as coordinated clone pedigrees of the specimensComparison against data from public databases, such as TCGA, to determine presence of population-specific mutational and evolutionary patterns in data at our disposalRefinement of the scaling algorithm for lung cancer (data from TCGA database and sequencing results from task 4 and 5)Computations of the estimates based on the sequencing data obtained for breast cancer (Task 4 and 5)Validation using current statistical models of breast and lung cancer progression based on demography and epidemiology
Contractors

Project manager

Marek Kimmel

Contractors

Results
Articles