Projects

Development of methods for bias correction of gene expression measurements

Grant NCN: Sonata 2016/23/D/ST7/03665
Start date: 25/08/2017; End date: 25/08/2020
Status: 31 month/s left

Introduction
Modern medicine increasingly reaches for the information stored in the genome of patients and associates the expression level of certain genes with specific diseases and treatment plans. However both in basic research and diagnostics one has to take into account the fact that the measurements of gene expression levels significantly depend on specific structural features of the RNA fragments studied and the specificity of the measurement method used [1]. For this reason, the comparison of signal levels obtained for different RNA fragments or identical fragments in different samples can lead to false conclusions. The problem of differences in the structure of nucleic acids significantly reduces the accuracy of the experiments in molecular biology and may lead to a reduction in the sensitivity of differentially expressed genes detection [2]. It can also  lead to false positives during detection of overrepresented sequence fragments in chip-on-chip experiments [3] and other.            Factors that affect signal levels in studies of gene expression profile mainly include differences in the nucleotide composition and the length of the tested RNA fragments and the differences resulting from variations in the degradation rate of the studied molecules. All of those factors affect both the process of cDNA synthesis and amplification of the test genetic material [1]. Factors associated with specific research methods are also of crucial importance, including differences in nucleotide composition of microarray probes, which affect the level of non-specific hybridization, or the presence of certain motifs within the probe sequences. This group of factors also includes the use of specific reagents, for example oligo-dT primers for cDNA synthesis, which also introduce a certain bias [4].Existing methods of bias correction often focus on one individual factor and are specific to a particular method and research platform [3, 5-8]. Due to the differences in the correction algorithm  it is impossible to combine various methods, in addition, most of the factors that affect the signal levels are currently too poorly described, to perform an effective signal correction. Our preliminary studies show that the influence of described factors should not be overlooked [4], which also applies to experiments involving a large number of replicates, either technical or biological. Additionally, we were able to demonstrate that, in the process of detection of differentially expressed genes the use of even simple statistical methods that take into account the structure of the probe and examined fragments of RNA, allows to obtain much better results [2].In  this project we plan to develop a comprehensive method that allows to reduce bias in gene expression level estimates. The method will be based on mathematical models of associated biochemical processes, and it will be implemented as a publicly available software. We are planning to create a set of input-output models for each stage of the measurement procedure with the possibility to estimate or determine experimentally certain parameter values. These models will be used to create correction curves for subsequent processing steps. Finally a comprehensive mathematical model will be build, based on input-output models obtained for each step.The result of the project will be a new method of data processing applicable for experiments in which nucleic acid concentrations are estimated, as well as the methodology for assessing the impact of technical factors on the obtained measurements. The implementation of these methods in the form of publicly available software can help to improve the accuracy of such measurements increasing their capabilities in basic research and diagnostic studies.  ADDIN EN.REFLIST 1.   Jaksik, R., et al., Microarray experiments and factors which affect their reliability. Biology Direct, 2015. 10.2.   Jaksik, R., W. Bensz, and J. Smieja, Nucleotide Composition Based Measurement Bias in High Throughput Gene Expression Studies, in Man–Machine Interactions 4, A. Gruca, et al., Editors. 2015, Springer International Publishing. p. 205-214.3.   Royce, T.E., J.S. Rozowsky, and M.B. Gerstein, Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics, 2007. 23(8): p. 988-97.4.   Jaksik, R., et al., Sources of High Variance between Probe Signals in Affymetrix Short Oligonucleotide Microarrays. Sensors, 2014. 14(1): p. 532-548.5.   Hulsman, M., et al., Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization. BMC Bioinformatics, 2010. 11: p. 156.6.   Fasold, M. and H. Binder, AffyRNADegradation: control and correction of RNA quality effects in GeneChip expression data. Bioinformatics, 2013. 29(1): p. 129-31.7.   Benjamini, Y. and T.P. Speed, Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res, 2012. 40(10): p. e72.8.   Gao, L., et al., Length bias correction for RNA-seq data in gene set analyses. Bioinformatics, 2011. 27(5): p. 662-9.
Goal
The primary goal of this project is to develop a comprehensive method that allows to reduce bias in gene expression level estimates. The method will be based on mathematical models of associated biochemical processes, and it will be implemented as a publicly available software. We are planning to create a set of input-output models for each stage of the measurement procedure with the possibility to estimate or determine experimentally certain parameter values. These models will be used to create correction curves for subsequent processing steps. Finally a comprehensive mathematical model will be build, based on individual input-output models obtained for each step.
Tasks
Identification of factors that that affect signal levels in gene expression level studiesDevelopment of a mathematical model that describes the influence of individual factors on the signal levelsEstimation of the model parameters based on data obtained using custom oligonucleotide microarrays and RT-qPCR experimentsDevelopment of data correction algorithm based on created mathematical model Validation of the method by the use of publicly available testing data
Contractors

Project manager

Roman Jaksik

Contractors

Anna Lalik, Krzysztof Puszynski

Results
Articles