Develop computational models to predict regulatory effects of sequence variation genome-wide and gain insights into mechanism of regulatory variation in specific diseases.
Experimental data is continuously incorporated into our contextual analysis of transcription factor occupancy (CATO) scores for functional non-coding variation affecting TF activity (Maurano et al., Nat Genet. 2015). This enables the extension of this modeling approach by yielding quantitative information linking regulatory variations to expression levels and chromatin structure, providing a rich dataset linking sequence to function. This would enable the identification of functional noncoding variations and accurate prediction of which and how nearby genes are affected.
We also apply linear regression models to explain and predict complex transcriptional behavior. Read more in Brosh et al., bioRxiv 2022.