Genome-wide association study on gene pathway identification and cognitive function prediction
- Li Xing, University of Saskatchewan
- Kyle Gardiner, University of Saskatchewan
Undergraduate Team Members
- Hanye Zhong, University of British Columbia
- Mathew Zbitniff, University of Saskatchewan
- Roham Asgari, University of Saskatchewan
We will work on large-scale genomic data from the Alzheimer’s Disease Neuroimaging Initiative Study, where we have millions of genotype variables and multiple cognitive outcomes, as well as information regarding participants' demographical characteristics and social-economic status.
Based on such a rich information source, we would like to work on two aims: (1) to build a machine learning model for simultaneous prediction of multiple cognitive outcomes; (2) to identify gene pathways linked to participants' cognitive functions. To realize Aim 1, we will extend our previous research output (a peer-review publication on Bioinformatics and an R package/software on CRAN) to make the model accommodate data of much higher dimensions. To realize Aim 2, we will employ all sorts of gene network analysis toolkits to determine the associated pathway of the identified important genes from the prediction model.
Our work is fascinating and also highly important. Students will need to learn cloud computation and manage large-scale genomic data, which are not covered by regular courses but are essential in analyzing real-world data. They need to work together with a group containing students with different expertise. They are expected to share their knowledge and brainstorm new ideas to solve difficulties. Those technical and soft skills will help them grow into data scientists. The research outputs will help diagnose and prevent diseases related to cognitive functions, such as Alzheimer’s.
This VXML project was completed by the participants listed above, as described in the following documents
- Expected team size: 2
- Student Experience Level: Intermediate: students who have an introduction to proofs
- STAT 345 Design and analysis of experiment
- STAT 344 Linear regression course
- R programming
- Github on version control
- RMarkdown for reproducible research
- Basic understanding on cloud computation