Genome wide association studies

Background

Genome Wide Association Studies (GWAS) make use of genetic data that are a collection of genetic variants such as single nucleotide polymorphism (SNP) across large population. Such data can be used to find which genetic variants are associated with certain phenotype traits.

Analysis requires two sources of information, genetic variant data for instance SNP for multiple individuals and data about their phenotype. The goal is to find which regions in the genome measured by means of SNPs co-vary with certain trait.

Data

I will be using HapMap data cite:HapMap2010 which can be downloaded from link. The map file contains SNP names, and ped file is for genotype information.

hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.map
hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.ped

I will use R package GenABEL cite:Aulchenko2007. First convert data to GenABEL format

  library(GenABEL)
  convert.snp.ped(
      pedfile = '/home/mateusz/Downloads/GWAS-Phaselll-01-2009/hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.ped',
      mapfile = '/home/mateusz/Downloads/GWAS-Phaselll-01-2009/hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.map',
      outfile = '/home/mateusz/Downloads/GWAS-Phaselll-01-2009/hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.out')

Load phenotype data, here sex and pedigree

  library(GenABEL)
  # Load tab-delimited file with phenotype data and fix factor levels
  pheno = read.table('/home/mateusz/Downloads/GWAS-Phaselll-01-2009/relationships_w_pops_121708.txt',
                     header = TRUE,stringsAsFactors = FALSE)
  pheno$sex = pheno$sex -1
  write.table(pheno,
              '/home/mateusz/Downloads/GWAS-Phaselll-01-2009/pheno.txt')

  # Load combined data
  data = load.gwaa.data(
      phenofile= "/home/mateusz/Downloads/GWAS-Phaselll-01-2009/pheno.txt",
      genofile = "/home/mateusz/Downloads/GWAS-Phaselll-01-2009/hapmap3_pop/hapmap3_r2_b36_fwd.CEU.qc.poly.out",
      id='IID')

  # Quality check
  qc = check.marker(data)

  # Select samples and SNPs that are ok by default filtering
  fdata = data[qc$idok,qc$snpok]
  mdata = phdata(fdata)

  nids(fdata)  # Number of individuals
  nsnps(fdata) # Number of SNPs

  # Fit logistic model, makes no sense, but just test the method
  result = scan.glm("sex~CRSNP",
                 data=fdata,
                 family=binomial(),
                 snps=(1:1000)) # Otherwise too long for this test

  result$map[result$P1df < 0.05] # P-values testing association to select SNPs

bibliographystyle:unsrtnat bibliography:gwas.bib

 Share!

 
comments powered by Disqus