(springer)

Applied Statistical Genetics with R
for Population-based Association Studies

Series: Springer Use R
Author: Andrea S. Foulkes

    About this book | Table of contents 

About this book

This book is intended to provide fundamental statistical concepts and R tools relevant to the analysis of genetic data arising from population-based association studies.  The statistical methods described are broadly relevant to the field of statistical genetics and include a large array of tools for a wide variety of medical and public health applications.  Data analytic methods include approaches to handling multiplicity, ambiguity in haplotypic phase and underlying gene-gene and gene-environment interactions.  Several publicly available data sets are used for illustration. [Read more on cover]

Table of contents

1 Genetic association studies 
    1.1  Overview of population-based investigations 
        1.1.1  Types of investigations 
        1.1.2  Genotype versus gene expression 
        1.1.3  Population versus family-based investigations
        1.1.4  Association versus population genetics 
    1.2  Data components and terminology
        1.2.1  Genetic information 

        1.2.2  Traits  
        1.2.3  Covariates  
    1.3  Analytic challenges 
        1.3.1
 Complex disease association studies
        1.3.2  HIV genotype association studies 
        1.3.3
 Publicly available data used throughout the text
    Problems 

2  Elementary Statistical Principles
    2.1  Background

        2.1.1 Notation and basic probability concepts
        2.1.2  Important epidemiological concepts 
  
 2.2  Measures and tests of association
        2.2.1  Contingency table analysis for a binary trait
        2.2.2  M-sample tests for a quantitative trait

        2.2.3  Generalized linear model

    2.3  Analytic challenges
        2.3.1
 Multiplicity and high dimensionality
        2.3.2  
Missing and unobservable data considerations
        2.3.3  Race and ethnicity
        2.3.4  Genetic models and models of association 
    Problems 

3  Genetic data concepts and tests
    3.1  Linkage disequilibrium (LD)
        3.1.1  Measures of LD: D' and r^2
        3.1.2  LD blocks and SNP tagging
        3.1.3  LD and population stratification
    3.2  Hardy-Weinberg equilibrium (HWE) 
        3.2.1  Pearson's \chi^2 and Fisher's exact test
        3.2.2  HWE and population substructure 
    3.3  Quality control and preprocessing
        3.3.1  SNP chips 
        3.3.2  Genotyping errors
        3.3.3  Identifying population substructure
        3.3.4  Relatedness 
       
3.3.5  Accounting for unobservable substructure
Problems

4  Multiple comparison procedures
    4.1  Measures of error
        4.1.1  Family-wise error rate
        4.1.2  False discovery rate
    4.2  Single-step and step-down adjustments
        4.2.1  Bonferroni adjustment 
        4.2.2  Tukey and Scheffe tests
        4.2.3  False discovery rate control

        4.2.4  The q-value

    4.3  Resampling-based methods
        4.3.1  Free step-down resampling
        4.3.2  Null unrestricted bootstrap
    4.4  Alternative paradigms  
   
    4.4.1  Effective number of tests
        4.4.2  Global tests   
  
Problems 

5  Methods for unobservable phase
    5.1  Haplotype estimation
        5.1.1  An expectation-maximization algorithm
        5.1.2  Bayesian haplotype reconstruction
    5.2  Estimating and testing for haplotype-trait association 
        5.2.1  Two-stage approaches
        5.2.2  Fully likelihood-based approach
Problems 
Supplemental notes
Supplemental R scripts 

6  Classification and regression trees
    6.1  Building a tree
        6.1.1  Recursive partitioning 
        6.1.2  Splitting rules
        6.1.3  Defining inputs
    6.2  Optimal trees
        6.2.1  Honest estimates 
        6.2.2  Cost-complexity pruning
Problems 

7  Additional topics in high-dimensional data anlaysis
    7.1  Random forests
        7.1.1  Variable importance
        7.1.2  Missing data methods
        7.1.3  Covariates
    7.2  Logic regression

    7.3  Multivariable adaptive regression splines
    7.4  Bayesian variable selection
    7.5  Further readings
Problems

Appendix R basics 
    A.1  Getting started
    A.2  Types of data objects
    A.3  Importing data 
   
A.4  Managing data 
   
A.5  Installing packages
    A.6  Additional help

References

Glossary of terms 

Glossary of select R packages

Subject Index

Index of R Functions and Packages 

    Links

      A.S. Foulkes homepage

        ASG home
        Data
       
Sample Chapter
        Examples (R code)

        UseR! 2009 Tutorial Handout
 


Made with Nvu