|
About this book
This book is intended to provide
fundamental
statistical concepts and R tools relevant to the analysis of genetic
data arising from population-based association studies. The
statistical methods described are broadly relevant to the field of
statistical genetics and include a large array of tools for a wide
variety of medical and public health
applications. Data
analytic methods include approaches to handling multiplicity, ambiguity
in haplotypic phase and underlying gene-gene and gene-environment
interactions. Several publicly
available data sets are used for illustration. [Read more on cover]
Table of contents
1 Genetic association studies
1.1 Overview of population-based
investigations
1.1.1
Types of investigations
1.1.2
Genotype versus gene expression
1.1.3
Population versus family-based investigations
1.1.4
Association versus population genetics
1.2 Data components and terminology
1.2.1 Genetic information
1.2.2 Traits
1.2.3 Covariates
1.3 Analytic challenges
1.3.1 Complex disease association studies
1.3.2
HIV
genotype association studies
1.3.3
Publicly available data used throughout the text
Problems
2 Elementary Statistical Principles
2.1 Background
2.1.1 Notation and basic probability concepts
2.1.2 Important epidemiological concepts
2.2 Measures and tests of association
2.2.1 Contingency table analysis for a binary trait
2.2.2 M-sample tests for a quantitative trait
2.2.3 Generalized linear model
2.3 Analytic challenges
2.3.1
Multiplicity and high dimensionality
2.3.2 Missing and unobservable data considerations
2.3.3 Race and ethnicity
2.3.4 Genetic models and models of association
Problems
3 Genetic data concepts and
tests
3.1 Linkage disequilibrium (LD)
3.1.1
Measures of LD: D' and r^2
3.1.2 LD
blocks and SNP tagging
3.1.3 LD
and population stratification
3.2 Hardy-Weinberg equilibrium (HWE)
3.2.1
Pearson's \chi^2 and Fisher's exact test
3.2.2 HWE
and population substructure
3.3 Quality control and preprocessing
3.3.1 SNP chips
3.3.2
Genotyping errors
3.3.3 Identifying population substructure
3.3.4 Relatedness
3.3.5 Accounting for unobservable substructure
Problems
4 Multiple comparison procedures
4.1 Measures of error
4.1.1
Family-wise error rate
4.1.2
False discovery rate
4.2 Single-step and step-down
adjustments
4.2.1
Bonferroni adjustment
4.2.2
Tukey and Scheffe tests
4.2.3
False discovery rate control
4.2.4
The q-value
4.3 Resampling-based methods
4.3.1
Free step-down resampling
4.3.2
Null unrestricted bootstrap
4.4 Alternative paradigms
4.4.1 Effective number of tests
4.4.2 Global tests
Problems
5 Methods for unobservable phase
5.1 Haplotype estimation
5.1.1 An
expectation-maximization algorithm
5.1.2
Bayesian haplotype reconstruction
5.2 Estimating and testing for haplotype-trait association
5.2.1
Two-stage approaches
5.2.2
Fully likelihood-based approach
Problems
Supplemental notes
Supplemental R scripts
6 Classification and regression trees
6.1 Building a tree
6.1.1
Recursive partitioning
6.1.2
Splitting rules
6.1.3
Defining inputs
6.2 Optimal trees
6.2.1
Honest estimates
6.2.2
Cost-complexity pruning
Problems
7 Additional topics in high-dimensional data anlaysis
7.1 Random forests
7.1.1 Variable importance
7.1.2
Missing data methods
7.1.3 Covariates
7.2 Logic regression
7.3 Multivariable adaptive regression
splines
7.4 Bayesian variable selection
7.5 Further readings
Problems
Appendix R basics
A.1 Getting started
A.2 Types of data objects
A.3 Importing data
A.4 Managing data
A.5 Installing packages
A.6 Additional help
References
Glossary of terms
Glossary of select R packages
Subject Index
Index of R Functions and Packages
|