PacktLib: R Statistical Application Development by Example Beginner's Guide

R Statistical Application Development by Example Beginner's Guide

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Data Characteristics

Questionnaire and its components

Experiments with uncertainty in computer science

R installation

Continuous distribution

Summary

Import/Export Data

data.frame and other formats

Time for action – understanding constants, vectors, and basic arithmetic

Time for action – matrix computations

Time for action – creating a list object

Time for action – creating a data.frame object

Summary

Data Visualization

Visualization techniques for categorical data

Time for action – bar charts in R

Time for action – dot charts in R

Time for action – the spine plot for the shift and operator data

Time for action – the mosaic plot for the Titanic dataset

Visualization techniques for continuous variable data

Time for action – using the boxplot

Time for action – understanding the effectiveness of histograms

Time for action – plot and pairs R functions

A brief peek at ggplot2

Time for action – qplot

Time for action – ggplot

Summary

Exploratory Analysis

Essential summary statistics

Time for action – the essential summary statistics for "The Wall" dataset

The stem-and-leaf plot

Time for action – the stem function in play

Letter values

Data re-expression

Bagplot – a bivariate boxplot

Time for action – the bagplot display for a multivariate dataset

The resistant line

Time for action – the resistant line as a first regression model

Smoothing data

Time for action – smoothening the cow temperature data

Median polish

Time for action – the median polish algorithm

Summary

Statistical Inference

Maximum likelihood estimator

Time for action – visualizing the likelihood function

Time for action – finding the MLE using mle and fitdistr functions

Confidence intervals

Time for action – confidence intervals

Hypotheses testing

Time for action – testing the probability of success

Time for action – testing proportions

Time for action – testing one-sample hypotheses

Time for action – testing two-sample hypotheses

Summary

Linear Regression Analysis

The simple linear regression model

Time for action – the arbitrary choice of parameters

Time for action – building a simple linear regression model

Time for action – ANOVA and the confidence intervals

Time for action – residual plots for model validation

Multiple linear regression model

Time for action – averaging k simple linear regression models

Time for action – building a multiple linear regression model

Time for action – the ANOVA and confidence intervals for the multiple linear regression model

Time for action – residual plots for the multiple linear regression model

Regression diagnostics

The multicollinearity problem

Time for action – addressing the multicollinearity problem for the Gasoline data

Model selection

Time for action – model selection using the backward, forward, and AIC criteria

Summary

The Logistic Regression Model

The binary regression problem

Time for action – limitations of linear regression models

Probit regression model

Time for action – understanding the constants

Logistic regression model

Time for action – fitting the logistic regression model

Time for action – The Hosmer-Lemeshow goodness-of-fit statistic

Model validation and diagnostics

Time for action – residual plots for the logistic regression model

Time for action – diagnostics for the logistic regression

Receiving operator curves

Time for action – ROC construction

Logistic regression for the German credit screening dataset

Time for action – logistic regression for the German credit dataset

Summary

Regression Models with Regularization

The overfitting problem

Time for action – understanding overfitting

Regression spline

Time for action – fitting piecewise linear regression models

Time for action – fitting the spline regression models

Ridge regression for linear models

Time for action – ridge regression for the linear regression model

Ridge regression for logistic regression models

Time for action – ridge regression for the logistic regression model

Another look at model assessment

Time for action – selecting lambda iteratively and other topics

Summary

Classification and Regression Trees

Recursive partitions

Time for action – partitioning the display plot

Time for action – building our first tree

The construction of a regression tree

Time for action – the construction of a regression tree

The construction of a classification tree

Time for action – the construction of a classification tree

Classification tree for the German credit data

Time for action – the construction of a classification tree

Pruning and other finer aspects of a tree

Time for action – pruning a classification tree

Summary

CART and Beyond

Improving CART

Time for action – cross-validation predictions

Bagging

Time for action – understanding the bootstrap technique

Time for action – the bagging algorithm

Random forests

Time for action – random forests for the German credit data

The consolidation

Time for action – random forests for the low birth weight data

Summary

References

Index