Statistical theory

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics.^[1]^[2] The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.^[2]^[3]

Apart from philosophical considerations about how to make statistical inferences and decisions, much of statistical theory consists of mathematical statistics, and is closely linked to probability theory, to utility theory, and to optimization.

Scope

Statistical theory provides an underlying rationale and provides a consistent basis for the choice of methodology used in applied statistics.

Modelling

Statistical models describe the sources of data and can have different types of formulation corresponding to these sources and to the problem being studied. Such problems can be of various kinds:

Sampling from a finite population
Measuring observational error and refining procedures
Studying statistical relations

Statistical models, once specified, can be tested to see whether they provide useful inferences for new data sets.^[4] Testing a hypothesis using the data that was used to specify the model is a fallacy, according to the natural science of Bacon and the scientific method of Peirce.

Data collection

Statistical theory provides a guide to comparing methods of data collection, where the problem is to generate informative data using optimization and randomization while measuring and controlling for observational error.^[5]^[6]^[7] Optimization of data collection reduces the cost of data while satisfying statistical goals,^[8]^[9] while randomization allows reliable inferences. Statistical theory provides a basis for good data collection and the structuring of investigations in the topics of:

Design of experiments to estimate treatment effects, to test hypotheses, and to optimize responses.^[8]^[10]^[11]
Survey sampling to describe populations^[12]^[13]^[14]

Summarising data

The task of summarising statistical data in conventional forms (also known as descriptive statistics) is considered in theoretical statistics as a problem of defining what aspects of statistical samples need to be described and how well they can be described from a typically limited sample of data. Thus the problems theoretical statistics considers include:

Choosing summary statistics to describe a sample
Summarising probability distributions of sample data while making limited assumptions about the form of distribution that may be met
Summarising the relationships between different quantities measured on the same items with a sample

Interpreting data

Besides the philosophy underlying statistical inference, statistical theory has the task of considering the types of questions that data analysts might want to ask about the problems they are studying and of providing data analytic techniques for answering them. Some of these tasks are:

Summarising populations in the form of a fitted distribution or probability density function
Summarising the relationship between variables using some type of regression analysis
Providing ways of predicting the outcome of a random quantity given other related variables
Examining the possibility of reducing the number of variables being considered within a problem (the task of Dimension reduction)

When a statistical procedure has been specified in the study protocol, then statistical theory provides well-defined probability statements for the method when applied to all populations that could have arisen from the randomization used to generate the data. This provides an objective way of estimating parameters, estimating confidence intervals, testing hypotheses, and selecting the best. Even for observational data, statistical theory provides a way of calculating a value that can be used to interpret a sample of data from a population, it can provide a means of indicating how well that value is determined by the sample, and thus a means of saying corresponding values derived for different populations are as different as they might seem; however, the reliability of inferences from post-hoc^[15] observational data is often worse than for planned randomized generation of data.

Applied statistical inference

Statistical theory provides the basis for a number of data analytic methods that are common across scientific and social research. Some of these are: Interpreting data is an important objective of statistical research:

Estimating parameters
Testing statistical hypotheses
Providing a range of values instead of a point estimate
Regression analysis

Many of the standard methods for these tasks rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a give data-set.

Notes

↑ Cox & Hinkley (1974, p.1)
1 2 Rao, C. R. (1981). "Foreword". In Arthanari, T. S.; Dodge, Yadolah. Mathematical Programming in Statistics. New York: John Wiley & Sons. pp. vii–viii. ISBN 0-471-08073-X. MR 607328.
↑ Lehmann & Romano (2005)
↑ Freedman (2009)
↑ Charles Sanders Peirce and Joseph Jastrow (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83. http://psychclassics.yorku.ca/Peirce/small-diffs.htm
↑ Hacking, Ian (September 1988). "Telepathy: Origins of Randomization in Experimental Design". Isis. 79 (3): 427–451. JSTOR 234674. MR 1013489. doi:10.1086/354775.
↑ Stephen M. Stigler (November 1992). "A Historical View of Statistical Concepts in Psychology and Educational Research". American Journal of Education. 101 (1): 60–70. doi:10.1086/444032.
1 2 Atkinson et al. (2007)
↑ Kiefer, Jack Carl (1985). Brown, Lawrence D.; Olkin, Ingram; Sacks, Jerome; et al., eds. Jack Carl Kiefer: Collected papers III—Design of experiments. Springer-Verlag and the Institute of Mathematical Statistics. pp. 718+xxv. ISBN 0-387-96004-X.
↑ Hinkelmann & Kempthorne (2008)
↑ Bailey (2008).
↑ Kish (1965)
↑ Cochran (1977)
↑ Särndal et al. (1992)
↑ Ijsmi, Editor (2016-11-14). "Post-hoc and multiple comparison test – An overview with SAS and R Statistical Package". International Journal of Statistics and Medical Informatics. 1 (1): 1–9.

References

Atkinson, A. C.; Donev, A. N.; Tobias, R. D. (2007). Optimum Experimental Designs, with SAS. Oxford University Press. pp. 511+xvi. ISBN 978-0-19-929660-6.
Bailey, R. A (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.
Cochran, William G. (1977). Sampling Techniques (Third ed.). John Wiley & Sons. ISBN 0-471-16240-X.
Cox, D.R., Hinkley, D.V. (1974) Theoretical Statistics, Chapman & Hall. ISBN 0-412-12420-3
Freedman, David A. (2009). Statistical Models: Theory and Practice (Second ed.). Cambridge University Press. ISBN 978-0-521-67105-7.
Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments. I, II (Second ed.). John Wiley & Sons. ISBN 978-0-470-38551-7.
Kish, L. (1965), Survey Sampling, John Wiley & Sons. ISBN 0-471-48900-X
Lehmann, E. L.; Romano, J. P. (2005), Testing Statistical Hypotheses (third ed.), Springer .
Särndal, Carl-Erik, Swensson, Bengt, and Wretman, Jan (1992). Model Assisted Survey Sampling. Springer-Verlag. ISBN 0-387-40620-4.

Peirce, C. S.
- (1876), "Note on the Theory of the Economy of Research" in Coast Survey Report, pp. 197–201 (Appendix No. 14), NOAA PDF Eprint. Reprinted 1958 in Collected Papers of Charles Sanders Peirce 7, paragraphs 139–157 and in 1967 in Operations Research 15 (4): pp. 643–648, Abstract from JSTOR.
- (1967) Peirce, C. S. (1967). "Note on the Theory of the Economy of Research". Operations Research. 15 (4): 643. doi:10.1287/opre.15.4.643.
- (1877–1878), "Illustrations of the Logic of Science"
- (1883), "A Theory of Probable Inference"
- and Jastrow, Joseph (1885), "On Small Differences in Sensation" in Memoirs of the National Academy of Sciences 3: pp. 73–83. Eprint.
Bickel, Peter J. & Doksum, Kjell A. (2001). Mathematical Statistics: Basic and Selected Topics. I (Second (updated printing 2007) ed.). Pearson Prentice-Hall. ISBN 0-13-850363-X.
Davison, A.C. (2003) Statistical Models. Cambridge University Press. ISBN 0-521-77339-3
Lehmann, Erich (1983). Theory of Point Estimation.
Liese, Friedrich & Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer. ISBN 0-387-73193-8.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode
Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range
Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data
Survey methodology	Sampling stratified cluster Standard error Opinion poll Questionnaire
Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment
Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F
Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

Statistical theory

Scope

Modelling

Data collection

Summarising data

Interpreting data

Applied statistical inference

See also

Notes

References

Further reading