Sequential analysis

From Wikipedia, the free encyclopedia

In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data are evaluated as they are collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results are observed. Thus a conclusion may sometimes be reached at a much earlier stage than would be possible with more classical hypothesis testing or estimation, at consequently lower financial and/or human cost.

History

Sequential analysis was first developed by Abraham Wald^[1] with Jacob Wolfowitz, W. Allen Wallis, and Milton Friedman^[2] while at Columbia University's Statistical Research Group as a tool for more efficient industrial quality control during World War II. Its value to the war effort was immediately recognised, and led to its receiving a "restricted" classification. Another early contribution to the method was made by K.J. Arrow with D. Blackwell and M.A. Girshick.^[3]

A similar approach was independently developed at the same time by Alan Turing, as part of the Banburismus technique used at Bletchley Park, to test hypotheses about whether different messages coded by German Enigma machines should be connected and analysed together. This work remained secret until the early 1980s.^[4]

Applications of sequential analysis

Clinical trials

In a randomized trial with two treatment groups, group sequential testing may for example be conducted in the following manner: After n subjects in each group, i.e., a total of 2n subjects, are available, an interim analysis is conducted. That means, a statistical test is performed to compare the two groups, if the null hypothesis is rejected, the trial is terminated. Otherwise, the trial continues. Another n subjects per group are recruited. The statistical test is performed again, now including all 4n subjects. If the null is rejected, the trial is terminated. Otherwise, it continues with periodic evaluations until a maximum number of interim analyses have been performed. At this point, the last statistical test is conducted, and the trial is discontinued.^[5]

Other applications

Sequential analysis also has a connection to the problem of gambler's ruin that has been studied by, among others, Huyghens in 1657.^[6]

Step detection is the process of finding abrupt changes in the mean level of a time series or signal. It is usually considered as a special kind of statistical method known as change point detection. Often, the step is small and the time series is corrupted by some kind of noise, and this makes the problem challenging because the step may be hidden by the noise. Therefore, statistical and/or signal processing algorithms are often required. When the algorithms are run online as the data is coming in, especially with the aim of producing an alert, this is an application of sequential analysis.

Bias

The statistics of a trial that is stopped early at only n samples are different than a similar trial that is run for a predetermined number of trials, even if they end up collecting the same number of samples. If this is not accounted for when interpreting the sequential trial, the results will be biased. Therefore it is important that proper methodology is followed in order to avoid false conclusions. See ^[7] for a discussion.

Notes

↑ Wald, Abraham (June 1945). "Sequential Tests of Statistical Hypotheses". The Annals of Mathematical Statistics 16 (2): 117–186. doi:10.1214/aoms/1177731118. JSTOR 2235829.
↑ Berger, James (2008). "Sequential Analysis". The New Palgrave Dictionary of Economics, 2nd Ed. doi:10.1057/9780230226203.1513.
↑ Kenneth J. Arrow, David Blackwell and M.A. Girshick (1949). "Bayes and minimax solutions of sequential decision problems". Econometrica 17 (3/4): 213–244. doi:10.2307/1905525. JSTOR 1905525.
↑ Randell, Brian (1980), "The Colossus", A History of Computing in the Twentieth Century, p. 30, retrieved 22 March 2011
↑ Korosteleva, Olga (2008). Clinical Statistics: Introducing Clinical Trials, Survival Analysis, and Longitudinal Data Analysis (First ed.). Jones and Bartlett Publishers. ISBN 0-7637-5850-7.
↑ Gosh, B. K.; Sen, P. K. (1991). Handbook of Sequential Analysis. New York: Marcel Dekker. ISBN 9780824784089.
↑

References

Wald, Abraham (1947). Sequential Analysis. New York: John Wiley and Sons.
Ghosh, Bhaskar Kumar (1970). Sequential Tests of Statistical Hypotheses. Reading: Addison-Wesley.
Chernoff, Herman (1972). Sequential Analysis and Optimal Design. SIAM.
Siegmund, David (1985). Sequential Analysis. Springer Series in Statistics. New York: Springer-Verlag. ISBN 0-387-96134-8.
Bakeman, R., Gottman, J.M., (1997) Observing Interaction: An Introduction to Sequential Analysis, Cambridge: Cambridge University Press

Jennison, C. and Turnball, B.W (2000) Group Sequential Methods With Applications to Clinical Trials. Chapman & Hall/CRC.

Whitehead, J. (1997). The Design and Analysis of Sequential Clinical Trials, 2nd Edition. John Wiley & Sons.

External links

Sequential Analysis: Design Methods & Applications Journal
Course given by Rebecca Betensky at Harvard University, lecture note slides
Software for conducting sequential analysis and applications of sequential analysis in the study of group interaction in computer-mediated communication by Dr. Allan Jeong at Florida State University

v t e Design of experiments

Scientific Method	Scientific experiment Statistical design Control Internal & external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size

Treatment & Blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable

Models & Inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison

Designs: Completely Randomized	Factorial Fractional factorial Plackett-Burman Taguchi Response surface methodology Polynomial & rational modeling Box-Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test

Glossary Category Statistics portal Statistical outline Statistical topics

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Designing studies	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling Stratified sampling Cluster sampling Opinion poll Questionnaire

Controlled experiment	Design of experiments Randomized experiment Random assignment Replication Blocking Factorial experiment Optimal design

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory	Sampling distribution Order statistic Scan statistic Record value Sufficiency Completeness Exponential family Permutation test (Randomization test) Empirical distribution Bootstrap U statistic Efficiency Asymptotics Robustness

Frequentist inference	Unbiased estimator (Mean unbiased minimum variance, Median unbiased) Biased estimators (Maximum likelihood, Method of moments, Minimum distance, Density estimation) Confidence interval Testing hypotheses Power Parametric tests (Likelihood-ratio, Wald, Score)

Specific tests	Z (normal) Student's t-test F Goodness of fit (Chi-squared, G, Sample source, sample normality, Skewness & kurtosis Normality, Model comparison, Model quality) Signed-rank (1-sample, 2-sample, 1-way anova) Shapiro–Wilk Kolmogorov–Smirnov

Bayesian inference	Bayesian probability Prior Posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator

Correlation and regression analysis

Correlation	Pearson product–moment correlation Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models MARS

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) Binomial Poisson

Partition of variance	Analysis of variance (ANOVA) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data

Multivariate statistics

Time series analysis

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration

Specific tests	Granger causality Q-Statistic Durbin–Watson

Time domain	ACF PACF XCF ARMA model ARIMA model ARCH Vector autoregression

Frequency domain	Spectral density estimation Fourier analysis

Survival analysis

Applications

Biostatistics	Bioinformatics Clinical trials & studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process & Quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Outline
Index

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.