Bivariate analysis

Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. This scatterplot suggests there are generally two "types" of eruptions: short-wait-short-duration, and long-wait-long-duration.

Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis.^[1] It involves the analysis of two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them.^[1] In order to see if the variables are related to one another, it is common to measure how those two variables simultaneously change together (see also covariance).^[2]

Bivariate analysis can be helpful in testing simple hypotheses of association and causality – checking to what extent it becomes easier to know and predict a value for the dependent variable if we know a case's value of the independent variable (see also correlation).^[2]

Bivariate analysis can be contrasted with univariate analysis in which only one variable is analysed.^[1] Furthermore, the purpose of a univariate analysis is descriptive. Subgroup comparison – the descriptive analysis of two variables – can be sometimes seen as a very simple form of bivariate analysis (or as univariate analysis extended to two variables).^[1] The major differentiating point between univariate and bivariate analysis, in addition to the latter's looking at more than one variable, is that the purpose of a bivariate analysis goes beyond simply descriptive: it is the analysis of the relationship between the two variables.^[1] Bivariate analysis is a simple (two variable) special case of multivariate analysis (where multiple relations between multiple variables are examined simultaneously).^[1]

Types of Bivariate analysis

Common forms of bivariate analysis involve creating a percentage table or a scatterplot graph and computing a simple correlation coefficient.^[1] The types of analysis that are suited to particular pairs of variables vary in accordance with the level of measurement of the variables of interest (e.g. nominal/categorical, ordinal, interval/ratio). If the dependent variable—the one whose value is determined to some extent by the other, independent variable— is a categorical variable, such as the preferred brand of cereal, then probit or logit regression (or multinomial probit or multinomial logit) can be used. If both variables are ordinal, meaning they are ranked in a sequence as first, second, etc., then a rank correlation coefficient can be computed. If just the dependent variable is ordinal, ordered probit or ordered logit can be used. If the dependent variable is continuous—either interval level or ratio level, such as a temperature scale or an income scale—then simple regression can be used.

If both variables are time series, a particular type of causality known as Granger causality can be tested for, and vector autoregression can be performed to examine the intertemporal linkages between the variables.

References

↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Earl R. Babbie, The Practice of Social Research, 12th edition, Wadsworth Publishing, 2009, ISBN 0-495-59841-0, pp. 436–440
↑ 2.0 2.1 Bivariate Analysis, Sociology Index

Statistics

Descriptive statistics

Continuous data

Location	Mean arithmetic geometric harmonic Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Study design	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling stratified cluster Opinion poll Questionnaire

Controlled experiments	Design optimal Randomized Random assignment Replication Blocking Factorial experiment

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory

Frequentist inference

Confidence interval Testing hypotheses Power

Unbiased estimators	Mean unbiased minimum-variance Median unbiased

Biased estimators	Maximum likelihood Method of moments Minimum distance Density estimation

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F Shapiro–Wilk Kolmogorov–Smirnov

Goodness of fit	Chi-squared G Sample source (Anderson–Darling) Sample normality (Shapiro–Wilk) Skewness / kurtosis normality (Jarque-Bera) Model comparison (Likelihood-ratio) Model quality (Akaike criterion)

Signed-rank	1-sample (Wilcoxon) 2-sample (Mann–Whitney U) 1-way anova (Kruskal–Wallis)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

Bivariate analysis

Types of Bivariate analysis

See also

References