Row- and column-major order

In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory.

The difference between the orders lies in which elements of an array are contiguous in memory. In a row-major order, the consecutive elements of a row reside next to each other, whereas the same holds true for consecutive elements of a column in a column-major order. While the terms allude to the rows and columns of a two-dimensional array, i.e. a matrix, the orders can be generalized to arrays of any dimension by noting that the terms row-major and column-major are equivalent to lexicographic and colexicographic orders, respectively.

Data layout is critical for correctly passing arrays between programs written in different programming languages. It is also important for performance when traversing an array because modern CPUs process sequential data more efficiently than nonsequential data. This is primarily due to CPU caching. Contiguous access makes it also possible to use SIMD instructions that operate on vectors of data. In some media such as tape or NAND flash memory, accessing sequentially is orders of magnitude faster than nonsequential access.

Explanation and example

The terms row-major and column-major stem from the terminology related to grouping objects. A general way to order objects with many attributes is to first group and order them by one attribute, and then, within each such group, group and order them by another attribute, etc. If more than one attribute participate in ordering, the first would be called major and the last minor. If two attributes participate in ordering, it is sufficient to name only the major attribute.

In the case of arrays, the attributes are the indices along each dimension. For matrices, the first index indicates the row, and the second indicates the column, e.g., given a matrix A , a_1,2 is in its first row and second column. This convention is carried over to the syntax in programming languages,^[1] although often with indexes starting at 0 instead of 1.^[2]

Even though the row is indicated by the first index and the column by the second index, no grouping order between the dimensions is implied yet. We can then choose to group and order the indices either row-major or column-major. The terminology can be applied to even higher dimensional arrays. Row-major grouping starts from the leftmost index and column-major from the rightmost index, leading to lexicographic and colexicographics orders, respectively.

For example, for the array

A={\begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\end{bmatrix}}

The two possible ways are:

Row-major order,
e.g., C (0-indexed)
Address	Access	Value
0	`A[0][0]`	$a_{11}$
1	`A[0][1]`	$a_{12}$
2	`A[0][2]`	$a_{{13}}$
3	`A[1][0]`	$a_{21}$
4	`A[1][1]`	$a_{{22}}$
5	`A[1][2]`	$a_{23}$

Column-major order,
e.g., Fortran (1-indexed)
Address	Access	Value
1	`A(1,1)`	$a_{11}$
2	`A(2,1)`	$a_{21}$
3	`A(1,2)`	$a_{12}$
4	`A(2,2)`	$a_{{22}}$
5	`A(1,3)`	$a_{{13}}$
6	`A(2,3)`	$a_{23}$

To select the correct element one must know whether an array A[a][b] in a particular language is stored in a row-major (a subscript indexes rows, b subscript indexes columns) or in a column-major mode (a subscript indexes columns, b subscript indexes rows).

Programming languages and libraries

Programming languages or their standard libraries that support multi-dimensional arrays typically have a native row-major or column-major storage order for these arrays.

Row-major order is used in C/C++/Objective-C (for C-style arrays), Mathematica, PL/I, Pascal, Speakeasy, SAS, Rasdaman,^[3] and C#/CLI/.Net.

Column-major order is used in Fortran, OpenGL and OpenGL ES, MATLAB,^[4] GNU Octave, S-Plus,^[5] R,^[6] Julia,^[7] Scilab, and Eigen^[8].

A typical alternative is to use Iliffe vectors, which store elements in the same row contiguously (like row-major order), but not the rows themselves. They are used in Java,^[9] Scala,^[10] and Swift.

Support for multi-dimensional arrays may also be provided by external libraries, which may even support arbitrary orderings, where each dimension has a stride value, and row-major or column-major are just two possible resulting interpretations.

Row-major order is the default in NumPy^[11] (for Python).

Torch (for Lua, which has no built-in support^[12]) changed from column-major^[13] to row-major^[14] default order.

Transposition

As exchanging the indices of an array is the essence of array transposition, an array stored as row-major but read as column-major (or vice versa) will appear transposed. As actually performing this rearrangement in memory is typically an expensive operation, some systems provide options to specify individual matrices as being stored transposed. The programmer must then decide whether or not to rearrange the elements in memory, based on the actual usage (including the number of times that the array is reused in a computation).

For example, the Basic Linear Algebra Subprograms functions are passed flags indicating which arrays are transposed.^[15]

Address calculation in general

The concept generalizes to arrays with more than two dimensions.

For a d-dimensional $N_1 \times N_2 \times \cdots \times N_d$ array with dimensions N_k (k=1...d), a given element of this array is specified by a tuple $(n_1, n_2, \ldots, n_d)$ of d (zero-based) indices $n_k \in [0,N_k - 1]$ .

In row-major order, the last dimension is contiguous, so that the memory-offset of this element is given by:

n_d + N_d \cdot (n_{d-1} + N_{d-1} \cdot (n_{d-2} + N_{d-2} \cdot (\cdots + N_2 n_1)\cdots))) = \sum_{k=1}^d \left( \prod_{\ell=k+1}^d N_\ell \right) n_k

In column-major order, the first dimension is contiguous, so that the memory-offset of this element is given by:

n_1 + N_1 \cdot (n_2 + N_2 \cdot (n_3 + N_3 \cdot (\cdots + N_{d-1} n_d)\cdots))) = \sum_{k=1}^d \left( \prod_{\ell=1}^{k-1} N_\ell \right) n_k

For a given order, the stride in dimension k is given by the multiplication value in parentheses before index n_k in the right hand-side summations above.

More generally, there are d! possible orders for a given array, one for each permutation of dimensions (with row-major and column-order just 2 special cases), although the lists of stride values are not necessarily permutations of each other, e.g., in the 2-by-3 example above, the strides are (3,1) for row-major and (1,2) for column-major.

References

↑ "Arrays and Formatted I/O". FORTRAN Tutorial. Retrieved 19 November 2016.
↑ "Why numbering should start at zero". E. W. Dijkstra Archive. Retrieved 2 February 2017.
↑ "Internal array representation in rasdaman". rasdaman.org. Retrieved 8 June 2017.
↑ MATLAB documentation, MATLAB Data Storage (retrieved from Mathworks.co.uk, January 2014).
↑ Spiegelhalter et al. (2003, p. 17): Spiegelhalter, David; Thomas, Andrew; Best, Nicky; Lunn, Dave (January 2003), "Formatting of data: S-Plus format", WinBUGS User Manual (Version 1.4 ed.), Robinson Way, Cambridge CB2 2SR, UK: MRC Biostatistics Unit, Institute of Public Health, PDF document
↑ An Introduction to R, Section 5.1: Arrays (retrieved March 2010).
↑ "Multi-dimensional Arrays". Julia. Retrieved 6 February 2016.
↑ "Eigen: Storage orders". eigen.tuxfamily.org. Retrieved 2017-06-27.
↑ "Java Language Specification". Oracle. Retrieved 13 February 2016.
↑ "object Array". Scala Standard Library. Retrieved 1 May 2016.
↑ "The N-dimensional array (ndarray)". SciPy.org. Retrieved 3 April 2016.
↑ "11.2 – Matrices and Multi-Dimensional Arrays". Retrieved 6 February 2016.
↑ "Tensor". Retrieved 6 February 2016.
↑ "Tensor". Torch Package Reference Manual. Retrieved 8 May 2016.
↑ "BLAS (Basic Linear Algebra Subprograms)". Retrieved 2015-05-16.

Sources

Donald E. Knuth, The Art of Computer Programming Volume 1: Fundamental Algorithms, third edition, section 2.2.6 (Addison-Wesley: New York, 1997).

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.