Video compression picture types

From Wikipedia, the free encyclopedia

The three major picture types found in typical video compression designs are I(ntra) pictures, P(redicted) pictures, and B(i-predictive) pictures (or B(i-directional) pictures). They are also commonly referred to as I frames, P frames, and B frames. In older material, the term "bi-directional" rather than "bi-predictive" is dominant.

In video compression formats, such as in ITU-T VCEG or ISO/IEC MPEG video standards, often only the differences between pictures will be encoded. For example, in a scene in which a person walks past a stationary background, only the moving region will need to be represented (either using motion compensation or as image data or as a combination of the two, depending on which representation requires fewer bits to adequately represent the picture). The parts of the scene that are not changing need not be sent repeatedly.

Strictly speaking, the term picture is a more general term than frame, as a picture can be either a frame or a field, where a frame is essentially an image captured at some instant in time and a field is the set of every-other line that would form an image at some instant in time. When sending video in interlaced-scan format, the coding of pictures as individual fields is often used rather than the coding of complete frames. Informally, the term "frame" is often used when the actual intent is the more general term "picture".

Typically, pictures are segmented into macroblocks, and individual prediction types can be selected on a macroblock basis rather than being the same for the entire picture, as follows:

I pictures can contain only intra macroblocks
P pictures can contain either intra macroblocks or predicted macroblocks
B pictures can contain intra, predicted, or bi-predicted macroblocks

Furthermore, in the most recent video codec standard H.264, the picture can be segmented into smaller regions called slices and instead of using I, B and P picture type selections, the encoder can choose the prediction style distinctly on each individual slice.

[edit] Intra pictures (or slices or I-frames or key frames)

For more information about Key Frames, see Keyframe

Are pictures coded without reference to any picture except itself.
May be generated by an encoder to create a random access point (to allow a decoder to start decoding properly from scratch at that picture location).
May also be generated when differentiating image details prohibit generation of effective P or B frames.
Typically require more bits to encode than other picture types.

Often, I pictures (I-frames) are used for random access and are used as references for the decoding of other pictures. Intra refresh periods of a half-second are common on such applications as digital television broadcast and DVD storage. Longer refresh periods may be used in some environments. For example, in videoconferencing systems it is common to send I pictures very infrequently.

[edit] Predicted pictures (or slices)

Require the prior decoding of some other picture(s) in order to be decoded.
May contain both image data and motion vector displacements and combinations of the two.
Can reference previous pictures in decoding order.
In older standard designs (such as MPEG-2), use only one previously-decoded picture as a reference during decoding, and require that picture to also precede the P picture in display order.
In H.264, can use multiple previously-decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
Typically require fewer bits for encoding than I pictures do.

[edit] Bi-predictive pictures (or slices)

Require the prior decoding of some other picture(s) in order to be decoded.
May contain both image data and motion vector displacements and combinations of the two.
Include some prediction modes that form a prediction of a motion region (e.g., a macroblock or a smaller area) by averaging the predictions obtained using two different previously-decoded reference regions.
In older standard designs (such as MPEG-2), B pictures are never used as references for the prediction of other pictures. As a result, a lower quality encoding (resulting in the use of fewer bits than would otherwise be the case) can be used for such B pictures because the loss of detail will not harm the prediction quality for subsequent pictures.
In H.264, may or may not be used as references for the decoding of other pictures (at the discretion of the encoder).
In older standard designs (such as MPEG-2), use exactly two previously-decoded pictures as references during decoding, and require one of those pictures to precede the B picture in display order and the other one to follow it.
In H.264, can use one, two, or more than two previously-decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
Typically require fewer bits for encoding than either I or P pictures do.