Fountain code
In coding theory, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the original source symbols can ideally be recovered from any subset of the encoding symbols of size equal to or only slightly larger than the number of source symbols. The term fountain or rateless refers to the fact that these codes do not exhibit a fixed code rate.
A fountain code is optimal if the original k source symbols can be recovered from any k encoding symbols. Fountain codes are known that have efficient encoding and decoding algorithms and that allow the recovery of the original k source symbols from any k’ of the encoding symbols with high probability, where k’ is just slightly larger than k.
LT codes were the first practical realization of fountain codes. Raptor codes and Online codes were subsequently introduced, and achieve linear time encoding and decoding complexity through a pre-coding stage of the input symbols.
Applications
Fountain codes are flexibly applicable at a fixed code rate, or where a fixed code rate cannot be determined a priori, and where efficient encoding and decoding of large amounts of data is required.
One example is that of a data carousel, where some large file is continuously broadcast to a set of receivers.[1] Using a fixed-rate erasure code, a receiver missing a source symbol (due to a transmission error) faces the coupon collector's problem: it must successfully receive an encoding symbol which it does not already have. This problem becomes much more apparent when using a traditional short-length erasure code, as the file must be split into several blocks, each being separately encoded: the receiver must now collect the required number of missing encoding symbols for each block. Using a fountain code, it suffices for a receiver to retrieve any subset of encoding symbols of size slightly larger than the set of source symbols. (In practice, the broadcast is typically scheduled for a fixed period of time by an operator based on characteristics of the network and receivers and desired delivery reliability, and thus the fountain code is used at a code rate that is determined dynamically at the time when the file is scheduled to be broadcast.)
Another application is that of hybrid ARQ in reliable multicast scenarios: parity information that is requested by a receiver can potentially be useful for all receivers in the multicast group.
Fountain codes in standards
Raptor codes are the most efficient fountain codes at this time,[2] having very efficient linear time encoding and decoding algorithms, and requiring only a small constant number of XOR operations per generated symbol for both encoding and decoding.[3] IETF RFC 5053 specifies in detail a systematic Raptor code, which has been adopted into multiple standards beyond the IETF, such as within the 3GPP MBMS standard for broadcast file delivery and streaming services, the DVB-H IPDC standard for delivering IP services over DVB networks, and DVB-IPTV for delivering commercial TV services over an IP network. This code can be used with up to 8,192 source symbols in a source block, and a total of up to 65,536 encoded symbols generated for a source block. This code has an average relative reception overhead of 0.2% when applied to source blocks with 1,000 source symbols, and has a relative reception overhead of less than 2% with probability 99.9999%.[4] The relative reception overhead is defined as the extra encoding data required beyond the length of the source data to recover the original source data, measured as a percentage of the size of the source data. For example, if the relative reception overhead is 0.2%, then this means that source data of size 1 Megabyte can be recovered from 1.002 Megabytes of encoding data.
A more advanced Raptor code with greater flexibility and improved reception overhead, called RaptorQ, has been introduced into the IETF.[5] This code can be used with up to 56,403 source symbols in a source block, and a total of up to 16,777,216 encoded symbols generated for a source block. This code is able to recover a source block from any set of encoded symbols equal to the number of source symbols in the source block with high probability, and in rare cases from slightly more than the number of source symbols in the source block.
Fountain codes for data storage
Erasure codes are used in data storage applications due to massive savings on the number of storage units for a given level of redundancy and reliability. The requirements of erasure code design for data storage, particularly for distributed storage applications, might be quite different relative to communication or data streaming scenarios. One of the requirements of coding for data storage systems is the systematic form, i.e., the original message symbols are part of the coded symbols. Systematic form enables reading off the message symbols without decoding from a storage unit. In addition, since the bandwidth and communication load between storage nodes can be a bottleneck, codes that allow minimum communication are very beneficial particularly when a node fails and a system reconstruction is needed to achieve the initial level of redundancy. In that respect, fountain codes are expected to allow efficient repair process in case of a failure: When a single encoded symbols is lost, it should not require too much communication and computation among other encoded symbols in order to resurrect the lost symbol. In fact, repair latency might sometimes be more important than storage space savings. Repairable fountain codes [6] are projected to address fountain code design objectives for storage systems. A detailed survey about fountain codes and their applications can be found at.[7]
See also
- Online codes
- Linear network coding
- Secret sharing
- Tornado codes, the precursor to Fountain codes
Notes
- ↑ J. Byers, M. Luby, M. Mitzenmacher, A. Rege (1998). "A Digital Fountain Approach to Reliable Distribution of Bulk Data".
- ↑ "Qualcomm Raptor Technology - Forward Error Correction".
- ↑ (Shokrollahi 2006)
- ↑ T. Stockhammer, A. Shokrollahi, M. Watson, M. Luby, T. Gasiba (March 2008). "Application Layer Forward Error Correction for Mobile Multimedia Broadcasting". In Furht, B.; Ahson, S. Handbook of Mobile Broadcasting: DVB-H, DMB, ISDB-T and Media FLO (CRC Press).
- ↑ (Luby et al. 2010)
- ↑ M. Asteris and A. G. Dimakis, (2012). ""Repairable Fountain Codes", In Proc. of 2012 IEEE International Symposium on Information Theory,".
- ↑ Suayb S. Arslan, (2014). "Incremental Redundancy, Fountain Codes and Advanced Topics".
References
- M. Luby (2002). "LT Codes". Proceedings of the IEEE Symposium on the Foundations of Computer Science: 271–280.
- A. Shokrollahi (2006), "Raptor Codes", Transactions on Information Theory (IEEE) 52 (6): 2551–2567.
- P. Maymounkov (November 2002). "Online Codes". (Technical Report).
- David J. C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.
- M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer (October 2007), "Raptor Forward Error Correction Scheme for Object Delivery", RFC 5053 (IETF).
- M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer, L. Minder (May 2011), RaptorQ Forward Error Correction Scheme for Object Delivery, IETF.