Network coding

From Wikipedia, the free encyclopedia

Network coding is a field of information theory and coding theory and is a method of attaining maximum information flow in a network.

Contents

[edit] General principle

A network is a directed graph, where the edges represent pathways for information. Using the max-flow min-cut theorem, one can calculate the maximum amount of information that can be pushed through this network between two nodes. It was shown that this max-flow value is achievable, but that routing is not capable of achieving it.[1]

The core notion of network coding is to allow mixing of data at intermediate network nodes. A receiver sees these data packets and deduces from them the messages that were originally intended for that data sink.

In the butterfly example below, it is easy to see that no routing scheme can achieve the maximum flow, but that by using a simple coding scheme, the full flow can be attained.

[edit] Linear network coding

Although the max-flow was shown to be achievable, it was shown using a random code. In 2003, it was shown that linear codes will achieve the max-flow in single-source multicast (every destination node receives all the source information) networks, provided the alphabet size is large enough.[2]

In 2004, it was shown that linear network codes are not sufficient in general.[3] An example in the paper is not solvable using any linear code over any field size or in any dimension; further, it is not asymptotically linear nor is it linearly solvable using convolutional codes or time-sharing methods. They also showed that the insufficiency of linear codes applies to multiple unicast networks.

[edit] Theory

In a Linear Network coding problem a group of nodes P are involved in moving the data from S source nodes to K sink nodes. Each node generates a new packet, which is a linear combination of the earlier received packets on the link, by coefficients in the finite field.

A message generated so Xk is related to the received messages Mi by the relation:

X_k = \sum_{i=1}^{S}g_k^i\cdot M_i

Each node forwards the computed value Xk along with all the coefficients used in the kth level, g_k^i.

The values g_k^i are the coefficients from the Galois field GF(2s). Since the operations are computed in the finite field, the result of the operation is also of the same length, as the size of each vector M.

Each node produces a similar output, as computed above. This yields a linear problem of the type X = GM, where with the knowledge of the X,G we need to compute M. Each of the receivers in K, try to solve this linear equation, and for which at least N \ge S packets must be received. The received packets are continually used in the Gaussian elimination method to reduce G matrix into the row-echelon form. Finally the resulting values of X = GechelonM are solved to obtain M.

[edit] Example

Butterfly Network.
Butterfly Network.

In the butterfly network, there are two sources (at the top of the picture), each having knowledge of some value A and B. There are two destination nodes (at the bottom), which each want to know both A and B. Each edge can carry only a single value (we can think of an edge transmitting a bit in each time slot).

If we only used routing, then the central line would be able to carry A or B, but not both. Suppose we send A through the center; then the left destination would receive A twice and not know B at all. Sending B poses a similar problem for the right destination. We say that routing is insufficient because no routing scheme can transmit both A and B simultaneously to both destinations.

Using a simple code, as shown, we do get both A and B to both destinations simultaneously by sending the sum of the symbols through the center (in other words, we encode A and B using the formula "A+B"). The left destination receives A and A+B, and can find B by subtracting the two values. This is a linear code because the encoding and decoding schemes are linear operations.

[edit] Throughput

At the middle of the butterfly network, 3 messages are being transmitted (A, A+B, B). However 4 messages are being received at the endpoints at the bottom (receive 4 messages for the cost of 3 messages, a 33% improvement). Note that a message storage in the middle center router could store messages A and B and still provide all 4 messages to the endpoints (receive 4 messages for the cost of 2 messages, a 100% improvement).

A better example is Figure 2 of algo.epfl.ch/~christin/primer.ps which looks like 2 squares showing a shared network path on one side. This shared path carries message A+B. The network gets both messages to endpoints in 1 time slot rather than 2.

[edit] Coding-aware Routing

The example of coding-aware routing was first proposed in ROCX.[4] Consider the scenario in Fig. 2 with three flows f1 : 2→1, f2 : 1→3, f3 : 3→2. If all the links are perfect with no loss, the corresponding shortest paths are 2→5→7→4→1, 1→4→7→6→3, and 3→9→8→2 respectively. Without coding, the total number of transmissions needed to deliver one packet of each flow is 11. Under COPE, since nodes 4, 5 and 6 can not overhear each other, there is only one coding opportunity at node 4 as node 1 and node 7 exchange packets through node 4, reducing the total from 11 to 10. On the other hand, if the route of f3 is changed to 3→6→7→5→2, we need no more than 9 transmissions. Though this path has one more hop (one more transmission) than the shortest path, it creates coding opportunities at nodes 5 and 6. This leads us to investigate the extent of gain possible when routing is performed with the awareness of coding opportunities.

[edit] History

The fundamental concept of network coding was introduced for satellite communication networks in a paper by R. W. Yeung and Z. Zhang.[5] The concept was fully developed in a subsequent paper by R. Ahlswede, N. Cai, S.-Y. R. Li and R. W. Yeung,[1] where the term network coding was coined. In this paper, the advantage of network coding over routing, the traditional way of operating a network, was pointed out for the first time by means of a very simple example known as the butterfly network. In 2006, Chiu et al demonstrated that the maximum achievable throughput by a p2p system in a network without multicasting is exactly the same with or without any form of network coding,[6] provided that perfect routing information is available (which is not exactly the case in real situations).

[edit] Applications

Network coding is seen to be useful in the following areas:

  • Alternative to forward error correction and ARQ in traditional and wireless networks. eg: Multi-user ARQ[7]
  • Robust and resilient to network attacks like snooping, eavesdropping or replay attacks.
  • Digital file distribution and P2P file sharing. eg: Avalanche from Microsoft
  • Throughput increase in wireless mesh networks. eg : COPE,[8] Coding-Aware Routing[9]
  • Bidirectional low energy transmission in wireless sensor networks.
  • Many-to-many broadcast network capacity augmentations.
  • Buffer and Delay reduction in spatial sensor networks: Spatial Buffer Multiplexing [10]

[edit] Network Coding and P2P

Many scholars believe that Network Coding could help improve P2P network performance. But some other scholars still hold their views on this issue. Even one of the creators, Dr. Yeung mentioned in his paper Can Network Coding Help in P2P[11] that Network Coding does not necessarily help P2P network.

There are several encountering difficulties for Network Coding in P2P networks:

  • One peer may need to spend a huge amount of time on decoding data they receive
  • The resources (CPU, etc. ) one needs to spend on decoding
  • How to ensure the uniqueness coefficients when there are a lot of file pieces in the transferred file
  • The topology of a P2P network is changing
  • Peers may depart the network any time they want

Based on these, some discovered that a P2P network with Network Coding could be even worse than BitTorrent network.

[edit] References

  1. ^ a b Ahlswede, Rudolf; N. Cai, Shuo-Yen Robert Li, and Raymond Wai-Ho Yeung (2000). "Network Information Flow". IEEE Transactions on Information Theory, IT-46: 1204–1216. doi:10.1109/18.850663. 
  2. ^ Li, Yeung, and Cai
  3. ^ Dougherty, Freiling, and Zeger [1] and [2]
  4. ^ http://arena.cse.sc.edu/papers/rocx.secon06.pdf
  5. ^ Yeung, Raymond Wai-Ho; Zhen Zhang (1999). "Distributed Source Coding for Satellite Communications". IEEE Transactions on Information Theory, IT-45: 1111–1120. doi:10.1109/18.761254. 
  6. ^ http://personal.ie.cuhk.edu.hk/~dmchiu/p2pnetcoding.pdf
  7. ^ http://www.ericsson.com/technology/research_papers/wireless_access/doc/Multi-User%20ARQ.pdf
  8. ^ Sigcomm 2006 Form >> XORs in The Air: Practical Wireless Network Coding
  9. ^ http://www.cs.wisc.edu/~shravan/infocom-07-2.pdf
  10. ^ Welcome to IEEE Xplore 2.0: Looking at Large Networks: Coding vs. Queueing
  11. ^ http://personal.ie.cuhk.edu.hk/~dmchiu/p2pnetcoding.pdf

[edit] External links