Network performance management

From Wikipedia, the free encyclopedia

Network performance management is the discipline of optimizing how networks function, trying to deliver the lowest latency, highest capacity, and maximum reliability despite intermittent failures and limited bandwidth.

Contents

[edit] Reliable and unreliable networks

Networks connect users or machines to one another using sets of well-defined protocols to govern how data is transmitted. Depending on the type of network and the goals of the application, the protocols may be optimized for specific characteristics:

  • A best-effort network protocol tries to send data, but may lose some along the way to avoid congestion. IP and UDP are popular examples of this. Often, this kind of protocol is used for isochronous traffic such as voice over IP (VOIP).
  • A reliable network guarantees delivery of traffic, favoring correctness and completeness over speed. TCP, which is the basis for most Internet protocols including the http protocol over which web applications are delivered, is the most common example.

[edit] Factors affecting network performance

Unfortunately, not all networks are the same. As data is broken into component parts (often known frames, packets, or segments) for transmission, several factors can affect their delivery.

  • Latency: It can take a long time for a packet to be delivered across intervening networks. In reliable protocols where a receiver acknowledges delivery of each chunk of data, it is possible to measure this as round-trip time.
  • Packet loss: In some cases, intermediate devices in a network will lose packets. This may be due to errors, to overloading of the intermediate network, or to intentional discarding of traffic in order to enforce a particular service level.
  • Retransmission: When packets are lost in a reliable network, they are retransmitted. This incurs two delays: First, the delay from re-sending the data; and second, the delay resulting from waiting until the data is received in the correct order before forwarding it up the protocol stack.
  • Throughput: The amount of traffic a network can carry is measured as throughput, usually in terms such as kilobits per second. Throughput is analogous to the number of lanes on a highway, whereas latency is analogous to its speed limit.

These factors, and others (such as the performance of the network signaling on the end nodes, compression, encryption, concurrency, and so on) all affect the effective performance of a network. In some cases, the network may not work at all; in others, it may be slow or unusable. And because applications run over these networks, application performance suffers.

[edit] The performance management discipline

Network performance management consists of measuring, modeling, planning, and optimizing networks to ensure that they carry traffic with the speed, reliability, and capacity that is appropriate for the nature of the application and the cost constraints of the organization. Different applications warrant different blends of capacity, latency, and reliability. For example:

  • Streaming video or voice can be unreliable (brief moments of static) but need to have very low latency so that lags don't occur
  • Bulk file transfer or e-mail must be reliable and have high capacity, but doesn't need to be instantaneous
  • Instant messaging doesn't consume much bandwidth, but should be fast and reliable

[edit] Network performance management tasks and classes of tool

Network managers perform many tasks; these include performance measurement, forensic analysis, capacity planning, and load-testing or load generation. They also work closely with application developers and IT departments who rely on them to deliver underlying network services.

  • For performance measurement, operators typically measure the performance of their networks at different levels. They either using per-port metrics (how much traffic on port 80 flowed between a client and a server and how long did it take) or they rely on end-user metrics (how fast did the login page load for Bob.) The former is collected using flow-based monitoring and protocols such as Netflow (now standardized as IPFIX) or RMON; the latter, through web logs, synthetic monitoring, or real user monitoring.
  • For forensic analysis, operators often rely on sniffers that break down the transactions by their protocols and can locate problems such as retransmissions or protocol negotiations.
  • For capacity planning, modeling tools that project the impact of new applications or increased usage are invaluable.
  • For load generation that helps to understand the breaking point, operators may use software or appliances that generate scripted traffic. Some hosted service providers also offer pay-as-you-go traffic generation for sites that face the public Internet.