Spanning tree protocol

From Wikipedia, the free encyclopedia

The Spanning Tree Protocol is an OSI layer-2 protocol which ensures a loop free topology for any bridged LAN. It is based on an algorithm invented by Radia Perlman while working for Digital Equipment Corporation[1][2]. Spanning tree allows a network design to include spare (redundant) links to provide automatic backup paths if an active link fails, without the danger of bridge loops, or the need for manual enabling/disabling of these backup links. Bridge loops must be avoided because they result in flooding the network.

The Spanning Tree Protocol (STP), is defined in the IEEE Standard 802.1D. As the name suggests, it creates a spanning tree within a mesh network of connected layer-2 bridges (typically Ethernet switches), and disables the links which are not part of that tree, leaving a single active path between any two network nodes.

Contents

[edit] Protocol operation

The collection of bridges in a LAN can be considered a graph whose nodes are the bridges and whose edges are the cables connecting the bridges. To break loops in the LAN while maintaining access to all LAN segments, the bridges collectively compute a spanning tree. The spanning tree is not necessarily a minimum cost spanning tree. A network administrator can reduce the cost of a spanning tree, if necessary, by altering some of the configuration parameters in such a way as to affect the choice of the root of the spanning tree.

The spanning tree that the bridges compute using the Spanning Tree Protocol can be determined using the following rules. The example network at the right, below, will be used to illustrate the rules.

1.  An example network.  The numbered boxes represent bridges (the number represents the bridge ID).  The lettered clouds represent network segments.
1. An example network. The numbered boxes represent bridges (the number represents the bridge ID). The lettered clouds represent network segments.
2.  The smallest bridge ID is 3.  Therefore, bridge 3 is the root bridge.
2. The smallest bridge ID is 3. Therefore, bridge 3 is the root bridge.
3.  Assuming that the cost of traversing any network segment is 1, the least cost path from bridge 4 to the root bridge goes through network segment c.  Therefore, the root port for bridge 4 is the one on network segment c.
3. Assuming that the cost of traversing any network segment is 1, the least cost path from bridge 4 to the root bridge goes through network segment c. Therefore, the root port for bridge 4 is the one on network segment c.
4.  The least cost path to the root from network segment e goes through bridge 92.  Therefore the designated port for network segment e is the port that connects bridge 92 to network segment e.
4. The least cost path to the root from network segment e goes through bridge 92. Therefore the designated port for network segment e is the port that connects bridge 92 to network segment e.
5.  This diagram illustrates all port states as computed by the spanning tree algorithm. Any active port that is not a root port or a designated port is a blocked port.
5. This diagram illustrates all port states as computed by the spanning tree algorithm. Any active port that is not a root port or a designated port is a blocked port.
6.  After link failure the spanning tree algorithm computes and spans new least-cost tree.
6. After link failure the spanning tree algorithm computes and spans new least-cost tree.

Elect a root bridge. The root bridge of the spanning tree is the bridge with the smallest bridge ID. Each bridge has a unique identifier (ID) and a configurable priority number; the bridge ID contains both numbers. To compare two bridge IDs, the priority is compared first. If two bridges have equal priority, then the MAC addresses are compared. A network administrator can determine which bridge is the root by configuring its priority to be higher (lower priority number) than any other bridge on the LAN.

Determine the least cost paths to the root bridge. The computed spanning tree has the property that messages from any connected device to the root bridge traverse a least cost path, i.e., a path from the device to the root that has minimum cost among all paths from the device to the root. The cost of traversing a single network segment is configurable; the cost of traversing a path is the sum of the costs of the segments on the path. Different technologies have different default costs for network segments. Also, an administrator can configure the cost of traversing a particular network segment.

The property that messages always traverse least-cost paths to the root is guaranteed by the following two rules.

Least cost path from each bridge. After the root bridge has been chosen, each bridge determines the cost of each possible path from itself to the root. From these, it picks the one with the smallest cost (the least-cost path). The port connecting to that path becomes the root port of the bridge.

Least cost path from each network segment. The bridges on a network segment collectively determine which bridge has the least-cost path from the network segment to the root. The port connecting this bridge to the network segment is then the designated port for the segment.

Disable all other root paths. Any active port that is not a root port or a designated port is a blocked port.

Modifications in case of ties. The above rules over-simplify the situation slightly, because it is possible that there are ties, for example, two or more ports on a single bridge are attached to least-cost paths to the root or two or more bridges on the same network segment have equal least-cost paths to the root. To break such ties:

Breaking ties for root ports. When multiple paths from a bridge are least-cost paths, the chosen path uses the neighbor bridge with the lower bridge ID. The root port is thus the one connecting to the bridge with the lowest bridge ID. For example, in figure 3, if switch 4 were connected to network segment e, there would be two paths of length 2 to the root, one path going through bridge 24 and the other through bridge 92. Because there are two least cost paths, the lower bridge ID (24) would be used as the tie-breaker in choosing which path to use.

Breaking ties for designated ports. When more than one bridge on a segment leads to a least-cost path to the root, the bridge with the lower bridge ID is used to forward messages to the root. The port attaching that bridge to the network segment is the designated port for the segment. In figure 4, there are two least cost paths from network segment d to the root, one going through bridge 24 and the other through bridge 92. The lower bridge ID is 24, so the tie breaker dictates that the designated port is the port through which network segment is connected to bridge 24.

The final tie-breaker. In some cases, there may still be a tie, as when two bridges are connected by multiple cables. In this case, multiple ports on a single bridge are candidates for root port or designated port. In this case, the port with the lowest port number is used.

[edit] Data rate and STP path cost

The table below shows the default cost of an interfaces for a given data rate.

Data rate STP Cost
4 Mbit/s 250
10 Mbit/s 100
16 Mbit/s 62
45 Mbit/s 39
100 Mbit/s 19
155 Mbit/s 14
200 Mbit/s 12
622 Mbit/s 6
1 Gbit/s 4
2 Gbit/s 3
10 Gbit/s 2

[edit] Bridge Protocol Data Units (BPDUs)

The above rules describe one way of determining what spanning tree will be computed by the algorithm, but the rules as written imply knowledge of the entire network. The bridges have to determine the root bridge and compute the port roles (root, designated, or blocked) with only the information that they have. To ensure that each bridge has enough information, the bridges use special data frames called Bridge Protocol Data Units (BPDUs) to exchange information about bridge IDs and root path costs.

A bridge sends a BPDU frame using the unique MAC address of the port itself as a source address, and a destination address of the STP multicast address 01:80:C2:00:00:00.

There are three types of BPDUs:

  • Configuration BPDU (CBPDU), used for Spanning Tree computation
  • Topology Change Notification (TCN) BPDU, used to announce changes in the network topology
  • Topology Change Notification Acknowledgment (TCA)

BPDUs are exchanged regularly (every 2 seconds by default) and enable switches to keep track of network changes and to start and stop forwarding at ports as required.

When a device is first attached to a switch port, it will not immediately start to forward data. It will instead go through a number of states while it processes BPDUs and determines the topology of the network. When a host is attached such as a computer, printer or server the port will always go into the forwarding state, albeit after a delay of about 30 seconds while it goes through the listening and learning states (see below). The time spent in the listening and learning states is determined by a value known as the forward delay (default 15 seconds and set by the root bridge). However, if instead another switch is connected, the port may remain in blocking mode if it is determined that it would cause a loop in the network. Topology Change Notification (TCN) BPDUs are used to inform other switches of port changes. TCNs are injected into the network by a non-root switch and propagated to the root. Upon receipt of the TCN, the root switch will set a Topology Change flag in its normal BPDUs. This flag is propagated to all other switches to instruct them to rapidly age out their forwarding table entries.

STP switch port states:

  • Blocking - A port that would cause a switching loop, no user data is sent or received but it may go into forwarding mode if the other links in use were to fail and the spanning tree algorithm determines the port may transition to the forwarding state. BPDU data is still received in blocking state.
  • Listening - The switch processes BPDUs and awaits possible new information that would cause it to return to the blocking state.
  • Learning - While the port does not yet forward frames (packets) it does learn source addresses from frames received and adds them to the filtering database (switching database)
  • Forwarding - A port receiving and sending data, normal operation. STP still monitors incoming BPDUs that would indicate it should return to the blocking state to prevent a loop.
  • Disabled - Not strictly part of STP, a network administrator can manually disable a port

To prevent the delay when connecting hosts to a switch and during some topology changes, Rapid STP was developed and standardized by IEEE 802.1w which allows a switch port to rapidly transition into the forwarding state during these situations.

[edit] BPDU fields

The bridge ID, or BID, is a field inside a BPDU packet. It is eight bytes in length. The first two bytes are the Bridge Priority, an unsigned integer of 0-65,535. The last six bytes are a MAC address supplied by the switch. In the event that MAC Address Reduction is used, the first two bytes are used differently. The first four bits are a configurable priority, and the last twelve bits carry either the VLAN ID or MSTP instance number.

[edit] Evolutions and extensions

The first spanning tree protocol was invented in 1985 at the Digital Equipment Corporation by Radia Perlman[1]. In 1990, the IEEE published the first standard for the protocol as 802.1D[3], based on the algorithm designed by Perlman. Subsequent versions were published in 1998[4] and 2004[5], incorporating various extensions.

Although the purpose of a standard is to promote interworking of equipment from different vendors, different implementations of a standard are not guaranteed to work, due for example to differences in default timer settings. The IEEE encourages vendors to provide a "Protocol Implementation Conformance Statement," declaring which capabilities and options have been implemented[5], to help users determine whether different implementations will interwork correctly.

Also, the original Perlman-inspired Spanning Tree Protocol, called DEC STP, is not a standard and differs from the IEEE version in message format as well as timer settings. Some bridges implement both the IEEE and the DEC versions of the Spanning Tree Protocol, but their interworking can create issues for the network administrator, as illustrated by the problem discussed in an on-line Cisco document[6].

[edit] Rapid Spanning Tree Protocol (RSTP)

In 1998, the IEEE with document 802.1w introduced an evolution of the Spanning Tree Protocol: Rapid Spanning Tree Protocol (RSTP), which provides for faster spanning tree convergence after a topology change. Standard IEEE 802.1D-2004 now incorporates RSTP and obsoletes STP.

RSTP bridge port roles:

  • Root - A forwarding port that has been elected for the spanning-tree topology
  • Designated - A forwarding port for every LAN segment
  • Alternate - An alternate path to the root bridge. This path is different than using the root port.
  • Backup - A backup/redundant path to a segment where another bridge port already connects.
  • Disabled - Not strictly part of STP, a network administrator can manually disable a port

RSTP is a refinement of STP and therefore shares most of its basic operation characteristics. However there are some notable differences as summarized below:

  • Detection of root switch failure is done in 3 hello times, which is 6 seconds if default hello times have not been changed.
  • Ports may be configured as edge ports if they are attached to a LAN which has no other bridges attached. These edge ports transition directly to the forwarding state. RSTP still continues to monitor the port for BPDUs in case a bridge is connected. RSTP can also be configured to automatically detect edge ports. As soon as the bridge detects a BPDU coming to an edge port, the port becomes a non-edge port.
  • Unlike in STP, RSTP will respond to BPDUs sent from the direction of the root bridge. An RSTP bridge will "propose" to its designated ports its spanning tree information. If another RSTP bridge receives this information, determines this is the superior root information, and sets all its other ports to discarding. The bridge may send an "agreement" to the first bridge confirming its superior spanning tree information. The first bridge, upon receiving this agreement, knows it can rapidly transition that port to the forwarding state bypassing the traditional listening/learning state transition. This essentially creates a cascading effect away from the root bridge where each designated bridge proposes to its neighbors to determine if it can make a rapid transition. This is one of the major elements which allows RSTP to achieve faster convergence times than STP.
  • As discussed in the port role details above, RSTP maintains backup details regarding the discarding status of ports. This avoids timeouts if the current forwarding ports were to fail or BPDUs were not received on the root port in a certain interval.

[edit] Per-VLAN Spanning Tree (PVST)

In Ethernet switched environments where multiple Virtual LANs exist, spanning tree can be deployed per Virtual LAN. Cisco's name for this is per VLAN spanning tree (PVST and PVST+ which is the default protocol used by Cisco switches). Both PVST and PVST+ protocols are Cisco proprietary protocols and they cannot be used on 3rd party switches, although Extreme Networks support PVST+ with two limitations (lack of support on ports where the VLAN is untagged/native and also on the VLAN with ID 1). PVST works only with ISL (Cisco's proprietary protocol for VLAN encapsulation) due to its embedded Spanning tree ID. Due to high penetration of the IEEE 802.1Q VLAN trunking standard and PVST's incompatibility with 802.1Q, Cisco redefined its PVST standard and called it PVST+. PVST+ can tunnel across a MSTP Region.

[edit] Multiple Spanning Tree Protocol (MSTP)

The Multiple Spanning Tree Protocol (MSTP), originally defined in IEEE 802.1s and later merged into IEEE 802.1Q-2003, defines an extension to the RSTP protocol to further develop the usefulness of virtual LANs (VLANs). This "Per-VLAN" Multiple Spanning Tree Protocol configures a separate Spanning Tree for each VLAN group and blocks the links that are redundant within each Spanning Tree.

If there is only one Virtual LAN (VLAN) in the network, single (traditional) STP works appropriately. If the network contains more than one VLAN, the logical network configured by single STP would work, but it is possible to make better use of the redundant links available by using an alternate spanning tree for different (groups of) VLANs.

MSTP allows formation of MST regions which can run multiple MST instances (MSTI). Multiple regions and other STP bridges are interconnected using one single common spanning tree (CST).

MSTP was inspired by Cisco Systems' Multiple Instances Spanning Tree Protocol (MISTP), and is an evolution of the Spanning Tree Protocol and the Rapid Spanning Tree Protocol. It was introduced in IEEE 802.1s as amendment to 802.1Q, 1998 edition. Standard IEEE 802.1Q-2003 now includes MSTP.

Unlike some proprietary per-VLAN spanning tree implementations,[citation needed] MSTP includes all of its spanning tree information in a single BPDU format. Not only does this reduce the number of BPDUs required on a LAN to communicate spanning tree information for each VLAN, but it also ensures backward compatibility with RSTP (and in effect, classic STP too). MSTP does this by encoding additional region information after the standard RSTP BPDU as well as a number of MSTI messages (from 0 to 64 instances, although in practice many bridges support less). Each of these MSTI configuration messages conveys the spanning tree information for each instance. Each instance can be assigned a number of configured VLANs and frames (packets) assigned to these VLANs operate in this spanning tree instance whenever they are inside the MST region. In order to avoid conveying their entire VLAN to spanning tree mapping in each BPDU, bridges encode an MD5 digest of their VLAN to instance table in the MSTP BPDU. This digest is then used by other MSTP bridges, along with other administratively configured values, to determine if the neighboring bridge is in the same MST region as itself.

MSTP is fully compatible with RSTP bridges, in that an MSTP BPDU can be interpreted by an RSTP bridge as an RSTP BPDU. This not only allows compatibility with RSTP bridges without configuration changes, but also causes any RSTP bridges outside of an MSTP region to see the region as a single RSTP bridge, regardless of the number of MSTP bridges inside the region itself. In order to further facilitate this view of an MST region as a single RSTP bridge, the MSTP protocol uses a variable known as remaining hops as a time to live counter instead of the message age timer used by RSTP. The message age time is only incremented once when spanning tree information enters an MST region, and therefore RSTP bridges will see a region as only one "hop" in the spanning tree. Ports at the edge of an MST region connected to either an RSTP or STP bridge or an endpoint are known as boundary ports. As in RSTP, these ports can be configured as edge ports to facilitate rapid changes to the forwarding state when connected to endpoints.

[edit] Rapid Per-VLAN Spanning Tree (R-PVST)

Cisco's proprietary protocol that combines the functionalites of RSTP and PVST. It based on per VLAN instance.creates tree for each VLAN

[edit] References

  1. ^ a b Perlman, Radia (1985). "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN". ACM SIGCOMM Computer Communication Review 15 (4): 44–53. doi:10.1145/318951.319004. 
  2. ^ Perlman, Radia (2000). Interconnections, Second Edition. USA: Addison-Wesley. ISBN 0-201-63448-1. 
  3. ^ LAN/MAN Standards Committee of the IEEE Computer Society, ed. (1990), ANSI/IEEE Std 802.1D, IEEE 
  4. ^ LAN/MAN Standards Committee of the IEEE Computer Society, ed. (1998), ANSI/IEEE Std 802.1D, 1998 Edition, Part 3: Media Access Control (MAC) Bridges, IEEE 
  5. ^ a b Template:Citaton
  6. ^ Understanding Issues Related to Inter-VLAN Bridging, Cisco Systems, Inc., 11072, <http://www.cisco.com/warp/public/473/inter-vlan_11072.pdf> 

[edit] See also

[edit] External links