Instability of the OSI Layer-2 Bridging

Event Year: 2004 Reliability: Confirmed
Country: Switzerland
Industry Type: Other
Description:

The current campus network infrastructure allows bridging of non-IP protocols (ISO, AIX,..) in parallel to the IP protocols by using dedicated virtual networks. For the ISO-protocol (OSI layer-2) a loop-less architecture is automatically configured by the corresponding switches, with one switch as master. The master is auto-negotiated by determining the switch with the highest (?) MAC address plus the highest internal “administrator level” (default is 128; master is 240; maximum is 255, but however, not compatible with all switch models). A watchdog signal issued every 45s verifies the proper functioning of the master. A change of master takes about 30s for all switches to adjust themselves to the new master configuration. No other traffic is exchanged during that time.

At the time of the incident, an old (2yrs) Cisco switch was master. Because of increased traffic in a remote part of the production network, a hub had been replaced by a new switch. The MAC address of this switch was rather high, such that it became number two in the hierarchy of the master switches. The Cisco switch was still number one. Due to an instability of the later ( i.e. its alive signals were not strong enough to be received by the new switch, which was far away), the new switch took over mastership, but was not able to determine the full network topology. Thus, it went to “listen”-mode, during which the Cisco switch took over again. This swapping of mastership continued and blocked all other traffic, because of the re-configuration latency, until the problem has been isolated and fixed.

Impact:

The traffic outage blocked the control of the important production facilities for several hours.

Action Description: A temporary solution has been the addition of a second dedicated switch near by (and thus being able to detect the Cisco switch) with a high MAC address and an "administration level", such that it will definitively take over from the Cisco switch in case of its failure. In addition, external production switches are denied access in the auto-negotiation task. A final solution will be the exclusion of all non-IP protocols from the campus network, which might come by the end of 2004.