Preventing Network Outages in L2VPN with Storm Control: A Real-World Guide
Preface:
How storm control settings help rate-limit traffic types and safeguard your LAN and L2VPN services from outages?
Using storm control can prevent
problems caused by broadcast storms. You can configure storm control to rate
limit broadcast traffic, multicast traffic (on some devices) and unknown
unicast traffic to a specified level so the switch will drop packets when that
level is exceeded and prevent packets from propagating and degrading the LAN.
You can also have the device shut down or temporarily disable an interface when
the storm control level is exceeded.
A broadcast storm occurs when
broadcast packets cause receiving devices to broadcast packets in response.
This causes further responses and creates a knock-on effect that floods the
device with packets and causes some clients to lose service or go down
completely
- On ELS systems, storm control is enabled by default on all interfaces at 80% of the available bandwidth.
- On non-ELS systems, storm control is disabled by
default on all interfaces. If you enable storm to control the default is 80%
of the available bandwidth.
One day, a technician accidentally misconfigures the CE
switch, enabling a loop or turning off spanning tree protocol. As a result:
- The
CE begins flooding broadcast traffic continuously.
- This
traffic enters the PE router via the CE-PE link and starts being
replicated across the VPLS instance to all other customer sites.
By accidentally, Strom control is not configured earlier.
- PE
starts processing and replicating the excessive broadcast traffic.
- CPU
and memory on PE go up.
- Broadcast
load propagates across the MPLS core, affecting other PEs and customers.
- Multiple
customer services go down.
- NOC
gets high severity outage alerts from multiple regions.
How can we prevent this outage using Strom Control
- Storm
control is configured on CE facing ports of both PE routers.
- PE
monitors broadcast, multicast, unknown unicast traffic rates.
- When
broadcast traffic from CE exceeds the configured threshold (e.g. 1% of the
port bandwidth), PE:
- Starts
dropping excess broadcast packets before they hit the MPLS core.
- Logs
the anomaly and optionally sends SNMP traps to NOC.
- Other
customers and core infrastructure is not affected.
- NOC
investigates and finds the misconfigured CE switch.
- Issue
is resolved with zero impact to the core network or other
customers.
Steps to perform:
1. Check OOB access of node is possible or not.
2. Service validation pre-check requires on
affected CE node. This is basically used to check whether packet drops are
observed or not policy-maps.
3.
Remove strom control from bridging instances.
4. Wait for 10 minutes, see any ambiguity generated
at service level or not. If yes, then reload the node.
5. Verify and observe service disruptions.
6. Load the strom control configuration to the sub
interfaces under l2vpn instance.
7. After configuration changes, commit it and
reload node again.
8. Confirm no service disruptions.
9. If all above steps fail, need to do handy
rollback plan in reverse steps.
Command line logs:
Jio-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512
Jio-PE(config)# no l2vpn bridge group 100022311
bridge-domain 100022311 -16VLXM000441CB storm-control unknown-unicast pps 300
Jio-PE(config)# no l2vpn bridge group 100022311 bridge-domain
100022311 -16VLXM000441CB storm-control broadcast pps 300
Jio-PE # reload location all
Jio-PE # show policy-map interface GigabitEthernet
0/0/0/1.1512
Jio-PE (config)# l2vpn bridge group 100022311 bridge-domain 100022311
-16VLXM000441CB interface GigabitEthernet0/0/0/1.1512 storm-control
unknown-unicast pps 300
Jio-PE (config)# l2vpn bridge group 100022311 bridge-domain 100022311
-16VLXM000441CB interface GigabitEthernet0/0/0/1.1512 storm-control broadcast
pps 300
Jio-PE (config) #Show commit config changes diff
Jio-PE (config) # Commit
Jio-PE (config) # exit
Jio-PE# reload
Jio-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512
Jio-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512 | i drop
====================================================
Airtel-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512
Airtel-PE(config)# no l2vpn bridge group 100022311
bridge-domain 100022311 -16VLXM000441CB storm-control unknown-unicast pps 300
Airtel-PE(config)# no l2vpn bridge group 100022311
bridge-domain 100022311 -16VLXM000441CB storm-control broadcast pps 300
Airtel-PE # reload location all
Airtel-PE # show policy-map interface GigabitEthernet
0/0/0/1.1512
Airtel-PE (config)# l2vpn bridge group 100022311
bridge-domain 100022311 -16VLXM000441CB interface GigabitEthernet0/0/0/1.1512
storm-control unknown-unicast pps 300
Airtel-PE (config)# l2vpn bridge group 100022311
bridge-domain 100022311 -16VLXM000441CB interface GigabitEthernet0/0/0/1.1512
storm-control broadcast pps 300
Airtel-PE (config) #Show commit config changes diff
Airtel-PE (config) # Commit
Airtel-PE (config) # exit
Airtel-PE# reload
Airtel-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512
Airtel-PE# show policy-map interface GigabitEthernet
0/0/0/1.1512 | i drop
====================================================
Be Attentive:
1 Strom control is a sticky service. Once when we apply a policy map, the sub interface needs to reload the node once, otherwise attempts will wipe out, no effect results at service level.
2. When we reload a node, we need to capture pre-check logs protocol, interface and service level so we can validate it after reloading the node.
Conclusion:
In summary, storm control in data
center environments between CE and PE devices is a basic safeguard for L2VPN
health and availability. L2VPNs replicate Layer 2 domains across geographically
dispersed sites so are especially vulnerable to uncontrolled broadcast,
multicast or unknown unicast traffic from misconfigured or faulty CE devices.
Without storm control this traffic can become a network wide storm causing
congestion, control plane exhaustion and service outages that affect not just
the customer but potentially hundreds of tenants on the same provider
infrastructure. Storm control is a proactive gatekeeper that enforces traffic
thresholds at the network edge and ensures only valid and scaled Layer 2
traffic enters the provider’s core. This isolates faults at their source and
allows providers to meet SLAs by keeping the network stable, optimizing
resources and reducing the risk of cascading failures. As shown in the example
above, with storm control a critical event becomes a manageable one, keeping business
running and the provider’s reputation for delivering resilient and secure
services intact. So, integrating storm control with other network hygiene
practices like MAC limiting, loop detection and interface monitoring is not
just recommended but required for robust L2VPN infrastructures.
Comments
Post a Comment