EVPN VXLAN Fundamentals: Complete Guide to Modern Data Center Networking
A comprehensive guide covering EVPN VXLAN fundamentals, use cases, design decisions, and practical deployment considerations.
What is EVPN VXLAN?
EVPN VXLAN is a combination of an encapsulation technology and a control plane technology:
- VXLAN (Virtual Extensible LAN): The data plane - encapsulation technology
- EVPN (Ethernet VPN): The control plane - BGP-based control protocol
- Concept: Ethernet in UDP - taking Ethernet frames, encapsulating them into UDP datagrams with VXLAN header, then forwarding them as Layer 3
- Purpose: Stretch Layer 2 networks across Layer 3 boundaries
The Historical Context: Why EVPN VXLAN?
The story begins around 2006 when AMD and Intel released processors with virtualization extensions, enabling virtualization with almost no performance penalty (1-2%). This triggered the most rapid technology adoption in IT history:
- 2006: Almost nothing in production was virtualized
- 2010: Approximately 90% of most data centers were virtualized
- Key Requirement: vMotion (or live migration) - ability to move workloads from one hypervisor to another without shutting down the VM
- Network Challenge: Needed Layer 2 everywhere to support vMotion across the data center
Earlier Technologies (and Why They Failed)
Several technologies attempted to solve the "Layer 2 everywhere" problem with more flexible network topologies:
| Technology | Year | Status | Notes |
|---|---|---|---|
| TRILL | ~2008 | Very limited use | Some large installations exist (~1% market share) |
| SPB | ~2008-2009 | Very limited use | Some service provider networks, ~1% market share |
| Fabric Path | ~2009 | Rare | TRILL-based but modified for incompatibility |
| VXLAN | 2011 | Initial release | No control plane - used multicast or head-end replication |
| EVPN VXLAN | ~2013+ | Winner | Added BGP-based control plane, integrated Layer 3 routing |
VMware has NOT removed the Layer 2 adjacency requirement for vMotion. Around 2012, they removed the requirement that hypervisors must be on the same subnet for the backend VM kernel network used during migration. However, the VM itself still requires Layer 2 adjacency between source and destination hypervisors. The same VLAN must exist on both hypervisors.
EVPN VXLAN Use Cases
1. Data Center at Scale - Primary Use Case
- Running a data center at scale with leaf-spine architecture
- Need to support thousands of nodes
- Avoid extending Layer 2 all over the place (reduces blast radius)
- Put whatever VLAN you need in whatever rack
- Still have safety and design benefits of Layer 3 architecture
- Support vMotion anywhere in the data center
2. Data Center Interconnect (DCI)
- Use EVPN VXLAN to extend Layer 2 between two data centers
- Traditional data centers still use core/aggregation/access with VLANs and SVIs
- Safer than extending VLANs with 802.1Q tags:
- Avoids extending spanning tree domain between data centers
- Prevents broadcast storms from taking down both data centers
- More controlled way to provide Layer 2 extension
- Note: Still has challenges with traffic flow (hairpinning), but provides "safe-ish" DCI
3. Wired Campus Networks
- Traditional design: VLAN per floor (VLAN 10 = first floor, VLAN 20 = second floor)
- VXLAN approach: VLAN based on function/role across multiple floors
- Sales = VLAN 10 across three floors
- Engineering = VLAN 20 across all floors
- Integration with NAC: Device gets placed into specific VLAN based on identity at login
- Security rules based on role instead of location
- Still relatively rare - most campus networks use traditional VLAN per floor design
4. Workload Placement Flexibility
- Provide every network everywhere in the data center
- Place workloads anywhere without subnet constraints
- Old way: "This application can only be in Rack 1 because that's where the Rack 1 subnet is"
- VXLAN way: Place any workload in any rack with appropriate security segmentation
The VNI Myth: 16 Million Segments
| Technology | Bits | Segments | Reality |
|---|---|---|---|
| VLAN (802.1Q) | 12 bits | 4096 (~4000 usable) | Traditional limitation |
| VXLAN (VNI) | 24 bits | 16,777,216 | Marketing number |
| Physical Switches | - | ~4000 limit | Each VXLAN segment needs local VLAN |
Key Point: With physical switches, every VXLAN segment needs a local VLAN to forward traffic, effectively limiting to ~4000 VLANs. Workarounds exist (ephemeral VLANs, switch-specific VLANs like Cisco ACI) but add complexity. Fortunately, most organizations don't come close to 4000 VLANs. Hyperscalers using virtual switches can benefit from 16M segments.
Key Benefits Beyond vMotion
1. More Than Two Spines
- Traditional limitation: Only two spines with MLAG/VPC/VSS configuration
- Losing one spine = no redundancy left
- Lost 50% of forwarding capacity
- Must buy expensive chassis switches for redundancy
- EVPN VXLAN: Can deploy 3, 4, 5, 6+ spines (typically 6-8 max)
- 3 spines: Lose one = still have redundancy, only lost 1/3 capacity
- 4 spines: Lose one = only lost 1/4 capacity
- Can use cheaper top-of-rack devices instead of chassis switches
- Makes smaller builds much more cost-effective
2. Built-in Multi-Tenancy
- Multi-tenancy built into EVPN
- Use cases beyond "Coke vs Pepsi" tenant separation:
- DMZ vs Internal network
- Sales vs Engineering departments
- Production vs Development environments
- Inter-tenant communication routes externally through firewall
- Extra separation inherent to VXLAN compared to VLANs
3. Enhanced Scalability
- First-hop routing at the leaf: Default gateway on leaf switches
- Traffic doesn't have to go to aggregation layer
- Can be switched right at leaf and go back down
- May only need to hit one spine
- Scale-out architecture:
- Pods: Set of leafs and spines
- Super spines: Aggregate multiple pods
- 3-stage Clos: Leaf → Spine → Leaf
- 5-stage Clos: Leaf → Spine → Super Spine → Spine → Leaf
- 7-stage Clos: Massive networks (rare)
- Predictable network forwarding characteristics across topology
- Any VLAN anywhere across entire fabric
Should Everyone Deploy EVPN VXLAN?
When to Consider EVPN VXLAN
- Network refresh time: When replacing all data center switches
- Larger networks: More than 6-10 switches where benefits justify complexity
- Need for 3+ spines: Want more than two aggregation/core switches
- Security segmentation requirements: Need strong multi-tenancy
- Massive scale requirements: Thousands of endpoints
When NOT to Deploy EVPN VXLAN
- Small networks: 6 switches or fewer - complexity may not justify benefits
- Only two core switches: Sticking with traditional collapsed core
- Higher learning curve: Much more complicated than SVIs and VLANs
- Troubleshooting complexity: "All fun and games until you can't ping your default gateway"
- Team skill level: Requires significant training investment
Hardware Requirements
VXLAN ASIC Support
- ~2012-2013: Broadcom Trident 2
- Could encap OR decap, but not both simultaneously
- Couldn't route between VXLAN segments (requires both)
- Trident 2 Plus and later: Full VXLAN support
- Can encap and decap simultaneously
- VXLAN routing supported
- Modern data center ASICs: VXLAN support is "table stakes"
- Almost every current switch supports VXLAN in hardware
- Still worth checking, but it's a foregone conclusion
- Exception: Old hardware from ~2013 era (e.g., used switches from eBay)
Switch Lifecycle Considerations
- Data center switch lifespans generally shorter than other network areas
- Old production switches may not support VXLAN at required scale
- Check older equipment for VXLAN-specific capabilities
- New equipment typically has full VXLAN support
Building an EVPN Fabric: Four-Step Process
Step 1: Topology
- Switch selection: Which hardware to use
- Architecture: Leaf-spine or leaf-spine-super spine
- 3-stage Clos (leaf-spine)
- 5-stage Clos (leaf-spine-super spine)
- 7-stage Clos (massive deployments)
- Number of spines: Recommend at least 3 for redundancy
- Cabling: How switches interconnect
- Special features: EVPN multi-homing support (some switches only)
- Vendor selection: Almost always single-vendor within data center
Note: Quick conversation - dictated by number of endpoints, port density requirements, and budget. Not much variability.
Step 2: Underlay
Provide IP connectivity from loopback to loopback. Every leaf and spine has loopback addresses. Leafs typically have two loopbacks:
- Loopback 0: All leafs and spines (by convention)
- Loopback 1: Additional loopback on leafs only
These loopbacks are used as VXLAN Tunnel Endpoint (VTEP) addresses. The underlay must learn these addresses via a routing protocol.
Underlay Routing Protocol Options
| Protocol | Advantages | Vendor Preference |
|---|---|---|
| OSPF |
• IP unnumbered support (no IP addressing needed) • Simple config: same per interface • Great for labbing (separates underlay/overlay) • Scales better than historically |
Cisco default |
| EBGP |
• IPv6 unnumbered support • Unified protocol for underlay/overlay • Modern "data center hotness" |
Arista default |
| IBGP |
• Alternative to EBGP • Can be combined with OSPF underlay |
Cisco option |
| ISIS |
• Was popular, falling out of favor • Still supported |
Less common now |
- Very stable: Routing tables extremely static
- Only loopback addresses: No endpoint routes in underlay
- Low requirements: Doesn't need to be super responsive
- All action in overlay: Day-to-day operations don't affect underlay
- High scalability: Modern control planes handle hundreds of routers
Flood Frame Handling Options
| Method | How It Works | Pros/Cons |
|---|---|---|
| Multicast | Flood frames to multicast address; multicast infrastructure distributes to VTEPs |
Pros: Better scaling at very large scale, more distributed Cons: Requires multicast infrastructure, more complex |
| Head-End Replication | Ingress VTEP makes copies and sends unicast to each destination VTEP using Type 3 routes |
Pros: Simpler, no multicast needed, more secure (BGP passwords) Cons: Can congest links with many copies at extreme scale |
- Cisco: Typically recommends multicast
- Arista: Typically recommends head-end replication
- Both work: 99% of cases either method is fine
- With automation: Easy to switch between methods
Step 3: Overlay
Use BGP with EVPN address family to exchange endpoint reachability information. This is Multi-Protocol BGP (MP-BGP) with the EVPN address family added.
Spine Switch Role
- Data plane: Spine knows nothing - just passing packets
- Control plane: Spines are aware of routes
- Operate as route reflectors (IBGP) or route servers (EBGP)
- Don't terminate VXLAN segments
- Can run show commands to troubleshoot
- Troubleshooting: Most problems are in control plane
- Leaf hasn't learned MAC address
- Leaf hasn't generated Type 2 route
- Type 2 route hasn't propagated through spines
Overlay BGP Options
| Configuration | Details | Vendor |
|---|---|---|
| IBGP Overlay | Separate BGP peering from underlay; peer from loopback 0 to loopback 0 | Cisco option |
| EBGP Overlay | Separate BGP peering; requires multi-hop (TTL > 1); peer loopback to loopback | Arista default |
EVPN Route Types
| Route Type | Purpose | Details |
|---|---|---|
| Type 1 | EVPN Multi-homing | Advanced topic - used with Type 4 |
| Type 2 | Endpoint Reachability |
• Most common route type • MAC address or MAC+IP combinations • Generated when switch learns new MAC • Installed in forwarding tables of other leafs • MAC → Layer 2 table, IP → host table (/32) |
| Type 3 | Flood Distribution |
• Used for head-end replication • Tells VTEPs where to send flood frame copies • Automated flood list via BGP |
| Type 4 | EVPN Multi-homing | Advanced topic - used with Type 1 |
| Type 5 | External Networks |
• Unknown host lookup • External route propagation • Where to find hosts not in fabric • How to reach external networks via L3 peering |
| Types 6-13 | Multicast | Tenant multicast in overlay (not underlay multicast) |
Forwarding Table Entries
MAC Address Port
AA:BB:CC:DD:EE:01 Port 1 (Local)
AA:BB:CC:DD:EE:02 Port 2 (Local)
AA:BB:CC:DD:EE:03 VXLAN (Remote - kicks to VXLAN engine)
When a frame is destined for a MAC learned via EVPN, the Layer 2 forwarding table shows "VXLAN" as the port. This kicks the frame up to the VXLAN forwarding engine, which:
- Adds VXLAN encapsulation
- Looks up destination VTEP in Type 2 routing table
- Forwards encapsulated frame to remote VTEP
Step 4: EVPN Services
- Tenants: Logical grouping (Coke/Pepsi, Prod/Dev, Sales/Engineering)
- Concept - doesn't show up in configs
- Most automation includes tenant concept
- Can have one tenant or skip concept entirely
- VRFs (Virtual Routing and Forwarding): Attached to tenants
- Standard IP VRFs as always used
- Layer 2 networks attach to VRFs
- L2 networks in same VRF can route to each other unrestricted
- Different VRFs = no communication unless route leak or external firewall
- Layer 2 Segments: Stretched VLANs
- Layer 2 VNI: VXLAN segment
- Layer 3 VNI: Associated with VRF
- Symmetric IRB: Integrated Routing and Bridging (standard approach)
VRF Best Practices
- Isolate EVPN routes: Even with one tenant/VRF, put EVPN routing in separate VRF
- show ip route: Only shows underlay (loopbacks)
- show ip route vrf [name]: Shows EVPN services (host routes)
- Security zones: VRFs typically represent security boundaries
- Inter-VRF communication: Route through external firewall (recommended)
- Avoid route leaking: Makes communication paths harder to conceptualize
Automation is Mandatory
- Configuration complexity: Much more complex than SVIs and VLANs
- Manual = unmanageable: Very easy to make mistakes
- Multiple configuration aspects:
- VLANs may differ switch to switch
- Anycast gateways with same virtual MAC on every device
- Control plane setup (BGP peering, route reflectors)
- Data plane setup (VTEP addresses, loopbacks)
- VXLAN to local VLAN mapping
- MAC VRFs and IP VRFs
- Route distinguishers (unique per leaf)
- Route targets (common per L2 segment)
- VLAN-aware bundles
Automation Options
| Solution Type | Examples | Notes |
|---|---|---|
| Vendor-Specific |
• Cisco DCNM / Nexus Dashboard • Arista Cloud Vision • Juniper Apstra |
• Web-based configuration • Vendor-optimized • Can include studios/wizards |
| Open-Source |
• Ansible with Jinja2 templates • Custom YAML data models • Direct switch deployment |
• Full control • Requires template development • Manual data model creation |
| Hybrid |
• Arista AVD (Validated Designs) - Built on Ansible - Can use CloudVision or bypass it |
• Pre-built data models • Auto-assigns IP addresses • Carves /24 into /31s automatically |
Configuration Generation Process
- Data Model: Contains relevant information (YAML, JSON, database)
- Templates: Jinja2 or similar templating system
- Engine: Takes data model + templates
- Output: Generates unique config for every device
- Deployment: Push configs to switches or upload
Advanced Features
ARP Suppression
- Traditional: "Who has 10.1.1.10?" broadcast floods throughout network
- With EVPN: Fabric already knows where 10.1.1.10 is (Type 2 routes)
- Suppression: Switch responds directly with MAC address
- Result: No flooding needed if address in routing table
- Status: Widely supported - "table stakes" for EVPN VXLAN
Multicast in the Overlay
- Not underlay multicast - tenant/client multicast
- Client joins multicast group from connected switch
- Multicast routing (Layer 2 and Layer 3)
- Uses Type 6 through Type 13 routes
- Support: Not as universal as ARP suppression
- Recommendation: Check with vendor for support maturity
Multi-Vendor Considerations
- Standards: EVPN and VXLAN are open standards
- Theory: Should be vendor-interoperable
- Practice: Data centers are almost always single-vendor
- Common pattern: Different vendors in different data centers, not within same fabric
- Reasons for single-vendor:
- Homogeneous data center design
- Vendor-specific automation (DCNM, CloudVision, etc.)
- Single point of support (avoid finger-pointing)
- Easier troubleshooting with one vendor
- Interoperability has improved but still not ideal
Getting Started: Practical Steps
1. Decide at Network Refresh
- EVPN VXLAN decision typically happens during data center refresh
- Replacing all switches at once (or most of them)
- Evaluate: Do benefits justify complexity for your environment?
- Consider: Network size, team skills, budget, requirements
2. Choose Automation Approach
- Not optional - must have automation for EVPN VXLAN
- Vendor-specific tools (easiest)
- Open-source (most control)
- Hybrid solutions (balance)
3. Train Your Team
- Not optional: This is a paradigm shift, not "just more networking"
- Complex technology: Layered, difficult to troubleshoot without understanding
- High-profile project: Involves significant investment
- New skills required:
- BGP (many engineers have limited BGP experience)
- VXLAN encapsulation
- EVPN route types
- Control plane vs data plane troubleshooting
- Automation tools
- Learning curve: Higher than SVIs and VLANs
- Payoff: Once understood, operationally not much more difficult
4. Understand Before Troubleshooting
"EVPN is not more difficult to troubleshoot if you know it. But if you don't know it, it is extraordinarily frustrating because you have no idea what's going on. Any sort of technology that you don't understand how it works, if you're going to try to troubleshoot it, it's going to be very frustrating."
Troubleshooting Insights
- Most problems: Control plane issues (not data plane)
- Common scenarios:
- Leaf hasn't learned MAC address
- Leaf hasn't generated Type 2 route based on MAC address
- Type 2 route hasn't propagated to spines
- Type 2 route hasn't propagated from spines to other leafs
- Start with: "Can you ping your default gateway?"
- Verify:
- BGP peering established (underlay and overlay separate)
- Route types being generated and received
- Forwarding tables populated correctly
- VXLAN to VLAN mappings correct
Key Insights
"We've long complained about having to extend VLANs since 2006 to every rack. We must accept that we need to provide every network everywhere in our data center."
"Automation systems do exactly what you tell them to do - which is both their greatest strength and their most frustrating characteristic when things don't work as intended."
"Understanding the fundamentals is critical. You must know how the technology works at a deep level to troubleshoot effectively. Training and proper automation selection are non-negotiable requirements."
- EVPN VXLAN = Encapsulation (VXLAN) + Control Plane (EVPN BGP)
- Driven by virtualization needs: vMotion requires Layer 2 adjacency everywhere
- VXLAN won over TRILL, SPB, Fabric Path
- Use cases: Data center scale, DCI, campus networks, workload flexibility
- VNI myth: 16M segments possible, but physical switches still limited to ~4K (need local VLANs)
- Major benefits: 3+ spines, multi-tenancy, massive scalability, any VLAN anywhere
- Not for everyone: Small networks (<10 switches) may not justify complexity
- Hardware: Modern ASICs support VXLAN - mostly table stakes now
- Four-step build: Topology → Underlay → Overlay → Services
- Underlay: IP connectivity for loopbacks (OSPF, BGP, ISIS)
- Overlay: MP-BGP with EVPN address family, Type 2/3/5 routes most common
- Services: Tenants → VRFs → L2 segments (symmetric IRB)
- Automation mandatory: Too complex for manual configuration
- Training critical: Requires understanding protocol stack for effective troubleshooting
- Single-vendor typical: Despite open standards, homogeneous fabrics are the norm
- Troubleshooting: Most issues are control plane (BGP, route propagation)
| Type 1 & 4: | EVPN Multi-homing (advanced) |
| Type 2: | Endpoint reachability (MAC, MAC+IP) |
| Type 3: | Flood frame distribution (head-end replication) |
| Type 5: | External networks and unknown hosts |
| Type 6-13: | Overlay multicast (tenant multicast) |