VXLAN EVPN DC Lab Guide
Configure Multi-Pod Interconnection
VXLAN Multi-Site Deployment Part 1
VXLAN Multi-Site Deployment Part 2
VXLAN Multi-Tenant Design 1 -Firewall Insert
Inter-Tenant Communication and Internet Access
Requirement 1: T1 and T3 Communications and T1 access Internet
Requirement 2: Suboptimal routing inter-tenant communication
Requirement 3: Inter-Tenants communications across DCs
Requirement 4: DC2-Leaf1 manipulates default route from T1 and
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Core Knowledge & Topology
VXLAN Forwarding Plane
If the source and destination MAC addresses live on the same switch, traffic is switched locally:
No VXLAN encapsulation or de-capsulation is performed
If the destination MAC is known
Frame if encapsulation by local VTEP and unicasted to the remote VTEP
Frame is de-capsulated by remote VETEP and delivered to destination as a classical Ethernet frame
If the destination is unknown ( or broadcasts and multicasts)
See BUM Traffic Handling
VXLAN Control Plane Operation
The VXLAN standard does not define a control plane signaling protocol for determining the location of end points
For discovery of MAC address and forwarding multicast and broadcast traffic then IP multicast groups are used that map directly to the VXLAN IDs being used (VNID)
No VXLAN Control Plane
Data driven flood-&-learn
BUM Traffic - multi-destination traffic
Broadcast
Unknown Layer-2 Unicast
Multicast
BUM Traffic transport mechanisms
Multicast replication in the underlay network
Requests the underlay network to run IP multicast
Ingress unicast replication
One unicast replica per remote VTEP
Increase traffic load throughout the network
Challenges with Flood-&-Learn VXLAN Deployments
Scale, Mobility and Security Limitations
Limited Scale
Flood and learn (BUM) - inefficient bandwidth utilization
Resource Intensive - Large MAC tables
Limited Workload Mobility
Centralized Gateways - Traffic Hair pining
sub-optimal traffic flow
Security Risk
no authentication for VXLAN devices(VTEPs)
What is VXLAN/EVPN
Standard based Overlay(VXLAN) with standards based Control-Plane(BGP)
Layer-2 MAC and Layer-3 IP information distribution by Control-Plane(BGP)
Forwarding decision based on Control-Plane(minimizes flooding)
Integrated Routing/Bridging(IRB) for optimized forwarding in the overlay
Function of VXLAN/EVPN
Advertise host/network reachability information through control protocol(MP-BGP)
Authenticate VTEPs through BGP peer authentication
Seamless and Optimal vm-mobility
Early ARP termination
Localize ARP learning process
Minimize network flooding
Unicast Alternative to Multicast underlay
EVPN Primer - MP-BGP Review
Virtual Routing and Forwarding (VRF)
Layer-3 segmentation of tenants' routing space
Route distinguisher (RD):
8-byte field, VRF parameters; unique value
VPN IP routes unique: VPN RD + IP prefix
Selective distribute VPN routes:
Route Target (RT): 8-byte field, VRF parameter, unique value to define the import/export rules for VPNv4 routes
VPN Address-family: L2VPN EVPN
Distribute the MP-BGP VPN routes
EVPN Control Plane - Reachability Distribution
Use MP-BGP with EVPN address family on leaf nodes to distribute internal host MAC/IP addresses, subnet routes and external reachability information
MP-BGP enhancements to carry up to 100s of thousands of routes with reduced convergence time
EVPN Control Plane - Host Advertisement
Route-Reflector deployed for scaling purposes
Distributed Anycast Gateway in MP-BGP EVPN
Centralized Gateway
Extra bridging hop before and after routing
Centralized gateway(aggregation) for routing
Large amounts of state => convergence issues
Scale problem for large Layer-2 domains
Works with VXLAN Flood & Lear
Distributed Gateway
Route or Bridge at Leaf
Distributed Gateway (Anycast)
Disaggregate state by scale out
Optimal Scalability
Requires VXLAN/EVPn
VXLAN Routing
VXLAN EVPN has two slightly different IRB semantics
Asymmetric
Routing on the ingress VTEP and bridging on the egress VTEP
Requires each VTEP to have all VNIs - can result in forwarding table resource wastage
Symmetric (Cisco)
Routing on both the ingress and the egress VTEP
A VTEP only needs to have VNIs in which they have local hosts. Optimal utilization of forwarding table resources
ARP Suppression
Minimize flood-&-Learn behavior for host learning
Hardware access-list tcam region arp-ether 256
Show ip arp supression -cache-detail
Head-end Replication
Head-end replication (aka ingress replication)
Eliminate the need for underlay multicast to transport overlay BUM traffic
VXLAN EVPN Parameters
The network virtualization overlay requires a mechanism to know which end hosts are behind which overlay edge device. This allows the tunnel edge devices to build the location-identity mapping database.
The mapping information may be facilitated via a central SDN controller ( such as a cisco APIC or an Opendaylight controller) via an overlay end host distribution protocol (such as BGP EVPN or OVSDB) or via a data plane - based discovery scheme (such as Flood and Learn)
In a nutshell the EVPN address family allows the host MAC, IP network, vrf AND vtep INFORMATION to be carried over MP-BGP.
MP-BGP address families are used to transport specific reachability information. Popular address families include VPNv4 and VPNv6, WHICH are widely used for Multiprotocol Label Switching (MPLS ) between different datacenter sites with Layer 3 VPNs. Other VPN address families (for example, Multicast VPN [MPVPN]) are used to transport reachability of multicast group information within MPLS or other encapsulations in a multitenant environment. In case of VXLAN, the focus is on the address family L2VPN EVPN, which describes the method of transporting tenant-aware Layer 2 (MAC) and Layer 3 (IP) information across a single MPLS BGP peering session.
MP-BGP is well known for its multitenancy and routing policy capabilities in order to differentiate between routes stored in the BGP tables, MP-BGP uses Route Distinguishers(RDs). An RD is 8 bytes (64 bits).
MP-BGP also uses route policies to prioritize routes in a particular logical instance. This is achieved via an attribute called a Route Target (RT). By using RTs routes can be identified and placed in different VRF tables. RTs are also 8 bytes long, but formatting can be rather freely used, apart from the prefix:suffix notation. For automatic derivation of RT, the format is ASN:VNI(RT:65501:50001)
MP-BGP EVPN has different NLRI definitions as part of RFC 7432. Likewise, MP-BGP EVPN route types have different defining sections in RFC 7432 for route type 1 through 4 and also Route type 5 indraft-ietf-bess-evpn-prefix-advertisement. Route type 1 (Ethernet auto discovery [A-D] ROUTE) AND ROUTE TYPE 4 (Ethernet segment routes are presently not used in Cisco's EVPN implementation for VXLAN but route type 2,3, and 5 are rather important.
L2 VNI
Automated derivation of the RD uses the type 1 format with RID loopback IP internal MAC/IP VRF ID (RD:10.10.10.1:32777). The internal MAC VRF ID is derived from VLAN ID(mapped to the L2VNI) plus 32767. for automatic derivation of the RTs. The format is ASN:VNI (RT:65501:30001).
L3 VNI
Automated derivation of the RD uses the type 1 format with RID loopback IP: Internal MAC/IP VRF ID (RD:10.10.10.1:30) for automatic derivation of Route Targets. The format is ASN: VNI (RT: 65501:50001)
rewrite-evpn-rt-asn
It should be noted that RTs cannot be autogenerated in some cases when external BGP (eBGP) issued for the underlay. Specifically, with the Ebgp Underlay, the autonomous system number may vary on different edge devices. Consequently the Ebgp USE case requires manual configuration of the Route Targets, and they must match between the respective edge devices. This is the only situation where manual configuration of Route Targets is required.
MAC routes, IP routes and prefix routes are all being carried in BGP EVPN messages.
Given that the MAC routes are applicable for MAC VRRF identified by the L2VNI and the IP routes are applicable for the IP VRF identified by the L3 VNI.
The command show bgp l2vpn evpn vni-id XXXX has been introduced to show these routes. The command show bgp ip unicast vrf VRF-NAME provides that IP prefix information for a given IP VRF.
Multi-pod
Initially, a standard two-tier spine-leaf topology that typically contains two spines and N leafs may be used. As the demands of the data center grow, more leafs and spines may be required. At some point, a new pod is added, with an additional set of spines serving at the same tier level as the initially deployed one. The new pod includes a new set of leafs as well.
Because some leafs connect to some spines and other leafs connect to other spines, it is important that these different pods are interconnected. A multi-pod design is simplified by introducing a super-spine layer.
Having multiple pods minimizes the impact of one pod on other pods. For example, if some maintenance needs to be performed on one of the pods while all other pods continue to carry production traffic, the maintenance at the one pod might impact the overlay connectivity toward that pod, but all other pods would continue to operate independently, without any impact.
Multi-pod designs provide the option of hierarchical network design in the underlay and potentially in the overlay control plane. As a result, a single data plane domain remains. In other words, the VXLAN tunnels always begin and terminate at leafs, regardless of whether the traffic is being forwarded between endpoints in the same pod or across pods. This means the same consistent configuration of Layer 2 and Layer 3 services is required across all pods.
Overlay control plane separation in the different pods can be achieved by using different BGP autonomous system (AS) numbers in each pod. However the separation only involves the handling functions within the control protocol provided by the BGP EVPN feature set. As a result, per MAC and IP becomes a global consideration in the entire multi-pod environment.
Interconnection at the Leaf, back-to-back
Interconnection at the Spine, back-to-back
While multi-pod deployments follow the concept of multiple control protocol domains in the underlay as well as in the overlay, a single data plane that extends end to end across all the pods exists. Therefore, a multi-pod environment is considered a single fabric
In a multifabric deployment, complete segregation at the control plane level typically occurs, as does segregation at the data plane level. In other words, a single VXLAN BGP EVPN fabric can sist of its own delay, its own overlay control protocol, and related data plane encapsulation.
Multi-fabric/Multi-site
The advantage of such a multifabric design not only involves the option of having true separation of the traffic but also benefits from an administrative domain perspective. In regard to the administrative domain, benefits relate to the BGP autonomous system as well as the numbering of VNIs , multicast groups, and other components that are completely independent between different fabrics
show cli history unformatted | last 20
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VXLAN EVPN Deployment
OSPF
feature ospf
router ospf 1
router-id <loopback 0 IP>
int loopback 0
ip router ospf 1 area 0
int ex/x
ip router ospf 1 area 0
PIM
feature pim
interface loopback 0
ip pim sparse-mode
int lo 1 // DC1-Spine1
ip pim sparse-mode
int ex/x
ip pim sparse-mode
ip pim rp-address 100.1.1.1
MP-BGP
DC1-Spine1
nv overlay evpn //enable vxlan evpn control-plane
feature bgp
router bgp 100
neighbor 2.2.2.2
remote-as 100
update-source loopback0
address-family l2vpn evpn // MAC to VTEP mapping, instead of IP routes
send-community
send-community extended // extended community has RT values, send MAC-VTEP, IP-VTEP tables, ej. route type 2 includes them
route-reflector-client
neighbor 3.3.3.3
remote-as 100
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
DC1-Leaf1 and DC1-Leaf2
nv overlay evpn //enable VXLAN control-plane EVPN
feature ospf
feature bgp
feature pim
feature vn-segment-vlan-based //enable VLAN based VXLAN mapping
feature nv overlay //enable VXLAN
router bgp 100
neighbor 1.1.1.1
remote-as 100
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
DC1-Pod2-Spine
nv overlay evpn
feature bgp
router bgp 110
neighbor 22.22.22.22
remote-as 110
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
DC1-Pod2-Leaf
nv overlay evpn
feature bgp
feature vn-segment-vlan-based
feature nv overlay
router bgp 110
router-id 22.22.22.22
neighbor 21.21.21.21
remote-as 110
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
DC-Spine1# show bgp sessions
Total peers 2, established peers 2
ASN 100
VRF default, local ASN 100
peers 2, established peers 2, local router-id 1.1.1.1
State: I-Idle, A-Active, O-Open, E-Established, C-Closing, S-Shutdown
Neighbor ASN Flaps LastUpDn|LastRead|LastWrit St Port(L/R) Notif(S/R)
2.2.2.2 100 0 00:08:58|00:00:56|00:00:57 E 179/48058 0/0
3.3.3.3 100 0 00:07:23|00:00:25|00:00:22 E 179/42096 0/0
DC1-Pod2-Spine# show bgp sessions
Total peers 1, established peers 1
ASN 110
VRF default, local ASN 110
peers 1, established peers 1, local router-id 21.21.21.21
State: I-Idle, A-Active, O-Open, E-Established, C-Closing, S-Shutdown
Neighbor ASN Flaps LastUpDn|LastRead|LastWrit St Port(L/R) Notif(S/R)
22.22.22.22 110 0 00:02:03|00:00:02|00:00:02 E 179/18022 0/0
VXLAN
DC1-Leaf1, DC1-Leaf2 and DC1-Pod2-Leaf
feature interface-vlan // enable for SVI interface
fabric forwarding anycast-gateway-mac 0000.2222.3333 // enable anycast gateway consistant
// VLAN to VNI mapping
vlan 100
name L2VNI1
vn-segment 10100
vlan 110
name L2VNI2
vn-segment 10110
vlan 111
name L3VNI
vn-segment 11111
// To enable ARP suppression, need to adjust TCAM table resource, ARP change to 256k, other table to 0k, reboot need
hardware access-list tcam region vpc-convergence 0
hardware access-list tcam region arp-ether 256
vrf context Tenant-1
vni 11111
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
interface Vlan100
no shutdown
vrf member Tenant-1
ip address 192.168.1.254/24
fabric forwarding mode anycast-gateway
interface Vlan110
no shutdown
vrf member Tenant-1
ip address 192.168.11.254/24
fabric forwarding mode anycast-gateway
interface Vlan111
no shutdown
vrf member Tenant-1
no ip redirects
ip forward // L2 VNI to L2 VNI for routing, Layer 3 VNI just use to forward so no IP needed
interface nve1 // Interface used for encap/decap vxlan
no shutdown
host-reachability protocol bgp // use bgp evpn control plane
source-interface loopback0
member vni 10100
mcast-group 225.1.1.1
member vni 10110
mcast-group 225.1.1.11
member vni 11111 associate-vrf // L3 VNI
evpn // define layer 2 VNI RD and RT, make bridging in same VNI
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
interface Ethernet1/X
sw
switchport access vlan 100
no shut
DC1-Leaf2# show bgp l2vpn evpn 192.168.1.111
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32867 (L2VNI 10100)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272, version 9
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10100 11111
Extcommunity: RT:100:10100 RT:100:11111 ENCAP:8 Router MAC:5000.0009.0007
Path-id 1 advertised to peers:
1.1.1.1
DC1-Leaf2# show bgp l2 evpn 0050.7966.6810
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32867 (L2VNI 10100)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6810]:[0]:[0.0.0.0]/216, version 8
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10100
Extcommunity: RT:100:10100 ENCAP:8
Path-id 1 advertised to peers:
1.1.1.1
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272, version 9
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10100 11111
Extcommunity: RT:100:10100 RT:100:11111 ENCAP:8 Router MAC:5000.0009.0007
Path-id 1 advertised to peers:
1.1.1.1
DC1-Leaf1# show bgp l2vpn evpn vni-id 10100
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 15, Local Router ID is 2.2.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 2.2.2.2:32867 (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0050.7966.680e]:[0]:[0.0.0.0]/216
2.2.2.2 100 32768 i
*>i[2]:[0]:[0]:[48]:[0050.7966.6810]:[0]:[0.0.0.0]/216
3.3.3.3 100 0 i
*>l[2]:[0]:[0]:[48]:[0050.7966.680e]:[32]:[192.168.1.100]/272
2.2.2.2 100 32768 i
*>i[2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272
3.3.3.3 100 0 i
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Configure Multi-Pod Interconnection
When there is no multi-pod, there is 192.168.1.222 on Pod2 leaf
DC1-Pod2-Leaf# show bgp l2vpn evpn
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 22.22.22.22:32867 (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0050.7966.680d]:[0]:[0.0.0.0]/216
22.22.22.22 100 32768 i
*>l[2]:[0]:[0]:[48]:[0050.7966.680d]:[32]:[192.168.1.222]/272
22.22.22.22 100 32768 i
T1-POD2> ping 192.168.1.111
84 bytes from 192.168.1.111 icmp_seq=1 ttl=64 time=82.746 ms
84 bytes from 192.168.1.111 icmp_seq=2 ttl=64 time=30.864 ms
Multipod spines advertise EVPN routes to each other, Spine pod 1 and Spine pod 2 need EBGP L2VPN neighbour.
BGP by default changes next hop itself, we need unchange next hop and use local leaf VTEP as next hop. - set ip next-hop unchanged
Pod 2 : RT 110:10100, Pod1 RT: 100:10100 are different RTs. We need to rewrite to same local RT in both pods - rewrite-evpn-rt-asn
DC1-Spine
route-map 121 permit 10
set ip next-hop unchanged
router bgp 100
neighbor 121.121.121.2
remote-as 110
update-source Ethernet1/4
address-family l2vpn evpn
send-community
send-community extended
route-map 121 out
rewrite-evpn-rt-asn
DC1-Pod2-Spine
route-map 121 permit 10
set ip next-hop unchanged
router bgp 110
neighbor 121.121.121.1
remote-as 100
update-source Ethernet1/2
address-family l2vpn evpn
send-community
send-community extended
route-map 121 out
rewrite-evpn-rt-asn
After that leafs in different pods can see update
DC1-Pod2-Leaf# show bgp l2vpn evpn
…
Route Distinguisher: 22.22.22.22:32867 (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0050.7966.680d]:[0]:[0.0.0.0]/216
22.22.22.22 100 32768 i
*>i[2]:[0]:[0]:[48]:[0050.7966.680e]:[0]:[0.0.0.0]/216
2.2.2.2 100 0 100 i
*>i[2]:[0]:[0]:[48]:[0050.7966.6810]:[0]:[0.0.0.0]/216
3.3.3.3 100 0 100 i
*>l[2]:[0]:[0]:[48]:[0050.7966.680d]:[32]:[192.168.1.222]/272
22.22.22.22 100 32768 i
*>i[2]:[0]:[0]:[48]:[0050.7966.680e]:[32]:[192.168.1.100]/272
2.2.2.2 100 0 100 i
*>i[2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272
3.3.3.3 100 0 100 i
DC1-Leaf1# show bgp l2vpn evpn 192.168.1.222
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 2.2.2.2:32867 (L2VNI 10100)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.680d]:[32]:[192.168.1.222]/272, version 1716
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 22.22.22.22:32867:[2]:[0]:[0]:[48]:[0050.7966.680d]:[32]:[192.168.1.222]/272
AS-Path: 110 , path sourced external to AS
22.22.22.22 (metric 121) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10100 11111
Extcommunity: RT:100:10100 RT:100:11111 ENCAP:8 Router MAC:5000.000b.0007
Path-id 1 not advertised to any peer
Route Distinguisher: 22.22.22.22:32867
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.680d]:[32]:[192.168.1.222]/272, version 1714
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 2 destination(s)
AS-Path: 110 , path sourced external to AS
22.22.22.22 (metric 121) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10100 11111
Extcommunity: RT:100:10100 RT:100:11111 ENCAP:8 Router MAC:5000.000b.0007
Path-id 1 not advertised to any peer
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VXLAN Multi-site Options
VXLAN Evolves as the Control Plane Evolves
Before Yesterday
Yet another encapsulation
Flood & learn (Multicast-based)
Data-Plane only
Yesterday
VXLAN for the datacenter - infra-dc
Control-plane
Active VTEP Discovery
Multicast and Unicast
Today
VXLAN for DCI - Inter-DC
DCI ready
ARP/ND caching/supress
Multi-homing
Failure domain isolation
Loop protection
Multi-pod Characteristics - "The Single" physically separate less connections
Single overlay domain - End-to-End Encapsulation/Encapsulation
Single Overlay Control-Plane Domain - End-to-End EVPN Updates
Single Underlay Domain End-to-End (OSPF, BGP, EIGRP etc;)
Single Replication domain for BUM (single multicast)
Single VNI Administrative Domain
Multi-pod Challenges - "the Single"
Single Overlay Domain - End-to-End Encapsulation
Scaling the VXLAN EVPN Network
Single overlay Control-Plane Domain - End-to-End EVPN Updates
Overlay Control-Plane Update Propagation
Single Underlay Domain End-to-End
Network must be extended in Underlay (VTEP to VTEP reachability)
Single Replication Domain for BUM
One BUM flooding domain through out all connected Pods
Multi-Site
Border Gateways (Key Functional Components of VXLAN Multi-Site Architecture) maximum 14
Normal Leaf
Border Gateway Leaf
Border Service Leaf (connecting FW, Loadbalancer)
VXLAN Multi-Site Characteristics
Multiple Overlay Domains - Interconnected & Controlled
Multiple Overlay Control-Plane Domains - Interconnected & Controlled
Multiple Underlay Domains - Isolated (DC1 OSPF, DC2 EIGRP)
Multiple Replication Domains for BUM - Interconnected & Controlled
Multiple VNI Administrative Domains (vni/rt/rd can be different)
Cloud/Super-Spine/back-to-back topology
Super-Spine (BGP EVPN)
Anycast Border Gateway
Up to 4 Border Gateways
Border Gateway
Deploying at Leaf - 7.0(3)I7(1)
Deploying at Spine - 7.0(3)I7(2)
Common Multicast-Site Virtual IP (Multi-Site VIP) across BGWs
Multi-Site VIP for communication between the Border Gateways in different sites
Multi-Site VIP for communication between Border Gateways and Leaf nodes within a site
Individual Primary IP (PIP) per BGW
Used for Broadcast, Unknown Unicast and Multicast (BUM) replication
PIP for communication with Single-Homed endpoints (route only), intra- and inter site
Per-VNI Designated Forwarder (DF) election
Each BGW can serve as DF for a single or a set of Layer-2 VNIs
DF election and assignment is automatic
Using BGP EVPN Route Type 4 for DF election
Six Octet Site Identifier (System MAC: 00:00:00:00:00:01)
Originators IP (PIP): 1.1.1101
Layer-2 VNI: 30010
Single-homed end-points only connected with L3 links
Service appliance (i.e. firewall, ADC and etc.)
External routers
No SVI support on BGW nodes
Advertised and reachable through individual primary IP address (PIP)
Intra-site: Leaf nodes uses PIP to reach the device connected to Border Gateways
Inter-site: Remote Border Gateway uses PIP to reach the device connected to Border Gateways
Control Plane Deployment Considerations
Two main options for underlay and overlay control plane deployment
I-E-I (Recommended)
Intra-site: IGP (OSPF, IS-IS) as underlay CP, iBGP as overlay CP
Inter-Site: eBGP for both underlay and overlay CPs.
E-E-E*
Intra-Site and Inter-Sites: eBGP for both underlay and overlay CPs
Full mesh of MP-eBGP EVPN adjacencies across sites
Recommended to deploy a couple of route-servers with 3 or more sites
RS in a separate AS only perform control functions ("eBGP route-reflectors, IETF RFC 7947)
RS functions: EVPN routes reflection, next-hop-unchanged, route-target rewrite
Only MP-eBGP EVPN in inter-sites
Next-hop behavior (VXLAN tunnel termination and reorgination) and loop protection (as-path attribute)
Selective Advertisements
Multi-site architecture provides granular control on how layer-2 and layer-3 communication is extended across sites
Layer-2 and/or Layer-3 VNIs configured on the Border Gateways (BGW) control the Control-Plane advertisement towards DCI
Enhances the overall scalability of the solution
Scale up the total number of end-points supported across sites
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VXLAN Multi-Site Deployment Part 1
------------------------------------------- DC2 --------------------------------------------------
DC2-Spine1
feature ospf
feature pim
ip pim rp-address 200.1.1.1
router ospf 1
router-id 4.4.4.4
!
interface loopback0
ip address 4.4.4.4/32
ip router ospf 1 area 0
ip pim sparse-mode
!
interface loopback1
description RP-Address
ip address 200.1.1.0/32
ip router ospf 1 area 0
ip pim sparse-mode
!
interface Ethernet1/1
no switchport
ip address 47.1.1.1/24
no shutdown
!
interface Ethernet1/2
no switchport
ip address 40.1.1.1/24
ip router ospf 1 area 0
ip pim sparse-mode
no shutdown
!
interface Ethernet1/3
no switchport
ip address 50.1.1.1/24
ip router ospf 1 area 0
ip pim sparse-mode
no shutdown
feature bgp
nv overlay evpn
!
router bgp 200
neighbor 5.5.5.5
remote-as 200
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
neighbor 6.6.6.6
remote-as 200
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
DC2-Leaf1
feature ospf
feature pim
feature fabric forwarding
feature interface-vlan
fabric forwarding anycast-gateway-mac 0000.2222.3333
ip pim rp-address 200.1.1.1
router ospf 1
router-id 5.5.5.5
!
interface loopback0
ip address 5.5.5.5/32
ip router ospf 1 area 0
ip pim sparse-mode
!
interface Ethernet1/1
no switchport
ip address 40.1.1.2/24
ip router ospf 1 area 0
ip pim sparse-mode
no shutdown
nv overlay evpn
feature bgp
feature vn-segment-vlan-based
feature nv overlay
!
router bgp 200
neighbor 4.4.4.4
remote-as 200
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
vlan 100
name L2VNI1
vn-segment 10100
vlan 110
name L2VNI2
vn-segment 10110
vlan 111
name L3VNI
vn-segment 11111
vlan 200
name L2VNI1-2
vn-segment 10200
vlan 210
name L2VNI2-2
vn-segment 10210
vlan 222
name L3VNI-2
vn-segment 22222
!
hardware access-list tcam region vpc-convergence 0
hardware access-list tcam region arp-ether 256
vrf context Tenant-1
vni 11111
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
interface Vlan100
no shutdown
vrf member Tenant-1
ip address 192.168.1.254/24
fabric forwarding mode anycast-gateway
interface Vlan110
no shutdown
vrf member Tenant-1
ip address 192.168.11.254/24
fabric forwarding mode anycast-gateway
interface Vlan111
no shutdown
vrf member Tenant-1
no ip redirects
ip forward
vrf context Tenant-2
vni 22222
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
!
interface Vlan200
no shutdown
vrf member Tenant-2
ip address 192.168.2.254/24
fabric forwarding mode anycast-gateway
interface Vlan210
no shutdown
vrf member Tenant-2
ip address 192.168.21.254/24
fabric forwarding mode anycast-gateway
interface Vlan222
no shutdown
vrf member Tenant-2
no ip redirects
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10100
suppress-arp
mcast-group 225.1.1.1
member vni 10110
mcast-group 225.1.1.11
member vni 10200
suppress-arp
mcast-group 225.1.1.2
member vni 10210
mcast-group 225.1.1.21
member vni 11111 associate-vrf
member vni 22222 associate-vrf
evpn
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
vni 10200 l2
rd auto
route-target import auto
route-target export auto
vni 10210 l2
rd auto
route-target import auto
route-target export auto
interface Ethernet1/2
switchport access vlan 100
DC2-BorderLeaf
feature ospf
feature pim
feature fabric forwarding
feature interface-vlan
fabric forwarding anycast-gateway-mac 0000.2222.3333
ip pim rp-address 200.1.1.1
router ospf 1
router-id 6.6.6.6
interface loopback0
ip address 6.6.6.6/32
ip router ospf 1 area 0
ip pim sparse-mode
!
interface Ethernet1/1
no switchport
ip address 50.1.1.2/24
ip router ospf 1 area 0
ip pim sparse-mode
no shutdown
nv overlay evpn
feature bgp
feature vn-segment-vlan-based
feature nv overlay
!
router bgp 200
neighbor 4.4.4.4
remote-as 200
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
vlan 100
name L2VNI1
vn-segment 10100
vlan 110
name L2VNI2
vn-segment 10110
vlan 111
name L3VNI
vn-segment 11111
vlan 200
name L2VNI1-2
vn-segment 10200
vlan 210
name L2VNI2-2
vn-segment 10210
vlan 222
name L3VNI-2
vn-segment 22222
hardware access-list tcam region vpc-convergence 0
hardware access-list tcam region arp-ether 256
vrf context Tenant-1
vni 11111
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
interface Vlan100
no shutdown
vrf member Tenant-1
ip address 192.168.1.254/24
fabric forwarding mode anycast-gateway
interface Vlan110
no shutdown
vrf member Tenant-1
ip address 192.168.11.254/24
fabric forwarding mode anycast-gateway
interface Vlan111
no shutdown
vrf member Tenant-1
no ip redirects
ip forward
vrf context Tenant-2
vni 22222
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
!
interface Vlan200
no shutdown
vrf member Tenant-2
ip address 192.168.2.254/24
fabric forwarding mode anycast-gateway
interface Vlan210
no shutdown
vrf member Tenant-2
ip address 192.168.21.254/24
fabric forwarding mode anycast-gateway
interface Vlan222
no shutdown
vrf member Tenant-2
no ip redirects
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10100
suppress-arp
mcast-group 225.1.1.1
member vni 10110
mcast-group 225.1.1.11
member vni 10200
suppress-arp
mcast-group 225.1.1.2
member vni 10210
mcast-group 225.1.1.21
member vni 11111 associate-vrf
member vni 22222 associate-vrf
evpn
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
vni 10200 l2
rd auto
route-target import auto
route-target export auto
vni 10210 l2
rd auto
route-target import auto
route-target export auto
------------------------------------------- DC1 --------------------------------------------------
DC1-Spine1
interface Ethernet1/1
no switchport
ip address 17.1.1.1/24
no shutdown
DC1-Leaf1
vlan 200
name L2VNI1-2
vn-segment 10200
vlan 210
name L2VNI2-2
vn-segment 10210
vlan 222
name L3VNI-2
vn-segment 22222
vrf context Tenant-2
vni 22222
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
!
interface Vlan200
no shutdown
vrf member Tenant-2
ip address 192.168.2.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan210
no shutdown
vrf member Tenant-2
ip address 192.168.21.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan222
no shutdown
vrf member Tenant-2
no ip redirects
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10100
suppress-arp
mcast-group 225.1.1.1
member vni 10110
mcast-group 225.1.1.11
member vni 10200
suppress-arp
mcast-group 225.1.1.2
member vni 10210
mcast-group 225.1.1.21
member vni 11111 associate-vrf
member vni 22222 associate-vrf
evpn
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
vni 10200 l2
rd auto
route-target import auto
route-target export auto
vni 10210 l2
rd auto
route-target import auto
route-target export auto
DC1-Leaf2
vlan 200
name L2VNI1-2
vn-segment 10200
vlan 210
name L2VNI2-2
vn-segment 10210
vlan 222
name L3VNI-2
vn-segment 22222
vrf context Tenant-2
vni 22222
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
!
interface Vlan200
no shutdown
vrf member Tenant-2
ip address 192.168.2.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan210
no shutdown
vrf member Tenant-2
ip address 192.168.21.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan222
no shutdown
vrf member Tenant-2
no ip redirects
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10100
suppress-arp
mcast-group 225.1.1.1
member vni 10110
mcast-group 225.1.1.11
member vni 10200
suppress-arp
mcast-group 225.1.1.2
member vni 10210
mcast-group 225.1.1.21
member vni 11111 associate-vrf
member vni 22222 associate-vrf
evpn
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
vni 10200 l2
rd auto
route-target import auto
route-target export auto
vni 10210 l2
rd auto
route-target import auto
route-target export auto
interface Ethernet1/2
switchport access vlan 200
!
interface Ethernet1/3
switchport access vlan 100
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VXLAN Multi-Site Deployment Part 2
Route type 4 is election for DF among BGWs(different PIP) in same site-id;
VXLAN EVPN Multi-Site
From <https://nwktimes.blogspot.com/2019/08/vxlan-evpn-multi-site.html>
----------------------------------------------DC2 --------------------------------------------------
SuperSpine-RS
interface Ethernet1/1
no switchport
ip address 17.1.1.7/24
no shutdown
!
interface Ethernet1/2
no switchport
ip address 47.1.1.7/24
no shutdown
!
interface loopback0
ip address 7.7.7.7/32 tag 1234
nv overlay evpn
feature bgp
route-map REDIST-TO-BGP permit 10
match tag 1234
route-map RETAIN-NEXT-HOP permit 10
set ip next-hop unchanged
!
router bgp 777
router-id 7.7.7.7
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-BGP
address-family l2vpn evpn
retain route-target all //Route server has no tenant, retain RTs from DCs; default remove all RTs
template peer Control-Plane // overlay template
update-source loopback0
ebgp-multihop 5
address-family l2vpn evpn
send-community
send-community extended //send community attributes
route-map RETAIN-NEXT-HOP out / Route server next hop un-change; ebgp default will change nexthop
neighbor 11.11.11.11 // DC1 BGW/Spine loopback
inherit peer Control-Plane
remote-as 100
address-family l2vpn evpn
rewrite-evpn-rt-asn // ASN:RTs rewrite to local; 100 -> 200; normalizing the outgoing RT's AS number to match the remote AS number. Uses bgp configured neighbors remote AS
neighbor 17.1.1.1 // phy interface as update source
remote-as 100
address-family ipv4 unicast
neighbor 44.44.44.44 // DC2 BGW/Spine loopback
inherit peer Control-Plane
remote-as 200
address-family l2vpn evpn
rewrite-evpn-rt-asn // ASN:RTs rewrite to local; 200 -> 100
neighbor 47.1.1.1 // phy interface as update source
remote-as 200
address-family ipv4 unicast
DC1-Spine1 as BGW
interface loopback0 // PIP(Primary IP) NVE interface update source; local site; must advertise out to DCI due to BUM source update address
description VTEP-IP
ip address 1.1.1.1/32 tag 1234
!
interface loopback100 // EBGP EVPN update source to SuperSpine-RS
description EBGP-L2VPN-source
ip address 11.11.11.11/32 tag 1234
!
interface loopback101 // BGW VIP in same site; a Site maximum support 4 BGW share same VIP
description Anycast-BGW-IP
ip address 101.101.101.101/32 tag 1234
ip router ospf 1 area 0.0.0.0 // Received tables change to next hop address (VIP), must be in OSPF
route-map REDIST-TO-BGP permit 10
match tag 1234
router bgp 100
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-BGP // ebgp advertise Loopback0, 100, 101
neighbor 17.1.1.7
remote-as 777
address-family ipv4 unicast
DC-Spine1# show ip bgp summary
…
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
17.1.1.7 4 777 9 8 7 0 0 00:02:17 1
DC-Spine1# show ip route 7.7.7.7
…
7.7.7.7/32, ubest/mbest: 1/0
*via 17.1.1.7, [20/0], 00:02:38, bgp-100, external, tag 777
// enable VXLAN and make a BGW as Leaf for VXLAN encapsulation/decapsulation
feature nv overlay
feature vn-segment-vlan-based
feature interface-vlan
feature fabric forwarding
fabric forwarding anycast-gateway-mac 0000.2222.3333 //BGW doesn't connect clients, no need. But need L3VNI
hardware access-list tcam region vpc-convergence 0
hardware access-list tcam region arp-ether 256
// only extend Tenant-1 for now
vlan 100
name L2VNI1
vn-segment 10100
vlan 110
name L2VNI2
vn-segment 10110
vlan 111
name L3VNI
vn-segment 11111
!
vrf context Tenant-1
vni 11111
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
!
interface Vlan111 // L3VNI SVI required; L3VNI must forward VXLAN;
no shutdown
vrf member Tenant-1
no ip redirects
ip forward // L3 interface; for L3 routes
// no need for L2 Interfaces/anycast gateway
evpn //L2 VNI RT attributes
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
router bgp 100
neighbor 7.7.7.7 // evpn neighbour to router server
remote-as 777
update-source loopback100 // EBGP EVPN update source to SuperSpine-RS
ebgp-multihop 5
peer-type fabric-external // external interconection; bgw changes next-hop to VIP as BGW device end
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn // rewrite RT to local AS
evpn multisite border-gateway 11 // All BGWs site ID 11 is the same in the same site due to election for which L2VNI DF (in the same VIP) forward BUM traffic
delay-restore time 30
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0 // PIP interface
multisite border-gateway interface loopback101 // Assign which interface as VIP address
member vni 10100
multisite ingress-replication // BUM internal site through DCI links for unicast instead of multicast (within site); route type 3 BGP update
mcast-group 225.1.1.1
member vni 10110
multisite ingress-replication // BUM internal site through DCI links for unicast instead of multicast (within site); route type 3 BGP update
mcast-group 225.1.1.11
member vni 11111 associate-vrf
interface Ethernet1/1
evpn multisite dci-tracking // Track interface to DCI; if link down, multisite down & it is not a legit BGW anymore, no forwarding; update tables withdraw
!
interface Ethernet1/2-3
evpn multisite fabric-tracking
// When all of the DCI links of BGW are down, it stops advertising VIP address to Intra-Site peer just like in case of previously discussed Fabric-Link failure. Naturally, it also stops advertising routes learned via DCI link due to link failure. What it still does, it continues acting as a regular Leaf switch. If it has connected hosts or external peers, it continues to advertise prefix attached/learned from those.
From <https://nwktimes.blogspot.com/2019/08/vxlan-evpn-multi-site.html>
DC2-Spine1 as BGW
interface loopback0
description vtep-ip
ip address 4.4.4.4/32 tag 1234
!
interface loopback100
description ebgp-source
ip address 44.44.44.44/32 tag 1234
!
interface loopback202
description anycast-bgw
ip address 202.202.202.202/32 tag 1234
ip router ospf 1 area 0
route-map REDIST-TO-BGP permit 10
match tag 1234
router bgp 200
router-id 4.4.4.4
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-BGP
neighbor 47.1.1.7
remote-as 777
address-family ipv4 unicast
feature nv overlay
feature vn-segment-vlan-based
feature interface-vlan
feature fabric forwarding
vlan 100
name L2VNI1
vn-segment 10100
vlan 110
name L2VNI2
vn-segment 10110
vlan 111
name L3VNI
vn-segment 11111
vrf context Tenant-1
vni 11111
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
interface Vlan111
no shutdown
vrf member Tenant-1
no ip redirects
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
multisite border-gateway interface loopback202
member vni 10100
multisite ingress-replication
mcast-group 225.1.1.1
member vni 10110
multisite ingress-replication
mcast-group 225.1.1.11
member vni 11111 associate-vrf
evpn
vni 10100 l2
rd auto
route-target import auto
route-target export auto
vni 10110 l2
rd auto
route-target import auto
route-target export auto
router bgp 200
neighbor 7.7.7.7
remote-as 777
update-source loopback100
ebgp-multihop 5
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn
evpn multisite border-gateway 22
delay-restore time 30
interface Ethernet1/1
evpn multisite dci-tracking
!
interface Ethernet1/2-3
evpn multisite fabric-tracking
//SuperSpine-RS establishes 2 IPV4 bgp and 2 evpn bgp sessions
SuperSpine-RS# show bgp sessions
…
Neighbor ASN Flaps LastUpDn|LastRead|LastWrit St Port(L/R) Notif(S/R)
11.11.11.11 100 0 04:27:42|00:00:51|00:00:50 E 179/50821 0/0 // evpn E state
17.1.1.1 100 0 22:32:36|00:00:52|00:00:53 E 179/36619 0/0 // IPv4 E state
44.44.44.44 200 0 01:46:50|00:00:45|00:00:29 E 179/48265 0/0 // evpn E state
47.1.1.1 200 0 02:04:59|00:00:53|00:00:57 E 179/21160 0/0 // IPv4 E state
// SuperSpine-RS receives both DC EVPN update
SuperSpine-RS# show bgp l2vpn evpn
…
*>e[2]:[0]:[0]:[48]:[0050.7966.680e]:[32]:[192.168.1.100]/272
101.101.101.101 2000 0 100 i
…
*>e[2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272
101.101.101.101 2000 0 100 i
…
…
*>e[2]:[0]:[0]:[48]:[0050.7966.6811]:[32]:[192.168.1.200]/272
202.202.202.202 2000 0 200 i
// DC2-Spine1 received DC1 update, next hop to DC1's BGW 101.101.101.101
DC2-Spine1# show bgp l2vpn evpn
…
*>e[2]:[0]:[0]:[48]:[0050.7966.680e]:[32]:[192.168.1.100]/272
101.101.101.101 0 777 100 i
…
*>e[2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272
101.101.101.101
//DC2 Leaf1 received DC1 updates from DC2's BGW; next hop changes to 202.202.202.202
DC2-Leaf1# show bgp l2 evpn
…
*>i[2]:[0]:[0]:[48]:[0050.7966.6810]:[32]:[192.168.1.111]/272
202.202.202.202 100 0 777 100 i
…
*>i[2]:[0]:[0]:[48]:[0050.7966.680e]:[32]:[192.168.1.100]/272
202.202.202.202 100 0 777 100 i
…
*>l[2]:[0]:[0]:[48]:[0050.7966.6811]:[32]:[192.168.1.200]/272
5.5.5.5 100 32768 i
//DC1-Leaf1 also receives update from DC2
DC1-Leaf1# show bgp l2vpn evpn
…
*>i[2]:[0]:[0]:[48]:[0050.7966.6811]:[32]:[192.168.1.200]/272
101.101.101.101 100 0 777 200 I
DC1-Sev-T1> ping 192.168.1.200
84 bytes from 192.168.1.200 icmp_seq=1 ttl=64 time=158.149 ms
84 bytes from 192.168.1.200 icmp_seq=2 ttl=64 time=33.161 ms
84 bytes from 192.168.1.200 icmp_seq=3 ttl=64 time=30.136 ms
84 bytes from 192.168.1.200 icmp_seq=4 ttl=64 time=28.604 ms
84 bytes from 192.168.1.200 icmp_seq=5 ttl=64 time=91.652 ms
DC2-SER-T1> ping 192.168.1.100
84 bytes from 192.168.1.100 icmp_seq=1 ttl=64 time=55.927 ms
84 bytes from 192.168.1.100 icmp_seq=2 ttl=64 time=29.083 ms
84 bytes from 192.168.1.100 icmp_seq=3 ttl=64 time=36.205 ms
84 bytes from 192.168.1.100 icmp_seq=4 ttl=64 time=30.240 ms
84 bytes from 192.168.1.100 icmp_seq=5 ttl=64 time=32.530 ms
DC2-SER-T1> ping 192.168.1.111
84 bytes from 192.168.1.111 icmp_seq=1 ttl=64 time=153.937 ms
84 bytes from 192.168.1.111 icmp_seq=2 ttl=64 time=30.933 ms
84 bytes from 192.168.1.111 icmp_seq=3 ttl=64 time=33.956 ms
84 bytes from 192.168.1.111 icmp_seq=4 ttl=64 time=41.276 ms
84 bytes from 192.168.1.111 icmp_seq=5 ttl=64 time=27.872 ms
DC2-Spine1# show nve peers
Interface Peer-IP State LearnType Uptime Router-Mac
--------- --------------- ----- --------- -------- -----------------
nve1 1.1.1.1 Up CP 00:39:08 n/a
nve1 5.5.5.5 Up CP 00:38:57 5000.0008.0007
nve1 101.101.101.101 Up CP 00:39:08 0200.6565.6565
DC1-Leaf1# show ip route vrf tenant-1
…
192.168.1.100/32, ubest/mbest: 1/0, attached
*via 192.168.1.100, Vlan100, [190/0], 1d23h, hmm
192.168.1.111/32, ubest/mbest: 1/0
*via 3.3.3.3%default, [200/0], 1d21h, bgp-100, internal, tag 100 (evpn) segid: 11111 tunnelid: 0x3030303 encap: VXLAN
192.168.1.200/32, ubest/mbest: 1/0
*via 101.101.101.101%default, [200/0], 15:39:29, bgp-100, internal, tag 777 (evpn) segid: 11111 tunnelid: 0x65656565 encap: VXLAN
…
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VXLAN Multi-Tenant Design 1 -Firewall Insert
Centralized gateway VXLAN different tenants communications concerns:
Tenants communications between Border or service leaf nodes
Remote/other leaf nodes make traffic to border leaf
DC1-ASAv
interface GigabitEthernet0/0
nameif T1
security-level 50
ip address 192.168.11.253 255.255.255.0
No shut
!
interface GigabitEthernet0/1
nameif T2
security-level 100
ip address 192.168.21.253 255.255.255.0
No shut
route T1 192.168.1.0 255.255.255.0 192.168.11.254 1
route T2 192.168.2.0 255.255.255.0 192.168.21.254 1
access-list permit-all extended permit ip any any
access-group permit-all in interface T1
DC1-Leaf1
interface Ethernet1/3
description T1-ASAv
switchport access vlan 110
!
interface Ethernet1/4
description T2-ASAv
switchport access vlan 210
vrf context Tenant-1
ip route 192.168.2.0/24 192.168.11.253
!
vrf context Tenant-2
ip route 192.168.1.0/24 192.168.21.253
// static routes will generate route type 5 evpn update to other remote VTEPs
DC1-Leaf1# ping 192.168.21.253 vrf tenant-2
PING 192.168.21.253 (192.168.21.253): 56 data bytes
64 bytes from 192.168.21.253: icmp_seq=0 ttl=254 time=33.027 ms
64 bytes from 192.168.21.253: icmp_seq=1 ttl=254 time=6.36 ms
64 bytes from 192.168.21.253: icmp_seq=2 ttl=254 time=7.484 ms
64 bytes from 192.168.21.253: icmp_seq=3 ttl=254 time=5.234 ms
64 bytes from 192.168.21.253: icmp_seq=4 ttl=254 time=5.183 ms
DC1-Leaf1# ping 192.168.11.254 vrf tenant-1
PING 192.168.11.254 (192.168.11.254): 56 data bytes
64 bytes from 192.168.11.254: icmp_seq=0 ttl=255 time=7.156 ms
64 bytes from 192.168.11.254: icmp_seq=1 ttl=255 time=2.957 ms
64 bytes from 192.168.11.254: icmp_seq=2 ttl=255 time=0.385 ms
64 bytes from 192.168.11.254: icmp_seq=3 ttl=255 time=0.968 ms
64 bytes from 192.168.11.254: icmp_seq=4 ttl=255 time=0.69 ms
ip access-list D2_T1
10 permit ip 192.168.1.0/24 any
ip access-list D2_T2
10 permit ip 192.168.2.0/24 any
!
route-map D2_T2 permit 10
match ip address D2_T2
route-map D2_T1 permit 10
match ip address D2_T1
router bgp 100
vrf Tenant-1
address-family ipv4 unicast
redistribute static route-map D2_T2
vrf Tenant-2
address-family ipv4 unicast
redistribute static route-map D2_T1
// DC1-Leaf2 have received bgp updates in both tenants
DC1-Leaf2# show ip route 192.168.2.0 vrf tenant-1
…
192.168.2.0/24, ubest/mbest: 1/0
*via 2.2.2.2%default, [200/0], 00:01:30, bgp-100, internal, tag 100 (evpn) s
egid: 11111 tunnelid: 0x2020202 encap: VXLAN
DC1-Leaf2# show ip route 192.168.1.0 vrf tenant-2
…
192.168.1.0/24, ubest/mbest: 1/0
*via 2.2.2.2%default, [200/0], 00:41:46, bgp-100, internal, tag 100 (evpn) s
egid: 22222 tunnelid: 0x2020202 encap: VXLAN
// DC1-Leaf2 have EVPN routes (static) as type 5
DC1-Leaf2# show bgp l2 evpn route-type 5
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 2.2.2.2:3
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.2.0]/224, version 35143
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 1 destination(s)
Gateway IP: 0.0.0.0
AS-Path: NONE, path sourced internal to AS
2.2.2.2 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 11111
Extcommunity: RT:100:11111 ENCAP:8 Router MAC:5000.0007.0007
Originator: 2.2.2.2 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
Route Distinguisher: 2.2.2.2:4
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.1.0]/224, version 35014
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 1 destination(s)
Gateway IP: 0.0.0.0
AS-Path: NONE, path sourced internal to AS
2.2.2.2 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 22222
Extcommunity: RT:100:22222 ENCAP:8 Router MAC:5000.0007.0007
Originator: 2.2.2.2 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
DC1-XP-T1> ping 192.168.2.100
84 bytes from 192.168.2.100 icmp_seq=1 ttl=59 time=70.994 ms
84 bytes from 192.168.2.100 icmp_seq=2 ttl=59 time=45.666 ms
84 bytes from 192.168.2.100 icmp_seq=3 ttl=59 time=76.157 ms
84 bytes from 192.168.2.100 icmp_seq=4 ttl=59 time=69.631 ms
84 bytes from 192.168.2.100 icmp_seq=5 ttl=59 time=46.373 ms
DC1-XP-T1> trace 192.168.2.100
trace to 192.168.2.100, 8 hops max, press Ctrl+C to stop
1 192.168.1.254 6.524 ms 4.892 ms 4.671 ms
2 0.0.0.0 15.516 ms 8.592 ms 7.993 ms
3 192.168.1.254 13.693 ms 10.957 ms 12.598 ms
4 192.168.21.254 29.353 ms 29.364 ms 23.273 ms
5 192.168.2.254 34.362 ms 35.954 ms 37.087 ms
6 *192.168.2.100 33.560 ms (ICMP type:3, code:3, Destination port unreachable)
// DC2-LEAF1 has 192.168.2.0 in tenant 1 but NOT 192.168.1.0 in tenant 2 because DCI only allows tenant 1
DC2-Leaf1# show ip route 192.168.2.0 vrf tenant-1
IP Route Table for VRF "Tenant-1"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.2.0/24, ubest/mbest: 1/0
*via 202.202.202.202%default, [200/0], 00:08:01, bgp-200, internal, tag 777
(evpn) segid: 11111 tunnelid: 0xcacacaca encap: VXLAN
DC2-Leaf1# show ip route 192.168.1.0 vrf tenant-2
IP Route Table for VRF "Tenant-2"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Route not found
// Tenant 1 in DC2 still able to ping Tenant 2 endpoints; DC2-Leaf1 --> DC2-Spine1 --> SuperSpine -> DC1-Spine1 -> DC1-Leaf1 --> Tenant2
// question: T2 hosts in DC2, can ping T1 hosts? What is the issue; how to fix?
// Tenant 2 in DC2 can't ping Tenant 1 endpoints; because DC2 Leafs don't have routes to 192.168.2.0; so we need to make DCI for Tenant2
DC2-SER-T1> ping 192.168.2.100
84 bytes from 192.168.2.100 icmp_seq=1 ttl=58 time=141.304 ms
84 bytes from 192.168.2.100 icmp_seq=2 ttl=58 time=58.860 ms
84 bytes from 192.168.2.100 icmp_seq=3 ttl=58 time=92.792 ms
84 bytes from 192.168.2.100 icmp_seq=4 ttl=58 time=58.596 ms
84 bytes from 192.168.2.100 icmp_seq=5 ttl=58 time=75.839 ms
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Custom Tenant Internet Access
Centralized route leaking scenario:
Host route communication between different tenants on same leaf
Host route communication between different tenants on different leafs
A host communication with another host in different tenants through external network
Leveraging import and export RT attributes to achieve routes advertisement in different tenants
Centralized route leaking uses default route for traffic goes to border Leaf node
Every VRF in each leaf has to configure route leaking used for leaking RT attributes
Hosts route on local leaf (HMM type) does not leak on its own leaf node. Redistribution of HMM is needed.
At Different VRF routes on local leaf node, it routes to centralized border node through a default route and then forwarded to border node; routes the traffic as route table on border node.
Tenant 3 : Share tenant for internet at DC2
Tenant 4 : Share tenant for internet at DC1
Requirement 1: Border leaf T3 Internet
DC2-Leaf1, DC2-BorderLeaf:
vlan 300
name L2VNI1-3
vn-segment 10300
vlan 310
name L2VNI2-3
vn-segment 10310
vlan 333
name L3VNI-3
vn-segment 33333
vrf context tenant-3
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vni 33333
interface Vlan300
no shutdown
vrf member tenant-3
ip address 192.168.3.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan310
no shutdown
vrf member tenant-3
ip address 192.168.31.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan333
no shutdown
vrf member tenant-3
no ip redirects
ip forward
interface nve1
member vni 10300
mcast-group 225.1.1.3
member vni 10310
mcast-group 225.1.1.31
member vni 33333 associate-vrf
evpn
vni 10300 l2
rd auto
route-target import auto
route-target export auto
vni 10310 l2
rd auto
route-target import auto
route-target export auto
DC2-Leaf1 only
interface Ethernet1/3
switchport access vlan 300
DC2-Spine1
vlan 333
name L3VNI-3
vn-segment 33333
vrf context tenant-3
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vni 33333
interface Vlan333
no shutdown
vrf member tenant-3
no ip redirects
ip forward
interface nve1
member vni 33333 associate-vrf
Cloud-WAN
interface GigabitEthernet0/3
ip address 128.61.0.254 255.255.255.240
no ip proxy-arp
no shutdown
!
interface GigabitEthernet0/0
ip address 63.130.0.254 255.255.255.0
ip route 128.61.0.0 255.255.255.252 128.61.0.253
DC2-WAN
interface GigabitEthernet0/2
ip address 128.61.0.253 255.255.255.240
no ip proxy-arp
no shutdown
!
interface GigabitEthernet0/3
ip address 128.61.0.2 255.255.255.252
no ip proxy-arp
ip ospf network point-to-point
no shutdown
router ospf 1
network 128.61.0.0 0.0.0.3 area 0
default-information originate
ip route 0.0.0.0 0.0.0.0 128.61.0.254
DC2-BorderLeaf
// based on per vrf lite, DC2-Borderleaf connects to WAN Router or Edge Router
interface Ethernet1/2
no switchport
vrf member tenant-3
ip address 128.61.0.1/30
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
no shutdown
route-map rmap-BGP-to-OSPF permit 10
match route-type internal
route-map rmap-OSPF-to-BGP permit 10 // Match any
router bgp 200
vrf tenant-3
address-family ipv4 unicast
network 0.0.0.0/0 // Even though default route learned from WAN but still need to network a default route, otherwise it can't redistribute
redistribute ospf 1 route-map rmap-OSPF-to-BGP // Redistribute all OSPF internal routes into BGP; default route into BGP to other leafs
router ospf 1
vrf tenant-3
redistribute bgp 200 route-map rmap-BGP-to-OSPF // Redistribute all tenant 3 routes to OSPF to WAN
summary-address 192.168.3.0/24 // Summary tenant 3 routes; all routes ends to DC2-WAN and will not populate to Cloud-WAN
// 1. If leverage iGP to WAN, it needs to redistribute OSPF into BGP, vise versa, if leverage eBGP, it doesn't have to redistribute, because iBGP populates to eBGP.
// 2. Many ways to use route-maps, match route-type internal is one of them.
DC2-WAN#show ip route
…
O E2 192.168.3.0/24 [110/1] via 128.61.0.1, 00:01:48, GigabitEthernet0/3
DC2-BorderLeaf# show ip route vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 128.61.0.2, Eth1/2, [110/1], 02:02:30, ospf-1, type-2, tag 1
…
DC2-Leaf1# show ip route vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 00:37:34, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
…
DC2-Leaf1# show bgp vpnv4 unicast vrf tenant-3
…
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 5.5.5.5:5 (VRF tenant-3)
*>i0.0.0.0/0 6.6.6.6 100 0 i
BGP L2VPN EVPN
BGP VPNV4 Unicast // type 5
IP Route // best/optimal route table
DC2-Spine1# show ip route vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 01:26:21, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
192.168.3.100/32, ubest/mbest: 1/0
*via 5.5.5.5%default, [200/0], 00:51:38, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x5050505 encap: VXLAN
DC2-WAN configure NAT address translation
public address 128.61.0.252 static translate for 192.168.3.100
Public address 128.61.0.253 port 80 translated to internal address 192.168.1.200 port 80
DC2-WAN
interface GigabitEthernet0/2
ip nat outside
!
interface GigabitEthernet0/3
ip nat inside
no ip http server
ip nat inside source static 192.168.1.100 128.61.0.251
ip nat inside source static tcp 192.168.1.200 80 128.61.0.253 80 extendable
ip nat inside source static 192.168.3.100 128.61.0.252
VPCS> ping 128.61.0.252
84 bytes from 128.61.0.252 icmp_seq=1 ttl=59 time=142.041 ms
84 bytes from 128.61.0.252 icmp_seq=2 ttl=59 time=22.491 ms
84 bytes from 128.61.0.252 icmp_seq=3 ttl=59 time=38.776 ms
84 bytes from 128.61.0.252 icmp_seq=4 ttl=59 time=23.589 ms
84 bytes from 128.61.0.252 icmp_seq=5 ttl=59 time=17.518 ms
DC2-T3-XP> ping 63.130.0.199
84 bytes from 63.130.0.199 icmp_seq=1 ttl=59 time=57.689 ms
84 bytes from 63.130.0.199 icmp_seq=2 ttl=59 time=26.722 ms
84 bytes from 63.130.0.199 icmp_seq=3 ttl=59 time=28.521 ms
84 bytes from 63.130.0.199 icmp_seq=4 ttl=59 time=20.654 ms
84 bytes from 63.130.0.199 icmp_seq=5 ttl=59 time=50.418 ms
DC2-WAN#show ip nat translations
Pro Inside global Inside local Outside local Outside global
--- 128.61.0.251 192.168.1.100 --- ---
tcp 128.61.0.253:80 192.168.1.200:80 --- ---
--- 128.61.0.252 192.168.3.100 --- ---
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Inter-Tenant Communication and Internet Access
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Requirement 1: T1 and T3 Communications and T1 access Internet via T3
VXLAN multi-tenants design is flexible for production environment, and it meets various businesses and remains simple, easy to configure and understand.
T3 as shared internet tenants in DC2, can be deployed with public services, such as DNS, DHCP, NTP and etc.
T3 serves for T1,T2 and Tn multi-tenants for internet access
Whether leaking is required is based on RT attributes between T1 and T2 or Tn
Leveraging route-maps is ideal way to control some routes leak into other tenants
DC1-Leaf1, DC1-Leaf2 (will not be used as T3)
vlan 300
name L2VNI1-3
vn-segment 10300
vlan 310
name L2VNI2-3
vn-segment 10310
vlan 333
name L3VNI-3
vn-segment 33333
vrf context tenant-3
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vni 33333
interface Vlan300
no shutdown
vrf member tenant-3
ip address 192.168.3.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan310
no shutdown
vrf member tenant-3
ip address 192.168.31.254/24
fabric forwarding mode anycast-gateway
!
interface Vlan333
no shutdown
vrf member tenant-3
no ip redirects
ip forward
interface nve1
member vni 10300
mcast-group 225.1.1.3
member vni 10310
mcast-group 225.1.1.31
member vni 33333 associate-vrf
evpn
vni 10300 l2
rd auto
route-target import auto
route-target export auto
vni 10310 l2
rd auto
route-target import auto
route-target export auto
// then configure this:
vrf context Tenant-1
address-family ipv4 unicast
route-target both 100:200
route-target both 100:200 evpn
!
vrf context tenant-3
address-family ipv4 unicast
route-target both 100:200
route-target both 100:200 evpn
DC2-Leaf1 and DC2 BorderLeaf
vrf context Tenant-1
address-family ipv4 unicast
route-target both 200:200 // manual configure 200:200 RT in T1; export matches import for leaking
route-target both 200:200 evpn // Same for multisite via EVPN
!
vrf context tenant-3
address-family ipv4 unicast
route-target both 200:200 // manual configure same 200:200 RT in T3
route-target both 200:200 evpn // Same for multisite via EVPN
// Note:
DC2 BorderLeaf Only
router bgp 200
vrf tenant-3
address-family ipv4 unicast
network 0.0.0.0/0 // Already configured in previous lab; populate default route as route-type 5 in T3
router ospf 1
vrf tenant-3
summary-address 192.168.1.0/24 // T3 summary T1 subnet
// T3 default route leaked into T1; will go to T3
DC2-BorderLeaf# show ip route 0.0.0.0. vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 128.61.0.2%tenant-3, Eth1/2, [20/0], 15:48:27, bgp-200, external, tag 200
// DC2-Leaf1 T3 default route leaked into T1; will go to T3 (segid: 33333 )
DC2-Leaf1# show ip route vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 16:05:30, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
// T3 host route leaked into T1
DC2-BorderLeaf# show ip route 192.168.3.100 vrf tenant-1
…
192.168.3.100/32, ubest/mbest: 1/0
*via 5.5.5.5%default, [200/0], 15:48:41, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x5050505 encap: VXLAN
// DC1 T1 routes leaked into T3
DC2-BorderLeaf# show ip rout 192.168.1.100 vrf tenant-3
…
192.168.1.100/32, ubest/mbest: 1/0
*via 202.202.202.202%default, [200/0], 15:45:21, bgp-200, internal, tag 777 (evpn) segid: 11111 tunnelid: 0xcacacaca encap: VXLAN
// DC2 T1 routes leaked into T3
DC2-BorderLeaf# show ip rout 192.168.1.200 vrf tenant-3
…
192.168.1.200/32, ubest/mbest: 1/0
*via 5.5.5.5%default, [200/0], 15:52:05, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x5050505 encap: VXLAN
// Find where default imported from
DC2-BorderLeaf# show bgp vpnv4 unicast 0.0.0.0 vrf Tenant-1
BGP routing table information for VRF default, address family VPNv4 Unicast
Route Distinguisher: 6.6.6.6:4 (VRF Tenant-1)
BGP routing table entry for 0.0.0.0/0, version 42
Paths: (1 available, best #1)
Flags: (0x8008001a) (high32 00000000) on xmit-list, is in urib, is best urib route
vpn: version 62, (0x100002) on xmit-list
Advertised path-id 1, VPN AF advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop, in rib
Imported from 6.6.6.6:5:0.0.0.0/0 (VRF tenant-3)
AS-Path: NONE, path locally originated
0.0.0.0 (metric 0) from 0.0.0.0 (6.6.6.6)
Origin IGP, MED not set, localpref 100, weight 32768
Extcommunity: RT:200:200 RT:200:33333
…
DC2-BorderLeaf# show vrf
VRF-Name VRF-ID State Reason
Tenant-1 4 Up --
Tenant-2 3 Up --
default 1 Up --
management 2 Up --
tenant-3 5 Up --
// T1 Ping T3 on same leaf
DC2-T3-XP> ping 192.168.1.200
84 bytes from 192.168.1.200 icmp_seq=1 ttl=61 time=64.358 ms
84 bytes from 192.168.1.200 icmp_seq=2 ttl=61 time=26.183 ms
84 bytes from 192.168.1.200 icmp_seq=3 ttl=61 time=25.004 ms
84 bytes from 192.168.1.200 icmp_seq=4 ttl=61 time=24.237 ms
84 bytes from 192.168.1.200 icmp_seq=5 ttl=61 time=46.338 ms
DC2-T3-XP> trace 192.168.1.200
trace to 192.168.1.200, 8 hops max, press Ctrl+C to stop
1 192.168.3.254 16.021 ms 6.328 ms 7.718 ms
2 0.0.0.0 25.033 ms 13.286 ms 12.530 ms
3 128.61.0.1 30.801 ms 12.114 ms 11.098 ms
4 * * *
5 192.168.1.254 18.915 ms 21.472 ms 19.290 ms
6 *192.168.1.200 39.075 ms (ICMP type:3, code:3, Destination port unreachable)
// local leaf inter tenant communication via border leaf
DC2-Leaf1# show ip route 192.168.1.200 vrf tenant-3
…
192.168.1.0/24, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 15:52:18, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Requirement 2: Suboptimal routing inter-tenant communication
DC1-Leaf1, DC1-Leaf2, DC2-Leaf1 and DC2 BorderLeaf (All leafs)
// At this moment, local leaf don't have specific routes (/32 routes) to other tenants
DC2-Leaf1# show ip route 192.168.1.200 vrf tenant-3
…
192.168.1.0/24, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 17:27:58, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
DC2-Leaf1# show ip route 192.168.3.100 vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 17:58:21, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
// At all leaf redistribute local direct connected hmm routes
// Route-map matches all host routes ONLY
ip prefix-list PL_DENY_EXPORT seq 5 permit 0.0.0.0/0
route-map RM_DENY_IMPORT deny 10
match ip address prefix-list PL_DENY_EXPORT
route-map RM_DENY_IMPORT permit 20
!
router bgp 200 / router bgp 100
vrf Tenant-1
address-family ipv4 unicast
redistribute hmm route-map RM_DENY_IMPORT // T3 host routes redistributed into T1
vrf tenant-3
address-family ipv4 unicast
redistribute hmm route-map RM_DENY_IMPORT // T1 host routes redistributed into T3
// ebgp /32 routes
DC2-Leaf1# show ip route 192.168.1.200 vrf tenant-3
192.168.1.200/32, ubest/mbest: 1/0, attached
*via 192.168.1.200%Tenant-1, Vlan100, [20/0], 00:01:57, bgp-200, external, tag 200
DC2-Leaf1# show ip route 192.168.3.100 vrf tenant-1
…
192.168.3.100/32, ubest/mbest: 1/0, attached
*via 192.168.3.100%tenant-3, Vlan300, [20/0], 00:01:43, bgp-200, external, tag 200
// traffic inter-tenant directly via local leaf; ttl is 63
DC2-T3-XP> ping 192.168.1.200
84 bytes from 192.168.1.200 icmp_seq=1 ttl=63 time=17.641 ms
84 bytes from 192.168.1.200 icmp_seq=2 ttl=63 time=11.193 ms
84 bytes from 192.168.1.200 icmp_seq=3 ttl=63 time=9.082 ms
84 bytes from 192.168.1.200 icmp_seq=4 ttl=63 time=15.607 ms
84 bytes from 192.168.1.200 icmp_seq=5 ttl=63 time=17.565 ms
DC2-T3-XP> trace 192.168.1.200
trace to 192.168.1.200, 8 hops max, press Ctrl+C to stop
1 192.168.3.254 9.265 ms 6.003 ms 4.693 ms
2 *192.168.1.200 10.241 ms (ICMP type:3, code:3, Destination port unreachable)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Requirement 3: Inter-Tenants communications across DCs
DC1 has no turn routes to Tenant-3
// DC2-T3 can't ping 192.168.1.100 in DC1 while DC2-Leaf1 has host route
DC2-T3-XP> ping 192.168.1.100
192.168.1.100 icmp_seq=1 timeout
192.168.1.100 icmp_seq=1 timeout
192.168.1.100 icmp_seq=1 timeout
// DC2-Leaf1 sends to borderleaf as/32 host route
DC2-Leaf1# show ip route 192.168.1.100 vrf tenant-3
…
192.168.1.100/32, ubest/mbest: 1/0
*via 202.202.202.202%default, [200/0], 20:03:29, bgp-200, internal, tag 777 (evpn) segid: 11111 tunnelid: 0xcacacaca encap: VXLAN
// DC2-BorderLeaf sends back to 202.202.202.202 (DC2-Spine1 Anycast-BGW)
DC2-BorderLeaf# show ip route 192.168.1.100 vrf tenant-3
..
192.168.1.100/32, ubest/mbest: 1/0
*via 202.202.202.202%default, [200/0], 22:40:22, bgp-200, internal, tag 777 (evpn) segid: 11111 tunnelid: 0xcacacaca encap: VXLAN
DC-Spine1# show ip route 192.168.1.100 vrf tenant-1
….
192.168.1.100/32, ubest/mbest: 1/0
*via 2.2.2.2%default, [200/0], 23:38:05, bgp-100, internal, tag 100 (evpn) segid: 11111 tunnelid: 0x2020202 encap: VXLAN
DC1-Leaf1# show ip route 192.168.1.100 vrf tenant-1
192.168.1.100/32, ubest/mbest: 1/0, attached
*via 192.168.1.100, Vlan100, [190/0], 1w2d, hmm
Why DC1 has no routes?
// DC1-Leafs have no route to 192.168.3.100
DC1-Leaf1# show ip route 192.168.3.100 vrf tenant-1
…
Route not found
DC1-Leaf2# show ip route 192.168.3.100 vrf tenant-1
…
Route not found
// DC1-Leafs even has no default route
DC1-Leaf1# show ip route 0.0.0.0 vrf tenant-1
…
Route not found
DC2-T3-XP> ping 192.168.1.100
192.168.1.100 icmp_seq=1 timeout
192.168.1.100 icmp_seq=2 timeout
192.168.1.100 icmp_seq=3 timeout
DC2-T3-XP> ping 192.168.1.111
192.168.1.111 icmp_seq=1 timeout
192.168.1.111 icmp_seq=2 timeout
192.168.1.111 icmp_seq=3 timeout
// DCI doesn't support Tennant-3 yet; Default route leaking should also apply to /32 host route
// T3 redistributes default route into T1, T1 will not populate to others; unless it is origin
// DC2-Leaf1 will not populate default route to others;
// DC2-BorderLeaf will not populate to others either; it is not RR or BGW
DC2 BorderLeaf
// eBGP default route from T3 to T1 filter out. And then add T1 originate a defaults route to T3
ip prefix-list PL_DENY_EXPORT seq 5 permit 0.0.0.0/0
route-map RM_DENY_IMPORT deny 10
match ip address prefix-list PL_DENY_EXPORT
route-map RM_DENY_IMPORT permit 20
vrf context Tenant-1
ip route 0.0.0.0/0 Vlan333 vrf tenant-3 // add new default route to T3; Vlan333 is T3 VNI-L3 interface
address-family ipv4 unicast
import map RM_DENY_IMPORT // Filter out default route from T3
router bgp 200
vrf Tenant-1
address-family ipv4 unicast
network 0.0.0.0/0 // default route origin
// default route was seg-id 33333, now IT becomes origin segid: 11111 from T1 itself; next-hop border leaf
DC2-Leaf1# show ip route vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 00:16:27, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x6060606 encap: VXLAN
// DC2-Spine1 has the default route also
DC2-Spine1# show ip route vrf tenant-1
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 00:19:22, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x6060606 encap: VXLAN
// DC2 T3 host can ping DC1 T1 host now.
DC2-T3-XP> ping 192.168.1.100
84 bytes from 192.168.1.100 icmp_seq=1 ttl=58 time=65.516 ms
84 bytes from 192.168.1.100 icmp_seq=2 ttl=58 time=43.809 ms
84 bytes from 192.168.1.100 icmp_seq=3 ttl=58 time=45.251 ms
84 bytes from 192.168.1.100 icmp_seq=4 ttl=58 time=40.493 ms
84 bytes from 192.168.1.100 icmp_seq=5 ttl=58 time=42.847 ms
// DC1-Leaf1 has default route to DC2 T3 host; segid 111
DC1-Leaf1# show ip route 192.168.3.100 vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 101.101.101.101%default, [200/0], 00:23:11, bgp-100, internal, tag 777 (evpn) segid: 11111 tunnelid: 0x65656565 encap: VXLAN
// DC2-Leaf1 sees two default routes in tenant 1
DC2-Leaf1# SHOW BGP VPNv4 unicast 0.0.0.0 vrf tenant-1
…
Advertised path-id 1, VPN AF advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 6.6.6.6:4:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 11111
Extcommunity: RT:200:200 RT:200:11111 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop
Imported from 6.6.6.6:5:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 33333
Extcommunity: RT:200:200 RT:200:33333 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
…
// DC2-Leaf1 sees two default routes in tenant 3
DC2-Leaf1# SHOW BGP VPNv4 unicast 0.0.0.0 vrf tenant-3
…
Advertised path-id 1, VPN AF advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 6.6.6.6:4:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 11111
Extcommunity: RT:200:200 RT:200:11111 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop
Imported from 6.6.6.6:5:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 33333
Extcommunity: RT:200:200 RT:200:33333 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
…
// DC2-Leaf1 is using /32 host route to DC1 T1 Host
DC1-Leaf1# show ip route 192.168.1.100 vrf tenant-1
…
192.168.1.100/32, ubest/mbest: 1/0, attached
*via 192.168.1.100, Vlan100, [190/0], 1w5d, hmm
// DC2-Spine1 default route to borderLeaf then to DC1-Leaf1, because DC1 has not Tenant-3
DC2-Spine1# show ip route 192.168.3.100 vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 01:03:30, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x6060606 encap: VXLAN
// DC1 T1 host can also ping Internet
DC1-XP-T1> ping 8.8.8.8
84 bytes from 8.8.8.8 icmp_seq=2 ttl=250 time=34.232 ms
84 bytes from 8.8.8.8 icmp_seq=3 ttl=250 time=38.730 ms
84 bytes from 8.8.8.8 icmp_seq=4 ttl=250 time=28.285 ms
84 bytes from 8.8.8.8 icmp_seq=5 ttl=250 time=33.120 ms
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Requirement 4: DC2-Leaf1 manipulates default route from T1 and T3
Choosing default route from T3 is ideal
// All T1 & T3 default routes use segid 11111
DC2-Leaf1# show ip route 0.0.0.0 vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 17:04:31, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x6060606 encap: VXLAN
DC2-Leaf1# show ip route 0.0.0.0 vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 17:04:38, bgp-200, internal, tag 200 (evpn) segid: 11111 tunnelid: 0x6060606 encap: VXLAN
// Matches RT attribute
DC2-Leaf1# show bgp vpnv4 unicast 0.0.0.0 vrf tenant-3
…
Advertised path-id 1, VPN AF advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 6.6.6.6:4:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 11111
Extcommunity: RT:200:200 RT:200:11111 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop
Imported from 6.6.6.6:5:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 33333
Extcommunity: RT:200:200 RT:200:33333 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
…
DC2 -Leaf1
ip prefix-list default seq 5 permit 0.0.0.0/0 // Match default route
ip extcommunity-list standard extrt permit rt 200:33333 // default route from T1 and T3; Match T3 extended community 200:3333
route-map MYDC permit 10
match ip address prefix-list default
match extcommunity extrt
set local-preference 200
route-map MYDC permit 20
vrf context Tenant-1
address-family ipv4 unicast
import map MYDC
vrf context tenant-3
address-family ipv4 unicast
import map MYDC
// DC2-Leaf 1 has chosen default route from T3
DC2-Leaf1# show ip route 0.0.0.0 vrf tenant-1
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 00:15:21, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
DC2-Leaf1# show ip route 0.0.0.0 vrf tenant-3
…
0.0.0.0/0, ubest/mbest: 1/0
*via 6.6.6.6%default, [200/0], 00:15:18, bgp-200, internal, tag 200 (evpn) segid: 33333 tunnelid: 0x6060606 encap: VXLAN
// DC2-Leaf 1 has chosen default route from T3
DC2-Leaf1# show bgp vpnv4 unicast 0.0.0.0 vrf tenant-3
…
Path type: internal, path is valid, not best reason: Local Preference, no labeled nexthop
Imported from 6.6.6.6:4:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 100, weight 0
Received label 11111
Extcommunity: RT:200:200 RT:200:11111 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
Advertised path-id 1, VPN AF advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 6.6.6.6:5:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
6.6.6.6 (metric 81) from 4.4.4.4 (4.4.4.4)
Origin IGP, MED not set, localpref 200, weight 0
Received label 33333
Extcommunity: RT:200:200 RT:200:33333 ENCAP:8 Router MAC:5000.000a.0007
Originator: 6.6.6.6 Cluster list: 4.4.4.4
…
No comments:
Post a Comment