VXLAN MP-BGP EVPN Part 2

VXLAN MP-BGP EVPN Configuration

In my previous post found here, I outlined at a high level the VXLAN MP-BGP EVPN solution and defined some of the key features and advantages of using it. In Part 2 of this series I will dig into the CLI and work through an example configuration and verification of the solution.

Configuration

The following configuration will be based on a common Spine/Leaf architect where there will be 2 x 9508s as the Spine nodes and 4 x 9396s as the Leaf nodes. It is a layer 3 fabric with single area OSPF as the IGP. Each Leaf will peer with the Spine nodes using OSPF on its P2P links and BGP will be sourced from the loopbacks. Since  iBGP is being used within the fabric, the Spine nodes will also be Route Reflectors. Although I am using iBGP, this solution can work just as well with eBGP. For example the Spines can be in one AS while each Leaf or pair of Leaf switches can be in a different AS. Although the majority of the configuration would be the same, there are a few key changes I may address in a later post.

The control plane will also use ARP suppression. This will still require multicast for BUM traffic in the underlay so the 2 spine switches will also function as anycast rendevous points. For brevity I am going to skip the configuration of the underlay (IGP/Multicast) but will include the BGP configuration since it has specific EVPN commands.

The 4 leaf nodes will be setup in vPC domains 50 and 51. Each vPC domain will have a shared anycast VTEP address as well as distributed anycast gateways. Attached to each vPC domain will be a host for generating traffic.

figure 2: Physical Topology

 

VXLAN_1

Before getting started on the overlay I have configured the fabric using OSPF and setup point-to-point adjacencies between the Spine and Leaf nodes. Although not shown in the configuration for fastest convergence it would be a good idea to make use of BFD on the routing adjacencies.

I have also setup a Layer 3 link between the vPC peers to provide redundancy and failover when a VTEP looses all of its uplinks to the Spine nodes. This adjacency can be an SVI over the peer-link or a discreet routed link. PIM must be enabled on this link as well.

I am using loopback0 as the management address as well as the RID for OSPF and BGP. I also created loopback1 for sourcing VTEP traffic. Since VXLAN adds about 50 bytes to the header I also running Jumbo frames on my p2p links.

Something else to take into consideration is that you may have to change the switching mode from form cut through to store and forward. This does require a reload.

Each Spine node is peering with each Leaf node.

SPINE-A(config)# sh ip ospf neighbors
 OSPF Process ID 1 VRF default
 Total number of neighbors: 4
 Neighbor ID     Pri State           Up Time Address        Interface
 10.254.1.3       1 FULL/ -         00:26:24 172.16.30.2    Eth5/2
 10.254.1.4       1 FULL/ -         00:26:25 172.16.30.10   Eth5/3
 10.254.1.5       1 FULL/ -         00:26:24 172.16.30.18   Eth5/4
 10.254.1.6       1 FULL/ -         00:26:25 172.16.30.26   Eth5/5
SPINE-B(config)# sh ip ospf neighbors
 OSPF Process ID 1 VRF default
 Total number of neighbors: 4
 Neighbor ID     Pri State           Up Time Address        Interface
 10.254.1.3       1 FULL/ -         00:27:30 172.16.30.6    Eth5/2
 10.254.1.4       1 FULL/ -         00:27:29 172.16.30.14   Eth5/3
 10.254.1.5       1 FULL/ -         00:27:28 172.16.30.22   Eth5/4
 10.254.1.6       1 FULL/ -         00:27:28 172.16.30.30   Eth5/5

The Anycast RP address is 10.1.1.1/32. All the Leafs are statically mapped to the anycast address as well.

SPINE-A# sh ip pim rp
 PIM RP Status Information for VRF "default"
 BSR disabled
 Auto-RP disabled
 BSR RP Candidate policy: None
 BSR RP policy: None
 Auto-RP Announce policy: None
 Auto-RP Discovery policy: None

Anycast-RP 10.1.1.1 members:
  10.254.1.1* 10.254.1.2 
RP: 10.1.1.1*, (0), uptime: 00:32:55, expires: never,
  priority: 0, RP-source: (local), group ranges:
      239.0.0.0/8

 LEAF-A# sh ip pim rp
 PIM RP Status Information for VRF "default"
 BSR disabled
 Auto-RP disabled
 BSR RP Candidate policy: None
 BSR RP policy: None
 Auto-RP Announce policy: None
 Auto-RP Discovery policy: None

RP: 10.1.1.1, (0), uptime: 00:44:50, expires: never,
 priority: 0, RP-source: (local), group ranges:
     239.0.0.0/8   

After gettting the underlay setup, the next step will be to setup the vPC domains. LEAF-A and B will form vPC domain 50 while LEAF-C and D will form vPC domin 51. The normal vPC configuration applies. Since both vPC peer switches will be L3 proxying for each other you must use the peer-gateway command under the vPC configuration.

LEAF-A

feature lacp
feature vpc

vpc domain 50
 peer-keepalive destination 10.160.12.41
 peer-gateway
 ip arp synchronize

int e1/1
 channel-group 50 mode active
 no sh

int e2/1
 channel-group 50 mode active
 no shut

interface port-channel50
 switchport mode trunk
 switchport trunk allowed vlan 99,200-201,500
 vpc peer-link

The vPC is up and consistent. Notice the VLANs across the peer-link. Here I am allowing VLANs 200-201 which will be used for data, 99 which is used for the L3 peering and VLAN 500 which will be the used with the L3 VNI.

LEAF-A# sh vpc br
Legend:
(*) - local vPC is down, forwarding via vPC peer-link
vPC domain id                     : 50
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                         : secondary
Number of vPCs configured         : 1
Peer Gateway                     : Enabled
Dual-active excluded VLANs       : -
Graceful Consistency Check       : Enabled
Auto-recovery status             : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1   Po50   up     99,200-201,500

Now that the peer-link is setup you can see the OSPF peering between the 2 Leaf nodes that form a vPC domain. Interface vlan 99 is created on each switch to support this.

LEAF-A# sh ip ospf neighbors
 OSPF Process ID 1 VRF default
 Total number of neighbors: 3
 Neighbor ID     Pri State           Up Time Address         Interface
 10.254.1.1       1 FULL/ -         00:15:46 172.16.30.1     Eth1/2
 10.254.1.2       1 FULL/ -         00:15:42 172.16.30.5     Eth1/3
 10.254.1.4       1 FULL/ -         00:03:27         Vlan99

Since we will be using the vPC VTEP feature, the next step will be to add a secondary address to the VTEP loopbacks. This will be used to source VXLAN traffic. This will be a shared secondary IP address between the vPC peers.

Start the VXLAN specific configuration by turning on the necessary features for VXLAN and MP-BGP EVPN

feature bgp
feature nv overlay
feature vn-segment-vlan-based
nv overlay evpn

For the vPC peers I am going add a secondary address to the VTEP loopbacks. I repeat this process on Leaf C and D.

LEAF-A
 
 interface loopback1
 description For VTEP
 ip address 10.180.1.3/32
 ip address 10.180.1.34/32 secondary
 ip router ospf 1 area 0.0.0.0
 ip pim sparse-mode
 
LEAF-B
 
 interface loopback1
 description For VTEP
 ip address 10.180.1.4/32
 ip address 10.180.1.34/32 secondary
 ip router ospf 1 area 0.0.0.0
 ip pim sparse-mode

Next step is to map the L2 VNIs to the VLANs created earlier. I will be using VLANs 200-201 for data.

vlan 200
 vn-segment 10200
 vlan 201
 vn-segment 10201

Configure EVPN and create a route distinguisher and route target for the VXLAN segments. The RD can be defined manually our can be created automatically.

evpn
 vni 10200 l2
 rd 10200:100
 route-target import 10200:100
 route-target export 10200:100
 vni 10201 l2
 rd 10201:100
 route-target import 10201:100
 route-target export 10201:100

Next step is to create the nve interface (VTEP) and map the VXLAN segments to it. As you can see each VNI is linked to a multicast control group for BUM traffic. Notice I am using the suppress ARP feature as well as specifying what control plane protocol I will use. In order for ARP suppression to work you need to allocate resources for the ACL TCAM region arp-ether. If there are not enough resources available you will need to remove TCAM slices (unit of memory allocation) from something else. Changing this does require a reload as well.

interface nve1
 no shutdown
 source-interface loopback1
 host-reachability protocol bgp
 member vni 10200
 suppress-arp
 mcast-group 239.1.1.1
 member vni 10201
 suppress-arp
 mcast-group 239.1.1.1
LEAF-A(config)# hardware access-list tcam region arp-ether 256
ERROR: Aggregate TCAM region configuration exceeded the available 
Ingress TCAM slices. Please re-configure.

If you get an error like this you will have to remove TCAM slices from another feature. This can be done by specifying a “0” value on another feature.

LEAF-A(config)# hardware access-list tcam region vacl 0

Here you can see the state is up and the encapsualtion is VXLAN. Notice the vPC capability is VPC-VIP.

LEAF-A(config-if-nve-vni)# sh nve interface nve 1
 Interface: nve1, State: Up, encapsulation: VXLAN
 VPC Capability: VPC-VIP-Only [notified]
 Local Router MAC: d8b1.9076.9053
 Host Learning Mode: Control-Plane
 Source-Interface: loopback1 (primary: 10.180.1.3, secondary: 10.180.1.34)

Another command with similar output.

LEAF-A# sh nve internal platform interface nve 1
 Printing Interface ifindex 0x49000001
 |======|=========================|===============|===============|=====|=====|
 |Intf |State                   |PriIP         |SecIP         |Vnis |Peers|
 |======|=========================|===============|===============|=====|=====|
 |nve1 |UP                         |10.180.1.3     |10.180.1.34   |3   |     |
 |======|=========================|===============|===============|=====|=====|

Create the Layer 3 VNI. This VLAN is mapped to a VXLAN segment similar to the other VNIs. It is then associate with the tenant VRF our hosts will be in (each Tenant or VRF is mapped to one L3 VNI) and given a RD and RT.

vlan 500
 vn-segment 10500
vrf context Tenant-1
 rd 10500:100
 address-family ipv4 unicast
 route-target import 10500:100
 route-target import 10500:100 evpn
 route-target export 10500:100
 route-target export 10500:100 evpn
 vni 10500

On each leaf there also needs to be a L3 VNI SVI. This does not get an IP address but needs to be “no shut” and associated with the tenant L3 VRF created in the last step.

feature interface-vlan

interface vlan 500
no shut
vrf member Tenant-1
LEAF-A(config-if)# sh int br
 ---------------------------------------------------------
 Interface Secondary VLAN(Type)       Status Reason
 ---------------------------------------------------------
 Vlan500   --                           up     --

Next we go back to the VTEP or nve  and associate the interface with the L3 VNI just created.

int nve 1
 member vni 10500 associate-vrf

Here you can see the three VXLAN segments associated with the VTEP. Notice that 10500 is type L3 and in VRF Tenant-1. VNI 10200-10201 are type L2 and are part of bridge domain 200 and 201 respectively.

LEAF-A# sh nve vni control-plane
 Codes: CP - Control Plane DP - Data Plane
 UC - Unconfigured SA - Suppress ARP
 Interface VNI     Multicast-group State Mode Type[BD/VRF] Flags
 --------- -------- ----------------- ----- ---- ------------------
 nve1     10200   239.1.1.1         Up   CP   L2  [200]      SA
 nve1     10201   239.1.1.1         Up   CP   L2  [201]      SA
 nve1     10500   n/a               Up   CP   L3  [Tenant-1]

Setting up the Anycast gateway feature will be next. Each leaf node will have a 2 SVIs which will act as distributed Gateways for VXLAN routing. First, create a anycast gateway MAC address that will be shared across all leafs in the fabric. Each SVI will use this MAC address as well as being put into a the tenant VRF Tenant-1.

fabric forwarding anycast-gateway-mac 0E00.A123.B456
interface Vlan200
 no shutdown
  vrf member Tenant-1
 ip address 10.200.1.1/24
 fabric forwarding mode anycast-gateway
 !
 interface Vlan201
 no shutdown
  vrf member Tenant-1
 ip address 10.201.1.1/24
  fabric forwarding mode anycast-gateway

The next step will be to setup MP-BGP between the Leaf and Spine nodes. I already enabled BGP and the nv overlay feature in an earlier step. Since I am using iBGP I will set the Spine nodes up as Route Reflectors and create neighbor relationships with the 4 leaf nodes. The peering will be off of the management loopback address.

Here is Spine-As’ configuration. Spine-B will be similar except for the Router-ID. As you can see I specify the address family l2vpn evpn.

 router bgp 65001
 router-id 10.254.1.1
 address-family ipv4 unicast
 address-family l2vpn evpn
  retain route-target all
 template peer LEAF-PEERS
  remote-as 65001
  update-source loopback0
 address-family ipv4 unicast
  send-community both
 route-reflector-client
 address-family l2vpn evpn
 send-community both
 route-reflector-client
 neighbor 10.254.1.3
 inherit peer LEAF-PEERS
 neighbor 10.254.1.4
 inherit peer LEAF-PEERS
 neighbor 10.254.1.5
 inherit peer LEAF-PEERS
 neighbor 10.254.1.6
 inherit peer LEAF-PEERS

On the Leaf nodes the configuration will be slightly different. Each Leaf will peer with the 2 Spine nodes. All the Leaf nodes will have the same configuration except for the router IDs. Notice that advertisement of the extended community under the l2evpn address family for each neighbor. VRF Tenant-1 is also advertising l2evpn under the ipv4 AFI.

LEAF-A
 router bgp 65001
 router-id 10.254.1.3
 address-family ipv4 unicast
 address-family l2vpn evpn
 neighbor 10.254.1.1 remote-as 65001
 update-source loopback0
 address-family ipv4 unicast
 address-family l2vpn evpn
 send-community both
 neighbor 10.254.1.2 remote-as 65001
 update-source loopback0
 address-family ipv4 unicast
 address-family l2vpn evpn
 send-community both
 vrf Tenant-1
 address-family ipv4 unicast
 advertise l2vpn evpn

Verify BGP peering between the Leaf and Spine nodes

LEAF-A# sh bgp l2vpn evpn summ
 BGP summary information for VRF default, address family L2VPN EVPN
 BGP router identifier 10.254.1.3, local AS number 65001
 BGP table version is 6, L2VPN EVPN config peers 2, capable peers 2
 0 network entries and 0 paths using 0 bytes of memory
 BGP attribute entries [0/0], BGP AS path entries [0/0]
 BGP community entries [0/0], BGP clusterlist entries [0/0]
 Neighbor       V   AS MsgRcvd MsgSent   TblVer InQ OutQ Up/Down State/PfxRcd
 10.254.1.1     4 65001     10     12       6   0   0 00:01:33 0
 10.254.1.2     4 65001      9     11       6   0   0 00:00:27 0

Repeat this configuration on the rest of the Leaf nodes as appropriate. Once both vPC domains are setup and the control plane is working; on LEAF-A you see an overlay peering with vPC domain 51 at the shared VTEP address of 10.180.1.56

LEAF-A# sh nve peers
 Interface Peer-IP         State LearnType Uptime   Router-Mac
 --------- --------------- ----- --------- -------- --------------
 nve1     10.180.1.56       Up   CP       00:00:16  188b.9dac.2757

There are 2 routers southbound of the vPC domains as end hosts. Each one has a vPC port-channel to the vPC domain. I have configured interface vlan 200 on both devices to test L2 connectivity between the end hosts.

IP and MAC addresses on Host-B

Host-B# sh int vlan 200
 Vlan200 is up, line protocol is up
 Hardware is EtherSVI, address is 8c60.4f93.647c
 Internet Address is 10.200.1.12/24
 MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec

Ping between Host-A at 10.200.1.11  and Host-B at 10.200.1.12. You can see from the output I have not configured any default gateway on the devices.

Host-A# sh ip route
 10.200.1.0/24, ubest/mbest: 1/0, attached
 *via 10.200.1.11, Vlan200, [0/0], 00:08:49, direct
 10.200.1.11/32, ubest/mbest: 1/0, attached
 *via 10.200.1.11, Vlan200, [0/0], 00:08:49, local
Host-A# ping 10.200.1.12
 PING 10.200.1.12 (10.200.1.12): 56 data bytes
 36 bytes from 10.200.1.11: Destination Host Unreachable
 Request 0 timed out
 Request 1 timed out
 64 bytes from 10.200.1.12: icmp_seq=2 ttl=254 time=2.097 ms
 64 bytes from 10.200.1.12: icmp_seq=3 ttl=254 time=1.045 ms
 64 bytes from 10.200.1.12: icmp_seq=4 ttl=254 time=0.987 ms
 --- 10.200.1.12 ping statistics ---
 5 packets transmitted, 3 packets received, 40.00% packet loss
 round-trip min/avg/max = 0.987/1.376/2.097 ms

If we look at the MAC table we can see our local MAC is learned via Po1 towards our end host and the remote MAC from Host-B is learned via the VXLAN tunnel or nve 1. Since we are using arp suppression we can also see that this MAC entry is cached with the R flag for remote. When this MAC ages out on the end host, we will not need to flood again to learn it, the VTEP will proxy for us.

LEAF-A# sh mac address-table dynamic
 Legend:
 * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
 age - seconds since last seen,+ - primary entry using vPC Peer-Link,
 (T) - True, (F) - False
 VLAN     MAC Address     Type     age     Secure NTFY Ports
 ---------+-----------------+--------+---------+------+----+-----------
 * 200     8c60.4f93.5ffc   dynamic 0         F     F   Po1
 * 200     8c60.4f93.647c   dynamic 0         F     F   nve1(10.180.1.56)
LEAF-A# sh ip arp suppression-cache vlan 200
 Flags: + - Adjacencies synced via CFSoE
 L - Local Adjacency
 R - Remote Adjacency
 L2 - Learnt over L2 interface
 Ip Address     Age     Mac Address   Vlan Physical-ifindex   Flags
 10.200.1.11     00:04:06 8c60.4f93.5ffc 200 port-channel1       L
 10.200.1.12     00:15:47 8c60.4f93.647c 200 (null)             R

If you look at the control plane on LEAF-A for VNI-id 10200 which is mapped to VLAN 200, you can see the switch is learning the remote MAC address through the BGP.

LEAF-A# sh bgp l2vpn evpn vni-id 10200
 BGP routing table information for VRF default, address family L2VPN EVPN
 BGP table version is 15, local router ID is 10.254.1.3
 Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
 Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
 Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
 Network           Next Hop           Metric     LocPrf     Weight Path
 Route Distinguisher: 10200:100   (L2VNI 10200)
 *>l[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[0]:[0.0.0.0]/216
                   10.180.1.34                         100   32768 i
 * i[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[0]:[0.0.0.0]/216
                  10.180.1.56                       100          0 i
 *>i              10.180.1.56                       100          0 i
 *>l[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[32]:[10.200.1.11]/272
                  10.180.1.34                       100     32768 i
 * i[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[32]:[10.200.1.12]/272
                  10.180.1.56                       100          0 i
 *>i              10.180.1.56                       100          0 i

For inter VXLAN routing I am going to create an interface VLAN 201 on Host-B and route between the 2 end hosts. A default route to the anycast gateway address will be created as well for each device.

Host-B# sh int vlan 201
 Vlan201 is up, line protocol is up
 Hardware is EtherSVI, address is 8c60.4f93.647c
 Internet Address is 10.201.1.12/24
 MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec
Host-B# sh ip route
 0.0.0.0/0, ubest/mbest: 1/0
    *via 10.201.1.1, [1/0], 00:06:33, static
 10.201.1.0/24, ubest/mbest: 1/0, attached
 *via 10.201.1.12, Vlan201, [0/0], 00:06:46, direct
 10.201.1.12/32, ubest/mbest: 1/0, attached
 *via 10.201.1.12, Vlan201, [0/0], 00:06:46, local

HOST-A is still on VLAN 200 as before, but I added a default route to its anycast gateway as well.

If you look on LEAF-D you see the interface vlan 201 MAC and IP from the host being learned locally. It is being added to the local ARP cache as well as being advertised into the BGP control plane.

LEAF-D# sh mac address-table dynamic vlan 201
Legend: 
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC 
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False
 VLAN     MAC Address     Type     age     Secure NTFY  Ports
 ---------+-----------------+--------+---------+------+----+--------
 +  201   8c60.4f93.647c   dynamic 0         F     F     Po1
LEAF-D# sh ip arp suppression-cache vlan 201
 Flags: + - Adjacencies synced via CFSoE
 L - Local Adjacency
 R - Remote Adjacency
 L2 - Learnt over L2 interface
 Ip Address     Age     Mac Address   Vlan Physical-ifindex   Flags
 10.201.1.12     00:08:56 8c60.4f93.647c 201 port-channel1       L
LEAF-D# sh bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 10, local router ID is 10.254.1.6
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10200:100    (L2VNI 10200)
*>i[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[0]:[0.0.0.0]/216
                      10.180.1.34                       100          0 i
*>i[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[32]:[10.200.1.11]/272
                      10.180.1.34                       100          0 i
Route Distinguisher: 10201:100    (L2VNI 10201)
*>l[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[0]:[0.0.0.0]/216
                      10.180.1.56                       100      32768 i
*>l[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[32]:[10.201.1.12]/272
                      10.180.1.56                       100      32768 i
Route Distinguisher: 10500:100    (L3VNI 10500)
*>i[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[32]:[10.200.1.11]/272
                      10.180.1.34                       100          0 i

On LEAF-B you see that it has also learned the MAC/IP from the remote switches and the BGP best path is pointing to the vPC VTEP on switches C and D (vPC domain 51). The learned MAC address is put into the MAC table and installed in the arp suppression cache.

LEAF-B# sh bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 13, local router ID is 10.254.1.4
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10200:100    (L2VNI 10200)
*>l[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[0]:[0.0.0.0]/216
                      10.180.1.34                       100      32768 i
* i                   10.180.1.34                       100          0 i
*>l[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[32]:[10.200.1.11]/272
                      10.180.1.34                       100      32768 i
* i                   10.180.1.34                       100          0 i
Route Distinguisher: 10201:100    (L2VNI 10201)
*>i[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[0]:[0.0.0.0]/216
                      10.180.1.56                       100          0 i
*>i[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[32]:[10.201.1.12]/272
                      10.180.1.56                       100          0 i
Route Distinguisher: 10500:100    (L3VNI 10500)
*>i[2]:[0]:[0]:[48]:[8c60.4f93.5ffc]:[32]:[10.200.1.11]/272
                      10.180.1.34                       100          0 i
*>i[2]:[0]:[0]:[48]:[8c60.4f93.647c]:[32]:[10.201.1.12]/272
                      10.180.1.56                       100          0 i
LEAF-B# sh mac address-table dynamic vlan 201
 Legend:
 * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
 age - seconds since last seen,+ - primary entry using vPC Peer-Link,
 (T) - True, (F) - False
 VLAN     MAC Address     Type     age     Secure NTFY Ports
 ---------+-----------------+--------+---------+------+----+-------
 * 201     8c60.4f93.647c   dynamic 0         F     F   nve1(10.180.1.56)
LEAF-B# sh ip arp suppression-cache vlan 201
 Flags: + - Adjacencies synced via CFSoE
 L - Local Adjacency
 R - Remote Adjacency
 L2 - Learnt over L2 interface
 Ip Address     Age     Mac Address   Vlan Physical-ifindex   Flags
 10.201.1.12     00:14:18 8c60.4f93.647c 201 (null)             R

Host-A will ping with a source address in VLAN 200 to Host-B in VLAN 201. Since LEAF-A and B already have the MAC/IP addresses cached there are no dropped pings on the first ping.

Host-A# ping 10.201.1.12 source 10.200.1.11
 PING 10.201.1.12 (10.201.1.12) from 10.200.1.11: 56 data bytes
 64 bytes from 10.201.1.12: icmp_seq=0 ttl=252 time=1.092 ms
 64 bytes from 10.201.1.12: icmp_seq=1 ttl=252 time=0.816 ms
 64 bytes from 10.201.1.12: icmp_seq=2 ttl=252 time=1.17 ms
 64 bytes from 10.201.1.12: icmp_seq=3 ttl=252 time=0.799 ms
 64 bytes from 10.201.1.12: icmp_seq=4 ttl=252 time=0.768 ms

If you look at the routing tables, you can see how the L3 VNI segment 10500 (VLAN 500) comes into play for Inter VXLAN routing. The BGP routing table shows next hop is the remote VTEP address and LEAF-D is originating the route. When you recurse to the routing table you see that to route to 10.201.1.12 we have to go to the remote VTEP over the L3 VNI (10500).

LEAF-A# sh ip bgp vrf tenant-1 10.201.1.12
BGP routing table information for VRF Tenant-1, address family IPv4 Unicast
BGP routing table entry for 10.201.1.12/32, version 5
Paths: (1 available, best #1)
Flags: (0x08041a) on xmit-list, is in urib, is best urib route
  vpn: version 5, (0x100002) on xmit-list
  Advertised path-id 1, VPN AF advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported from unknown dest
  AS-Path: NONE, path sourced internal to AS
    10.180.1.56 (metric 9) from 10.254.1.2 (10.254.1.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10201 10500
      Extcommunity:  RT:10201:100 RT:10500:100 SOO:10.180.1.56:0 
      ENCAP:8 Router MAC:188b.9dac.2757 Originator: 10.254.1.6 
      Cluster list: 10.254.1.2
LEAF-A# sh ip route vrf tenant-1 10.201.1.12
IP Route Table for VRF "Tenant-1"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
10.201.1.12/32, ubest/mbest: 1/0
    *via 10.180.1.56%default, [200/0], 01:36:56, bgp-65001, internal, 
    tag 65001 (evpn)segid: 10500 tunnelid: 0xab40138 encap: 1

LEAF-A#sh ip route 10.180.1.56
 IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
10.180.1.56/32, ubest/mbest: 1/0
    *via 172.16.30.5, Eth1/3, [110/9], 01:37:01, ospf-1, intr

Routed traffic is encapsulated in the L3 VNI with a Destination MAC of the remote VTEPs router/system MAC. If you look on LEAF-A you can see an entry for LEAF-D’s router MAC in the FIB.

LEAF-D# sh nve interface nve 1
Interface: nve1, State: Up, encapsulation: VXLAN
 VPC Capability: VPC-VIP-Only [notified]
 Local Router MAC: 188b.9dac.2757
 Host Learning Mode: Control-Plane
 Source-Interface: loopback1 (primary: 10.180.1.6, secondary: 10.180.1.56)
LEAF-A# sh forwarding nve L3 peers
slot  1
=======
NVE cleanup transaction-id 0
tunnel_id  Peer_id  Peer_address   Interface       rmac      origin state del count
--------------------------------------------------------------------------------------
0xab40138    1        10.180.1.56    nve1     188b.9dac.2757 URIB       merge-done no     1
LEAF-A# sh l2route evpn mac all
Topology    Mac Address    Prod   Next Hop (s)
----------- -------------- ------ ---------------
200         8c60.4f93.5ffc Local  Ifindex 3690987
201         8c60.4f93.647c BGP    10.180.1.56
500         188b.9dac.2757 VXLAN  10.180.1.56
Advertisements

About matt pinizzotto

Matt Pinizzotto is a Network Consultant working in the Midwest and CCIE# 44694 (Data Center)
This entry was posted in Cisco, Data Center, EVPN, Nexus, VXLAN and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s