Trigger: Zebra core seen when we convert l2vni to l3vni and back
BackTrace:
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7f4af96989d9]
/usr/lib/frr/zebra(zebra_vxlan_if_vni_up+0x250) [0x5561022ae030]
/usr/lib/frr/zebra(netlink_vlan_change+0x2f4) [0x5561021fd354]
/usr/lib/frr/zebra(netlink_parse_info+0xff) [0x55610220d37f]
/usr/lib/frr/zebra(+0xc264a) [0x55610220d64a]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f4af967e96d]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f4af9637588]
/usr/lib/frr/zebra(main+0x402) [0x5561021f4d32]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f4af932624a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f4af9326305]
/usr/lib/frr/zebra(_start+0x21) [0x5561021f72f1]
Root Cause:
In working case,
- We get a RTM_NEWLINK whose ctx is enqueued by zebra dplane and
dequeued by zebra main and processed i.e.
(102000 is deleted from vxlan99) before we handle RTM_NEWVLAN.
- So in handling of NEWVLAN (vxlan99) we bail out since find with
vlan id 703 does not exist.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/working/nocras.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 23:09:33.741105 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 23:09:33.744061 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 23:09:33.767240 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 23:09:33.767380 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (73), result QUEUED
2024/07/18 23:09:33.767389 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for vxlan99(73)
2024/07/18 23:09:33.767404 ZEBRA: [TQR2A-H2RFY] Vlan-Vni(1186:1186-6000002:6000002) update for VxLAN IF vxlan99(73)
2024/07/18 23:09:33.767422 ZEBRA: [TP4VP-XZ627] Del L2-VNI 102000 intf vxlan99(73)
2024/07/18 23:09:33.767858 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 23:09:33.767866 ZEBRA: [KKZGZ-8PCDW] Cannot find VNI for VID (703) IF vxlan99 for vlan state update >>>>BAIL OUT
In failure case,
- The NEWVLAN is received first even before processing RTM_NEWLINK.
- Since the vxlan id 102000 is not removed from the vxlan99,
the find with vlan id 703 returns the 102000 one and we invoke
zebra_vxlan_if_vni_up where the interfaces don't match and assert.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/noworking/crash.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 22:26:43.829370 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 22:26:43.829646 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7fe07c026d30, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 22:26:43.853930 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 22:26:43.853949 ZEBRA: [K61WJ-XQQ3X] Intf vxlan99(73) L2-VNI 102000 is UP >>> VLAN PROCESSED BEFORE INTF EVENT
2024/07/18 22:26:43.853951 ZEBRA: [SPV50-BX2RP] RAJA zevpn_vxlanif vxlan48 and ifp vxlan99
2024/07/18 22:26:43.854005 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 22:26:43.854241 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=516, seq=0, pid=0
2024/07/18 22:26:43.854251 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=544, seq=0, pid=0
ZEBRA: in thread kernel_read scheduled from zebra/kernel_netlink.c:505 kernel_read()
Fix:
Similar to #13396, where link change
handling was offloaded to dplane, same is being done for vlan events.
Note: Prior to this change, zebra main thread was interested in the
RTNLGRP_BRVLAN. So all the kernel events pertaining to vlan was
handled by zebra main.
With this change change as well the handling of vlan events is still
with Zebra main. However we offload it via Dplane thread.
Ticket :#3878175
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
a) Move the reads of link and address information
into the dplane
b) Move the startup read of data into the dplane
as well.
c) Break up startup reading of the linux kernel data
into multiple phases. As that we have implied ordering
of data that must be read first and if the dplane has
taken over some data reading then we must delay initial
read-in of other data.
Fixes: #13288
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Ticket: 2674793
Testing Done: precommit, evpn-min and evpn-smoke
The problem in this case is whenever we are triggering ifdown
followed by ifup of bridge, we see that remote mac entries
are programmed with vlan-1 in the fdb from zebra and never cleaned up.
bridge has vlan_default_pvid 1 which means any port that gets added
will initially have vlan 1 which then gets deleted by ifupdown2 and
the proper vlan gets added.
The problem lies in zebra where we are not cleaning up the remote
macs during vlan change.
Fix is to uninstall the remote macs and then install them
during vlan change.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
This patch addresses following issues,
- When the VLAN-VNI mapping is configured via a map and not using
individual VXLAN interfaces, upon removal of a VNI ensure that the
remote FDB entries are uninstalled correctly.
- When VNI configuration is performed using VLAN-VNI mapping (i.e., without
individual VXLAN interfaces) and flooded traffic is handled via multicast,
the multicast group corresponding to the VNI needs to be explicitly read
from the bridge FDB. This is relevant in the case of netlink interface to
the kernel and for the scenario where a new VNI is provisioned or comes up.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This change set introduces data structure changes required for multiple vlan aware bridge
functionality. A new structure zebra_l2_bridge_if encapsulates the vlan to access_bd
association of the bridge. A vlan_table hash_table is used to record each instance
of the vlan to access_bd of the bridge via zebra_l2_bridge_vlan structure.
vxlan iftype derivation: netlink attribute IFLA_VXLAN_COLLECT_METADATA is used
to derive the iftype of the vxlan device. If the attribute is present, then the
vxlan interface is treated as single vxlan device, otherwise it would default to
traditional vxlan device.
zebra_vxlan_check_readd_vtep, zebra_vxlan_dp_network_mac_add/del is modified to
be vni aware.
mac_fdb_read_for_bridge - is modified to be (vlan, bridge) aware
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This changeset introduces the data structure changes needed for
single vxlan device functionality. A new struct zebra_vxlan_vni_info
encodes the iftype and vni information for vxlan device.
The change addresses related access changes of the new data structure
fields from different files
zebra_vty is modified to take care of the vni dump information according
to the new vni data structure for vxlan devices.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
Move a few things into places they actually belong, and reduce the
number of places we have `#ifdev HAVE_RTADV`. Just overall code
prettification.
... I had actually done this quite a while ago while doing some other
random hacking and thought it more useful to not be sitting on it on my
disk...
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
When using bgp evpn rt5 setup, after BGP configuration has been
loaded, if the user attempts to detach and reattach the bridged
vxlan interface from the bridge, then BGP loses its BGP EVPN
contexts, and a refresh of BGP configuration is necessary to
maintain consistency between linux configuration and BGP EVPN
contexts (RIB). The following command can lead to inconsistency:
ip netns exec cust1 ip link set dev vxlan1000 nomaster
ip netns exec cust1 ip link set dev vxlan1000 master br1000
consecutive to the, BGP l2vpn evpn RIB is empty, and the way to
solve this until now is to reconfigure EVPN like this:
vrf cust1
no vni 1000
vni 1000
exit-vrf
Actually, the link information is correctly handled. In fact,
at the time of link event, the lower link status of the bridge
interface was not yet up, thus preventing from establishing
BGP EVPN contexts. In fact, when a bridge interface does not
have any slave interface, the link status of the bridge interface
is down. That change of status comes a bit after, and is not
detected by slave interfaces, as this event is not intercepted.
This commit intercepts the bridge link up event, and triggers
a check on slaved vxlan interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when running bgp evpn rt5 setup, the Rmac sent in BGP updates
stands for the MAC address of the bridge interface. After
having loaded frr configuration, the Rmac address is not refreshed.
This issue can be easily reproduced by executing some commands:
ip netns exec cust1 ip link set dev br1000 address 2e🆎45:aa:bb:cc
Actually, the BGP EVPN contexts are kept unchanged.
That commit proposes to fix this by intercepting the mac address
change, and refreshing the vxlan interfaces attached to te bridge
interface that changed its MAC address.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zebra is able to get information about gre tunnels.
zebra_gre file is created to handle hooks, but is not yet used.
also, debug zebra gre command is done to add gre traces.
A zebra_gre file is used for complementary actions that may be needed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Local ethernet segments are held in a protodown or error-disabled state
if access to the VxLAN overlay is not ready -
1. When FRR comes up the local-ESs/access-port are kept protodown
for the startup-delay duration. During this time the underlay and
EVPN routes via it are expected to converge.
2. When all the uplinks/core-links attached to the underlay go down
the access-ports are similarly protodowned.
The ES-bond protodown state is propagated to each ES-bond member
and programmed in the dataplane/kernel (per-bond-member).
Configuring uplinks -
vtysh -c "conf t" vtysh -c "interface swp4" vtysh -c "evpn mh uplink"
Configuring startup delay -
vtysh -c "conf t" vtysh -c "evpn mh startup-delay 100"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
EVPN protodown display -
========================
root@torm-11:mgmt:~# vtysh -c "show evpn"
L2 VNIs: 10
L3 VNIs: 3
Advertise gateway mac-ip: No
Advertise svi mac-ip: No
Duplicate address detection: Disable
Detection max-moves 5, time 180
EVPN MH:
mac-holdtime: 60s, neigh-holdtime: 60s
startup-delay: 180s, start-delay-timer: 00:01:14 <<<<<<<<<<<<
uplink-cfg-cnt: 4, uplink-active-cnt: 4
protodown: startup-delay <<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond protodown display -
===========================
root@torm-11:mgmt:~# vtysh -c "show interface hostbond1"
Interface hostbond1 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 1 last: 2020/04/26 20:38:03.53
PTM status: disabled
vrf: default
OS Description: Local Node/s torm-11 and Ports swp5 <==> Remote Node/s hostd-11 and Ports swp1
index 58 metric 0 mtu 9152 speed 4294967295
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type bond
Master interface: bridge
EVPN-MH: ES id 1 ES sysmac 00:00:00:00:01:11
protodown: off rc: startup-delay <<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond member protodown display -
==================================
root@torm-11:mgmt:~# vtysh -c "show interface swp5"
Interface swp5 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 3 last: 2020/04/26 20:38:03.52
PTM status: disabled
vrf: default
index 7 metric 0 mtu 9152 speed 10000
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type Other
Master interface: hostbond1
protodown: on rc: startup-delay <<<<<<<<<<<<<<<<
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
when working with vrf netns backend, two bridges interfaces may have the
same bridge interface index, but not the same namespace. because in vrf
netns backend mode, a bridge slave always belong to the same network
namespace, then a check with the namespace id and the ns id of the
bridge interface permits to resolve correctly the interface pointer.
The problem could occur if a same index of two bridge interfaces can be
found on two different namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
an incoming bridge index has been found, that is linked with vxlan
interface, and the search for that bridge interface is done. In
vrf-lite, the search is done across the same default namespace, because
bridge and vxlan may not be in the same vrf. But this behaviour is wrong
when using vrf netns backend, as the bridge and the vxlan have to be in
the same vrf ( hence in the same network namespace). To comply with
that, use the netnamespace of the vxlan interface. Like that, the
appropriate nsid is passed as parameter, and consequently, the search is
correct, and the mac address passed to BGP will be ok too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
1. Local ethernet segments are configured in zebra by attaching a
local-es-id and sys-mac to a access interface -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
interface hostbond1
evpn mh es-id 1
evpn mh es-sys-mac 00:00:00:00:01:11
!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
This info is then sent to BGP and used for the generation of EAD-per-ES
routes.
2. Access VLANs associated with an (ES) access port are translated into
ES-EVI objects and sent to BGP. This is used by BGP for the
generation of EAD-EVI routes.
3. Remote ESs are imported by BGP and sent to zebra. A list of VTEPs
is maintained per-remote ES in zebra. This list is used for the creation
of the L2-NHG that is used for forwarding traffic.
4. MAC entries with a non-zero ESI destination use the L2-NHG associated
with the ESI for forwarding traffic over the VxLAN overlay.
Please see zebra_evpn_mh.h for the datastruct organization details.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
when working with vrf netns backend, two bridges interfaces may have the
same bridge interface index, but not the same namespace. because in vrf
netns backend mode, a bridge slave always belong to the same network
namespace, then a check with the namespace id and the ns id of the
bridge interface permits to resolve correctly the interface pointer.
The problem could occur if a same index of two bridge interfaces can be
found on two different namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
an incoming bridge index has been found, that is linked with vxlan
interface, and the search for that bridge interface is done. In
vrf-lite, the search is done across the same default namespace, because
bridge and vxlan may not be in the same vrf. But this behaviour is wrong
when using vrf netns backend, as the bridge and the vxlan have to be in
the same vrf ( hence in the same network namespace). To comply with
that, use the netnamespace of the vxlan interface. Like that, the
appropriate nsid is passed as parameter, and consequently, the search is
correct, and the mac address passed to BGP will be ok too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the link information of vxlan interface is populated in layer 2
information, as well as in layer 2 vxlan information. This information
will be used later to collect vnis that are in other network namespaces,
but where bgp evpn is enabled on main network namespaces, and those vnis
have the link information in that namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In if_netlink.c, when an interface structure, ifp, is first created,
its possible for the master to come up after the slave interface does.
This means, the slave interface has no way to display the master's ifname
in show outputs. To fix this, we need to allow creation by ifindex instead
of by ifname so that this issue is handled.
Signed-off-by: Dinesh G Dutt<5016467+ddutt@users.noreply.github.com>
The multicast group ip address for BUM traffic is configurable per-l2-vni.
One way to configure that is to setup a vxlan device that per-l2-vni and
specify the address against that vxlan device -
root@TORS1:~# vtysh -c "show interface vx-1000" |grep -i vxlan
Interface Type Vxlan
VxLAN Id 1000 VTEP IP: 27.0.0.15 Access VLAN Id 1000 Mcast 239.1.1.100
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep Mcast
Mcast group: 239.1.1.100
root@TORS1:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The interface type can be a bond or a bond slave, add some
code to note this and to display it as part of a show interface
command.
Signed-off-by: Dinesh Dutt <didutt@gmail.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This reverts commit c14777c6bf.
clang 5 is not widely available enough for people to indent with. This
is particularly problematic when rebasing/adjusting branches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Define interface types of interest and recognize the types. Store layer-2
information (VLAN Id, VNI etc.) for interfaces, process bridge interfaces
and map bridge members to bridge. Display all the additional information
to user (through "show interface").
Note: Only implemented for the netlink interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>