The FRRouting Protocol Suite
Find a file
Rajasekar Raja aa4786642c zebra: vlan to dplane Offload from main
Trigger: Zebra core seen when we convert l2vni to l3vni and back

BackTrace:
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7f4af96989d9]
/usr/lib/frr/zebra(zebra_vxlan_if_vni_up+0x250) [0x5561022ae030]
/usr/lib/frr/zebra(netlink_vlan_change+0x2f4) [0x5561021fd354]
/usr/lib/frr/zebra(netlink_parse_info+0xff) [0x55610220d37f]
/usr/lib/frr/zebra(+0xc264a) [0x55610220d64a]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f4af967e96d]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f4af9637588]
/usr/lib/frr/zebra(main+0x402) [0x5561021f4d32]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f4af932624a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f4af9326305]
/usr/lib/frr/zebra(_start+0x21) [0x5561021f72f1]

Root Cause:
In working case,
 - We get a RTM_NEWLINK whose ctx is enqueued by zebra dplane and
   dequeued by zebra main and processed i.e.
   (102000 is deleted from vxlan99) before we handle RTM_NEWVLAN.
 - So in handling of NEWVLAN (vxlan99) we bail out since find with
   vlan id 703 does not exist.

root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/working/nocras.log  | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 23:09:33.741105 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 23:09:33.744061 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 23:09:33.767240 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 23:09:33.767380 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (73), result QUEUED
2024/07/18 23:09:33.767389 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for vxlan99(73)
2024/07/18 23:09:33.767404 ZEBRA: [TQR2A-H2RFY] Vlan-Vni(1186:1186-6000002:6000002) update for VxLAN IF vxlan99(73)
2024/07/18 23:09:33.767422 ZEBRA: [TP4VP-XZ627] Del L2-VNI 102000 intf vxlan99(73)
2024/07/18 23:09:33.767858 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 23:09:33.767866 ZEBRA: [KKZGZ-8PCDW] Cannot find VNI for VID (703) IF vxlan99 for vlan state update >>>>BAIL OUT

In failure case,
 - The NEWVLAN is received first even before processing RTM_NEWLINK.
 - Since the vxlan id 102000 is not removed from the vxlan99,
   the find with vlan id 703 returns the 102000 one and we invoke
   zebra_vxlan_if_vni_up where the interfaces don't match and assert.

root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/noworking/crash.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 22:26:43.829370 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 22:26:43.829646 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7fe07c026d30, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 22:26:43.853930 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 22:26:43.853949 ZEBRA: [K61WJ-XQQ3X] Intf vxlan99(73) L2-VNI 102000 is UP >>> VLAN PROCESSED BEFORE INTF EVENT
2024/07/18 22:26:43.853951 ZEBRA: [SPV50-BX2RP] RAJA zevpn_vxlanif vxlan48 and ifp vxlan99
2024/07/18 22:26:43.854005 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 22:26:43.854241 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=516, seq=0, pid=0
2024/07/18 22:26:43.854251 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=544, seq=0, pid=0
ZEBRA: in thread kernel_read scheduled from zebra/kernel_netlink.c:505 kernel_read()

Fix:
Similar to #13396, where link change
handling was offloaded to dplane, same is being done for vlan events.

Note: Prior to this change, zebra main thread was interested in the
RTNLGRP_BRVLAN. So all the kernel events pertaining to vlan was
handled by zebra main.

With this change change as well the handling of vlan events is still
with Zebra main. However we offload it via Dplane thread.

Ticket :#3878175

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-09-26 20:17:35 -07:00
.github ci: do apt-get update before installing required modules 2024-06-08 17:29:19 -04:00
alpine docker: Set ABUILD_APK_INDEX_OPTS for frr build 2024-06-14 16:33:32 +03:00
babeld babeld: changes for code maintainability 2024-09-06 07:02:51 +05:30
bfdd bfdd: remove control socket obsolete code 2024-07-25 10:37:11 -03:00
bgpd Merge pull request #16913 from chiragshah6/evpn_dev4 2024-09-25 21:52:33 +03:00
debian debian: Add option to build pkg with grpc support 2024-06-20 12:14:48 +02:00
doc doc: Add user documentation for AutoRP CLI 2024-09-24 16:39:19 +00:00
docker docker: Set ABUILD_APK_INDEX_OPTS for libyang 2024-06-14 11:37:23 +03:00
eigrpd *: Modify agentx to be allowed to be called 2024-05-10 10:16:29 -04:00
fpm *: add XREF_SETUP() to libraries and utilites 2024-05-02 23:03:08 +02:00
gdb lib: add simplified native msg support 2023-12-26 08:34:56 -05:00
grpc build: throw in a few more XREF_SETUP 2024-05-09 18:02:49 +02:00
include build: include our own copy of if.h and dependencies 2024-04-26 17:11:53 +02:00
isisd Merge pull request #16855 from zhou-run/202409131731 2024-09-24 11:25:13 -04:00
ldpd ldpd: fix wrong gtsm count 2024-06-15 23:30:18 +08:00
lib Merge pull request #16909 from donaldsharp/help 2024-09-24 21:23:03 -05:00
m4 build: add recursion limit for AX_RECURSIVE_EVAL 2024-01-27 19:01:19 +01:00
mgmtd mgmtd: add ietf-yang-library support 2024-09-17 22:27:36 -04:00
mlag build: throw in a few more XREF_SETUP 2024-05-09 18:02:49 +02:00
nhrpd Merge pull request #16788 from LabNConsulting/jmuthii/nhrpd-retry-resolution 2024-09-24 11:07:21 -05:00
ospf6d Merge pull request #16908 from donaldsharp/ospf6_snmp_special 2024-09-24 13:39:57 -05:00
ospfclient tools, ospfclient: add a config option to skip installing python scripts 2024-08-22 13:46:30 -05:00
ospfd Merge pull request #16853 from Shbinging/no_ip_ospf_dead_interval_minimal 2024-09-24 10:03:29 -04:00
pathd *: Create termtable specific temp memory 2024-09-01 13:07:46 -04:00
pbrd lib: common debug status output 2024-08-27 09:53:02 -04:00
pceplib Merge pull request #15215 from donaldsharp/pceplib_fixup 2024-01-25 09:59:59 +02:00
pimd pimd: Fix coverity checked return issue 2024-09-25 13:56:30 +00:00
pkgsrc build: homologize path handling 2024-01-27 19:02:52 +01:00
python vtysh, lib: preprocess CLI graphs 2024-07-31 08:08:53 -04:00
qpb *: add XREF_SETUP() to libraries and utilites 2024-05-02 23:03:08 +02:00
redhat redhat: Add option to build pkg with grpc support 2024-06-20 12:14:56 +02:00
ripd ripd: fix show run output for distribute-list 2024-08-08 01:25:02 +03:00
ripngd ripngd: adjust header for display command 2024-07-05 09:31:39 +08:00
sharpd sharpd: Eliminate leaked list for locator-chunks 2024-08-08 14:24:59 -04:00
snapcraft bfdd: remove control socket obsolete code 2024-07-25 10:37:11 -03:00
staticd lib: common debug status output 2024-08-27 09:53:02 -04:00
tests tests: Addition of AutoRP Discovery uncovered broken PIM test 2024-09-24 16:40:57 +00:00
tools Merge pull request #16733 from gtataranni/fix/regex-string 2024-09-04 08:13:19 +03:00
vrrpd *: Create termtable specific temp memory 2024-09-01 13:07:46 -04:00
vtysh Merge pull request #16664 from mjstapp/igor_debug_simplify 2024-08-29 11:51:53 -04:00
watchfrr build: homologize path handling 2024-01-27 19:02:52 +01:00
yang pimd,yang: Implement AutoRP CLI and NB config path 2024-09-24 16:39:17 +00:00
zebra zebra: vlan to dplane Offload from main 2024-09-26 20:17:35 -07:00
.clang-format tools: Ignore ALIAS_* macros for clang-formatter 2024-07-17 08:10:13 +03:00
.dockerignore
.flake8 style: add format checker config that matches FRR style standards 2023-04-18 05:18:26 -04:00
.git-blame-ignore-revs tools: Add black formatting commit to .git-blame-ignore-revs 2024-04-28 12:50:51 +03:00
.gitignore python: add tool to expand typesafe definitions 2024-04-29 17:37:49 +02:00
.isort.cfg style: add format checker config that matches FRR style standards 2023-04-18 05:18:26 -04:00
.pylintrc style: add format checker config that matches FRR style standards 2023-04-18 05:18:26 -04:00
.travis.yml
bootstrap.sh
buildtest.sh build: update packaging & docs for dir changes 2024-01-27 19:01:19 +01:00
config.version.in
configure.ac Merge pull request #16610 from Jafaral/no-py 2024-08-27 10:38:09 -04:00
COPYING
Makefile.am build: homologize path handling 2024-01-27 19:02:52 +01:00
README.md
stamp-h.in
version.h

Icon

FRRouting

FRR is free software that implements and manages various IPv4 and IPv6 routing protocols. It runs on nearly all distributions of Linux and BSD and supports all modern CPU architectures.

FRR currently supports the following protocols:

  • BGP
  • OSPFv2
  • OSPFv3
  • RIPv1
  • RIPv2
  • RIPng
  • IS-IS
  • PIM-SM/MSDP
  • LDP
  • BFD
  • Babel
  • PBR
  • OpenFabric
  • VRRP
  • EIGRP (alpha)
  • NHRP (alpha)

Installation & Use

For source tarballs, see the releases page.

For Debian and its derivatives, use the APT repository at https://deb.frrouting.org/.

Instructions on building and installing from source for supported platforms may be found in the developer docs.

Once installed, please refer to the user guide for instructions on use.

Community

The FRRouting email list server is located here and offers the following public lists:

Topic List
Development dev@lists.frrouting.org
Users & Operators frog@lists.frrouting.org
Announcements announce@lists.frrouting.org

For chat, we currently use Slack. You can join by clicking the "Slack" link under the Participate section of our website.

Contributing

FRR maintains developer's documentation which contains the project workflow and expectations for contributors. Some technical documentation on project internals is also available.

We welcome and appreciate all contributions, no matter how small!

Security

To report security issues, please use our security mailing list:

security [at] lists.frrouting.org