Mirror/frr - Forgejo: Beyond coding. We Forge.

Mirror/frr

mirror of https://github.com/FRRouting/frr.git synced 2025-04-30 13:37:17 +02:00

Author	SHA1	Message	Date
Donald Sharp	f849511c47	Merge pull request #17935 from mjstapp/fix_nhg_hash_equal zebra: include resolving nexthops in nhg hash	2025-01-29 10:14:37 -05:00
Donald Sharp	fb8e399e4f	lib: Remove System routes from ip protocol route map choices Do not allow system routes to be selected for ip protocol Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-29 09:31:53 -05:00
Russ White	bd82864d03	Merge pull request #17941 from opensourcerouting/fix-dst-src static: fix botched staticd YANG conversion for dst-src	2025-01-28 12:23:06 -05:00
David Lamparter	2af780650f	lib, zebra: carry source prefix in route_notify When a daemon wants to know about its routes, make it possible to have that work for dst-src routes. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2025-01-28 15:40:17 +01:00
David Lamparter	1d341d461e	zebra: install dst-src routes without NHG The Linux kernel doesn't support dst-src routes with NHGs as nexthop, for some (rather dubious) caching reasons. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2025-01-28 11:10:31 +01:00
Mark Stapp	cb7cf73992	zebra: include resolving nexthops in nhg hash Ensure that the nhg hash comparison function includes all nexthops, including recursive-resolving nexthops. Signed-off-by: Mark Stapp <mjs@cisco.com>	2025-01-27 14:17:24 -05:00
Rafael Zalamena	28a9ca3405	lib,zebra: VRF table-direct support Implement the necessary data structures and code changes to support sending table-direct routes to protocols running in different VRFs. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2025-01-23 14:37:09 -03:00
Pooja Jagadeesh Doijode	8c6489bc56	zebra: Return error if v6 prefix is passed to show ip route Return error if IPv6 address or prefix is passed as an argument to "show ip route" command. UT: r1# show ip route 2::3/128 % Cannot specify IPv6 address/prefix for IPv4 table r1# r1# show ip route 2::3 % Cannot specify IPv6 address/prefix for IPv4 table r1# Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>	2025-01-22 10:09:03 -08:00
Donatas Abraitis	76ed8f61d8	Merge pull request #17814 from donaldsharp/nhg_removal_in_some_situations	2025-01-17 17:31:19 +02:00
Donald Sharp	19af3f3d7a	zebra: Ensure that changes to dg_update_list are protected by mutex The dg_update_list access is controlled by the dg_mutex in all other locations. Let's just add a mutex usage around the initialization of the dg_update_list even if it's part of the startup, just to keep things consistent. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-17 10:16:48 -05:00
Donald Sharp	4b96752737	zebra: Add some documentation on when zserv_open should be used Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-17 10:16:48 -05:00
Igor Ryzhov	300f8dbda4	lib: introduce global -w option for VRF netns backend Current -n option is only for zebra and mgmtd. All other daemons receive the VRF backend configuration from zebra upon connection to it. This leads to a potential race condition - daemons need to know the backend before they start reading their config, but they can be not connected to zebra yet at this point. As the VRF backend cannot change during runtime, let's introduce a new global -w option for setting netns backend, to make sure that all daemons know their VRF backend immediately after start. The reason for introducing a new option instead of making -n global is that ospfd already uses -n for another purposes. Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>	2025-01-15 23:38:27 +02:00
Igor Ryzhov	6f214d97d1	lib, zebra: move ns context intialization to zebra vrf->ns_ctxt is only ever used in zebra, so move its initialization to zebra's callback. Ideally this pointer shouldn't even be a part of library's vrf struct, and moved to zebra-specific struct, but this is the first step. Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>	2025-01-15 23:38:27 +02:00
Igor Ryzhov	4877f2f685	lib: remove VRF_BACKEND_UNKNOWN The backend type cannot be unknown. It is configured to VRF_LITE by default in zebra anyway, so just init to VRF_LITE in the lib and remove the UNKNOWN type. Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>	2025-01-15 23:38:27 +02:00
Donald Sharp	953d5fd526	Merge pull request #17799 from LabNConsulting/chopps/backend-yang-model mgmtd backend yang model (depends on #17796)	2025-01-15 10:22:11 -05:00
Donatas Abraitis	93ea9748cf	Merge pull request #17859 from donaldsharp/active_routes_are_active Active routes are active	2025-01-15 15:01:59 +02:00
Donald Sharp	ec6a000b0b	zebra: On Nexthop install failure don't set Installation failed Currently FRR when installing a nexthop group, the installation can fail. The assumption with the code was that the current nexthop group was not already installed. This leaves a problem state where if the users of the nexthop group are removed, the nexthop group will be removed possibly leaving a orphaned nexthop group in the data plane. FRR on a nexthop group installation does not actually know the status of the nexthop group in the kernel. It's possible that a earlier version of the nexthop group is left in play. It's possible that there is no nexthop group in the kernel at all. Leaving the Installed flag alone allows upon Zebra removing the nexthop group when it is removed from zebra. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-14 16:23:40 -05:00
Donald Sharp	b61424a717	zebra: Nexthops need to be ACTIVE in some cases Currently if you have an interface down event, Zebra sets the nexthop(s) as !ACTIVE that use it. On interface up events the singleton nexthops are not being set as ACTIVE. Due to timing events it is sometimes possible to end up with a route that is using a singleton Change singleton nexthops to set the nexthop to ACTIVE. This will allow the nexthop to be reinstalled appropriately as well. I was able to easily reproduce this using sharpd since it does not attempt to reinstall the routes when a interface goes up/down. Before: D>* 10.0.0.0/32 [150/0] via 192.168.102.34, dummy2, weight 1, 00:00:01 sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up D> 10.0.0.0/32 [150/0] (350) via 192.168.102.34, dummy2 inactive, weight 1, 00:00:10 After code change: D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:14 sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:21 Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-14 15:12:32 -05:00
Christian Hopps	5f2a927d7b	lib: northbound/mgmtd: add backend model support Signed-off-by: Christian Hopps <chopps@labn.net>	2025-01-14 18:48:59 +00:00
Donald Sharp	5f35096123	Merge pull request #17796 from LabNConsulting/chopps/datastore-notifications operational-state (datastore) change notifications	2025-01-14 13:47:28 -05:00
Donald Sharp	67da971218	Merge pull request #17581 from mjstapp/fix_fpm_netlink zebra: avoid race between FPM pthread and zebra main pthread in netlink encode/decode	2025-01-14 13:42:29 -05:00
Christian Hopps	80c6f98ea7	lib: if: track oper-state inline Signed-off-by: Christian Hopps <chopps@labn.net>	2025-01-13 23:40:52 -05:00
Christian Hopps	e64966876c	lib: vrf: track oper-state inline Signed-off-by: Christian Hopps <chopps@labn.net>	2025-01-13 23:40:52 -05:00
Rajasekar Raja	e77954e5d9	zebra: Optimize invoking nhg compare func In some cases, the old_re nhe and the newnhe is same and there is no point in comparing them both since they are the same. Skip comparing in such cases. Ex: 2025/01/09 23:49:27.489020 ZEBRA: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x555f611d30c0 (44[38/39/45]) 2025/01/09 23:49:27.489021 ZEBRA: [ZH3FQ-TE9NV] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 new id: 44 old id: 44 2025/01/09 23:49:27.489021 ZEBRA: [YB8HE-Z86GN] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 NEW 0x555f611d30c0 (44[38/39/45]) 2025/01/09 23:49:27.489023 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags 2025/01/09 23:49:27.489024 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags 2025/01/09 23:49:27.489025 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE 2025/01/09 23:49:27.489026 ZEBRA: [ZM3BX-HPETZ] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 OLD 0x555f611d30c0 (44[38/39/45]) 2025/01/09 23:49:27.489027 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags 2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags 2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>	2025-01-10 13:39:12 -08:00
Donald Sharp	4c166947a8	zebra: Uninstall NHG in some situations If you have this series of events: a) Decision to install a NHG is made in zebra, enqueue to DPLANE b) Changes to NHG are made and we remove it in the master pthread Since this NHG is not marked as installed it is not removed but the NHG data structure is deleted c) DPLANE installs the NHG In the end the NHG stays installed but ZEBRA has lost track of it. Modify the removal code to check to see if the NHG is queued. There are 2 cases: a) NHG is kept around for a bit before being deleted. In this case just see that the NHG is Queued and keep it around too. b) NHG is not kept around and we are just removing it. In this case check to see if it is queued and send another deletion event. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-10 09:35:35 -05:00
Donald Sharp	97fa24e70b	zebra: Fix leaked nhe During route processing in zebra, Zebra will create a nexthop group that matches the nexthops passed down from the routing protocol. Then Zebra will look to see if it can re-use a nhe from a previous version of the route entry( say a interface goes down ). If Zebra decides to re-use an nhe it was just dropping the route entry created. Which led to nexthop group's that had a refcount of 0 and in some cases these nexthop groups were installed into the kernel. Add a bit of code to see if the returned entry is not being used and it has no reference count and if so, properly dispose of it. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-09 12:34:50 -05:00
Russ White	7f2be9a595	Merge pull request #17474 from sougata-github-nvidia/rib_ip_protocol_cleanup zebra: Fix ip protocol route-map issue.	2025-01-07 08:45:07 -05:00
Sougata Barik	86b294698f	zebra: Fix ip protocol route-map issue. "ip/ipv6 protocol any route-map <route map>" cli is setting wrong route type ( ZEBRA_ROUTE_MAX ), It should set route type ZEBRA_ROUTE_ALL. Ticket: #4101560 Signed-off-by: Sougata Barik <sougatab@nvidia.com>	2025-01-06 17:02:21 +05:30
Jafar Al-Gharaibeh	8ca4c3d098	Merge pull request #17752 from raja-rajasekar/rajasekarr/comp_issue zebra: fix dpdk compilation error	2025-01-05 20:15:20 -06:00
Rajasekar Raja	eced678d34	zebra: fix dpdk compilation error Fixing compilation error in a switch statement case Fixes :aa4786642c9a65c282d0fd5247a35b0f14fa1c3c Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>	2025-01-03 11:13:02 -08:00
Donatas Abraitis	73fad72213	Merge pull request #17737 from chiragshah6/fdev7 zebra:check DAD freeze action before notifying bgp	2025-01-03 09:34:54 +02:00
Chirag Shah	5aa9c8652e	zebra:check DAD freeze action before notifying bgp If Duplicate Address Detection action is freeze (permanent or definite time means not warn only mode) then locally duplicate detected MAC delete notification is not require to inform, instead ask BGP to sync previous remote MAC entry. In freeze case local MAC event is not known to BGP, instead BGP is pointing to remote VTEP for the MAC. Ticket: #3652383 Issue: 3652383 Signed-off-by: Chirag Shah <chirag@nvidia.com>	2024-12-30 14:39:27 -08:00
Donald Sharp	54ec9f3888	zebra: Fix resetting valid flags for NHG dependents Upon if_down, we don't reset the valid flag for dependents and unset the INSTALLED flag. So when its time for the NHG to be deleted (routes dereferenced), zebra deletes it since refcnt goes to 0, but stale NHG remains in kernel. Ticket :#4200788 Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>	2024-12-30 08:40:44 -08:00
Carmine Scarpitta	13f3c7c679	zebra: Remove tests for `srv6_locator_alloc` failure `srv6_locator_alloc` can never fail. Let's remove the tests for allocation failure. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-12-22 09:01:59 +01:00
Russ White	c8ba5b09b3	Merge pull request #17544 from anlancs/zebra/fix-plug-interface zebra: fix wrong nexthop status for kernel routes	2024-12-17 11:16:32 -05:00
anlan_cs	4d2ac714f0	zebra: check kernel routes when interface becomes up Just like `link down`, check all kernel routes when interface become up. And, they maybe will be selected as the best one by zebra. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-12-17 16:14:30 +08:00
anlan_cs	298bc623e7	zebra: don't uninstall kernel routes After the nexthop check is fixed, zebra will wrongly uninstall the kernel routes with inactive nexthop. This commit would skip the uninstallation for kernel routes. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-12-17 16:14:30 +08:00
anlan_cs	b9538fe481	zebra: fix wrong nexthop check The kernel routes are wrongly selected even the nexthop interface is linkdown. Use `ip link set dev <interface> down` on the other box to set the box's nexthop interface linkdown. The kernel routes will be kept as `linkdown`, but are still with active nexthop in `zebra`. Add three changes/commits for kernel routes in this PR: 1) The active nexthop should be the operative interface. 2) Don't uninstall the kernel routes from `zebra` even no active nexthops. (It doesn't affect the kernel routes' deletion from kernel netlink messages.) 3) Update the kernel routes when the nexthop interface becomes up. Before: (during nexthop interface is linkdown) ``` K>* 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0, weight 1, 00:00:14 ``` After: (during nexthop interface is linkdown, with all three changes) ``` K 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0 inactive, weight 1, 00:00:07 ``` This commit is 1st change: Improve the judgment for "active" nexthop to be more accurate, the active nexthop should be the operative interface. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-12-17 16:14:30 +08:00
Rafael Zalamena	3bebb7be92	Merge pull request #17252 from nabahr/mcast-mode Fix PIMD RPF lookup mode and nexthop tracking	2024-12-16 09:57:31 -03:00
Donald Sharp	3a53b2dc4f	zebra: Give a bit more data about zclient connection on errors When debugging a crash I noticed that sometimes we talked about a zclient connection in relation to the fd associated with it and sometimes we did not. Let's just always give the data associated with the fd. It will make it a bit easier for me to follow the transitions. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-12-13 11:21:26 -05:00
Nathan Bahr	8983d24282	zebra: Improve multicast safi route show commands Add `mrib` flag to existing "show ip route" commands which then use the multicast safi rather than the unicast safi. Updated the vty output to include the AFI and SAFI string when printing the table. Deprecate `show ip rpf` command, aliased to `show ip route mrib`. Removed `show ip rpf A.B.C.D`. Signed-off-by: Nathan Bahr <nbahr@atcorp.com>	2024-12-12 13:50:31 +00:00
Nathan Bahr	bf8728dcf6	zebra,yang: Completely remove multicast mode from zebra Multicast mode belongs in PIM, so removing it completely from zebra. Modified `show (ip\|ipv6) rpf ADDRESS` to always lookup from SAFI_MULTICAST. This means this command is now specific to the multicast table and does not necessarily reflect the PIM RPF lookup, but that should be implemented in PIM instead. Signed-off-by: Nathan Bahr <nbahr@atcorp.com>	2024-12-12 13:50:31 +00:00
Nathan Bahr	4250eae00d	zebra,pimd,lib: Modify ZEBRA_NEXTHOP_LOOKUP_MRIB Modified ZEBRA_NEXTHOP_LOOKUP_MRIB to include the SAFI from which to do the lookup. This generalizes the API away from MRIB specifically and allows the user to decide how it should do lookups. Rename ZEBRA_NEXTHOP_LOOKUP_MRIB to ZEBRA_NEXTHOP_LOOKUP now that it is more generalized. This change is in preperation to remove multicast lookup mode completely from zebra. Signed-off-by: Nathan Bahr <nbahr@atcorp.com>	2024-12-12 13:50:31 +00:00
Donatas Abraitis	492750f8bc	Merge pull request #17638 from donaldsharp/zebra_metaq_stuff zebra: Remove tests for allocation failure	2024-12-12 11:10:31 +02:00
Donald Sharp	5d8bf74f0a	zebra: Remove tests for allocation failure This cannot happen. No need to test Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-12-11 11:43:48 -05:00
Donald Sharp	da7393b8fd	zebra: Fix another ships in the night issue with WFI Effectively When bgp would send a route update down to zebra and immediately after that a asic update from the kernel was read. Zebra would choose the asic update and drop the bgp update leaving us in a state where bgp was not used as the true source. Modify the code so that in rib_multipath_nhe we notice that we have an unprocessed route update from bgp. And if so just drop this kernel update about an older version of the route since it is no longer needed. Ticket: 2722533 Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-12-09 12:35:42 -05:00
Donald Sharp	b3facc23df	zebra: Reduce memory usage of streams for encoding packets For those packets that we are not sending 16k of data, but something far less than 256 bytes. Reduce those stream sizes we allocate to something much more reasonable. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-12-09 12:31:29 -05:00
vivek	e2b20dfb33	zebra: Reset MAC's remote sequence number appropriately When a MAC gets deleted but associated neighbors remain, the MAC is kept in the zebra MAC database as an internal ("auto") entry. When this happens, reset the MAC's remote sequence number. This ensures that when the host with the MAC later comes up behind a remote VTEP, the local switch accepts the MAC and installs it into the bridge FDB and we don't end up in a situation where remote MACs are not installed into the bridge FDB. This fix is a corollary of CM-22753 and is this time done for local MACs upon delete. Note: Commit is marked Cumulus-only because I need to evalute more comprehensive changes before upstreaming it. Ticket: CM-29581 Reviewed By: As above Testing Done: 1. Multiple rounds of manual testing 2. Two rounds of evpn-smoke, 1 round of precommit Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Acked-by: Chirag Shah <chirag@cumulusnetworks.com> Acked-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>	2024-12-09 12:29:38 -05:00
Donatas Abraitis	17a0d92ffd	Merge pull request #17589 from anlancs/master_up zebra: use macro for one check	2024-12-07 22:35:12 +02:00
Igor Ryzhov	e51c6dd256	zebra: add deprecation notice for no-op netns command Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>	2024-12-07 17:02:58 +02:00
Mark Stapp	9af5425a28	zebra: improve an rnode debug Improve a debug when we create a new rib_dest by calling the debug after setting up the dest. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-05 09:23:53 -05:00
Mark Stapp	079ee236ec	zebra: remove thread-unsafe debug from netlink nhg encoder Remove use of vrf name, which isn't thread-safe. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-05 09:23:53 -05:00
Mark Stapp	bea4aa7b5e	zebra: remove thread-unsafe debugs from netlink route encode The netlink route message encode function accessed and used zebra vrfs to produce debug output. That's not thread-safe, and that encode code needs to run in (at least) the dplane pthread and the FPM plugin pthread. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-05 09:23:53 -05:00
Mark Stapp	d06975f44f	zebra: remove unused dplane route api Remove a route api that's no longer used after refactoring the netlink and FPM code. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-05 09:23:53 -05:00
Mark Stapp	29122bc9b8	zebra: refactor netlink route message parsing Separate core netlink route message parsing into a new api that uses a dplane ctx to hold the parsed attribute data. Use the new api in two paths: the normal netlink update message parsing path, and in the FPM plugin, which also uses netlink encoding. The FPM route-notificatin code runs in its own pthread, and only needs a subset of the route info that zebra ordinarily develops. This change stops that pthread from accessing zebra's internal data, such as vrfs and ifps, that are not thread-safe. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-05 09:23:53 -05:00
anlan_cs	f536ca30f5	zebra: use macro for one check Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-12-05 21:20:05 +08:00
Mark Stapp	99ecf5ead0	zebra: add more dataplane route apis Add several additional route attribute data and accessors to the dplane module. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-04 15:04:54 -05:00
Mark Stapp	506097a1b9	zebra: separate zebra ZAPI server open and accept Separate zebra's ZAPI server socket handling into two phases: an early phase that opens the socket, and a later phase that starts listening for client connections. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-12-03 09:44:46 -05:00
Mark Stapp	8ef5282c8d	Merge pull request #17519 from chiragshah6/evpn_dev4 zebra: EVPN fix code style in vlan vni map debugs	2024-11-26 16:39:47 -05:00
Jafar Al-Gharaibeh	5c1154beaf	Merge pull request #16878 from donaldsharp/increased_test_cover Add some test cases, and some ability to see what is going on in zebra	2024-11-26 13:40:39 -06:00
Chirag Shah	887a0840f6	zebra: EVPN fix code style in vlan vni map debugs Fix up couple of style issues missed in PR 17483 Signed-off-by: Chirag Shah <chirag@nvidia.com>	2024-11-26 09:06:44 -08:00
Russ White	bcf6e53314	Merge pull request #17483 from chiragshah6/evpn_dev4 zebra: fix EVPN check vxlan oper up in vlan mapping	2024-11-26 11:48:01 -05:00
Mark Stapp	277784fe34	zebra: avoid a race during FPM dplane plugin shutdown During zebra shutdown, the main pthread and the FPM pthread can deadlock if the FPM pthread is in fpm_reconnect(). Each pthread tries to use event_cancel_async() to cancel tasks that may be scheduled for the other pthread - this leads to a deadlock as neither thread can progress. This adds an atomic boolean that's managed as each pthread enters and leaves the cleanup code in question, preventing the two threads from running into the deadlock. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-11-25 15:37:39 -05:00
Donald Sharp	a92854047b	zebra: Remove some unused functions on linux build The functions: if_get_flags if_flags_update if_flags_mangle are never invoked from a linux netlink build. Put a #ifdef around those functions so that they are not included on the linux build as that they are not needed there. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-25 13:12:10 -05:00
Donald Sharp	069dff269e	zebra: Add ability to know if some config is set For interface config: shutdown mpls multicast These states were never being shown in output, let's show it. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-25 13:12:10 -05:00
Chirag Shah	866148ef1b	zebra: add debug in remote vtep install ifp not up Ticket: #4139506 Signed-off-by: Chirag Shah <chirag@nvidia.com>	2024-11-25 09:00:03 -08:00
Chirag Shah	97538158ba	zebra: EVPN add debug trace for HREP entry Ticket: #4139506 Signed-off-by: Chirag Shah <chirag@nvidia.com>	2024-11-25 09:00:03 -08:00
Chirag Shah	adae8192d1	zebra: EVPN check vxlan oper up in vlan mapping When VLAN-VNI mapping is updated, do not set the L2VNI up event if the associated VXLAN device is not up. This may result in bgp synced remote routes to skip installing in Zebra and onwards (Kernel). Ticket: #4139506 Signed-off-by: Chirag Shah <chirag@nvidia.com>	2024-11-25 09:00:03 -08:00
Donald Sharp	cb6f7b153e	lib, zebra: Do not have duplicate memory type problems In zebra_mpls.c it has a usage of MTYPE_NH_LABEL which is defined in both lib/nexthop.c and zebra/zebra_mpls.c. The usage in zebra_mpls.c is a realloc. This leads to a crash: (gdb) bt 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x0000730a1b442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x0000730a1b94fb18 in core_handler (signo=6, siginfo=0x7ffeed1e07b0, context=0x7ffeed1e0680) at lib/sigevent.c:268 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=126487246404032) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=126487246404032, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x0000730a1b442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x0000730a1b4287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x0000730a1b9984f5 in _zlog_assert_failed (xref=0x730a1ba59480 <_xref.16>, extra=0x0) at lib/zlog.c:789 12 0x0000730a1b8f8908 in mt_count_free (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80) at lib/memory.c:74 13 0x0000730a1b8f8a59 in qrealloc (mt=0x576e0edda520 <MTYPE_NH_LABEL>, ptr=0x576e36617b80, size=16) at lib/memory.c:112 14 0x0000576e0ec85e2e in nhlfe_out_label_update (nhlfe=0x576e368895f0, nh_label=0x576e3660e9b0) at zebra/zebra_mpls.c:1462 15 0x0000576e0ec833ff in lsp_install (zvrf=0x576e3655fb50, label=17, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:224 16 0x0000576e0ec87c34 in zebra_mpls_lsp_install (zvrf=0x576e3655fb50, rn=0x576e366197c0, re=0x576e3660a590) at zebra/zebra_mpls.c:2215 17 0x0000576e0ecbb427 in rib_process_update_fib (zvrf=0x576e3655fb50, rn=0x576e366197c0, old=0x576e36619660, new=0x576e3660a590) at zebra/zebra_rib.c:1084 18 0x0000576e0ecbc230 in rib_process (rn=0x576e366197c0) at zebra/zebra_rib.c:1480 19 0x0000576e0ecbee04 in process_subq_route (lnode=0x576e368e0270, qindex=8 '\b') at zebra/zebra_rib.c:2661 20 0x0000576e0ecc0711 in process_subq (subq=0x576e3653fc80, qindex=META_QUEUE_BGP) at zebra/zebra_rib.c:3226 21 0x0000576e0ecc07f9 in meta_queue_process (dummy=0x576e3653fae0, data=0x576e3653fb80) at zebra/zebra_rib.c:3265 22 0x0000730a1b97d2a9 in work_queue_run (thread=0x7ffeed1e3f30) at lib/workqueue.c:282 23 0x0000730a1b96b039 in event_call (thread=0x7ffeed1e3f30) at lib/event.c:1996 24 0x0000730a1b8e4d2d in frr_run (master=0x576e36277e10) at lib/libfrr.c:1232 25 0x0000576e0ec35ca9 in main (argc=7, argv=0x7ffeed1e4208) at zebra/main.c:536 Clearly replacing a label stack is an operation that should be owned by lib/nexthop.c. So lets move this function into there and have zebra_mpls.c just call the function to replace the label stack. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-22 11:02:15 -05:00
Donald Sharp	8a71bf9341	zebra: Put debug guards in zebra_vxlan.c Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-14 16:08:47 -05:00
Donald Sharp	d580af8394	zebra: zebra_vxlan.c assert on dev escape problem Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-14 16:03:14 -05:00
Donald Sharp	922489a8d6	zebra: Missed debug guard in zebra_evpn.c Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-14 16:03:14 -05:00
Mark Stapp	aee85f7c6a	zebra: fix unguarded debug in evpn code Guard a debug in the evpn code. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-11-13 13:49:28 -05:00
Donald Sharp	ac6314d380	Merge pull request #17297 from mjstapp/mjs_ifp_table zebra, lib: use internal rbtree for per-NS tree of ifps	2024-11-12 15:12:07 -05:00
Russ White	fe20f83286	Merge pull request #17326 from anlancs/fix/zebra-no-ifp-down zebra: fix missing kernel routes	2024-11-05 10:20:36 -05:00
Donald Sharp	e88cbd65dd	zebra: Remove large indentation level in do_show_route_helper CI is complaining about the large level of indentation. Make it a bit better. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-04 13:02:36 -05:00
Donald Sharp	f51d2a6b97	zebra: Don't display the vrf if not using namespace based vrfs Currently when doing a `show ip route table XXXX`, zebra is displaying the current default vrf as the vrf we are in. We are displaying a table not a vrf. This is only true if you are not using namespace based vrf's, so modify the output to display accordingly. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-04 12:14:47 -05:00
Mark Stapp	960462aade	Merge pull request #16960 from donaldsharp/zebra_nhg_startup_issue zebra: On startup actually allow for nhe's to be early	2024-11-04 11:49:30 -05:00
Carmine Scarpitta	afd9d3f924	zebra: Fix wrong debug macro in `release_srv6_sid_func_dynamic` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:45:03 +01:00
Carmine Scarpitta	69b1acf4e3	zebra: Fix wrong debug macro in `release_srv6_sid_func_explicit` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:44:41 +01:00
Carmine Scarpitta	d1810d5c7f	zebra: Fix wrong debug macro in `alloc_srv6_sid_func_dynamic` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:44:23 +01:00
Carmine Scarpitta	58fd136f44	zebra: Fix wrong debug macro in `alloc_srv6_sid_func_explicit` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:43:55 +01:00
Carmine Scarpitta	a56e790b07	zebra: Fix wrong debug macro in `release_srv6_sid_func_dynamic` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:43:38 +01:00
Carmine Scarpitta	8e96fcece2	zebra: Fix wrong debug macro in `release_srv6_sid_func_explicit` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:43:17 +01:00
Carmine Scarpitta	4710baa7bb	zebra: Fix wrong debug macro in `alloc_srv6_sid_func_dynamic` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:42:58 +01:00
Carmine Scarpitta	973c4750e1	zebra: Fix wrong debug macro in `alloc_srv6_sid_func_explicit` `ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not. The correct macro is `IS_ZEBRA_DEBUG_SRV6`. Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`. Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>	2024-11-03 08:42:36 +01:00
Donald Sharp	9e74dda819	zebra: Delay some processing until after startup is finished Currently zebra starts the graceful restart timer as well as allows connections from clients before all data is read in from the kernel as well as the possiblity of allowing client connections before this happens as well. Let's move the graceful restart timer start till after this is done as well as not allowing client connections till then as well. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-01 14:43:50 -04:00
Jafar Al-Gharaibeh	248ee22b9d	Merge pull request #17230 from donaldsharp/clang_19_some_more Clang 19 some more	2024-11-01 09:02:31 -05:00
Donald Sharp	66feece071	Merge pull request #17281 from nabahr/mrib-import Add support to import alternate URIB tables into the main MRIB	2024-10-31 13:28:57 -04:00
anlan_cs	44a82da405	zebra: fix missing kernel routes The `rib_update_handle_kernel_route_down_possibility()` didn't consider the kernel routes ( blackhole ) without interface. When some other interfaces are down, these kernel routes will be wrongly removed. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-10-31 22:45:16 +08:00
Donatas Abraitis	25ae643996	zebra: Add missing new line for help string ``` -A, --asic-offload FRR is interacting with an asic underneath the linux kernel --v6-with-v4-nexthops Underlying dataplane supports v6 routes with v4 nexthops -s, --nl-bufsize Set netlink receive buffer size ``` Fixes: `1f5611c06d` ("zebra: Allow zebra cli to accept v6 routes with v4 nexthops") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-10-31 10:47:48 +02:00
Nathan Bahr	be2a7ed6af	zebra: Add ability to import alternate tables into the MRIB Expanded the cli command to include an mrib flag for importing to the main table MRIB instead of the main table URIB. Piped through specifying the safi through the import table functions rather than hardcoding to SAFI_UNICAST. Import still only import routes from the URIB subtable, only added the ability to import into the main table MRIB. Signed-off-by: Nathan Bahr <nbahr@atcorp.com>	2024-10-29 20:17:59 +00:00
Donald Sharp	5f6200d334	zebra: Deconfuse clang-sa about possible NULL pointer Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-29 16:05:15 -04:00
Mark Stapp	b7263c0548	zebra: remove if_table from the zebra NS Finish removing the if_table from the zebra NS struct. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-29 13:49:43 -04:00
Mark Stapp	8353cf5d4c	zebra: use new per-NS ifp iterators in vxlan code Replace use of the old if_table with the new per-NS ifp iterator apis in the zebra vxlan code. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-29 13:49:43 -04:00
Mark Stapp	c1160538ea	lib,zebra: remove table node from ifp struct Finish removing the table route_node from the ifp struct. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-29 13:49:43 -04:00
Donatas Abraitis	60f77de79d	Merge pull request #17257 from pguibert6WIND/srv6_debug zebra: add 'debug zebra srv6' command	2024-10-29 10:32:20 +02:00
Donald Sharp	3bff65abc7	zebra: When installing a mroute, allow it to flow Currently the mroute code was not allowing the mroute to be sent to the dataplane. This leaves us with a situation where the routes being installed where never being set as installed and additionally nht against the mrib would not work if the route came into existence after the nexthop tracking was asked for. Turns out all the pieces where there to let this work. Modify the code to pass it to the dplane and to send it back up as having worked. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-28 15:02:39 -04:00
Mark Stapp	2b05d34a00	zebra: use new per-NS iteration in zebra_evpn Use the new per-NS interface iteration apis in the evpn module. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-28 14:54:06 -04:00
Mark Stapp	d349877aa5	zebra: removing use of per-NS if_table Remove use of the per-NS if_table from zebra/interface module. Use new add, lookup, and iteration apis. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-28 14:54:06 -04:00
Mark Stapp	2c5c679e93	zebra: use new per-NS interface iteration Replace use of the old if_table with the new per-NS ifp iteration apis. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-28 14:54:06 -04:00
Mark Stapp	5af72e95b0	zebra: add new per-NS tree of interfaces Add new per-NS interface typerb tree, using external linkage, to replace the use of the if_table table. Add apis to iterate the per-NS collection - which is not public. Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-28 14:54:06 -04:00
Donald Sharp	811168ecc3	zebra: Add safi to some debugs Trying to figure out what safi we are talking about is fun when it is not put into the debugs. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-28 14:29:31 -04:00
Philippe Guibert	a7fec9c387	zebra: add 'debug zebra srv6' command Add a specific debug command to handle srv6 troubleshooting. Move the srv6 traces that initially were under 'debug zebra packet' debug. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2024-10-28 16:40:33 +01:00
Mark Stapp	1f8d81335d	zebra: make a zif MTYPE internal/static Make an MTYPE used in zifs internal/static Signed-off-by: Mark Stapp <mjs@cisco.com>	2024-10-28 11:09:52 -04:00
Jafar Al-Gharaibeh	f11421d4ec	Merge pull request #17160 from opensourcerouting/fix/keep_zebra_on-rib-process_in_frr.conf lib, zebra: Keep `zebra on-rib-process script` in frr.conf	2024-10-27 18:23:36 -05:00
Donald Sharp	138935a5fd	bgpd: Fix wrong pthread event cancelling 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470 14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355 15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610 16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152 17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639 18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111 19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252 20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214 21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180 22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) f 12 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 1428 assert(m->owner == pthread_self()); In this decode the attempt to cancel the connection's events from the wrong thread is causing the crash. Modify the code to create an event on the bm->master to cancel the events for the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-24 21:01:26 -04:00
Donatas Abraitis	91e157f3ae	Merge pull request #17162 from louis-6wind/fix-bh-nh-vrf zebra: fix showing nexthop vrf for ipv6 blackhole	2024-10-23 17:34:44 +03:00
Jafar Al-Gharaibeh	0078472e19	Merge pull request #17180 from anlancs/zebra/review-move-dplane zebra: drop NEWLINK event handling in the main thread	2024-10-22 10:29:49 -05:00
anlan_cs	96192f6aee	zebra: drop NEWLINK event handling in the main thread NEWLINK is only registered by the dplane thread, the main thread doesn't care about it. So remove the real process of `netlink_link_change()` for NEWLINK event in main thread. And move NEWLINK/DELLINK event to the block where the dplane messages are kept together. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-10-22 09:05:00 +08:00
anlan_cs	5829fea1b5	zebra: remove useless code Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-10-19 13:32:53 +08:00
Louis Scalbert	6cdc82b21b	zebra: fix showing nexthop vrf for ipv6 blackhole For some reasons the Linux kernel associates the ipv6 blackhole of non default table the lo interface. > root@r1# ip -6 route show table 100 > root@r1# ip -6 route add unreachable default metric 4278198272 table 100 > root@r1# ip -6 route show table 100 > unreachable default dev lo metric 4278198272 pref medium As a consequence, the VRF default that owns the lo interface is shown as the nexthop VRF: > r1# show ipv6 route table 20 > Table 20: > K>* ::/0 [255/8192] unreachable (ICMP unreachable) (vrf default), 00:18:12 Do not display the nexthop VRF of a blackhole. It does not make sense for a blackhole and it was not displayed in the past. Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>	2024-10-18 14:45:50 +02:00
Donatas Abraitis	1fe1f8d87c	lib, zebra: Keep `zebra on-rib-process script` in frr.conf After the change: ``` $ grep on-rib-process /etc/frr/frr.conf zebra on-rib-process script script4 $ systemctl restart frr $ vtysh -c 'show run' \| grep on-rib-process zebra on-rib-process script script4 ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-10-18 15:36:52 +03:00
Donald Sharp	5a2a9e3b89	zebra: Fix possible null deref discovered by coverity Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-17 07:42:47 -04:00
Donald Sharp	466efab870	Merge pull request #17136 from opensourcerouting/clang-sa-19 *: fix clang-19 SA	2024-10-17 07:38:28 -04:00
Donatas Abraitis	1ce225d7e4	Merge pull request #17076 from donaldsharp/rnh_and_redistribution_nexthop_num_fix *: Fix up improper handling of nexthops for nexthop tracking	2024-10-16 16:34:08 +03:00
Donald Sharp	cc63dbb68f	Merge pull request #17020 from pguibert6WIND/asan_shutdown zebra: fix heap-use-after free on ns shutdown	2024-10-16 09:15:06 -04:00
David Lamparter	e6cb1a90f2	zebra: check `dirfd()` result `dirfd()` can theoretically return an error. Call it once and check the result. clang-SA: technically correct™. Ain't that the best kind of correct? Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:30:25 +02:00
David Lamparter	67b0a457ed	zebra: don't misappropriate `errno` `errno` has its own semantics. Sometimes it is correct to write to it. This is not one of those cases - just use a separate `nl_errno`. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:30:25 +02:00
David Lamparter	1350f8d1c1	zebra: don't try to read past EOF `FILE *` objects are theoretically in an invalid state if you try to use them past their reporting EOF. Adjust the code to make it correct. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:30:25 +02:00
David Lamparter	c071b4370d	*: clang-SA switch-enum initializer workarounds In these cases the value assigned by the switch block is used directly rather than returned. Mark the initial/default value as used so clang-SA doesn't complain about it. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:30:25 +02:00
David Lamparter	49cf311d46	*: clang-SA friendly switch-enum-return-string clang-19's SA complains about unused initializers for this kind of "switch (enum) { return string }" kind of code. Use direct string return values to avoid the issue. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:00:11 +02:00
Donatas Abraitis	c32bdc2469	Merge pull request #17116 from enkechen-panw/zfix-2 zebra: unlock node only after operation in zebra_free_rnh()	2024-10-16 08:12:28 +03:00
Jafar Al-Gharaibeh	b2eaf86fb5	Merge pull request #15586 from donaldsharp/nht_explain_doc zebra: Attempt to explain the rnh tracking code better	2024-10-15 14:25:35 -05:00
Jafar Al-Gharaibeh	b23bbb885a	Merge pull request #17088 from donaldsharp/connected_kernel_fun zebra: Prevent a kernel route from being there when a connected should	2024-10-15 14:04:51 -05:00
Enke Chen	5b6ff51b8a	zebra: unlock node only after operation in zebra_free_rnh() Move route_unlock_node() after rnh_list_del(). Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>	2024-10-15 10:25:46 -07:00
Donald Sharp	28237d73ad	zebra: Attempt to explain the rnh tracking code better I got asked today what was going on in the rnh code. I had to take time off of what I was doing and rewrap my head around this code, since it's been a long time. As that this question may come up again in the future I am trying to document this better so that someone coming behind us will be able to read this and get a better idea of what the algorithm is attempting to do. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-15 12:42:17 -04:00
Donald Sharp	645a9e4f83	*: Fix up improper handling of nexthops for nexthop tracking Currently FRR needs to send a uint16_t value for the number of nexthops as well it needs the ability to properly decode all of this. Find and handle all the places that this happens. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-15 11:57:23 -04:00
Mark Stapp	6c1bc51bbb	Merge pull request #16737 from raja-rajasekar/rajasekarr/vlan_to_dplane zebra: vlan to dplane	2024-10-15 08:06:34 -04:00
Donald Sharp	74e25198e7	zebra: Prevent a kernel route from being there when a connected should There exists a series of events where a kernel route is learned first( that happens to be exactly what a connected route should be ) and FRR ends up with both a kernel route and a connected route, leaving us in a very strange spot. This code change just mirrors the existing code of if there is a connected route drop the kernel route. Here we just do the reverse, if we have a kernel route already and a connected should be created, remove the kernel and keep the connected. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-14 11:27:53 -04:00
Donatas Abraitis	d1433ee9a8	Merge pull request #17062 from donaldsharp/dplane_fpm_nl_problems zebra: Only notify dplane work pthread when needed	2024-10-14 08:14:34 +03:00
anlan_cs	05e2472de7	zebra: add back one field for debug The `flags` field is removed recently, so add back it for debug. Signed-off-by: anlan_cs <anlan_cs@126.com>	2024-10-13 21:30:46 +08:00
Donald Sharp	8aa97a439f	zebra: Slow down fpm_process_queue When the fpm_process_queue has run out of space but has written to the fpm output buffer, schedule it to wake up immediately, as that the write will go out pretty much immediately, since it was scheduled first. If the fpm_process_queue has not written to the output buffer then delay the processing by 10 milliseconds to allow a possibly backed up write processing to have a chance to complete it's work. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-11 09:37:37 -04:00
Donald Sharp	963792e8c5	zebra: Only notify dplane work pthread when needed The fpm_nl_process function was getting the count of the total number of ctx's processed. This leads to after having processed 1 context to always signal the dataplane that there is work to do. Change the code to only notify the dplane worker when a context was actually added to the outgoing context queue. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-11 09:37:37 -04:00
Donald Sharp	154a89bc31	zebra: Fix crash in pw code Recent PR #17009 introduced a crash in pw handing for deletion. Let's fix that problem. Fixes: #17041 Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-09 07:17:29 -04:00
Philippe Guibert	7ae70eb5ef	zebra: fix heap-use-after free on ns shutdown The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > #11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > #12 0x7f26f275bae4 in route_table_free lib/table.c:111 > #13 0x7f26f275b749 in route_table_finish lib/table.c:46 > #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > #16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > #17 0x7f26f2777108 in event_call lib/event.c:2011 > #18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > #19 0x55910c4f49cb in main zebra/main.c:531 > #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2024-10-08 22:25:55 +02:00
Russ White	b8c458622d	Merge pull request #17023 from donaldsharp/dplane_problems zebra: Allow dplane to pass larger number of nexthops down to dataplane	2024-10-08 11:45:27 -04:00
Donald Sharp	9f8968fc5a	*: Allow 16 bit size for nexthops Currently FRR is limiting the nexthop count to a uint8_t not a uint16_t. This leads to issues when the nexthop count is 256 which results in the count to overflow to 0 causing problems in the code. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-08 09:26:57 -04:00
Donald Sharp	a8af2b2a9d	zebra: Do not retry in 30 seconds on pw reachability failure Currently the zebra pw code has setup a retry to install the pw after 30 seconds when it is decided that reachability to the pw is gone. This causes a failure mode where the pw code just goes and re-installs the pw after 30 seconds in the non-reachability case. Instead it should just be reinstalling after reachability is restored. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-07 20:36:45 -04:00
Donald Sharp	f50b1f7c22	zebra: Move pw status settting until after we get results Currently the pw code sets the status of the pw for install and uninstall immediately when notifying the dplane. This is incorrect in that we do not actually know the status at this point in time. When we get the result is when to set the status. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-07 20:36:45 -04:00
Donatas Abraitis	ded59bcc72	Merge pull request #17013 from dksharp5/removal_functions Removal functions	2024-10-07 11:47:01 +03:00
Donna Sharp	f62dfc5d53	lib,zebra: remove unused ZEBRA_VRF_UNREGISTER Signed-off-by: Donna Sharp <dksharp5@gmail.com>	2024-10-06 19:40:49 -04:00
Donna Sharp	103f24485c	zebra: remove unsued function from tc_netlink.c Signed-off-by: Donna Sharp <dksharp5@gmail.com>	2024-10-06 19:30:56 -04:00
Donna Sharp	7a63799a84	zebra: remove unused function from if_netlink.c Signed-off-by: Donna Sharp <dksharp5@gmail.com>	2024-10-06 19:25:44 -04:00
Donna Sharp	b6dd4ff8bc	zebra: remove unused function from tc_netlink.c Signed-off-by: Donna Sharp <dksharp5@gmail.com>	2024-10-06 19:08:44 -04:00
Donna Sharp	8eb5f4f506	zebra: remove unused function rib_lookup_ipv4 Signed-off-by: Donna Sharp <dksharp5@gmail.com>	2024-10-06 18:53:11 -04:00
Russ White	15991e1a08	Merge pull request #16800 from donaldsharp/nhg_reuse_intf_down_up Nhg reuse intf down up	2024-10-04 10:28:58 -04:00
Igor Zhukov	a3877e4444	zebra: Fix crash during reconnect fpm_enqueue_rmac_table expects an fpm_rmac_arg* as its argument. The issue can be reproduced by dropping the TCP session using: ss -K dst 127.0.0.1 dport = 2620 I used Fedora 40 and frr 9.1.2 and I got the gdb backtrace: (gdb) bt 0 0x00007fdd7d6997ea in fpm_enqueue_rmac_table (bucket=0x2134dd0, arg=0x2132b60) at zebra/dplane_fpm_nl.c:1217 1 0x00007fdd7dd1560d in hash_iterate (hash=0x21335f0, func=0x7fdd7d6997a0 <fpm_enqueue_rmac_table>, arg=0x2132b60) at lib/hash.c:252 2 0x00007fdd7dd1560d in hash_iterate (hash=0x1e5bf10, func=func@entry=0x7fdd7d698900 <fpm_enqueue_l3vni_table>, arg=arg@entry=0x7ffed983bef0) at lib/hash.c:252 3 0x00007fdd7d698b5c in fpm_rmac_send (t=<optimized out>) at zebra/dplane_fpm_nl.c:1262 4 0x00007fdd7dd6ce22 in event_call (thread=thread@entry=0x7ffed983c010) at lib/event.c:1970 5 0x00007fdd7dd20758 in frr_run (master=0x1d27f10) at lib/libfrr.c:1213 6 0x0000000000425588 in main (argc=10, argv=0x7ffed983c2e8) at zebra/main.c:492 Signed-off-by: Igor Zhukov <fsb4000@yandex.ru>	2024-10-04 14:59:14 +07:00
Rajasekar Raja	aa4786642c	zebra: vlan to dplane Offload from main Trigger: Zebra core seen when we convert l2vni to l3vni and back BackTrace: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7f4af96989d9] /usr/lib/frr/zebra(zebra_vxlan_if_vni_up+0x250) [0x5561022ae030] /usr/lib/frr/zebra(netlink_vlan_change+0x2f4) [0x5561021fd354] /usr/lib/frr/zebra(netlink_parse_info+0xff) [0x55610220d37f] /usr/lib/frr/zebra(+0xc264a) [0x55610220d64a] /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f4af967e96d] /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f4af9637588] /usr/lib/frr/zebra(main+0x402) [0x5561021f4d32] /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f4af932624a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f4af9326305] /usr/lib/frr/zebra(_start+0x21) [0x5561021f72f1] Root Cause: In working case, - We get a RTM_NEWLINK whose ctx is enqueued by zebra dplane and dequeued by zebra main and processed i.e. (102000 is deleted from vxlan99) before we handle RTM_NEWVLAN. - So in handling of NEWVLAN (vxlan99) we bail out since find with vlan id 703 does not exist. root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/working/nocras.log \| grep "RTM_NEWLINK\\|QUEUED\\|vxlan99\\|in thread" 2024/07/18 23:09:33.741105 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0 2024/07/18 23:09:33.744061 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (65), result QUEUED 2024/07/18 23:09:33.767240 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0 2024/07/18 23:09:33.767380 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (73), result QUEUED 2024/07/18 23:09:33.767389 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for vxlan99(73) 2024/07/18 23:09:33.767404 ZEBRA: [TQR2A-H2RFY] Vlan-Vni(1186:1186-6000002:6000002) update for VxLAN IF vxlan99(73) 2024/07/18 23:09:33.767422 ZEBRA: [TP4VP-XZ627] Del L2-VNI 102000 intf vxlan99(73) 2024/07/18 23:09:33.767858 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0 2024/07/18 23:09:33.767866 ZEBRA: [KKZGZ-8PCDW] Cannot find VNI for VID (703) IF vxlan99 for vlan state update >>>>BAIL OUT In failure case, - The NEWVLAN is received first even before processing RTM_NEWLINK. - Since the vxlan id 102000 is not removed from the vxlan99, the find with vlan id 703 returns the 102000 one and we invoke zebra_vxlan_if_vni_up where the interfaces don't match and assert. root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/noworking/crash.log \| grep "RTM_NEWLINK\\|QUEUED\\|vxlan99\\|in thread" 2024/07/18 22:26:43.829370 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0 2024/07/18 22:26:43.829646 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7fe07c026d30, op INTF_INSTALL, ifindex (65), result QUEUED 2024/07/18 22:26:43.853930 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0 2024/07/18 22:26:43.853949 ZEBRA: [K61WJ-XQQ3X] Intf vxlan99(73) L2-VNI 102000 is UP >>> VLAN PROCESSED BEFORE INTF EVENT 2024/07/18 22:26:43.853951 ZEBRA: [SPV50-BX2RP] RAJA zevpn_vxlanif vxlan48 and ifp vxlan99 2024/07/18 22:26:43.854005 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0 2024/07/18 22:26:43.854241 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=516, seq=0, pid=0 2024/07/18 22:26:43.854251 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=544, seq=0, pid=0 ZEBRA: in thread kernel_read scheduled from zebra/kernel_netlink.c:505 kernel_read() Fix: Similar to #13396, where link change handling was offloaded to dplane, same is being done for vlan events. Note: Prior to this change, zebra main thread was interested in the RTNLGRP_BRVLAN. So all the kernel events pertaining to vlan was handled by zebra main. With this change change as well the handling of vlan events is still with Zebra main. However we offload it via Dplane thread. Ticket :#3878175 Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>	2024-09-26 20:17:35 -07:00
Rajasekar Raja	1632988acf	zebra: vlan to dplane, Relocating some functions Relocating functions used by vlan in if_netlink into zebra vxlan Note: Static variable to the functions will be added back in the next commit. Ticket :#3878175 Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>	2024-09-25 11:56:06 -07:00

1 2 3 4 5 ...

6232 commits