Commit graph

8487 commits

Author SHA1 Message Date
Donald Sharp 5d4c7d2ece
Merge pull request #18675 from mjstapp/fix_clang_18_warnings
lib,pimd,bgpd,bfdd: Fix clang 18 warnings
2025-04-17 18:38:21 -04:00
Mark Stapp 2d42318625 bfdd, bgpd: clean up clang warnings
Clean up some clang compiler warnings.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-04-16 13:50:21 -04:00
David Lamparter d46909e50f bgpd: fix misused rfapi conditional
bgpd/bgpd.c:8975:5: error: "ENABLE_BGP_VNC" is not defined, evaluates to 0 [-Werror=undef]
 8975 | #if ENABLE_BGP_VNC

Fixes: FRRouting#18546
Fixes: 1629c05924 ("bgpd: rfapi: track outstanding rib and import timers, free mem at exit")
Cc: G. Paul Ziemba <paulz@labn.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2025-04-16 12:47:36 +02:00
Donatas Abraitis 7ef9394c33
Merge pull request #18653 from louis-6wind/fix-bgp-pbr-mem-leaks
bgpd: fix pbr memory leaks
2025-04-15 10:07:14 +03:00
Mark Stapp 81b472bd79 lib,bgpd: clean up clang warnings
Clean up a couple of clang compiler warnings (this was
clang 18)

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-04-14 15:32:08 -04:00
Louis Scalbert eb8281aeb8 bgpd: fix bgp_pbr_or_filter memory leaks
Note that bgp_pbr_policyroute_add_from_zebra() and
bgp_pbr_policyroute_remove_from_zebra() are only called from
bgp_pbr_handle_entry().

>  ==966967==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 40 byte(s) in 1 object(s) allocated from:
>     #0 0x7fd447ab4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fd44746f8dd in qcalloc lib/memory.c:105
>     #2 0x7fd44744401a in list_new lib/linklist.c:49
>     #3 0x560f8c094490 in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2818
>     #4 0x560f8c095993 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2941
>     #5 0x560f8c2164f3 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #6 0x560f8c0bd668 in bgp_process_main_one bgpd/bgp_route.c:3691
>     #7 0x560f8c0be7fe in process_subq_other_route bgpd/bgp_route.c:3856
>     #8 0x560f8c0bf280 in process_subq bgpd/bgp_route.c:3955
>     #9 0x560f8c0bf320 in meta_queue_process bgpd/bgp_route.c:3980
>     #10 0x7fd44759fdfc in work_queue_run lib/workqueue.c:282
>     #11 0x7fd4475779b2 in event_call lib/event.c:2011
>     #12 0x7fd447442ff1 in frr_run lib/libfrr.c:1216
>     #13 0x560f8bef0a15 in main bgpd/bgp_main.c:545
>     #14 0x7fd446e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> Direct leak of 40 byte(s) in 1 object(s) allocated from:
>     #0 0x7fd447ab4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fd44746f8dd in qcalloc lib/memory.c:105
>     #2 0x7fd44744401a in list_new lib/linklist.c:49
>     #3 0x560f8c09439d in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2812
>     #4 0x560f8c095993 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2941
>     #5 0x560f8c2164f3 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #6 0x560f8c0bd668 in bgp_process_main_one bgpd/bgp_route.c:3691
>     #7 0x560f8c0be7fe in process_subq_other_route bgpd/bgp_route.c:3856
>     #8 0x560f8c0bf280 in process_subq bgpd/bgp_route.c:3955
>     #9 0x560f8c0bf320 in meta_queue_process bgpd/bgp_route.c:3980
>     #10 0x7fd44759fdfc in work_queue_run lib/workqueue.c:282
>     #11 0x7fd4475779b2 in event_call lib/event.c:2011
>     #12 0x7fd447442ff1 in frr_run lib/libfrr.c:1216
>     #13 0x560f8bef0a15 in main bgpd/bgp_main.c:545
>     #14 0x7fd446e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> Direct leak of 4 byte(s) in 1 object(s) allocated from:
>     #0 0x7fd447ab4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fd44746f8dd in qcalloc lib/memory.c:105
>     #2 0x560f8c080cec in bgp_pbr_extract_enumerate_unary bgpd/bgp_pbr.c:362
>     #3 0x560f8c080f7e in bgp_pbr_extract_enumerate bgpd/bgp_pbr.c:400
>     #4 0x560f8c094530 in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2819
>     #5 0x560f8c095993 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2941
>     #6 0x560f8c2164f3 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #7 0x560f8c0bd668 in bgp_process_main_one bgpd/bgp_route.c:3691
>     #8 0x560f8c0be7fe in process_subq_other_route bgpd/bgp_route.c:3856
>     #9 0x560f8c0bf280 in process_subq bgpd/bgp_route.c:3955
>     #10 0x560f8c0bf320 in meta_queue_process bgpd/bgp_route.c:3980
>     #11 0x7fd44759fdfc in work_queue_run lib/workqueue.c:282
>     #12 0x7fd4475779b2 in event_call lib/event.c:2011
>     #13 0x7fd447442ff1 in frr_run lib/libfrr.c:1216
>     #14 0x560f8bef0a15 in main bgpd/bgp_main.c:545
>     #15 0x7fd446e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> Direct leak of 4 byte(s) in 1 object(s) allocated from:
>     #0 0x7fd447ab4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fd44746f8dd in qcalloc lib/memory.c:105
>     #2 0x560f8c080cec in bgp_pbr_extract_enumerate_unary bgpd/bgp_pbr.c:362
>     #3 0x560f8c080f7e in bgp_pbr_extract_enumerate bgpd/bgp_pbr.c:400
>     #4 0x560f8c09443d in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2813
>     #5 0x560f8c095993 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2941
>     #6 0x560f8c2164f3 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #7 0x560f8c0bd668 in bgp_process_main_one bgpd/bgp_route.c:3691
>     #8 0x560f8c0be7fe in process_subq_other_route bgpd/bgp_route.c:3856
>     #9 0x560f8c0bf280 in process_subq bgpd/bgp_route.c:3955
>     #10 0x560f8c0bf320 in meta_queue_process bgpd/bgp_route.c:3980
>     #11 0x7fd44759fdfc in work_queue_run lib/workqueue.c:282
>     #12 0x7fd4475779b2 in event_call lib/event.c:2011
>     #13 0x7fd447442ff1 in frr_run lib/libfrr.c:1216
>     #14 0x560f8bef0a15 in main bgpd/bgp_main.c:545
>     #15 0x7fd446e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-14 13:52:02 +02:00
Louis Scalbert 442d8bce36 bgpd: fix bgp_pbr_rule memory leak
Fix bgp_pbr_rule memory leak. Found by code review.

Fixes: 27e376d4e1 ("bgpd: an hash list of pbr iprule is created")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-14 13:09:36 +02:00
Louis Scalbert 8d9df5cf04 bgpd: fix bgp_pbr_match memory leak
> Direct leak of 1144 byte(s) in 13 object(s) allocated from:
>     #0 0x7f3eedeb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7f3eed86f8dd in qcalloc lib/memory.c:105
>     #2 0x55b32d236faf in bgp_pbr_match_alloc_intern bgpd/bgp_pbr.c:1074
>     #3 0x7f3eed817d79 in hash_get lib/hash.c:147
>     #4 0x55b32d242d9a in bgp_pbr_policyroute_add_to_zebra_unit bgpd/bgp_pbr.c:2486
>     #5 0x55b32d244436 in bgp_pbr_policyroute_add_to_zebra bgpd/bgp_pbr.c:2672
>     #6 0x55b32d245a05 in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2843
>     #7 0x55b32d246912 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2939
>     #8 0x55b32d3c7472 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #9 0x55b32d26e5e7 in bgp_process_main_one bgpd/bgp_route.c:3691
>     #10 0x55b32d26f77d in process_subq_other_route bgpd/bgp_route.c:3856
>     #11 0x55b32d2701ff in process_subq bgpd/bgp_route.c:3955
>     #12 0x55b32d27029f in meta_queue_process bgpd/bgp_route.c:3980
>     #13 0x7f3eed99fdd8 in work_queue_run lib/workqueue.c:282
>     #14 0x7f3eed97798e in event_call lib/event.c:2011
>     #15 0x7f3eed842ff1 in frr_run lib/libfrr.c:1216
>     #16 0x55b32d0a1a15 in main bgpd/bgp_main.c:545
>     #17 0x7f3eed229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Fixes: d114b0d739 ("bgpd: inject policy route entry from bgp into zebra pbr entries.")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-14 13:09:35 +02:00
Louis Scalbert 520518c3ad bgpd: fix bgp_pbr_match_entry memory leak
> ==238132==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 160 byte(s) in 1 object(s) allocated from:
>     #0 0x7fd79f0b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fd79ea6f8dd in qcalloc lib/memory.c:105
>     #2 0x5586b26995f9 in bgp_pbr_match_entry_alloc_intern bgpd/bgp_pbr.c:1155
>     #3 0x7fd79ea17d79 in hash_get lib/hash.c:147
>     #4 0x5586b26a551d in bgp_pbr_policyroute_add_to_zebra_unit bgpd/bgp_pbr.c:2522
>     #5 0x5586b26a6436 in bgp_pbr_policyroute_add_to_zebra bgpd/bgp_pbr.c:2672
>     #6 0x5586b26a8089 in bgp_pbr_handle_entry bgpd/bgp_pbr.c:2876
>     #7 0x5586b26a8912 in bgp_pbr_update_entry bgpd/bgp_pbr.c:2939
>     #8 0x5586b2829472 in bgp_zebra_announce bgpd/bgp_zebra.c:1618
>     #9 0x5586b282ab4b in bgp_zebra_announce_table bgpd/bgp_zebra.c:1766
>     #10 0x5586b2824b99 in bgp_zebra_tm_connect bgpd/bgp_zebra.c:1091
>     #11 0x7fd79eb7798e in event_call lib/event.c:2011
>     #12 0x7fd79ea42ff1 in frr_run lib/libfrr.c:1216
>     #13 0x5586b2503a15 in main bgpd/bgp_main.c:545
>     #14 0x7fd79e429d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Fixes: d114b0d739 ("bgpd: inject policy route entry from bgp into zebra pbr entries.")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-14 13:09:34 +02:00
Donald Sharp dcf43ae009 bgpd: Prevent crash when issuing a show rpki connections
When attempting to check rpki status and the connection
has been turned off, let's check to see if we are connected
before we ask the rpki subsystem, else we will get a crash
in the rpki library.

Signed-off-by: Donald Sharp <donaldsharp72@gmail.com>
2025-04-12 16:59:56 -04:00
Donatas Abraitis dc37bff8c0
Merge pull request #18376 from pguibert6WIND/show_bgp_neighbor_counter_tx
bgpd: fix add prefix sent in 'show bgp neighbor'
2025-04-10 19:38:59 +03:00
Donatas Abraitis cf351d6d05
Merge pull request #18611 from pguibert6WIND/bgp_usid
bgpd: add usid behavior for bgp srv6 instructions
2025-04-10 19:34:44 +03:00
Donatas Abraitis 2f0b8ff1ea
Merge pull request #18624 from louis-6wind/remove-afi2family
bgpd: remove useless calls to afi2family
2025-04-10 17:14:08 +03:00
Mark Stapp 0e8e0c5fb9
Merge pull request #18594 from soumyar-roy/soumya/netwithdraw
bgpd: Paths not deleted received from shutdown peer
2025-04-10 10:07:32 -04:00
Philippe Guibert a47a53b003 bgpd: fix add prefix sent in 'show bgp neighbor'
The 'acceptedPrefixCounter' is available in 'show bgp neighbor json', but
there is no equivalent when using the non json output. Add it.

> # show bgp neighbor
> [..]
>  Community attribute sent to this neighbor(all)
>  0 accepted prefixes, 1 sent prefixes

Fixes: 856ca177c4 ("Added json formating support to show-...-neighbors-... bgp commands.")

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-04-10 10:12:36 +02:00
Soumya Roy d2bec7a691 bgpd: Paths, received from shutdown peer, not deleted
Issue:
In a scaled setup, (where number of nets > BGP_CLEARING_BATCH_MAX_DESTS
for walk_batch_table_helper), when peer is shutdown, it is seen some
of the paths are not deleted, which are received from that peer.

Fix:
This is due to, in clear_batch_rib_helper, once walk_batch_table_helper
returns after BGP_CLEARING_BATCH_MAX_DESTS is reached, we just break
from inner loop for the afi/safi for loops. So during walk for next
afi/safi that 'ret' state is overwritten with new state. Also the
resume context is overwritten. This causes to lose the start point
for next walk, some nets are skipped forever. So they are not marked
for deletion anymore. To fix this, we immediately return from current
run. This will have resume state to be stored correctly, and next walk
will start from there.

Testing:
32 ecmp paths were received from the shutdown peer
Before fix:
show bgp ipv6 2052:52:1:167::/64
BGP routing table entry for 2052:52:1:167::/64, version 495
Paths: (246 available, best #127, table default)
  Not advertised to any peer

<snip>
  4200165500 4200165002
    2021:21:51:101::2(spine-5) from spine-5(2021:21:51:101::2) (6.0.0.17)
    (fe80::202:ff:fe00:55) (prefer-global)
      Origin incomplete, valid, external, multipath
      Last update: Fri Apr  4 17:25:05 2025
  4200165500 4200165002
    2021:21:11:116::2(spine-1) from spine-1(2021:21:11:116::2) (0.0.0.0)
    (fe80::202:ff:fe00:3d) (prefer-global)<<<<path not deleted
      Origin incomplete, valid, external
      Last update: Fri Apr  4 17:25:05 2025
  4200165500 4200165002
    2021:21:11:115::2(spine-1) from spine-1(2021:21:11:115::2) (0.0.0.0)
    (fe80::202:ff:fe00:3d) (prefer-global)<<<<path not deleted
      Origin incomplete, valid, external
      Last update: Fri Apr  4 17:25:05 2025
<snip>

 32 paths are supposed to be withdrawn:
root@leaf-1:mgmt:# vtysh -c "show bgp ipv6 2052:52:1:167::/64" | grep "prefer-global" | wc -l
256
root@leaf-1:mgmt# vtysh -c "show bgp ipv6 2052:52:1:167::/64" | grep "prefer-global" | wc -l
246<<should be 224, but showing 246, which is wrong
After fix:
 32 paths are supposed to be withdrawn:
root@leaf-1:mgmt:# vtysh -c "show bgp ipv6 2052:52:1:167::/64" | grep "prefer-global" | wc -l
256
root@leaf-1:mgmt:# vtysh -c "show bgp ipv6 2052:52:1:167::/64" | grep "prefer-global" | wc -l
224<<<shows correctly

Signed-off-by: Soumya Roy <souroy@nvidia.com>
2025-04-09 14:32:23 +00:00
Donald Sharp b2d8d9b37a bgpd: On shutdown free up table for static routes
Indirect leak of 56 byte(s) in 1 object(s) allocated from:
    0 0x7fdaf6cb83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
    1 0x7fdaf683a480 in qcalloc lib/memory.c:106
    2 0x7fdaf68dd706 in route_table_init_with_delegate lib/table.c:38
    3 0x5649b22c05b0 in bgp_table_init bgpd/bgp_table.c:139
    4 0x5649b2273da0 in bgp_static_set bgpd/bgp_route.c:7779
    5 0x5649b21eba58 in vpnv4_network bgpd/bgp_mplsvpn.c:3244
    6 0x7fdaf67b6d61 in cmd_execute_command_real lib/command.c:1003
    7 0x7fdaf67b7080 in cmd_execute_command lib/command.c:1062
    8 0x7fdaf67b75ac in cmd_execute lib/command.c:1228
    9 0x7fdaf68ffb20 in vty_command lib/vty.c:626
    10 0x7fdaf6900073 in vty_execute lib/vty.c:1389
    11 0x7fdaf6903e24 in vtysh_read lib/vty.c:2408
    12 0x7fdaf68f0222 in event_call lib/event.c:2019
    13 0x7fdaf681b3c6 in frr_run lib/libfrr.c:1247
    14 0x5649b211c903 in main bgpd/bgp_main.c:565
    15 0x7fdaf630c249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Table was being created but never deleted.  Let's delete it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-04-09 09:28:31 -04:00
Louis Scalbert a0a0749568 bgpd: remove useless calls to afi2family
Remove useless calls to afi2family(). str2prefix() always sets the
prefix family.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-09 13:07:56 +02:00
David Lamparter 8418e57791
Merge pull request #17915 from mjstapp/compile_wshadow 2025-04-09 09:59:06 +02:00
Mark Stapp 660cbf5651 bgpd: clean up variable-shadowing compiler warnings
Clean up -Wshadow warnings in bgp.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-04-08 14:41:27 -04:00
Donald Sharp b18c309015 bgpd: On shutdown free up memory leak found by topotest
This commit fixes two types of problems:

a) Avoidance of cleaning up memory when a instance is
hidden, thus causing it never to be freed on shutdown

b) In some instances bgp_create is called 2 times
for some code.  We are double allocating memory
and dropping it on the second allocation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-04-08 11:47:50 -04:00
Philippe Guibert 2ced6d233f bgpd: add usid behavior for bgp srv6 instructions
Until now, BGP srv6 usid instructions were not really used. Add the
support for this.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-04-08 16:16:06 +02:00
Donatas Abraitis 5e092d0e25
Merge pull request #18558 from spoignant-proton/master
bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557)
2025-04-07 03:27:02 +03:00
Mark Stapp 259ffe1dfe
Merge pull request #18562 from opensourcerouting/fix/bfd_down_if_established
bgpd: Treat the peer as not active due to BFD down only if established
2025-04-04 12:28:18 -04:00
Stephane Poignant 2cee5567bc
bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557)
Section 4.1 of RFC8955 defines how the length field of flowspec NLRIs is encoded.
The method use implies a maximum length of 4095 for a single flowspec NLRI.
However, in bgp_flowspec.c, we check the length attribute of the bgp_nlri structure against this maximum value, which actually is the *total* length of all NLRI included in the considered MP_REACH_NLRI path attribute.
Due to this confusion, frr would reject valid announces that contain many flowspec NLRIs, when their cummulative length exceeds 4095, and close the session.
The proposed change removes that check entirely. Indeed, there is no need to check the length field of each invidual NLRI because the method employed make it impossible to encode a length greater than 4095.

Signed-off-by: Stephane Poignant <stephane.poignant@proton.ch>
2025-04-04 13:29:02 +02:00
Mark Stapp e0a97e5b85
Merge pull request #18546 from LabNConsulting/ziemba/250330-rfapi-mem-cleanup
bgpd: rfapi: track outstanding rib and import timers, free mem at exit
2025-04-03 09:01:35 -04:00
Russ White ab67e5544e
Merge pull request #18396 from pguibert6WIND/srv6l3vpn_to_bgp_vrf_redistribute
Add BGP redistribution in SRv6 BGP
2025-04-03 08:25:32 -04:00
Rajesh Varatharaj 35129c88b4 bgpd: Skip EVPN MAC processing for non-EVPN peers
Issue:
"Processing EVPN MAC interface change on peer" log message is printed
even when the peer didnt have EVPN address family.

Fix:
Process only if the peer is in EVPN address family

Ticket: #17890
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
2025-04-02 11:48:42 -07:00
Donatas Abraitis da4a7b0356 bgpd: Treat the peer as not active due to BFD down only if established
If we have `neighbor X bfd` and BFD status is DOWN and/or ADMIN_DOWN, and BGP
session is not yet established, we never allow the session to establish.

Let's fix this regression that was in 10.2.

Fixes: 1fb48f5 ("bgpd: Do not start BGP session if BFD profile is in shutdown state")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-04-02 17:24:09 +03:00
Louis Scalbert cbf27be5d9 bgpd: optimize attrhash_cmp calls
Only call attrhash_cmp when necessary.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-04-01 16:52:58 +02:00
Russ White c312917988
Merge pull request #18450 from donaldsharp/bgp_packet_reads
Bgp packet reads conversion to a FIFO
2025-04-01 10:12:37 -04:00
G. Paul Ziemba 1629c05924 bgpd: rfapi: track outstanding rib and import timers, free mem at exit
While here, also make "VPN SAFI clear" test wait for clear result
    (tests/topotests/bgp_rfapi_basic_sanity{,_config2})

    Original RFAPI code relied on the frr timer system to remember
    various allocations that were supposed to be freed at future times
    rather than manage a parallel database. However, if bgpd is terminated
    before the times expire, those pending allocations are marked as
    memory leaks, even though they wouldn't be leaks under normal operation.

    This change adds some hash tables to track these outstanding
    allocations that are associated with pending timers, and uses
    those tables to free the allocations when bgpd exits.

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2025-03-31 08:45:33 -07:00
Donald Sharp 354aee8932 bgpd: Free memory associated with aspath_dup
Fix this:

==3890443== 92 (48 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 68 of 98
==3890443==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3890443==    by 0x49737B3: qcalloc (memory.c:106)
==3890443==    by 0x3EA63B: aspath_dup (bgp_aspath.c:703)
==3890443==    by 0x2F5438: route_set_aspath_exclude (bgp_routemap.c:2604)
==3890443==    by 0x49BC52A: route_map_apply_ext (routemap.c:2708)
==3890443==    by 0x2C1069: bgp_input_modifier (bgp_route.c:1925)
==3890443==    by 0x2C9F12: bgp_update (bgp_route.c:5205)
==3890443==    by 0x2CF281: bgp_nlri_parse_ip (bgp_route.c:7271)
==3890443==    by 0x2A28C7: bgp_nlri_parse (bgp_packet.c:338)
==3890443==    by 0x2A7F5C: bgp_update_receive (bgp_packet.c:2448)
==3890443==    by 0x2ACCA6: bgp_process_packet (bgp_packet.c:4046)
==3890443==    by 0x49EB77C: event_call (event.c:2019)
==3890443==    by 0x495FAD1: frr_run (libfrr.c:1247)
==3890443==    by 0x208D6D: main (bgp_main.c:557)

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-30 17:54:34 -04:00
Donald Sharp c9d431d4db bgpd: On shutdown, unlock table when clearing the bgp metaQ
There are some tables not being freed upon shutdown.  This
is happening because the table is being locked as dests
are being put on the metaQ.  When in shutdown it was clearing
the MetaQ it was not unlocking the table

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-30 14:02:16 -04:00
Donald Sharp 06480c0c81 bgpd: When shutting down do not clear self peers
Commit: e0ae285eb8

Modified the fsm state machine to attempt to not
clear routes on a peer that was not established.
The peer should be not a peer self.  We do not want
to ever clear the peer self.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-30 14:02:16 -04:00
Russ White 565da4d471
Merge pull request #18498 from opensourcerouting/fix/keep_stale_routes_on_clear
bgpd: Retain the routes if we do a clear with N-bit set for Graceful-Restart
2025-03-26 14:02:52 -04:00
Donald Sharp 212a5379b0
Merge pull request #18502 from opensourcerouting/fix/mpls_withdraw_label
bgpd: Set the label for MP_UNREACH_NLRI 0x800000 instead of 0x000000
2025-03-26 11:26:19 -04:00
Donatas Abraitis 94e2aadf71 bgpd: Set the label for MP_UNREACH_NLRI 0x800000 instead of 0x000000
RFC8277 says:

The procedures in [RFC3107] for withdrawing the binding of a label
or sequence of labels to a prefix are not specified clearly and correctly.

=> How to Explicitly Withdraw the Binding of a Label to a Prefix

Suppose a BGP speaker has announced, on a given BGP session, the
   binding of a given label or sequence of labels to a given prefix.
   Suppose it now wishes to withdraw that binding.  To do so, it may
   send a BGP UPDATE message with an MP_UNREACH_NLRI attribute.  The
   NLRI field of this attribute is encoded as follows:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |    Length     |        Compatibility                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          Prefix                               ~
     ~                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 4: NLRI for Withdrawal

   Upon transmission, the Compatibility field SHOULD be set to 0x800000.
   Upon reception, the value of the Compatibility field MUST be ignored.

[RFC3107] also made it possible to withdraw a binding without
   specifying the label explicitly, by setting the Compatibility field
   to 0x800000.  However, some implementations set it to 0x000000.  In
   order to ensure backwards compatibility, it is RECOMMENDED by this
   document that the Compatibility field be set to 0x800000, but it is
   REQUIRED that it be ignored upon reception.

In FRR case where a single label is used per-prefix, we should send 0x800000,
and not 0x000000.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-26 10:30:52 +02:00
Donatas Abraitis 42b9d985cc bgpd: Remove unused defines from bgp_label.h
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-26 08:50:06 +02:00
Donatas Abraitis b7c657d4e0 bgpd: Retain the routes if we do a clear with N-bit set for Graceful-Restart
On receiving side we already did the job correctly, but the peer which initiates
the clear does not retain the other's routes. This commit fixes that.

Fixes: 20170775da ("bgpd: Activate Graceful-Restart when receiving CEASE/HOLDTIME notifications")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-25 17:20:56 +02:00
Donald Sharp 83a92c926e bgpd: Delay processing MetaQ in some events
If the number of peers that are being handled on
the peer connection fifo is greater than 10, that
means we have some network event going on.  Let's
allow the packet processing to continue instead
of running the metaQ.  This has advantages because
everything else in BGP is only run after the metaQ
is run.  This includes best path processing,
installation of the route into zebra as well as
telling our peers about this change.  Why does
this matter?  It matters because if we are receiving
the same route multiple times we limit best path processing
to much fewer times and consequently we also limit
the number of times we send the route update out and
we install the route much fewer times as well.

Prior to this patch, with 512 peers and 5k routes.
CPU time for bgpd was 3:10, zebra was 3:28.  After
the patch CPU time for bgpd was 0:55 and zebra was
0:25.

Here are the prior `show event cpu`:
Event statistics for bgpd:

Showing statistics for pthread default
--------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0         20.749     33144        0       395        1       396         0         0          0    T    (bgp_generate_updgrp_packets)
    0       9936.199      1818     5465     43588     5466     43589         0         0          0     E   bgp_handle_route_announcements_to_zebra
    0          0.220        84        2        20        3        20         0         0          0    T    update_subgroup_merge_check_thread_cb
    0          0.058         2       29        43       29        43         0         0          0     E   zclient_connect
    0      17297.733       466    37119     67428    37124     67429         0         0          0   W     zclient_flush_data
    1          0.134         7       19        40       20        42         0         0          0  R      vtysh_accept
    0        151.396      1067      141      1181      142      1189         0         0          0  R      vtysh_read
    0          0.297      1030        0        14        0        14         0         0          0    T    (bgp_routeadv_timer)
    0          0.001         1        1         1        2         2         0         0          0    T    bgp_sync_label_manager
    2          9.374       544       17       261       17       262         0         0          0  R      bgp_accept
    0          0.001         1        1         1        2         2         0         0          0    T    bgp_startup_timer_expire
    0          0.012         1       12        12       13        13         0         0          0     E   frr_config_read_in
    0          0.308         1      308       308      309       309         0         0          0    T    subgroup_coalesce_timer
    0          4.027       105       38        77       39        78         0         0          0    T    (bgp_start_timer)
    0     112206.442      1818    61719     84726    61727     84736         0         0          0    TE   work_queue_run
    0          0.345         1      345       345      346       346         0         0          0    T    bgp_config_finish
    0          0.710       620        1         6        1         9         0         0          0   W     bgp_connect_check
    2         39.420      8283        4       110        5       111         0         0          0  R      zclient_read
    0          0.052         1       52        52      578       578         0         0          0    T    bgp_start_label_manager
    0          0.452        87        5        90        5        90         0         0          0    T    bgp_announce_route_timer_expired
    0        185.837      3088       60       537       92     21705         0         0          0     E   bgp_event
    0      48719.671      4346    11210     78292    11215     78317         0         0          0     E   bgp_process_packet

Showing statistics for pthread BGP I/O thread
---------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0        321.915     28597       11        86       11       265         0         0          0   W     bgp_process_writes
  515        115.586     26954        4       121        4       128         0         0          0  R      bgp_process_reads

Event statistics for zebra:

Showing statistics for pthread default
--------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0          0.109         2       54        62       55        63         0         0          0    T    timer_walk_start
    1          0.550        11       50       100       50       100         0         0          0  R      vtysh_accept
    0     112848.163      4441    25410    405489    25413    410127         0         0          0     E   zserv_process_messages
    0          0.007         1        7         7        7         7         0         0          0     E   frr_config_read_in
    0          0.005         1        5         5        5         5         0         0          0    T    rib_sweep_route
    1        573.589      4789      119      1567      120      1568         0         0          0    T    wheel_timer_thread
  347         30.848        97      318      1367      318      1366         0         0          0    T    zebra_nhg_timer
    0          0.005         1        5         5        6         6         0         0          0    T    zebra_evpn_mh_startup_delay_exp_cb
    0          5.404       521       10        38       10        70         0         0          0    T    timer_walk_continue
    1          1.669         9      185       219      186       219         0         0          0  R      zserv_accept
    1          0.174        18        9        53       10        53         0         0          0  R      msg_conn_read
    0          3.028       520        5        47        6        47         0         0          0    T    if_zebra_speed_update
    0          0.324       274        1         5        1         6         0         0          0   W     msg_conn_write
    1         24.661      2124       11       359       12       359         0         0          0  R      kernel_read
    0      73683.333      2964    24859    143223    24861    143239         0         0          0    TE   work_queue_run
    1         46.649      6789        6       424        7       424         0         0          0  R      rtadv_read
    0         52.661        85      619      2087      620      2088         0         0          0  R      vtysh_read
    0         42.660        18     2370     21694     2373     21695         0         0          0     E   msg_conn_proc_msgs
    0          0.034         1       34        34       35        35         0         0          0     E   msg_client_connect_timer
    0       2786.938      2300     1211     29456     1219     29555         0         0          0     E   rib_process_dplane_results

Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0       4875.670    200371       24       770       24       776         0         0          0     E   dplane_thread_loop
    0          0.059         1       59        59       76        76         0         0          0     E   dplane_incoming_request
    1          9.640       722       13      4510       15      5343         0         0          0  R      dplane_incoming_read

Here are the post `show event cpu` results:

Event statistics for bgpd:

Showing statistics for pthread default
--------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0      21297.497      3565     5974     57912     5981     57913         0         0          0     E   bgp_process_packet
    0        149.742      1068      140      1109      140      1110         0         0          0  R      vtysh_read
    0          0.013         1       13        13       14        14         0         0          0     E   frr_config_read_in
    0          0.459        86        5       104        5       105         0         0          0    T    bgp_announce_route_timer_expired
    0          0.139        81        1        20        2        21         0         0          0    T    update_subgroup_merge_check_thread_cb
    0        405.889    291687        1       179        1       450         0         0          0    T    (bgp_generate_updgrp_packets)
    0          0.682       618        1         6        1         9         0         0          0   W     bgp_connect_check
    0          3.888       103       37        81       38        82         0         0          0    T    (bgp_start_timer)
    0          0.074         1       74        74      458       458         0         0          0    T    bgp_start_label_manager
    0          0.000         1        0         0        1         1         0         0          0    T    bgp_sync_label_manager
    0          0.121         3       40        54      100       141         0         0          0     E   bgp_process_conn_error
    0          0.060         2       30        49       30        50         0         0          0     E   zclient_connect
    0          0.354         1      354       354      355       355         0         0          0    T    bgp_config_finish
    0          0.283         1      283       283      284       284         0         0          0    T    subgroup_coalesce_timer
    0      29365.962      1805    16269     99445    16273     99454         0         0          0    TE   work_queue_run
    0        185.532      3097       59       497       94     26107         0         0          0     E   bgp_event
    1          0.290         8       36       151       37       158         0         0          0  R      vtysh_accept
    2          9.462       548       17       320       17       322         0         0          0  R      bgp_accept
    2         40.219      8283        4       128        5       128         0         0          0  R      zclient_read
    0          0.322      1031        0         4        0         5         0         0          0    T    (bgp_routeadv_timer)
    0        356.812       637      560      3007      560      3007         0         0          0     E   bgp_handle_route_announcements_to_zebra

Showing statistics for pthread BGP I/O thread
---------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
  515         62.965     14335        4       103        5       181         0         0          0  R      bgp_process_reads
    0       1986.041    219813        9       213        9       315         0         0          0   W     bgp_process_writes

Event statistics for zebra:

Showing statistics for pthread default
--------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0          0.006         1        6         6        7         7         0         0          0     E   frr_config_read_in
    0       3673.365      2044     1797    259281     1800    261342         0         0          0     E   zserv_process_messages
    1        651.846      8041       81      1090       82      1233         0         0          0    T    wheel_timer_thread
    0         38.184        18     2121     21345     2122     21346         0         0          0     E   msg_conn_proc_msgs
    1          0.651        12       54       112       55       112         0         0          0  R      vtysh_accept
    0          0.102         2       51        55       51        56         0         0          0    T    timer_walk_start
    0        202.721      1577      128     29172      141     29226         0         0          0     E   rib_process_dplane_results
    1         41.650      6645        6       140        6       140         0         0          0  R      rtadv_read
    1         22.518      1969       11       106       12       154         0         0          0  R      kernel_read
    0          4.265        48       88      1465       89      1466         0         0          0  R      vtysh_read
    0       6099.851       650     9384     28313     9390     28314         0         0          0    TE   work_queue_run
    0          5.104       521        9        30       10        31         0         0          0    T    timer_walk_continue
    0          3.078       520        5        53        6        55         0         0          0    T    if_zebra_speed_update
    0          0.005         1        5         5        5         5         0         0          0    T    rib_sweep_route
    0          0.034         1       34        34       35        35         0         0          0     E   msg_client_connect_timer
    1          1.641         9      182       214      183       215         0         0          0  R      zserv_accept
    0          0.358       274        1         6        2         6         0         0          0   W     msg_conn_write
    1          0.159        18        8        54        9        54         0         0          0  R      msg_conn_read

Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn Starv_Warn   Type  Event
    0        301.404      7280       41      1878       41      1878         0         0          0     E   dplane_thread_loop
    0          0.048         1       48        48       49        49         0         0          0     E   dplane_incoming_request
    1          9.558       727       13      4659       14      5420         0         0          0  R      dplane_incoming_read

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25 10:47:39 -04:00
Mark Stapp 39bb12299c bgpd: fix SA warnings in bgp clearing code
Fix a possible use-after-free in the recent bgp batch
clearing code, CID 1639091.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-25 09:32:14 -04:00
Donald Sharp 12bf042c68 bgpd: Modify bgp to handle packet events in a FIFO
Current behavor of BGP is to have a event per connection.  Given
that on startup of BGP with a high number of neighbors you end
up with 2 * # of peers events that are being processed.  Additionally
once BGP has selected the connection this still only comes down
to 512 events.  This number of events is swamping the event system
and in addition delaying any other work from being done in BGP at
all because the the 512 events are always going to take precedence
over everything else.  The other main events are the handling
of the metaQ(1 event), update group events( 1 per update group )
and the zebra batching event.  These are being swamped.

Modify the BGP code to have a FIFO of connections.  As new data
comes in to read, place the connection on the end of the FIFO.
Have the bgp_process_packet handle up to 100 packets spread
across the individual peers where each peer/connection is limited
to the original quanta.  During testing I noticed that withdrawal
events at very very large scale are taking up to 40 seconds to process
so I added a check for yielding to further limit the number of packets
being processed.

This change also allow for BGP to be interactive again on scale
setups on initial convergence.  Prior to this change any vtysh
command entered would be delayed by 10's of seconds in my setup
while BGP was doing other work.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25 09:10:46 -04:00
Donatas Abraitis 45af7ea217
Merge pull request #18483 from donaldsharp/holdtime_mistake
bgpd: Fix holdtime not working properly when busy
2025-03-25 09:38:09 +02:00
Donatas Abraitis 0a405f477d
Merge pull request #18484 from mjstapp/fix_evpn_rt_cli
bgpd: fix handling of configured route-targets for l2vni, l3vni
2025-03-25 09:37:08 +02:00
Mark Stapp 2496bfecfe bgpd: fix handling of configured RTs for l2vni, l3vni
Test for existing explicit config as part of validation of
route-target configuration: allow explicit config of generic/
default AS+VNI, for example, instead of rejecting it.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-24 16:53:32 -04:00
Russ White 7afe25744b
Merge pull request #18447 from donaldsharp/bgp_clear_batch
Bgp clear batch
2025-03-24 16:13:49 -04:00
Philippe Guibert 56c9f1c566 bgpd: fix dereference of null pointer in bgp_nht
Assuming attr is null, a dereference can happen in the function
make_prefix(). Add the protection over attr before accessing the
variable.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-03-24 20:59:18 +01:00
Donald Sharp 9a26a56c51 bgpd: Fix holdtime not working properly when busy
Commit:  cc9f21da22

Modified the bgp_fsm code to dissallow the extension
of the hold time when the system is under extremely
heavy load.  This was a attempt to remove the return
code but it was too aggressive and messed up this bit
of code.

Put the behavior back that was introduced in:
d0874d195d

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-24 15:55:01 -04:00
Philippe Guibert 99acebcdc9 bgpd: fix check validity of a VPN SRv6 route with modified nexthop
When exporting a VPN SRv6 route, the path may not be considered valid if
the nexthop is not valid. This is the case when the 'nexthop vpn export'
command is used. The below example illustrates that the VPN path to
2001:1::/64 is not selected, as the expected nexthop to find in vrf10 is
the one configured:

> # show running-config
> router bgp 1 vrf vrf10
>  address-family ipv6 unicast
>   nexthop vpn export 2001::1

> # show bgp ipv6 vpn
> [..]
> Route Distinguisher: 1:10
>      2001:1::/64      2001::1@4                0             0 65001 i
>     UN=2001::1 EC{99:99} label=16 sid=2001:db8:1:1:: sid_structure=[40,24,16,0] type=bgp, subtype=5

The analysis indicates that the 2001::1 nexthop is considered.

> 2025/03/20 21:47:53.751853 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0
> 2025/03/20 21:47:53.751855 BGP: [VWNP2-DNMFV] Found existing bnc 2001::1/128(0)(VRF vrf10) flags 0x82 ifindex 0 #paths 2 peer 0x0, resolved prefix UNK prefix
> 2025/03/20 21:47:53.751856 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is not valid (in VRF vrf10)
> 2025/03/20 21:47:53.751857 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change

Actually, to check the nexthop validity, only the source path in the VRF
has the correct nexthop. Fix this by reusing the source path information
instead of the current one.

> 2025/03/20 22:43:51.703521 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0
> 2025/03/20 22:43:51.703523 BGP: [VWNP2-DNMFV] Found existing bnc fe80::b812:37ff:fe13:d441/128(0)(VRF vrf10) flags 0x87 ifindex 0 #paths 2 peer 0x0, resolved prefix fe80::/64
> 2025/03/20 22:43:51.703525 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is valid (in VRF vrf10)
> 2025/03/20 22:43:51.703526 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-03-24 09:17:01 +01:00