Commit graph

1165 commits

Author SHA1 Message Date
David Lamparter 8418e57791
Merge pull request #17915 from mjstapp/compile_wshadow 2025-04-09 09:59:06 +02:00
Mark Stapp 660cbf5651 bgpd: clean up variable-shadowing compiler warnings
Clean up -Wshadow warnings in bgp.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-04-08 14:41:27 -04:00
Donald Sharp b18c309015 bgpd: On shutdown free up memory leak found by topotest
This commit fixes two types of problems:

a) Avoidance of cleaning up memory when a instance is
hidden, thus causing it never to be freed on shutdown

b) In some instances bgp_create is called 2 times
for some code.  We are double allocating memory
and dropping it on the second allocation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-04-08 11:47:50 -04:00
Mark Stapp 259ffe1dfe
Merge pull request #18562 from opensourcerouting/fix/bfd_down_if_established
bgpd: Treat the peer as not active due to BFD down only if established
2025-04-04 12:28:18 -04:00
Mark Stapp e0a97e5b85
Merge pull request #18546 from LabNConsulting/ziemba/250330-rfapi-mem-cleanup
bgpd: rfapi: track outstanding rib and import timers, free mem at exit
2025-04-03 09:01:35 -04:00
Donatas Abraitis da4a7b0356 bgpd: Treat the peer as not active due to BFD down only if established
If we have `neighbor X bfd` and BFD status is DOWN and/or ADMIN_DOWN, and BGP
session is not yet established, we never allow the session to establish.

Let's fix this regression that was in 10.2.

Fixes: 1fb48f5 ("bgpd: Do not start BGP session if BFD profile is in shutdown state")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-04-02 17:24:09 +03:00
G. Paul Ziemba 1629c05924 bgpd: rfapi: track outstanding rib and import timers, free mem at exit
While here, also make "VPN SAFI clear" test wait for clear result
    (tests/topotests/bgp_rfapi_basic_sanity{,_config2})

    Original RFAPI code relied on the frr timer system to remember
    various allocations that were supposed to be freed at future times
    rather than manage a parallel database. However, if bgpd is terminated
    before the times expire, those pending allocations are marked as
    memory leaks, even though they wouldn't be leaks under normal operation.

    This change adds some hash tables to track these outstanding
    allocations that are associated with pending timers, and uses
    those tables to free the allocations when bgpd exits.

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2025-03-31 08:45:33 -07:00
Donald Sharp 12bf042c68 bgpd: Modify bgp to handle packet events in a FIFO
Current behavor of BGP is to have a event per connection.  Given
that on startup of BGP with a high number of neighbors you end
up with 2 * # of peers events that are being processed.  Additionally
once BGP has selected the connection this still only comes down
to 512 events.  This number of events is swamping the event system
and in addition delaying any other work from being done in BGP at
all because the the 512 events are always going to take precedence
over everything else.  The other main events are the handling
of the metaQ(1 event), update group events( 1 per update group )
and the zebra batching event.  These are being swamped.

Modify the BGP code to have a FIFO of connections.  As new data
comes in to read, place the connection on the end of the FIFO.
Have the bgp_process_packet handle up to 100 packets spread
across the individual peers where each peer/connection is limited
to the original quanta.  During testing I noticed that withdrawal
events at very very large scale are taking up to 40 seconds to process
so I added a check for yielding to further limit the number of packets
being processed.

This change also allow for BGP to be interactive again on scale
setups on initial convergence.  Prior to this change any vtysh
command entered would be delayed by 10's of seconds in my setup
while BGP was doing other work.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25 09:10:46 -04:00
Russ White 7afe25744b
Merge pull request #18447 from donaldsharp/bgp_clear_batch
Bgp clear batch
2025-03-24 16:13:49 -04:00
Donald Sharp 863f4b0992 bgpd: Tie in more clear events to clear code
The `clear bgp *` and the interface down events
cause a global clearing of data from the bgp rib.
Let's tie those into the clear peer code such
that we can take advantage of the reduced load
in these cases too.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-20 09:38:51 -04:00
Mark Stapp c527882012 bgpd: Allow batch clear to do partial work and continue later
Modify the batch clear code to be able to stop after processing
some of the work and to pick back up again.  This will allow
the very expensive nature of the batch clearing to be spread out
and allow bgp to continue to be responsive.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-20 09:33:52 -04:00
Donald Sharp 7a40da3f0a
Merge pull request #18412 from lsang6WIND/fix_bgp_delete
bgpd: fix "delete in progress" flag on default instance
2025-03-19 20:44:57 -04:00
Loïc Sang 8dc9eacb83 bgpd: fix "delete in progress" flag on default instance
Since 4d0e7a4 ("bgpd: VRF-Lite fix default BGP delete"), upon deletion
of the default instance, it is marked as hidden and the "deletion
in progress" flag is set. When the instance is restored, some routes
are not installed due to the presence of this flag.

Fixes: 4d0e7a4 ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Loïc Sang <loic.sang@6wind.com>
2025-03-18 17:42:34 +01:00
Russ White ad7e625c15
Merge pull request #18410 from opensourcerouting/fix/print_the_real_reason_supressed_peer
bgpd: Print the real reason why the peer is not accepted (incoming)
2025-03-18 08:46:43 -04:00
Donatas Abraitis ace4b8fe61 bgpd: Print the real reason why the peer is not accepted (incoming)
If it's suppressed due to BFD down or unspecified connection, we never know
the real reason and just say "no AF activated" which is misleading.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-17 14:52:42 +02:00
Dmytro Shytyi 942a7c916c
bgpd: align peer_unconfigure with gracefull-restart
When configured Graceful-Restart, skipping unconfig notification,
similarly as it is done in 95098d9611
("bgpd: Do not send Deconfig/Shutdown message when restarting")

Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
2025-03-17 11:19:58 +01:00
Philippe Guibert 496caed836
bgpd: fix radv interface disabled when bgp instance removed
If a peer uses radv for an interface, and bgp instance is removed,
then the radv service is not disabled on the interface.

Fix this by doing the same at BGP unconfiguration. Like it has been
done when a peer is unconfigured, call the radv unregistration before
deleting the peer.

Fixes: b3a3290e23 ("bgpd: turn off RAs when numbered peers are deleted")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
2025-03-17 11:19:58 +01:00
Mark Stapp 6206e7e7ed zebra: move peer conn error list to connection struct
Move the peer connection error list to the peer_connection
struct; that seems to line up better with the way that struct
works.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-12 12:42:07 -04:00
Mark Stapp 58f924d287 bgpd: batch peer connection error clearing
When peer connections encounter errors, attempt to batch some
of the clearing processing that occurs. Add a new batch object,
add multiple peers to it, if possible. Do one rib walk for the
batch, rather than one walk per peer. Use a handler callback
per batch to check and remove peers' path-infos, rather than
a work-queue and callback per peer. The original clearing code
remains; it's used for single peers.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-12 12:42:06 -04:00
Mark Stapp 6a5962e1f8 bgpd: Replace per-peer connection error with per-bgp
Replace the per-peer connection error with a per-bgp event and
a list. The io pthread enqueues peers per-bgp-instance, and the
error-handing code can process multiple peers if there have been
multiple failures.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-12 12:40:07 -04:00
Donald Sharp 543fc6dc56 bgpd: Add connection direction to debug logs
Currently the incoming and outgoing connections mix up their
logs and there is absolutely no way to tell which way is being
talked about when both are operating.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-28 10:28:50 -05:00
Louis Scalbert 70e07678bf bgpd: fix leaving hidden state
Upon configuration of a VRF instance that references an absent default
VRF with "import vrf default", the default instance is created in hidden
state. However, the default instance is not properly un-hidden when
configured.

Restore the behavior prior to commit below.

Fixes: 9f7177af13 ("bgpd: fix duplicate BGP instance created with unified config")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-24 17:52:43 +01:00
Alexander Skorichenko 1515a59202 bgpd: update AS value of a hidden bgp instance
'import vrf VRF' could define a hidden bgp instance with
the default AS_UNSPECIFIED (i.e. = 1) value.
When a
	router bgp AS vrf VRF
gets configured later on, replace this AS_UNSPECIFIED setting
with a requested value.

Fixes: 9680831518 ("bgpd: fix as_pretty mem leaks when un-hiding")
Signed-off-by: Alexander Skorichenko <askorichenko@netgate.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-24 15:17:05 +01:00
Louis Scalbert 339206341f Revert "bgpd: fix bgp vrf instance creation from implicit"
This reverts commit 2ff08af78e.

The fix is obviously wrong.

Link: 2ff08af78e
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-24 15:17:05 +01:00
Louis Scalbert 71a3756f2d bgpd: fix process_queue when un-hiding
bgp_process_queue_init() is not called in bgp_create() when leaving the
BGP instance hidden state because of the following goto:

>	if (hidden) {
>		bgp = bgp_old;
>		goto peer_init;
>	}

Upon reconfiguration of the default instance, the prefixes are never set
into a meta queue by mq_add_handler(). They are never processed for
zebra RIB installation and announcements of update/withdraw.

Do not delete the BGP process_queue when hiding.

Fixes: 4d0e7a49cf ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-24 15:17:05 +01:00
Louis Scalbert d2ff7e8a21 bgpd: fix default instance name when un-hiding
When unconfiguring a default BGP instance with VPN SAFI configurations,
the default BGP structure remains but enters a hidden state. Upon
reconfiguration, the instance name incorrectly appears as "VIEW ?"
instead of "VRF default". And the name_pretty pointer

The name_pretty pointer is replaced by another one with the incorrect
name. This also leads to a memory leak as the previous pointer is not
properly freed.

Do not rewrite the instance name.

Fixes: 4d0e7a49cf ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-24 15:17:05 +01:00
Louis Scalbert d6363625c3 bgpd: release manual vpn label on instance deletion
When a BGP instance with a manually assigned VPN label is deleted, the
label is not released from the Zebra label registry. As a result,
reapplying a configuration with the same manual label leads to VPN
prefix export failures.

For example, with the following configuration:

> router bgp 65000 vrf BLUE
>  address-family ipv4 unicast
>   label vpn export <int>

Release zebra label registry on unconfiguration.

Fixes: d162d5f6f5 ("bgpd: fix hardset l3vpn label available in mpls pool")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2025-02-12 14:03:02 +01:00
Chirag Shah 2ff08af78e bgpd: fix bgp vrf instance creation from implicit
In bgp route leak, when import vrf x is executed,
it creates bgp instance as hidden with asn value as unspecified.

When router bgp x is configured ensure the correct as,
asnotation is applied otherwise running config shows asn value as 0.

This can lead to frr-reload failure when any FRR config change.

Fix:
Move asn and asnotiation, as_pretty value in common done section,
so when bgp_create gets existing instance but before returning
update asn and required fields in common section.

In bgp_create(): when returning for hidden at least update asn
and required when bgp instance created implicitly due to vrf leak.

if (hidden) {
    bgp = bgp_old;
    goto peer_init; <<<
}

Before fix:
show running:

router bgp 0 vrf purple
 bgp router-id 10.10.3.11
 !
 address-family ipv4 unicast
  redistribute static
  import vrf blue
 exit-address-family
 !
 address-family ipv6 unicast
  import vrf blue
 exit-address-family
 !
 address-family l2vpn evpn
  advertise ipv4 unicast
  advertise ipv6 unicast
 exit-address-family
exit

Testing:

1) following snippet config:
router bgp 63420 vrf blue
 import vrf purple
router bgp 63420 vrf purple
 import vrf blue
2) restart frr leads to the running config with 0 asn value.

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2025-02-10 19:08:00 -08:00
Russ White 2ef76a3350
Merge pull request #17871 from opensourcerouting/feature/bgp_link_local_capability
bgpd: Implement Link-Local Next Hop capability
2025-02-07 14:00:59 -05:00
Carmine Scarpitta 0768c620e0
Merge pull request #17913 from Sokolmish/bgp-sid-release
bgpd: Release SID on router deletion
2025-02-03 14:52:00 +01:00
Mikhail Sokolovskiy f3680ab410 bgpd: Release SID on router deletion
Signed-off-by: Mikhail Sokolovskiy <sokolmish@gmail.com>
2025-01-30 01:54:31 +03:00
Russ White cec2e9b159
Merge pull request #17881 from opensourcerouting/fix/last_reset_reason
bgpd: last reset SNAFU
2025-01-28 10:40:50 -05:00
Donatas Abraitis 4338e21aa2 Revert "bgpd: Handle Addpath capability using dynamic capabilities"
This reverts commit 05cf9d03b3.

TL;DR; Handling BGP AddPath capability is not trivial (possible) dynamically.

When the sender is AddPath-capable and sends NLRIs encoded with AddPath ID,
and at the same time the receiver sends AddPath capability "disable-addpath-rx"
(flag update) via dynamic capabilities, both peers are out of sync about the
AddPath state. The receiver thinks already he's not AddPath-capable anymore,
hence it tries to parse NLRIs as non-AddPath, while they are actually encoded
as AddPath.

AddPath capability itself does not provide (in RFC) any mechanism on backward
compatible way to handle NLRIs if they come mixed (AddPath + non-AddPath).

This explains why we have failures in our CI periodically.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-25 20:51:16 +02:00
Philippe Guibert 3a921c6a1d bgpd: fix import vrf creates multiple bgp instances
The more the vrf green is referenced in the import bgp command, the more
there are instances created. The below configuration shows that the vrf
green is referenced twice, and two BGP instances of vrf green are
created.

The below configuration:
> router bgp 99
> [..]
>  import vrf green
> exit
> router bgp 99 vrf blue
> [..]
>  import vrf green
> exit
> router bgp 99 vrf green
> [..]
> exit
>
> r4# show bgp vrfs
> Type  Id     routerId          #PeersCfg  #PeersEstb  Name
>              L3-VNI            RouterMAC              Interface
> DFLT  0      10.0.3.4          0          0           default
>              0                 00:00:00:00:00:00      unknown
>  VRF  5      10.0.40.4         0          0           blue
>              0                 00:00:00:00:00:00      unknown
>  VRF  6      0.0.0.0           0          0           green
>              0                 00:00:00:00:00:00      unknown
>  VRF  6      10.0.94.4         0          0           green
>              0                 00:00:00:00:00:00      unknown

Fix this at import command, by looking at an already present bgp
instance.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-01-21 13:48:36 +01:00
Philippe Guibert 9f7177af13 bgpd: fix duplicate BGP instance created with unified config
When running the bgp_evpn_rt5 setup with unified config, memory leak
about a non deleted BGP instance happens.

> root@ubuntu2204hwe:~/frr/tests/topotests/bgp_evpn_rt5# cat /tmp/topotests/bgp_evpn_rt5.test_bgp_evpn/r1.asan.bgpd.1164105
>
> =================================================================
> ==1164105==ERROR: LeakSanitizer: detected memory leaks
>
> Indirect leak of 12496 byte(s) in 1 object(s) allocated from:
>     #0 0x7f358eeb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7f358e877233 in qcalloc lib/memory.c:106
>     #2 0x55d06c95680a in bgp_create bgpd/bgpd.c:3405
>     #3 0x55d06c95a7b3 in bgp_get bgpd/bgpd.c:3805
>     #4 0x55d06c87a9b5 in bgp_get_vty bgpd/bgp_vty.c:603
>     #5 0x55d06c68dc71 in bgp_evpn_local_l3vni_add bgpd/bgp_evpn.c:7032
>     #6 0x55d06c92989b in bgp_zebra_process_local_l3vni bgpd/bgp_zebra.c:3204
>     #7 0x7f358e9e3feb in zclient_read lib/zclient.c:4626
>     #8 0x7f358e98082d in event_call lib/event.c:1996
>     #9 0x7f358e848931 in frr_run lib/libfrr.c:1232
>     #10 0x55d06c60eae1 in main bgpd/bgp_main.c:557
>     #11 0x7f358e229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Actually, a BGP VRF Instance is created in auto mode when creating the
global BGP instance for the L3 VNI. And again, an other BGP VRF instance
is created. Fix this by ensuring that a non existing BGP instance is not
present. If it is present, and with auto mode or in hidden mode, then
override the AS value.

Fixes: f153b9a9b6 ("bgpd: Ignore auto created VRF BGP instances")

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-01-21 13:48:36 +01:00
Donatas Abraitis 9a5be11191 bgpd: Set last reset No AFI/SAFI activated for peer after we do defaults
Move checking if the peer is active only after we apply defaults for address
families.

If the family got activated after applying the defaults we should reset last_reset
reason.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-19 11:08:00 +02:00
Donatas Abraitis db853cc97e bgpd: Implement Link-Local Next Hop capability
Related: https://datatracker.ietf.org/doc/html/draft-white-linklocal-capability

TL;DR; use 16 bytes long next-hops for point-to-point (unnumbered) links instead
of sending 32 bytes (::/LL, GUA/LL, LL/LL combinations).

For backward compatiblity we should handle even 32 bytes existing next hops.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-17 16:48:32 +02:00
Donatas Abraitis d3c46bce3b bgpd: Set the last reset reason correctly if we change capabilities per-peer
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-17 13:22:38 +02:00
Russ White 66a5d76920
Merge pull request #17810 from donaldsharp/bgp_connect_refactor
Bgp connect refactor
2025-01-15 11:11:41 -05:00
Donatas Abraitis d60320c6d2 bgpd: Handle ENHE capability via dynamic capability
FRR supports dynamic capability which is useful to exchange the capabilities
without tearing down the session. ENHE capability was missed to be included
handling via dynamic capability. Let's add it too.

This was missed and asked in Slack that it would be useful.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-14 22:46:53 +02:00
Donald Sharp 78fa9b6feb bgpd: su_remote and su_local are properties of the connection
su_local and su_remote in the peer can change based upon
if we are initiating the remote connection or receiving it.
As such we need to treat it as a property of the connection.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-10 10:07:11 -05:00
Donatas Abraitis 76fc75de9e bgpd: Fix showing default timers bgp x y
Fixes: ef4a9215b9 ("bgpd: Reuse defined constants for BGP timers")
Fixes: ab3535fbcf ("bgpd: Implement connect retry backoff")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-09 23:56:31 +02:00
Russ White 2a90c80f49
Merge pull request #17733 from pguibert6WIND/bmp_event_changes
BMP handling of BGP configuration changes
2025-01-07 09:06:43 -05:00
Enke Chen bcd1017794 bgpd: fix a bug in peer_allowas_in_set()
Fix a bug in peer_allowas_in_set() so that the config takes effect
for peer-group members.

Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
2025-01-06 21:01:14 -08:00
Donatas Abraitis f3daeda935
Merge pull request #17716 from ykholod/master-17463
bgpd: Clean address-family config on daemon restart
2025-01-01 21:16:39 +02:00
Philippe Guibert f3c17d94e0 bgpd: bmp, define hook for router-id updates
At startup, if bmp loc-rib is enabled, the peer_id of the
loc-rib per peer header message has the router-id set to 0.0.0.0.
Actually, the router-id has been updated after the peer up
message is sent, and the information is not refreshed.

Create a hook API to handle router id events: withdraw and
updates. Use that hook in BMP module to send peer down, and
peer up events when necessary.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-12-30 15:13:38 +01:00
Yaroslav Kholod 663281ca6a BGP: Clean address-family config on daemon restart
When stopping and restarting BGP daemon part of the configuration
remains. It should be cleared.
Particulary those are address-family parametes, like: distance,
ead-es-frag, disable-ead-evi-rx, disable-ead-evi-tx.

Signed-off-by: Yaroslav Kholod <y.kholod@vyos.io>
2024-12-30 14:37:54 +02:00
Donatas Abraitis 9ce3b144c9
Merge pull request #17580 from varuntumbe/dev/label_pool_release_fix
BGP Labelpool : Releasing the label in labelpool when VPN session gets removed
2024-12-23 14:48:21 +02:00
Donatas Abraitis b6dcf61877 bgpd: Fix enforce-first-as per peer-group removal
If we do `no neighbor PG enforce-first-as`, it wasn't working because the flag
was inherited incorrectly for the members of the peer-group.

Fixes: 322462920e ("bgpd: Enable enforce-first-as by default")

Closes: https://github.com/FRRouting/frr/issues/17702

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-21 17:04:30 +02:00
Jafar Al-Gharaibeh 1b213448a9
Merge pull request #17619 from donaldsharp/bgp_metaq_upstream
bgpd: add meta queue in bgp
2024-12-20 13:59:15 -06:00