matthieu/frr - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
Russ White	ab67e5544e	Merge pull request #18396 from pguibert6WIND/srv6l3vpn_to_bgp_vrf_redistribute Add BGP redistribution in SRv6 BGP	2025-04-03 08:25:32 -04:00
Russ White	c312917988	Merge pull request #18450 from donaldsharp/bgp_packet_reads Bgp packet reads conversion to a FIFO	2025-04-01 10:12:37 -04:00
Donald Sharp	06480c0c81	bgpd: When shutting down do not clear self peers Commit: `e0ae285eb8` Modified the fsm state machine to attempt to not clear routes on a peer that was not established. The peer should be not a peer self. We do not want to ever clear the peer self. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-03-30 14:02:16 -04:00
Donald Sharp	12bf042c68	bgpd: Modify bgp to handle packet events in a FIFO Current behavor of BGP is to have a event per connection. Given that on startup of BGP with a high number of neighbors you end up with 2 * # of peers events that are being processed. Additionally once BGP has selected the connection this still only comes down to 512 events. This number of events is swamping the event system and in addition delaying any other work from being done in BGP at all because the the 512 events are always going to take precedence over everything else. The other main events are the handling of the metaQ(1 event), update group events( 1 per update group ) and the zebra batching event. These are being swamped. Modify the BGP code to have a FIFO of connections. As new data comes in to read, place the connection on the end of the FIFO. Have the bgp_process_packet handle up to 100 packets spread across the individual peers where each peer/connection is limited to the original quanta. During testing I noticed that withdrawal events at very very large scale are taking up to 40 seconds to process so I added a check for yielding to further limit the number of packets being processed. This change also allow for BGP to be interactive again on scale setups on initial convergence. Prior to this change any vtysh command entered would be delayed by 10's of seconds in my setup while BGP was doing other work. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-03-25 09:10:46 -04:00
Donatas Abraitis	45af7ea217	Merge pull request #18483 from donaldsharp/holdtime_mistake bgpd: Fix holdtime not working properly when busy	2025-03-25 09:38:09 +02:00
Russ White	7afe25744b	Merge pull request #18447 from donaldsharp/bgp_clear_batch Bgp clear batch	2025-03-24 16:13:49 -04:00
Donald Sharp	9a26a56c51	bgpd: Fix holdtime not working properly when busy Commit: `cc9f21da22` Modified the bgp_fsm code to dissallow the extension of the hold time when the system is under extremely heavy load. This was a attempt to remove the return code but it was too aggressive and messed up this bit of code. Put the behavior back that was introduced in: `d0874d195d` Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-03-24 15:55:01 -04:00
Philippe Guibert	99acebcdc9	bgpd: fix check validity of a VPN SRv6 route with modified nexthop When exporting a VPN SRv6 route, the path may not be considered valid if the nexthop is not valid. This is the case when the 'nexthop vpn export' command is used. The below example illustrates that the VPN path to 2001:1::/64 is not selected, as the expected nexthop to find in vrf10 is the one configured: > # show running-config > router bgp 1 vrf vrf10 > address-family ipv6 unicast > nexthop vpn export 2001::1 > # show bgp ipv6 vpn > [..] > Route Distinguisher: 1:10 > 2001:1::/64 2001::1@4 0 0 65001 i > UN=2001::1 EC{99:99} label=16 sid=2001:db8:1:1:: sid_structure=[40,24,16,0] type=bgp, subtype=5 The analysis indicates that the 2001::1 nexthop is considered. > 2025/03/20 21:47:53.751853 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0 > 2025/03/20 21:47:53.751855 BGP: [VWNP2-DNMFV] Found existing bnc 2001::1/128(0)(VRF vrf10) flags 0x82 ifindex 0 #paths 2 peer 0x0, resolved prefix UNK prefix > 2025/03/20 21:47:53.751856 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is not valid (in VRF vrf10) > 2025/03/20 21:47:53.751857 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change Actually, to check the nexthop validity, only the source path in the VRF has the correct nexthop. Fix this by reusing the source path information instead of the current one. > 2025/03/20 22:43:51.703521 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0 > 2025/03/20 22:43:51.703523 BGP: [VWNP2-DNMFV] Found existing bnc fe80::b812:37ff:fe13:d441/128(0)(VRF vrf10) flags 0x87 ifindex 0 #paths 2 peer 0x0, resolved prefix fe80::/64 > 2025/03/20 22:43:51.703525 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is valid (in VRF vrf10) > 2025/03/20 22:43:51.703526 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2025-03-24 09:17:01 +01:00
Donatas Abraitis	ace4b8fe61	bgpd: Print the real reason why the peer is not accepted (incoming) If it's suppressed due to BFD down or unspecified connection, we never know the real reason and just say "no AF activated" which is misleading. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2025-03-17 14:52:42 +02:00
Mark Stapp	58f924d287	bgpd: batch peer connection error clearing When peer connections encounter errors, attempt to batch some of the clearing processing that occurs. Add a new batch object, add multiple peers to it, if possible. Do one rib walk for the batch, rather than one walk per peer. Use a handler callback per batch to check and remove peers' path-infos, rather than a work-queue and callback per peer. The original clearing code remains; it's used for single peers. Signed-off-by: Mark Stapp <mjs@cisco.com>	2025-03-12 12:42:06 -04:00
Donald Sharp	2cd1d00dde	bgpd: Convert bgp_keepalive_send to use a connection The peer is going to eventually have a incoming and outgoing connection. Let's send the data based upon the connection not the peer. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-02-28 10:28:50 -05:00
Donald Sharp	543fc6dc56	bgpd: Add connection direction to debug logs Currently the incoming and outgoing connections mix up their logs and there is absolutely no way to tell which way is being talked about when both are operating. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-02-28 10:28:50 -05:00
Jafar Al-Gharaibeh	92288c9069	Merge pull request #17865 from donaldsharp/coverity_2024_new_hotness Coverity 2024 new hotness	2025-02-06 10:15:55 -06:00
Donatas Abraitis	739f2b566a	bgpd: Do not start BGP session if BGP identifier is not set If we have IPv6-only network and no IPv4 addresses at all, then by default 0.0.0.0 is created which is treated as malformed according to RFC 6286. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2025-01-29 23:03:06 +02:00
Donatas Abraitis	0702ddb3c9	bgpd: Do not show "Waiting for OPEN" as last reset This is actually not reset, and should be ignored showing it as last reset under `show bgp neighbor`. Fixes: `1e91f1d119` ("bgpd: Update failed reason to distinguish some NHT scenario") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2025-01-19 11:07:59 +02:00
Donald Sharp	f94ad538cf	bgpd: Ensure ibuf count is protected by mutex Grab the count of streams in ibuf when it is protected by a mutex. Since this data is written to it in another pthread. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-17 10:16:48 -05:00
Donald Sharp	348c2dc3f8	bgpd: Only update peer connection information when needed Currently bgp is repeatedly grabbing peer connection information. This is a bit overkill. There are two situations: a) Opening a connection to the peer In this case, we know the remote port/address a priori and can get the local information by just asking the OS. b) Peer opening a connection to us. In this case, we know the local port/address a priori and can get the remote information by just asking the OS. Modify the code to just grab this data at the appropriate time. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-10 10:07:11 -05:00
Donald Sharp	78fa9b6feb	bgpd: su_remote and su_local are properties of the connection su_local and su_remote in the peer can change based upon if we are initiating the remote connection or receiving it. As such we need to treat it as a property of the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-10 10:07:11 -05:00
Donald Sharp	0e416ff157	bgpd: bgp_getsockanme is connection oriented Let's make it so. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2025-01-10 10:06:16 -05:00
Jafar Al-Gharaibeh	f78b1786a6	Merge pull request #17599 from opensourcerouting/fix/reduce_default_connect_timer bgpd: Connect retry timer backoff	2024-12-18 16:26:37 -06:00
Donald Sharp	40c31bdf40	bgpd: When calling bgp_process, prevent infinite loop If we have this construct: for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { ... bgp_process(); } This can induce an infinite loop. This happens because bgp_process will move the unsorted items to the top of the list for handling, as such it is necessary to hold the next pointer to the side to actually look at each possible bgp_path_info. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-12-12 15:08:35 -05:00
Donatas Abraitis	ab3535fbcf	bgpd: Implement connect retry backoff Instead of starting with a fairly high value of retry, let's try with a lower and increase with a backoff to reach what was a default value (120s). Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-12-11 17:20:48 +02:00
Donald Sharp	7bf3f53e44	bgpd: peer_active is connection oriented, make it so Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:59:39 -05:00
Donald Sharp	1baeb81632	bgpd: bgp_getsockname should use connection Let's use the connection associated with the peer instead. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:59:33 -05:00
Donald Sharp	72f716ef28	bgpd: Modify bgp_connect_in_progress_update_connection to use connection Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:59:27 -05:00
Donald Sharp	2771431938	bgpd: Modify bgp_udpatesockname to pass in a connection Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:59:19 -05:00
Donald Sharp	eacf923b00	bgpd: Fix pattern of usage in bgp_notify_config_change if (BGP_IS_VALID_STATE_FOR_NOTIF(peer->connection->status)) peer_notify_config_change(peer->connection); else bgp_session_reset_safe(peer, &nnode); Let's add a bool return to peer_notify_config_change of whether or not it should call the peer session reset. This simplifies the code a bunch. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:59:18 -05:00
Donald Sharp	ba0edb9545	bgpd: Add `peer_notify_config_change()` function We have about a bajillion tests of if we can notify the peer and then we send a config change notification. Let's just make a function that does this. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-26 11:58:23 -05:00
Donatas Abraitis	0a85b1ba04	bgpd: Fix graceful-restart for peer-groups Slipped somehow that peer-groups with GR is just completely broken, but it was working before. Strikes again, that we MUST have more and more topotests. Fixes: `15403f521a` ("bgpd: Streamline GR config, act on change immediately") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-11-24 21:57:19 +02:00
Donald Sharp	2a94de8af2	bgpd: bgp_connect should return an `enum connect_result` This function when it is run by bgp_start is expected to return a `enum connect_result`. But instead the function returns a variety of values that are not really being checked for. Consolidate to a correct choice. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-11-20 16:11:22 -05:00
Donatas Abraitis	29eafd32c5	bgpd: Do not try to uninstall BFD session if the peer is not established Having something like: ``` neighbor 192.168.1.222 ebgp-multihop 32 neighbor 192.168.1.222 update-source 192.168.1.5 neighbor 192.168.1.222 bfd ``` Won't work and the result is (empty): ``` $ show bfd peers BFD Peers: ``` bgp_stop() is called in BGP FSM multiple times (even at startup) that causes intermediate session interruption when update-source/ebgp-multihop is triggered. With this fix, the ordering does not matter and the BFD session's parameters are updated correctly. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-11-11 16:49:22 +02:00
Donatas Abraitis	895d586a5f	bgpd: Set LLGR stale routes for all the paths including addpath Without this patch we set only the first path for the route (if multiple exist) as LLGR stale and stop doing that for the rest of the paths, which is wrong. Fixes: `1479ed2fb3` ("bgpd: Implement LLGR helper mode") Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-11-07 14:05:36 +02:00
Donald Sharp	138935a5fd	bgpd: Fix wrong pthread event cancelling 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26 4 0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258 5 <signal handler called> 6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44 7 __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78 8 __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 9 0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79 11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470 14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355 15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610 16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152 17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639 18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111 19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252 20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214 21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180 22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) f 12 12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428 1428 assert(m->owner == pthread_self()); In this decode the attempt to cancel the connection's events from the wrong thread is causing the crash. Modify the code to create an event on the bm->master to cancel the events for the connection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-10-24 21:01:26 -04:00
David Lamparter	49cf311d46	*: clang-SA friendly switch-enum-return-string clang-19's SA complains about unused initializers for this kind of "switch (enum) { return string }" kind of code. Use direct string return values to avoid the issue. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2024-10-16 13:00:11 +02:00
Philippe Guibert	37702ca080	bgpd: fix 'nexthop_set failed' error message often displayed The 'nexthop_set failed, resetting connection - intf' log message is often seen when peering with BGP peers. This message has been displayed by introducing a recent fix that extracts the IP/port information of outgoing connections when peering is not yet established. Fix this by separating the update of the socket information from the call to bgp_zebra_nexthop_set(). Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2024-09-12 16:14:27 +02:00
Donald Sharp	bb78f73fa6	bgpd: Reduce # of iterations when doing llgr Code was scanning a table then identifying a prefix that needed to be modified then calling code that reran bestpath on the entire table again. If you had multiple items that needed processing you would end up scanning and setting the entire table to be scanned multiple times. No bueno. a) We do not need to reprocess items that are not being modified. b) We do not need to walk the entire table multiple times, we have the data that is needed already. Modify the code to just call bgp_process on the interesting nodes. Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-09-06 10:39:41 -04:00
Donald Sharp	4344ac1d28	bgpd: global_gr_mode does not need to be set twice Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2024-08-22 13:32:20 -04:00
Donatas Abraitis	9c710eef0c	bgpd: Use bgp_session_reset_safe() for GR update all peers It might cause this use-after-free: ``` ==6523==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300058d720 at pc 0x55f3ab62ab1f bp 0x7ffe5b95a0d0 sp 0x7ffe5b95a0c8 READ of size 8 at 0x60300058d720 thread T0 #0 0x55f3ab62ab1e in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2729 #1 0x55f3ab62ab1e in bgp_gr_update_all bgpd/bgp_fsm.c:2779 #2 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037 #3 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130 #4 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002 #5 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061 #6 0x7fc5539a9dcf in cmd_execute lib/command.c:1227 #7 0x7fc553ae493f in vty_command lib/vty.c:616 #8 0x7fc553ae4e92 in vty_execute lib/vty.c:1379 #9 0x7fc553aedd34 in vtysh_read lib/vty.c:2374 #10 0x7fc553ad8a64 in event_call lib/event.c:1995 #11 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #12 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #13 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #14 0x7fc55342d304 in __libc_start_main_impl ../csu/libc-start.c:360 #15 0x55f3ab5799a0 in _start (/usr/lib/frr/bgpd+0x2e19a0) 0x60300058d720 is located 16 bytes inside of 24-byte region [0x60300058d710,0x60300058d728) freed by thread T0 here: #0 0x7fc553eb76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52 #1 0x7fc553a2b713 in qfree lib/memory.c:130 #2 0x7fc553a0e50d in listnode_free lib/linklist.c:81 #3 0x7fc553a0e50d in list_delete_node lib/linklist.c:379 #4 0x55f3ab7ae353 in peer_delete bgpd/bgpd.c:2796 #5 0x55f3ab7ae91f in bgp_session_reset bgpd/bgpd.c:141 #6 0x55f3ab62ab17 in bgp_gr_update_mode_of_all_peers bgpd/bgp_fsm.c:2752 #7 0x55f3ab62ab17 in bgp_gr_update_all bgpd/bgp_fsm.c:2779 #8 0x55f3ab73557e in bgp_inst_gr_config_vty bgpd/bgp_vty.c:3037 #9 0x55f3ab74db69 in bgp_graceful_restart bgpd/bgp_vty.c:3130 #10 0x7fc5539a9584 in cmd_execute_command_real lib/command.c:1002 #11 0x7fc5539a98a3 in cmd_execute_command lib/command.c:1061 #12 0x7fc5539a9dcf in cmd_execute lib/command.c:1227 #13 0x7fc553ae493f in vty_command lib/vty.c:616 #14 0x7fc553ae4e92 in vty_execute lib/vty.c:1379 #15 0x7fc553aedd34 in vtysh_read lib/vty.c:2374 #16 0x7fc553ad8a64 in event_call lib/event.c:1995 #17 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #18 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #19 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 previously allocated by thread T0 here: #0 0x7fc553eb83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x7fc553a2ae20 in qcalloc lib/memory.c:105 #2 0x7fc553a0d056 in listnode_new lib/linklist.c:71 #3 0x7fc553a0d85b in listnode_add_sort lib/linklist.c:197 #4 0x55f3ab7baec4 in peer_create bgpd/bgpd.c:1996 #5 0x55f3ab65be8b in bgp_accept bgpd/bgp_network.c:604 #6 0x7fc553ad8a64 in event_call lib/event.c:1995 #7 0x7fc553a0c429 in frr_run lib/libfrr.c:1232 #8 0x55f3ab57b78d in main bgpd/bgp_main.c:555 #9 0x7fc55342d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-07-31 11:43:19 +03:00
Donatas Abraitis	fa9bd07ae5	bgpd: Keep the last reset reason before we reset the peer If we send a notification, there is no point setting the last_reset, because bgp_notify_send() sets last_reset to PEER_DOWN_NOTIFY_SEND (almost everywhere). Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-07-25 13:22:27 +03:00
Donatas Abraitis	743b169384	bgpd: Set the last_reset if we change the password also ``` donatas.net(config-router)# do show ip bgp summary failed IPv4 Unicast Summary: BGP router identifier 1.1.1.1, local AS number 65001 VRF default vrf-id 0 BGP table version 0 RIB entries 0, using 0 bytes of memory Peers 1, using 24 KiB of memory Neighbor EstdCnt DropCnt ResetTime Reason 127.0.0.1 2 2 00:02:02 Password config change (GoBGP/3.26.0) Displayed neighbors 1 Total number of neighbors 1 ``` Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-07-25 13:06:46 +03:00
Donatas Abraitis	45f80de734	bgpd: Pass a connection struct directly for EVENT_OFF() Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-07-24 15:30:43 +03:00
vivek	c6ed1cc16d	bgpd: Refine restarter operation - R-bit & F-bit Introduce BGP-wide flags to denote if BGP has started gracefully and GR is in progress or not. Use this for setting of the R-bit in the GR capability, and not a timer which is set for any new instance creation. Mark graceful restart is complete when the deferred path selection has been done and route sync with zebra as well as deferred EOR advertisement has been initiated. Introduce a function to check on F-bit setting rather than just base it on configuration. Subsequent commits will extend these functionalities. Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>	2024-07-01 13:02:45 -07:00
vivek	15403f521a	bgpd: Streamline GR config, act on change immediately Streamline the BGP graceful-restart configuration at the global and peer level some more. Similar to many other neighbor capability parameters like MP and ENHE, reset the session immediately upon a change to the configuration. This will be more aligned with the transactional UI model also and will not require a separate 'clear' command to be executed. Note: Peer-group graceful-restart configuration is not yet supported. Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>	2024-06-27 11:40:57 -07:00
Loïc Sang	e0ae285eb8	bgpd: avoid clearing routes for peers that were never established Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue: >bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0 >bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224 >bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224 ... repeating The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets. When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop. To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection. With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections. Signed-off-by: Loïc Sang <loic.sang@6wind.com>	2024-06-26 16:11:16 +02:00
Louis Scalbert	e446308d76	bgpd: fix dynamic peer graceful restart race condition bgp_llgr topotest sometimes fails at step 8: > topo: STEP 8: 'Check if we can see 172.16.1.2/32 after R4 (dynamic peer) was killed' R4 neighbor is deleted on R2 because it fails to re-connect: > 14:33:40.128048 BGP: [HKWM3-ZC5QP] 192.168.3.1 fd -1 went from Established to Clearing > 14:33:40.128154 BGP: [MJ1TJ-HEE3V] 192.168.3.1(r4) graceful restart timer expired > 14:33:40.128158 BGP: [ZTA2J-YRKGY] 192.168.3.1(r4) graceful restart stalepath timer stopped > 14:33:40.128162 BGP: [H917J-25EWN] 192.168.3.1(r4) Long-lived stale timer (IPv4 Unicast) started for 20 sec > 14:33:40.128168 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 172.16.1.2/32 > 14:33:40.128220 BGP: [H5X66-NXP9S] 192.168.3.1(r4) Long-lived set stale community (LLGR_STALE) for: 192.168.3.0/24 > [...] > 14:33:41.138869 BGP: [RGGAC-RJ6WG] 192.168.3.1 [Event] Connect failed 111(Connection refused) > 14:33:41.138906 BGP: [ZWCSR-M7FG9] 192.168.3.1 [FSM] TCP_connection_open_failed (Connect->Active), fd 23 > 14:33:41.138912 BGP: [JA9RP-HSD1K] 192.168.3.1 (dynamic neighbor) deleted (bgp_connect_fail) > 14:33:41.139126 BGP: [P98A2-2RDFE] 192.168.3.1(r4) graceful restart stalepath timer stopped `af8496af08` ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in") forgot to modify bgp_connect_fail() Do not delete the peer in bgp_connect_fail() if Non-Stop-Forwarding is in progress. Fixes: `af8496af08` ("bgpd: Do not delete BGP dynamic peers if graceful restart kicks in") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>	2024-05-16 15:19:11 +02:00
Russ White	827badc53c	Merge pull request #15883 from opensourcerouting/fix/bgpd_gr_fsm bgpd: Apply NOOP when doing negative commands for GR operations	2024-05-07 09:56:51 -04:00
Donatas Abraitis	7b5595b61d	bgpd: Print old/new states of graceful restart FSM To better debug what's going on before/after. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-04-30 13:44:17 +03:00
Philippe Guibert	f101108e3e	bgpd: fix covery ID 1585206 The return value of bgp_getsockname() should always be checked. Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2024-04-29 15:44:24 +02:00
Philippe Guibert	78ce63952a	bgpd: fix addressing information of non established outgoing sessions When trying to connect to a BGP peer that does not respons, the 'show bgp neighbors' command does not give any indication on the local and remote addresses used: > # show bgp neighbors > BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link > Local Role: undefined > Remote Role: undefined > BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1 > BGP state = Connect > [..] > Connections established 0; dropped 0 > Last reset 00:00:04, Waiting for peer OPEN (n/a) > Internal BGP neighbor may be up to 255 hops away. > BGP Connect Retry Timer in Seconds: 120 > Next connect timer due in 117 seconds > Read thread: off Write thread: off FD used: 27 The addressing information (address and port) are only available when TCP session is established, whereas this information is present at the system level: > root@ubuntu2204:~# netstat -pan \| grep 192.0.2.1 > tcp 0 0 192.0.2.1:179 192.0.2.150:38060 SYN_RECV - > tcp 0 1 192.0.2.1:46526 192.0.2.150:179 SYN_SENT 488310/bgpd Add the display for outgoing BGP session, as the information in the getsockname() API provides information for connected streams. When getpeername() API does not give any information, use the peer configuration (destination port is encoded in peer->port). > # show bgp neighbors > BGP neighbor is 192.0.2.150, remote AS 65500, local AS 65500, internal link > Local Role: undefined > Remote Role: undefined > BGP version 4, remote router ID 0.0.0.0, local router ID 192.0.2.1 > BGP state = Connect > [..] > Connections established 0; dropped 0 > Last reset 00:00:16, Waiting for peer OPEN (n/a) > Local host: 192.0.2.1, Local port: 46084 > Foreign host: 192.0.2.150, Foreign port: 179 Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>	2024-04-15 09:16:54 +02:00
Donatas Abraitis	4967bf6d72	bgpd: Send "Send Hold Timer Expired" on such events notification This is required by the current (latest/-02 draft). IANA has registered code 8 for "Send Hold Timer Expired" in the "BGP Error (Notification) Codes" sub-registry under the "Border Gateway Protocol (BGP) Parameters" registry. https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-sendholdtimer Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>	2024-02-29 15:37:53 +02:00

1 2 3 4 5 ...

400 commits