Commit graph

445 commits

Author SHA1 Message Date
Mark Stapp 660cbf5651 bgpd: clean up variable-shadowing compiler warnings
Clean up -Wshadow warnings in bgp.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-04-08 14:41:27 -04:00
Russ White c312917988
Merge pull request #18450 from donaldsharp/bgp_packet_reads
Bgp packet reads conversion to a FIFO
2025-04-01 10:12:37 -04:00
Donatas Abraitis b7c657d4e0 bgpd: Retain the routes if we do a clear with N-bit set for Graceful-Restart
On receiving side we already did the job correctly, but the peer which initiates
the clear does not retain the other's routes. This commit fixes that.

Fixes: 20170775da ("bgpd: Activate Graceful-Restart when receiving CEASE/HOLDTIME notifications")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-03-25 17:20:56 +02:00
Donald Sharp 12bf042c68 bgpd: Modify bgp to handle packet events in a FIFO
Current behavor of BGP is to have a event per connection.  Given
that on startup of BGP with a high number of neighbors you end
up with 2 * # of peers events that are being processed.  Additionally
once BGP has selected the connection this still only comes down
to 512 events.  This number of events is swamping the event system
and in addition delaying any other work from being done in BGP at
all because the the 512 events are always going to take precedence
over everything else.  The other main events are the handling
of the metaQ(1 event), update group events( 1 per update group )
and the zebra batching event.  These are being swamped.

Modify the BGP code to have a FIFO of connections.  As new data
comes in to read, place the connection on the end of the FIFO.
Have the bgp_process_packet handle up to 100 packets spread
across the individual peers where each peer/connection is limited
to the original quanta.  During testing I noticed that withdrawal
events at very very large scale are taking up to 40 seconds to process
so I added a check for yielding to further limit the number of packets
being processed.

This change also allow for BGP to be interactive again on scale
setups on initial convergence.  Prior to this change any vtysh
command entered would be delayed by 10's of seconds in my setup
while BGP was doing other work.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25 09:10:46 -04:00
Russ White 7afe25744b
Merge pull request #18447 from donaldsharp/bgp_clear_batch
Bgp clear batch
2025-03-24 16:13:49 -04:00
Donald Sharp c9655e2893 bgpd: Remove unnecessary stream_new/stream_copies in bgp_open_make
The call into bgp_open_capability can return that it wrote more
than BGP_OPEN_NON_EXT_OPT_LEN bytes, in that case the open
part needs to be written again with ext_opt_params set to
true to allow extended parameters to be written thus keeping
the len < 255 bytes.  The code to do this was first creating
a new stream and then copying into it the stream, trying
to call bgp_open_capability() and if it succeeded recopying
the tmp stream back onto the original.

Let's change this around such that we save the current spot
in the stream of where we are writing and if the change does
not work reset the pointer and try again with the correct
parameter.  This removes the stream and multiple copies and
eventual free of the temporary stream.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-14 14:50:59 -04:00
Mark Stapp 58f924d287 bgpd: batch peer connection error clearing
When peer connections encounter errors, attempt to batch some
of the clearing processing that occurs. Add a new batch object,
add multiple peers to it, if possible. Do one rib walk for the
batch, rather than one walk per peer. Use a handler callback
per batch to check and remove peers' path-infos, rather than
a work-queue and callback per peer. The original clearing code
remains; it's used for single peers.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-12 12:42:06 -04:00
Mark Stapp 6a5962e1f8 bgpd: Replace per-peer connection error with per-bgp
Replace the per-peer connection error with a per-bgp event and
a list. The io pthread enqueues peers per-bgp-instance, and the
error-handing code can process multiple peers if there have been
multiple failures.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2025-03-12 12:40:07 -04:00
Donald Sharp 2cd1d00dde bgpd: Convert bgp_keepalive_send to use a connection
The peer is going to eventually have a incoming and
outgoing connection.  Let's send the data based
upon the connection not the peer.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-28 10:28:50 -05:00
Donald Sharp ffff1a1760 bgpd: Fix another crash in orf
I was pointed at yet another crash in the orf code.  I think it
stems from basicaly the same problem as the last one.  Let's just
make sure that the orf_plist is handled appropriately.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-19 06:29:36 -05:00
Donald Sharp 3d43d7b789 bgpd: When removing the prefix list drop the pointer
We are very very rarely seeing this crash:

    0 0x7f36ba48e389 in prefix_list_apply_ext lib/plist.c:789
    1 0x55eff3fa4126 in subgroup_announce_check bgpd/bgp_route.c:2334
    2 0x55eff3fa858e in subgroup_process_announce_selected bgpd/bgp_route.c:3440
    3 0x55eff4016488 in subgroup_announce_table bgpd/bgp_updgrp_adv.c:808
    4 0x55eff401664e in subgroup_announce_route bgpd/bgp_updgrp_adv.c:861
    5 0x55eff40111df in peer_af_announce_route bgpd/bgp_updgrp.c:2223
    6 0x55eff3f884cb in bgp_announce_route_timer_expired bgpd/bgp_route.c:5892
    7 0x7f36ba4ec239 in event_call lib/event.c:2019
    8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
    9 0x55eff3e668b7 in main bgpd/bgp_main.c:557
    10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    11 0x7f36b9e2d304 in __libc_start_main_impl ../csu/libc-start.c:360
    12 0x55eff3e64a30 in _start (/home/ci/cibuild.1407/frr-source/bgpd/.libs/bgpd+0x2fda30)
0x608000037038 is located 24 bytes inside of 88-byte region [0x608000037020,0x608000037078)
freed by thread T0 here:
    0 0x7f36ba8b76a8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
    1 0x7f36ba439bd7 in qfree lib/memory.c:131
    2 0x7f36ba48d3a3 in prefix_list_free lib/plist.c:156
    3 0x7f36ba48d3a3 in prefix_list_delete lib/plist.c:247
    4 0x7f36ba48fbef in prefix_bgp_orf_remove_all lib/plist.c:1516
    5 0x55eff3f679c4 in bgp_route_refresh_receive bgpd/bgp_packet.c:2841
    6 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069
    7 0x7f36ba4ec239 in event_call lib/event.c:2019
    8 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
    9 0x55eff3e668b7 in main bgpd/bgp_main.c:557
    10 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
previously allocated by thread T0 here:
    0 0x7f36ba8b83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
    1 0x7f36ba4392e4 in qcalloc lib/memory.c:106
    2 0x7f36ba48d0de in prefix_list_new lib/plist.c:150
    3 0x7f36ba48d0de in prefix_list_insert lib/plist.c:186
    4 0x7f36ba48d0de in prefix_list_get lib/plist.c:204
    5 0x7f36ba48f9df in prefix_bgp_orf_set lib/plist.c:1479
    6 0x55eff3f67ba6 in bgp_route_refresh_receive bgpd/bgp_packet.c:2920
    7 0x55eff3f70bab in bgp_process_packet bgpd/bgp_packet.c:4069
    8 0x7f36ba4ec239 in event_call lib/event.c:2019
    9 0x7f36ba41a22a in frr_run lib/libfrr.c:1295
    10 0x55eff3e668b7 in main bgpd/bgp_main.c:557
    11 0x7f36b9e2d249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Let's just stop trying to save the pointer around in the peer->orf_plist
data structure.  There are other design problems but at least lets
stop the crash from possibly happening.

Fixes: #18138
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-14 07:58:01 -05:00
Donald Sharp 418d0064bc
Merge pull request #18068 from opensourcerouting/fix/coverity_link_local_capability
bgpd: Do not check for capability length for Link-Local Next Hop capability
2025-02-12 09:35:16 -05:00
Philippe Guibert 2a143041f8 bgpd: fix loc-rib open message should use router-id
When forging BMP open message, the BGP router-id of tx open message of
the BMP LOC-RIB peer up message is always set to 0.0.0.0, whatever the
configured value of 'bgp router-id'.

Actually, when forging a peer up LOC-RIB message, the BGP router-id
value should be taken from the main BGP instance, and not from the peer
bgp identifier. Fix this by refreshing the router-id whenever a peer up
loc-rib message should be sent.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2025-02-11 11:49:28 +01:00
Donatas Abraitis d3741f8437 bgpd: Do not check for capability length for Link-Local Next Hop capability
Capability's length is 0 and this is not needed to check if it's multiplied by
X or there is a minimum length for that.

Fixes: db853cc97e ("bgpd: Implement Link-Local Next Hop capability")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-02-08 13:01:53 +02:00
Russ White 2ef76a3350
Merge pull request #17871 from opensourcerouting/feature/bgp_link_local_capability
bgpd: Implement Link-Local Next Hop capability
2025-02-07 14:00:59 -05:00
Russ White f3b6651954
Merge pull request #17863 from opensourcerouting/fix/bgp_coverity_1617727
bgpd: Check if the peer really exists before sending dynamic capability
2025-01-28 10:35:57 -05:00
Donatas Abraitis 4338e21aa2 Revert "bgpd: Handle Addpath capability using dynamic capabilities"
This reverts commit 05cf9d03b3.

TL;DR; Handling BGP AddPath capability is not trivial (possible) dynamically.

When the sender is AddPath-capable and sends NLRIs encoded with AddPath ID,
and at the same time the receiver sends AddPath capability "disable-addpath-rx"
(flag update) via dynamic capabilities, both peers are out of sync about the
AddPath state. The receiver thinks already he's not AddPath-capable anymore,
hence it tries to parse NLRIs as non-AddPath, while they are actually encoded
as AddPath.

AddPath capability itself does not provide (in RFC) any mechanism on backward
compatible way to handle NLRIs if they come mixed (AddPath + non-AddPath).

This explains why we have failures in our CI periodically.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-25 20:51:16 +02:00
Donatas Abraitis db853cc97e bgpd: Implement Link-Local Next Hop capability
Related: https://datatracker.ietf.org/doc/html/draft-white-linklocal-capability

TL;DR; use 16 bytes long next-hops for point-to-point (unnumbered) links instead
of sending 32 bytes (::/LL, GUA/LL, LL/LL combinations).

For backward compatiblity we should handle even 32 bytes existing next hops.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-17 16:48:32 +02:00
Donatas Abraitis 2df722262f bgpd: Check if the peer really exists before sending dynamic capability
CID: 1617727

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-16 09:06:17 +02:00
Donatas Abraitis d60320c6d2 bgpd: Handle ENHE capability via dynamic capability
FRR supports dynamic capability which is useful to exchange the capabilities
without tearing down the session. ENHE capability was missed to be included
handling via dynamic capability. Let's add it too.

This was missed and asked in Slack that it would be useful.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2025-01-14 22:46:53 +02:00
Donatas Abraitis 28e62b46ba bgpd: Show prefix-related stats per neighbor
E.g.:

```
  Prefix statistics:
    Inbound filtered: 0
    AS-PATH loop: 0
    Originator loop: 0
    Cluster loop: 0
    Invalid next-hop: 0
    Withdrawn: 0
    Attributes discarded: 3
```

JSON:

```
    "prefixStats":{
      "inboundFiltered":0,
      "aspathLoop":0,
      "originatorLoop":0,
      "clusterLoop":0,
      "invalidNextHop":0,
      "withdrawn":0,
      "attributesDiscarded":3
    },
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-12-30 12:26:19 +02:00
Donald Sharp 1baeb81632 bgpd: bgp_getsockname should use connection
Let's use the connection associated with the peer
instead.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-26 11:59:33 -05:00
Donald Sharp 5592aecefd bgpd: Convert rcvd_attr_printed to a bool
No need for a integer to store this, use a bool

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-31 10:35:01 -04:00
Donald Sharp 138935a5fd bgpd: Fix wrong pthread event cancelling
0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44
1  __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78
2  __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3  0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4  0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258
5  <signal handler called>
6  __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44
7  __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78
8  __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9  0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789
12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428
13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470
14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355
15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610
16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152
17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639
18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111
19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252
20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214
21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180
22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) f 12
12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428
1428		assert(m->owner == pthread_self());

In this decode the attempt to cancel the connection's events from
the wrong thread is causing the crash.  Modify the code to create an
event on the bm->master to cancel the events for the connection.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-24 21:01:26 -04:00
Donald Sharp b097a3188a bgpd: Fix deadlock in bgp_keepalive and master pthreads
(gdb) bt
0  futex_wait (private=0, expected=2, futex_word=0x5c438e9a98d8) at ../sysdeps/nptl/futex-internal.h:146
1  __GI___lll_lock_wait (futex=futex@entry=0x5c438e9a98d8, private=0) at ./nptl/lowlevellock.c:49
2  0x00007af16d698002 in lll_mutex_lock_optimized (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:48
3  ___pthread_mutex_lock (mutex=0x5c438e9a98d8) at ./nptl/pthread_mutex_lock.c:93
4  0x00005c4369c17e70 in _frr_mtx_lock (mutex=0x5c438e9a98d8, func=0x5c4369dc2750 <__func__.265> "bgp_notify_send_internal") at ./lib/frr_pthread.h:258
5  0x00005c4369c1a07a in bgp_notify_send_internal (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000', data=0x0, datalen=0, use_curr=true) at bgpd/bgp_packet.c:928
6  0x00005c4369c1a707 in bgp_notify_send (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_packet.c:1069
7  0x00005c4369bea422 in bgp_stop_with_notify (connection=0x5c438e9a98c0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1597
8  0x00005c4369c18480 in bgp_packet_add (connection=0x5c438e9a98c0, peer=0x5c438e9b6010, s=0x7af15c06bf70) at bgpd/bgp_packet.c:151
9  0x00005c4369c19816 in bgp_keepalive_send (peer=0x5c438e9b6010) at bgpd/bgp_packet.c:639
10 0x00005c4369bf01fd in peer_process (hb=0x5c438ed05520, arg=0x7af16bdffaf0) at bgpd/bgp_keepalives.c:111
11 0x00007af16dacd8e6 in hash_iterate (hash=0x7af15c000be0, func=0x5c4369bf005e <peer_process>, arg=0x7af16bdffaf0) at lib/hash.c:252
12 0x00005c4369bf0679 in bgp_keepalives_start (arg=0x5c438e0db110) at bgpd/bgp_keepalives.c:214
13 0x00007af16dac9932 in frr_pthread_inner (arg=0x5c438e0db110) at lib/frr_pthread.c:180
14 0x00007af16d694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
15 0x00007af16d726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)

The bgp keepalive pthread gets deadlocked with itself and consequently
the bgp master pthread gets locked when it attempts to lock
the peerhash_mtx, since it is also locked by the keepalive_pthread

The keepalive pthread is locking the peerhash_mtx in
bgp_keepalives_start.  Next the connection->io_mtx mutex in
bgp_keepalives_send is locked and then when it notices a problem it invokes
bgp_stop_with_notify which relocks the same mutex ( and of course
the relock causes it to get stuck on itself ).  This generates a
deadlock condition.

Modify the code to only hold the connection->io_mtx as short as
possible.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-24 21:01:26 -04:00
Maxence Younsi 035304c25a bgpd: bmp loc-rib peer up/down for vrfs
added bmp bgp peer for vrfs
added peer up vrf in bmp peer up state
added vrf state in bmpbgp
added safe bmp_peer_sendall : bmp_peer_sendall_safe
changed bgp_open_send to call new bgp_open_make
bgp_open_make creates a bgp open packet, now used in bmp for peer up vrf
added hook and call to bgp instance state
vrf peer state is recomputed when interfaces (including vrf itf) go up / down
and when it gets created or removed

Link: e48ba38070
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
2024-10-11 15:14:12 +02:00
sri-mohan1 c853c8d13b bgpd: changes for code maintainability
these changes are for improving the code maintainability and readability

Signed-off-by: sri-mohan1 <sri.mohan@samsung.com>
2024-10-10 23:23:20 +05:30
Donatas Abraitis cadfa693d6 bgpd: Implement BGP dual-as feature
This is helpful for migrations, etc.

The neighbor is configured with:

```
router bgp 65000
 neighbor X local-as 65001 no-prepend replace-as dual-as
```

Neighbor X can use either 65000, or 65001 to peer with.

Closes: https://github.com/FRRouting/frr/issues/13928

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-09-13 10:51:41 +03:00
Donatas Abraitis 19a85c68bf bgpd: Do not send route-refresh if it wasn't negotiated in capabilities
Fixes: 04dfcb14ff ("bgpd: Deprecate Prestandard Route Refresh capability (128)")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-09-01 22:35:22 +03:00
Donatas Abraitis 1d181dfb98 bgpd: Clear previously allocated capabilities values before parsing a new OPEN
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-07 10:16:26 +03:00
Donatas Abraitis b1b1c922a5 bgpd: Do not increment treat-as-withdraw counters if debug is enabled
Increment only if we really treat the UPDATE as withdrawn.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-25 13:41:23 +03:00
Donatas Abraitis e8169f9385 bgpd: Show extended parameters support for the OPEN messages
We did that for the receiving side, but not for a sending side, let's fix it.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-19 05:20:04 +03:00
Donatas Abraitis 0dfe25697f bgpd: Implement neighbor X remote-as auto
In some cases (large scale) it's desired to avoid changing configurations, but
let the BGP to automatically handle ASN changes.

`auto` means the peering can be iBGP or eBGP. It will be automatically detected
and adjusted from the OPEN message.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-07-04 14:42:19 +03:00
vivek 75040a0295 bgpd: Enhance OPEN Tx debug log
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-07-01 13:02:52 -07:00
vivek c6ed1cc16d bgpd: Refine restarter operation - R-bit & F-bit
Introduce BGP-wide flags to denote if BGP has started gracefully
and GR is in progress or not. Use this for setting of the R-bit in
the GR capability, and not a timer which is set for any new
instance creation. Mark graceful restart is complete when the
deferred path selection has been done and route sync with zebra as
well as deferred EOR advertisement has been initiated.

Introduce a function to check on F-bit setting rather than just
base it on configuration.

Subsequent commits will extend these functionalities.

Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-07-01 13:02:45 -07:00
Russ White ed5628fef1
Merge pull request #16213 from opensourcerouting/fix/fqdn_capability_parsing_for_dynamic_capability
bgpd: Check if we have really enough data before doing memcpy for FQDN capability
2024-06-24 16:38:58 -04:00
Russ White f047430576
Merge pull request #16211 from opensourcerouting/fix/dynamic_software_version_sanity_check
bgpd: Check if we have really enough data before doing memcpy for software version
2024-06-24 16:38:50 -04:00
Donatas Abraitis b685ab5e1b bgpd: Check if we have really enough data before doing memcpy for FQDN capability
We advance data pointer (data++), but we do memcpy() with the length that is 1-byte
over, which is technically heap overflow.

```
==411461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50600011da1a at pc 0xc4f45a9786f0 bp 0xffffed1e2740 sp 0xffffed1e1f30
READ of size 4 at 0x50600011da1a thread T0
    0 0xc4f45a9786ec in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x3586ec) (BuildId: e794c5f796eee20c8973d7efb9bf5735e54d44cd)
    1 0xc4f45abf15f8 in bgp_dynamic_capability_fqdn /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3457:4
    2 0xc4f45abdd408 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3911:4
    3 0xc4f45abdbeb4 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
    4 0xc4f45abde2cc in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
    5 0xc4f45a9b6110 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```

Found by fuzzing.

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-06-13 08:12:10 +03:00
Donatas Abraitis 5d7af51c4f bgpd: Check if we have really enough data before doing memcpy for software version
If we receive CAPABILITY message (software-version), we SHOULD check if we really
have enough data before doing memcpy(), that could also lead to buffer overflow.

(data + len > end) is not enough, because after this check we do data++ and later
memcpy(..., data, len). That means we have one more byte.

Hit this through fuzzing by

```
    0 0xaaaaaadf872c in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x35872c) (BuildId: 9c6e455d0d9a20f5a4d2f035b443f50add9564d7)
    1 0xaaaaab06bfbc in bgp_dynamic_capability_software_version /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3713:3
    2 0xaaaaab05ccb4 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3839:4
    3 0xaaaaab05c074 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
    4 0xaaaaab05e48c in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
    5 0xaaaaaae36150 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```

Hit this again by Iggy \m/

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-06-12 22:54:45 +03:00
Donatas Abraitis ae1f3a4851 bgpd: Keep last notification's state about hard reset
When we receive a hard-reset notification, we always show it if it was a hard,
or not.

For sending side, we missed that. Let's display it too.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-06-11 15:51:21 +03:00
Donatas Abraitis 637ab53f75 bgpd: Send End-of-RIB not only if Graceful Restart capability is received
Before we checked for received Graceful Restart capability, but that was also
incorrect, because we SHOULD HAVE checked it per AFI/SAFI instead.

https://datatracker.ietf.org/doc/html/rfc4724 says:

Although the End-of-RIB marker is specified for the purpose of BGP
   graceful restart, it is noted that the generation of such a marker
   upon completion of the initial update would be useful for routing
   convergence in general, and thus the practice is recommended.

Thus, it might be reasonable to send EoR regardless of whether the Graceful Restart
capability is received or not from the peer.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-05-31 15:03:55 +03:00
Donatas Abraitis 0d079e01e5 bgpd: Check if FQDN capability length is in valid ranges
If FQDN capability comes as dynamic capability we should check if the encoding
is proper.

Before this patch we returned an error if the hostname/domainname length check
was > end. But technically, if the length is also == end, this is
a malformed capability, because we use the data incorrectly after we check the
length.

This causes heap overflow (when compiled with address-sanitizer).

Signed-off-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-05-24 10:38:49 +03:00
Donatas Abraitis 150eb73054 bgpd: Send a notification if we receive CAPABILITY message if not exepected
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-05-24 10:35:43 +03:00
Donatas Abraitis 048609103c bgpd: Add sanity check for capability lengths before processing them
This is for CAPABILITY messages, not for OPEN message capabilities.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-05-24 10:35:42 +03:00
Donald Sharp b4c64b3a39 bgpd: Remove unused addition found in clang
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-04-25 14:45:32 -04:00
Donatas Abraitis 30a332dad8 bgpd: Fix errors handling for MP/GR capabilities as dynamic capability
When receiving a MP/GR capability as dynamic capability, but malformed, do not
forget to advance the pointer to avoid hitting infinity loop.

After:
```
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [GS0AQ-HKY0X] 127.0.0.1 rcv CAPABILITY
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 5, length 0
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 0, length 0
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [HFHDS-QT71N][EC 33554494] 127.0.0.1(donatas-pc): unrecognized capability code: 0 - ignored
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 0, code: 0, length 0
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [HFHDS-QT71N][EC 33554494] 127.0.0.1(donatas-pc): unrecognized capability code: 0 - ignored
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 0, code: 0, length 0
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [HFHDS-QT71N][EC 33554494] 127.0.0.1(donatas-pc): unrecognized capability code: 0 - ignored
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 0, code: 0, length 1
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [HFHDS-QT71N][EC 33554494] 127.0.0.1(donatas-pc): unrecognized capability code: 0 - ignored
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:15:28 donatas-laptop bgpd[353550]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
```

Before:
```
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [Z1DRQ-N6Z5F] 127.0.0.1(donatas-pc): Dynamic Capability MultiProtocol Extensions afi/safi invalid (bad-value/unicast)
Mar 29 11:14:54 donatas-laptop bgpd[347675]: [JTVED-VGTQQ] 127.0.0.1(donatas-pc): CAPABILITY has action: 1, code: 1, length 10
```

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-03-30 16:15:08 +02:00
Donatas Abraitis 081f6520ff bgpd: Avoid padding for bgp_paths_limit_capability struct
When sending the packets over the network (dynamic capability) it reports 6 bytes
instead of 5 bytes, and causes some issues between little/big endian machines.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-03-14 09:50:49 +02:00
Donatas Abraitis 78757362f2 bgpd: Allow dynamically disable graceful-restart/long-lived graceful-restart
If we enter `bgp graceful-restart-disable`, make sure we disable the capabilities.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-03-10 18:25:30 +02:00
Donatas Abraitis 77102e853e bgpd: Unset advertised capabilities if capability is disabled
When using dynamic capabilities, do not forget to unset advertised capabilities.

Otherwise, it's kept as advertised.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-03-09 22:23:37 +02:00
Donatas Abraitis 4967bf6d72 bgpd: Send "Send Hold Timer Expired" on such events notification
This is required by the current (latest/-02 draft).

IANA has registered code 8 for "Send Hold Timer Expired" in the "BGP
Error (Notification) Codes" sub-registry under the "Border Gateway
Protocol (BGP) Parameters" registry.

https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-sendholdtimer

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-02-29 15:37:53 +02:00