Commit graph

42 commits

Author SHA1 Message Date
Donald Sharp aa736aa0aa Revert "bgpd: Make keepalive pthread be connection based."
This reverts commit 23bdaba147.
2025-03-06 20:50:41 -05:00
Donald Sharp 23bdaba147 bgpd: Make keepalive pthread be connection based.
Again instead of making the keepalives be peer based
use the connection to make it happen.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-28 10:28:50 -05:00
Donald Sharp 2cd1d00dde bgpd: Convert bgp_keepalive_send to use a connection
The peer is going to eventually have a incoming and
outgoing connection.  Let's send the data based
upon the connection not the peer.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-02-28 10:28:50 -05:00
Donald Sharp d1e7215da0 bgpd: make bgp_keepalives_on|off connection oriented
The bgp_keepalives_on|off functions should use a peer_connection
as a basis for it's operation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-09-10 08:31:25 -04:00
Donald Sharp e16d030c65 *: Convert THREAD_XXX macros to EVENT_XXX macros
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp de2754be3a *: Convert thread_fetch and thread_call to event_fetch and event_call
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp d8bc11a592 *: Add a hash_clean_and_free() function
Add a hash_clean_and_free() function as well as convert
the code to use it.  This function also takes a double
pointer to the hash to set it NULL.  Also it cleanly
does nothing if the pointer is NULL( as a bunch of
code tested for ).

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-21 08:54:21 -04:00
David Lamparter acddc0ed3c *: auto-convert to SPDX License IDs
Done with a combination of regex'ing and banging my head against a wall.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-02-09 14:09:11 +01:00
Russ White 6664d74505
Merge pull request #12641 from samanvithab/bgpd_crash
bgpd: Fix crash during shutdown due to race condition
2023-01-17 09:40:05 -05:00
Samanvitha B Bhargav 8c9d306c8d bgpd: Fix crash during shutdown due to race condition
[New LWP 2524]
[New LWP 2539]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/avi/bin/bgpd -f /run/frr/avi_ns3_bgpd.config -i /opt/avi/etc/avi_ns3_bgpd.'.
Program terminated with signal SIGABRT, Aborted.
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0  0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f92ac8f1740 (LWP 2524))]
0  0x00007f92acb3800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
1  0x00007f92acb17859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2  0x00007f92acb17729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3  0x00007f92acb28fd6 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
4  0x00007f92accf2164 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
5  0x000055b46be1ef63 in bgp_keepalives_wake () at bgpd/bgp_keepalives.c:311
6  0x000055b46be1f111 in bgp_keepalives_stop (fpt=0x55b46cfacf20, result=<optimized out>) at bgpd/bgp_keepalives.c:323
7  0x00007f92acea9521 in frr_pthread_stop (fpt=0x55b46cfacf20, result=result@entry=0x0) at lib/frr_pthread.c:176
8  0x00007f92acea9586 in frr_pthread_stop_all () at lib/frr_pthread.c:188
9  0x000055b46bdde54a in bgp_pthreads_finish () at bgpd/bgpd.c:8150
10 0x000055b46bd696ca in bgp_exit (status=0) at bgpd/bgp_main.c:210
11 sigint () at bgpd/bgp_main.c:154
12 0x00007f92acecc1e9 in quagga_sigevent_process () at lib/sigevent.c:105
13 0x00007f92aced689a in thread_fetch (m=m@entry=0x55b46cf23540, fetch=fetch@entry=0x7fff95379238) at lib/thread.c:1487
14 0x00007f92aceb2681 in frr_run (master=0x55b46cf23540) at lib/libfrr.c:1010
15 0x000055b46bd676f4 in main (argc=11, argv=0x7fff953795a8) at bgpd/bgp_main.c:482

Root cause:
This is due to race condition between main thread & keepalive thread during clean-up.

This happens when the keepalive thread is processing a wake signal owning the mutex, when meanwhile the main thread tries to stop the keepalives thread.

In main thread, the keepalive thread’s running bit (fpt->running) is set to false, without taking the mutex & then it blocks on mutex.
Meanwhile, keepalive thread which owns the mutex sees that the running bit is false & executes bgp_keepalives_finish() which also frees up mutex.
Main thread that is waiting on mutex with pthread_mutex_lock() will cause core while trying to access mutex.

Fix:
Take the lock in main thread while setting the fpt->running to false.

Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>
2023-01-16 04:22:11 -08:00
Donald Sharp 19a713be1d bgpd: Make bgp_keepalives.c not use MTYPE_TMP
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-12-05 12:17:00 -05:00
vivek 3ffec403e8 bgpd: Modify keepalive debug category
Log keepalive timer expiry against 'debug bgp keepalive' instead
of 'debug bgp neighbor-events'.

Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2022-11-20 22:42:47 -05:00
Mark Stapp 85ba04f389 bgpd: release rcu lock in bgp keepalive pthread
Don't hold the rcu lock in the bgp keepalive pthread: it
blocks the rcu pthread and prevents log-file deletion.

Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2022-09-06 09:07:07 -04:00
Donald Sharp cb1991af8c *: frr_with_mutex change to follow our standard
convert:
	frr_with_mutex(..)

to:
	frr_with_mutex (..)

To make all our code agree with what clang-format is going to produce

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-07-20 15:50:32 -04:00
Donatas Abraitis 6006b807b1 *: Properly use memset() when zeroing
Wrong: memset(&a, 0, sizeof(struct ...));
    Good:  memset(&a, 0, sizeof(a));

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2022-05-11 14:08:47 +03:00
anlan_cs 8e3aae66ce *: remove the checking returned value for hash_get()
Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.

Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.

Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.

Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
   ensure it is a created node, not a found node.
   Refer to `isis_vertex_queue_insert()` of isisd, there
   are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
   is a found node, then free <searching_data>.
   Refer to `aspath_intern()` of bgpd, there are many
   examples of this case in bgpd.

Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2022-05-03 00:41:48 +08:00
David Lamparter 2b64873d24 *: generously apply const
const const const your boat, merrily down the stream...

Signed-off-by: David Lamparter <equinox@diac24.net>
2019-12-02 15:01:29 +01:00
Quentin Young bfc18a0205 bgpd: do not send keepalives when KA timer is 0
RFC4271 specifies behavior when the hold timer is sent to zero - we
should not send keepalives or run a hold timer. But FRR, and other
vendors, allow the keepalive timer to be set to zero with a nonzero hold
timer. In this case we were sending keepalives constantly and maxing out
a pthread to do so. Instead behave similarly to other vendors and do not
send keepalives.

Unsure what the utility of this is, but blasting keepalives is
definitely the wrong thing to do.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2019-09-16 16:15:07 +00:00
David Lamparter 00dffa8cde lib: add frr_with_mutex() block-wrapper
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited.  This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2019-09-03 17:15:17 +02:00
Quentin Young d8b87afe7c lib: hashing functions should take const arguments
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.

BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2019-05-14 21:23:08 +00:00
Tim Bray e3b78da875 *: Rename backet to bucket
Presume typo from original author

Signed-off-by: Tim Bray <tim@kooky.org>
2019-02-25 16:22:36 +00:00
Ruben Kerkhof 4d762f2607 Treewide: use ANSI function definitions
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
2019-01-24 11:21:59 +01:00
Donald Sharp c80bedb83b lib, bgpd: Convert frr_pthread_set_name to only cause it to set os name of the thread
The current invocation of frr_pthread_set_name was causing it reset the os_name.
There is no need for this, we now always create the pthread appropriately
to have both name and os_name.  So convert this function to a simple
call through of the pthread call now.

Before(any of these changes):
sharpd@robot ~/frr1> ps -L -p 16895
  PID   LWP TTY          TIME CMD
16895 16895 ?        00:01:39 bgpd
16895 16896 ?        00:00:54
16895 16897 ?        00:00:07 bgpd_ka

After:
sharpd@donna ~/frr1> ps -L -p 1752
  PID   LWP TTY          TIME CMD
 1752  1752 ?        00:00:00 bgpd
 1752  1753 ?        00:00:00 bgpd_io
 1752  1754 ?        00:00:00 bgpd_ka

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2019-01-09 14:59:22 -05:00
Donald Sharp 74df8d6d9d *: Replace hash_cmp function return value to a bool
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing.  So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-10-19 13:14:45 -04:00
David Lamparter 1ac267a2d9 lib: remove frr_pthread->id
All I can see is an unneccessary complication.  If there's some purpose
here it needs to be documented...

Signed-off-by: David Lamparter <equinox@diac24.net>
2018-09-19 22:01:46 +02:00
Chirag Shah 57019528a0 *: pthread set name abstraction
Testing Done:

 TOR#cat /proc/2670/task/2672/comm
 bgpd_ka

TOR# ps H -C bgpd -o 'pid tid cmd comm'
  PID   TID CMD                         COMMAND
  2670  2670 /usr/lib/frr/bgpd -M snmp - bgpd
  2670  2671 /usr/lib/frr/bgpd -M snmp - bgpd
  2670  2672 /usr/lib/frr/bgpd -M snmp - bgpd_ka

Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
2018-08-29 15:41:54 -07:00
Donald Sharp 68ede9c401 bgpd: zlog_warn to assert for code that must be executed first
In bgp_keepalives.c, it was noticed that we were
ensuring that we called an intialization function first,
but this is a development escape in that once this
was fixed we never see it.  So if a developer moves
this assumption around, let's crash the program and
lead them to this spot instead of silently ignoring
the problem.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-08-16 08:24:22 -04:00
Chirag Shah a9198bc1e2 bgpd: add keepalive thread name
Testing Done:

Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
2018-07-27 21:04:47 -07:00
Quentin Young 2ccf91b108
bgpd: fix incorrect keepalive timer evaluation
Incorrect check for sentinel value effectively caused peers to sometimes
use the keepalive timer value of other peers, which sometimes led to
hold timer expiry.

Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
2018-02-21 12:15:17 -05:00
Quentin Young 096476ddb0
bgpd: check flags before attempting keepalive ops
If a peer already has keepalives turned on when asking to turn them on,
return immediately. Same thing for turning them off.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-01-24 17:47:17 -05:00
Quentin Young a715eab3ce
bgpd: update pthreads to use lib changes
Use the new threading facilities provided in lib/ to streamline the
threads used in bgpd. In particular, all of the lifecycle code has been
removed from the I/O thread and replaced with the default loop. Did not
do the same to the keepalives thread as it is much smaller (doesn't need
the event system).

Also cleaned up some comments to match the style guide.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-01-24 15:30:55 -05:00
Quentin Young 934af4587f
bgpd: turn off keepalives when sending NOTIFY
This is necessary because otherwise between the time we wipe the output
buffer and the time we push the NOTIFY onto it, the KA generation thread
could have pushed a KEEPALIVE in the middle.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:18:07 -05:00
Quentin Young 152456fe23
bgpd: rebase onto master
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:18:04 -05:00
Quentin Young 6ee8ea1cf9
bgpd: fix includes for bgp_keeaplives.c
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:18:02 -05:00
Quentin Young 5c0c651c0a
bgpd: restyle bgp_keepalives.[ch]
And update copyright header.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:18:02 -05:00
Quentin Young b72b6f4fc9
bgpd: rename peer_keepalives* --> bgp_keepalives*
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:59 -05:00
Quentin Young 424ab01d0f
bgpd: implement buffered reads
* Move and modify all network input related code to bgp_io.c
* Add a real input buffer to `struct peer`
* Move connection initialization to its own thread.c task instead of
  piggybacking off of bgp_read()
* Tons of little fixups

Primary changes are in bgp_packet.[ch], bgp_io.[ch], bgp_fsm.[ch].
Changes made elsewhere are almost exclusively refactoring peer->ibuf to
peer->curr since peer->ibuf is now the true FIFO packet input buffer
while peer->curr represents the packet currently being processed by the
main pthread.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:59 -05:00
Quentin Young 0ca8b79f38
bgpd: use new threading infra
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:59 -05:00
Quentin Young bd8b71e405
bgpd: use hash table for bgp_keepalives.c
Large numbers of peers makes insertion and removal time for a linked
list non-negligible.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:59 -05:00
Quentin Young 419dfe6a70
bgpd: dynamically allocate synchronization primitives
Changes all synchronization primitives to be dynamically allocated. This
should help catch any subtle errors in pthread lifecycles.

This change also pre-initializes synchronization primitives before
threads begin to run, eliminating a potential race condition that
probably would have caused a segfault on startup on a very fast box.

Also changes mutex and condition variable allocations to use
MTYPE_PTHREAD and updates tests to do the proper initializations.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:58 -05:00
Quentin Young 49507a6f6a
bgpd: remove unused struct thread from peer
* Remove t_write
* Remove t_keepalive

These have been replaced by pthreads and are no longer needed. Since
some code looks at these values to determine if the threads are
scheduled, also add a new bitfield to store the same information.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:58 -05:00
Quentin Young 03014d48f4
bgpd: put BGP keepalives in a pthread
This patch, in tandem with moving packet writes into a dedicated kernel
thread, fixes session flaps caused by long-running internal operations
starving the (old) userspace write thread.

BGP keepalives are now produced by a kernel thread and placed onto the
peer's output queue. These are then consumed by the write thread. Both
of these tasks are concurrent with the rest of bgpd, obviating the
session flaps described above.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2017-11-30 16:17:57 -05:00