Commit graph

493 commits

Author SHA1 Message Date
Pooja Jagadeesh Doijode 7a935bb1ad zebra: Change zserv_accept to use global zsock FD
Problem:
Zebra crashed while going down. This happened because zebra was
trying to process a new client accept request after closing ZAPI's
listener socket zsock and setting it to -1.

Fix:
Skip rescheduling zserv_accept() and accepting a new client if global
ZAPI listener socket FD, zsock is already closed and set to -1.
Also use global ZAPI listener socket FD zsock in zserv_accept() instead
of using a copy of zsock.

Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
2025-04-25 20:27:38 -07:00
Donald Sharp 937a9fb3e9 zebra: Limit reading packets when MetaQ is full
Currently Zebra is just reading packets off the zapi
wire and stacking them up for processing in zebra
in the future.  When there is significant churn
in the network the size of zebra can grow without
bounds due to the MetaQ sizing constraints.  This
ends up showing by the number of nexthops in the
system.  Reducing the number of packets serviced
to limit the metaQ size to the packets to process
allieviates this problem.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-03-25 09:10:46 -04:00
Donald Sharp 4d6f5c7e27 zebra: Rework the stale client list to a typesafe list
The stale client list was just a linked list, let's use
the typesafe list.

Signed-off-by: Donald Sharp <donaldsharp72@gmail.com>
2025-03-19 13:43:00 -04:00
Donald Sharp 24d293277f zebra: Convert the zrouter.client_list to a typesafe list
This list should just be a typesafe list.

Signed-off-by: Donald Sharp <donaldsharp72@gmail.com>
2025-03-19 13:27:36 -04:00
Donald Sharp 4b96752737 zebra: Add some documentation on when zserv_open should be used
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2025-01-17 10:16:48 -05:00
Donald Sharp 3a53b2dc4f zebra: Give a bit more data about zclient connection on errors
When debugging a crash I noticed that sometimes we talked about
a zclient connection in relation to the fd associated with it
and sometimes we did not.  Let's just always give the data
associated with the fd.  It will make it a bit easier for me
to follow the transitions.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-12-13 11:21:26 -05:00
Mark Stapp 506097a1b9 zebra: separate zebra ZAPI server open and accept
Separate zebra's ZAPI server socket handling into two phases:
an early phase that opens the socket, and a later phase that
starts listening for client connections.

Signed-off-by: Mark Stapp <mjs@cisco.com>
2024-12-03 09:44:46 -05:00
Donald Sharp e0437aba6d zebra: Add more vrf name to debugs
Trying to debug some cross vrf stuff in zebra and frankly
it's hard to grep the file for the routes you are interested
in.  Let's clean this up some and get a bit better
information for us developers

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-09-11 15:30:43 -04:00
vivek b5682ffbf0 *: Add and use option for graceful (re)start
Add a new start option "-K" to libfrr to denote a graceful start,
and use it in zebra and bgpd.

zebra will use this option to denote a planned FRR graceful restart
(supporting only bgpd currently) to wait for a route sync completion
from bgpd before cleaning up old stale routes from the FIB. An optional
timer provides an upper-bounds for this cleanup.

bgpd will use this option to denote either a planned FRR graceful
restart or a bgpd-only graceful restart, and this will drive the BGP
GR restarting router procedures.

Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-07-01 13:02:52 -07:00
Donald Sharp 8d8f12ba8e zebra: Actually display I/O buffer sizes
An operator found a situation where zebra was
backing up in a significant way towards BGP
with EVPN changes taking up some serious amounts
of memory.  The key lines that would have clued
us in on it were behind a dev build.  Let's change
this.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-04-09 10:08:03 -04:00
Donald Sharp 9ef76cff98 zebra: Cleanup leaked memory on shutdown from GR code
Recent commit: 6b2554b94a
Exposed, via Address Sanitation, that memory was being
leaked.  Unfortunately the CI system did not catch this.

Two pieces of memory were being lost: The zserv client
data structure as well as anything on the client->gr_info_queue.
Clean these up.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-03-12 21:18:02 -04:00
Rajasekar Raja a8efa994da zebra: backpressure - Zebra push back on Buffer/Stream creation
Currently, the way zebra works is it creates pthread per client (BGP is
of interest in this case) and this thread loops itself in zserv_read()
to check for any incoming data. If there is one, then it reads,
validates and adds it in the ibuf_fifo signalling the main thread to
process the message. The main thread when it gets a change, processes
the message, and invokes the function pointer registered in the header
command. (Ex: zserv_handlers).

Finally, if all of this was successful, this task reschedules itself and
loops in zserv_read() again

However, if there are already items on ibuf FIFO, that means zebra is
slow in processing. And with the current mechanism if Zebra main is
busy, the ibuf FIFO keeps growing holding up the memory.

Show memory zebra:(Example: 15k streams hoarding ~160 MB of data)
--- qmem libfrr ---
Stream             :       44 variable   3432352    15042 161243800

Fix:
Client IO Thread: (zserv_read)
 - Stop doing the read events when we know there are X number of items
   on the FIFO already.(X - zebra zapi-packets <1-10000> (Default-1000)

 - Determine the number of items on the zserv->ibuf_fifo. Subtract this
   from the work items and only pull the number of items off that would
   take us to X items on the ibuf_fifo again.

 - If the number of items in the ibuf_fifo has reached to the maximum
      * Either initially when zserv_read() is called (or)
      * when processing the remainders of the incoming buffer
   the client IO thread is woken by the the zebra main.

Main thread: (zserv_process_message)
If the client ibuf always schedules a wakeup to the client IO to read
more items from the socked buffer. This way we ensure
 - Client IO thread always tries to read the socket buffer and add more
   items to the ibuf_fifo (until max limit)
 - hidden config change (zebra zapi-packets <>) is taken into account

Ticket: #3390099

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-03-07 15:16:33 -08:00
Donald Sharp 275edb5c16 *: Rename ZEBRA_NHRP_NEIGH_XXX to ZEBRA_NEIGH_XXX
This does not need to be nhrp specific.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-01-22 12:14:59 -05:00
Philippe Guibert d41b425ab3 zebra: add client counter for nhg operations
Add three counters that account for the nhg operations
that are using the zebra API with the NHG_ADD and NHG_DEL
commands.

> # show zebra client
> [..]
> Type        Add         Update      Del
> ==================================================
> IPv4        100         0           0
> IPv6        0           0           0
> Redist:v4   0           0           0
> Redist:v6   0           0           0
> NHG         1           1           1
> VRF         3           0           0
> [..]

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-11-30 14:25:00 +01:00
Donatas Abraitis 9a0bb7bcd1
Merge pull request #13333 from donaldsharp/vrf_bitmap_cleanup
*: Rearrange vrf_bitmap_X api to reduce memory footprint
2023-07-04 22:11:11 +03:00
Donatas Abraitis 97072d144e zebra: Free Zebra client resources
Memory leaks started flowing:

```
AddressSanitizer Topotests Part 0:  15 KB -> 283 KB
AddressSanitizer Topotests Part 1:  1 KB -> 495 KB
AddressSanitizer Topotests Part 2:  13 KB -> 478 KB
AddressSanitizer Topotests Part 3:  39 KB -> 213 KB
AddressSanitizer Topotests Part 4:  30 KB -> 836 KB
AddressSanitizer Topotests Part 5:  0 bytes -> 356 KB
AddressSanitizer Topotests Part 6:  86 KB -> 783 KB
AddressSanitizer Topotests Part 7:  0 bytes -> 354 KB
AddressSanitizer Topotests Part 8:  0 bytes -> 62 KB
AddressSanitizer Topotests Part 9:  408 KB -> 518 KB
```

```
Direct leak of 3584 byte(s) in 1 object(s) allocated from:
    #0 0x7f1957b02d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
    #1 0x559895c55df0 in qcalloc lib/memory.c:105
    #2 0x559895bc1cdf in zserv_client_create zebra/zserv.c:743
    #3 0x559895bc1cdf in zserv_accept zebra/zserv.c:880
    #4 0x559895cf3438 in event_call lib/event.c:1995
    #5 0x559895c3901c in frr_run lib/libfrr.c:1213
    #6 0x559895a698f1 in main zebra/main.c:472
    #7 0x7f195635ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
```

Fixes b20acd0 ("bgpd: Use synchronous way to get labels from Zebra")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-27 22:48:39 +03:00
Donald Sharp 161972c9fe *: Rearrange vrf_bitmap_X api to reduce memory footprint
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable"  | grep "VRF BIT HASH" | wc -l
3570

3570 hashes for bitmaps associated with the vrf.  This is a very
large number of hashes.  Let's do two things:

a) Reduce the created size of the actually created hashes to 2
instead of 32.

b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.

This reduces the number of hashes to 61 in my setup for normal
operation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-26 14:59:21 -04:00
Donatas Abraitis 52dde8747b zebra: Ignore non GR-aware zclient handling for BGP
This is for synchronous client (label/table manager) - aka session_id == 1.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Donatas Abraitis 20c2c8787a zebra: Show session id when printing an error when the client disconnects
Before:

```
2023/06/18 22:00:42 ZEBRA: [VXKFG-8SJRV][EC 4043309121] Client 'bgp' encountered an error and is shutting down.
2023/06/18 22:00:42 ZEBRA: [VXKFG-8SJRV][EC 4043309121] Client 'bgp' encountered an error and is shutting down.
```

After:

```
2023/06/18 22:06:44 ZEBRA: [N5M5Y-J5BPG][EC 4043309121] Client 'bgp' (session id 0) encountered an error and is shutting down.
2023/06/18 22:06:44 ZEBRA: [N5M5Y-J5BPG][EC 4043309121] Client 'bgp' (session id 1) encountered an error and is shutting down.
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Donald Sharp c2cf522347 zebra: No need to set msg to NULL
The msg value is always reset to something new before it is used inside
the mutex.  No need to set it to NULL.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-01 08:54:25 -04:00
Donald Sharp 644a8d3560 zebra: remove current_afi as that it is no longer used
After the restructure of the gr code to allow zebra_gr
to have individual cleanups of afi, this is no longer necessary.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 15:40:56 -04:00
Donald Sharp 0c1fd82df6 zebra: GR code could potentially stop running
When GR is running and attempting to clear up a node
if the node that is currently saved and we are coming
back to happens to be deleted during the time zebra
suspends the GR code due to hitting the node limit
then zebra GR code will just completely stop processing
and potentially leave stale nodes around forever.

Let's just remove this hole and process what we can.
Can you imagine trying to debug this after the fact?

If we remove a node then that counts toward the maximum
to process of ZEBRA_MAX_STALE_ROUTE_COUNT.  This should
prevent any non-processing with a slightly larger cost
of having to look at a few nodes repeatedly

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 07:48:42 -04:00
Donald Sharp 24a58196dd *: Convert event.h to frrevent.h
We should probably prevent any type of namespace collision
with something else.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp e16d030c65 *: Convert THREAD_XXX macros to EVENT_XXX macros
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp 4f830a0799 *: Convert thread_timer_remain_XXX to event_timer_remain_XXX
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp 332beb64b8 *: Convert thread_cancelXXX to event_cancelXXX
Modify the code base so that thread_cancel becomes event_cancel

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp 907a2395f4 *: Convert thread_add_XXX functions to event_add_XXX
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp e6685141aa *: Rename struct thread to struct event
Effectively a massive search and replace of
`struct thread` to `struct event`.  Using the
term `thread` gives people the thought that
this event system is a pthread when it is not

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp cb37cb336a *: Rename thread.[ch] to event.[ch]
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system.  There is a continual
problem where people are confusing `struct thread` with a true
pthread.  In reality, our entire thread.c is an event system.

In this commit rename the thread.[ch] files to event.[ch].

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:16 -04:00
David Lamparter acddc0ed3c *: auto-convert to SPDX License IDs
Done with a combination of regex'ing and banging my head against a wall.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-02-09 14:09:11 +01:00
David Lamparter ccac11096c zebra: do not load/store wider-than-ptr atomics
Most 32-bit architectures cannot do atomic loads and stores of data
wider than their pointer size, i.e. 32 bit.  Funnily enough they
generally *can* do a CAS2, i.e., 64-bit compare-and-swap, but while a
CAS can emulate atomic add/bitops, loads and stores aren't available.

Replace with a mutex;  since this is 99% used from the zserv thread, the
mutex should take the local-to-thread fast path anyway.  And while one
atomic might be faster than a mutex lock/unlock, we're doing several
here, and at some point a mutex wins on speed anyway.

This fixes build on armel, mipsel, m68k, powerpc, and sh4.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-01-06 16:59:02 +01:00
Mark Stapp 3e6ff764a1 zebra: fix a couple of typos
Fix a couple of typos in vty prompt and output text.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-01-03 15:22:37 -05:00
Mark Stapp c0ce4875ff zebra: use real MTYPEs for various objects
Don't use MTYPE_TMP for many things in zebra: add specific
mem types.

Signed-off-by: Mark Stapp <mjs@labn.net>
2022-12-05 10:55:35 -05:00
Donald Sharp ef03888333 zebra: Convert time to uint64_t for zclient data structures
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-08-24 08:29:50 -04:00
Donald Sharp cb1991af8c *: frr_with_mutex change to follow our standard
convert:
	frr_with_mutex(..)

to:
	frr_with_mutex (..)

To make all our code agree with what clang-format is going to produce

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-07-20 15:50:32 -04:00
Donald Sharp 17be83bf99 *: Fix spelling of Gracefull
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-04-02 07:46:19 -04:00
Donald Sharp cc9f21da22 *: Change thread->func to return void instead of int
The int return value is never used.  Modify the code
base to just return a void instead.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-02-23 19:56:04 -05:00
Donald Sharp f2ada31cba zebra: Expand v4/v6 route space
At some scale we eventually run out of room displaying v4/v6 route
totals for `show zebra client summ`:
janelle# show zebra client summ
Name      Connect Time    Last Read  Last Write  IPv4 Routes       IPv6 Routes
--------------------------------------------------------------------------------
bgp           04w0d18h     00:00:19    00:01:2411729127/4052681  2037786/903094

This total over ran the space in just a little over a week of uptime.
Expand to have a bit more room.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-11-17 07:47:28 -05:00
Donald Sharp 76ab1a9702 zebra: Display how long zebra is expected to wait for GR
When a client sends to zebra that GR mode is being turned
on.  The client also passes down the time zebra should hold
onto the routes.  Display this time with the output
of the `show zebra client` command as well.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-10-07 12:08:42 -04:00
Donald Sharp cc3d834308 zebra: GR data was being printed 2 times for show zebra client
When issuing the `show zebra client` command data about
Graceful Restart state is being printed 2 times.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-10-07 12:06:00 -04:00
Donald Sharp 06302ecb88 zebra: On client shutdown cleanup any vrf labels associated with it
When a vrf label is created by a client and the client disconnects
we should clean up any vrf labels associated with that client.

eva# show mpls table
 Inbound Label  Type   Nexthop  Outbound Label
 -----------------------------------------------
 1000           SHARP  RED      -

eva# exit
sharpd@eva ~/f/zebra (label_destruction)> ps -ef | grep frr
root     4017793       1  0 13:57 ?        00:00:00 /usr/lib/frr/watchfrr -d -F datacenter --log file:/var/log/frr/watchfrr.log --log-level debug zebra bgpd ospfd isisd pimd eigrpd sharpd staticd
frr      4017824       1  0 13:57 ?        00:00:00 /usr/lib/frr/zebra -d -F datacenter --log file:/tmp/zebra.log -r --graceful_restart 60 -A 127.0.0.1 -s 90000000
frr      4017829       1  0 13:57 ?        00:00:00 /usr/lib/frr/bgpd -d -F datacenter -M rpki -A 127.0.0.1
frr      4017836       1  0 13:57 ?        00:00:00 /usr/lib/frr/ospfd -d -F datacenter -A 127.0.0.1
frr      4017839       1  0 13:57 ?        00:00:00 /usr/lib/frr/isisd -d -F datacenter -A 127.0.0.1
frr      4017842       1  0 13:57 ?        00:00:00 /usr/lib/frr/pimd -d -F datacenter -A 127.0.0.1
frr      4017865       1  0 13:57 ?        00:00:00 /usr/lib/frr/eigrpd -d -F datacenter -A 127.0.0.1
frr      4017869       1  0 13:57 ?        00:00:00 /usr/lib/frr/sharpd -d -F datacenter -A 127.0.0.1
frr      4017888       1  0 13:57 ?        00:00:00 /usr/lib/frr/staticd -d -F datacenter -A 127.0.0.1
sharpd   4018624 3938423  0 14:02 pts/10   00:00:00 grep --color=auto frr
sharpd@eva ~/f/zebra (label_destruction)> sudo kill -9 4017869

sharpd@eva ~/f/zebra (label_destruction)> sudo vtysh -c "show mpls table"
sharpd@eva ~/f/zebra (label_destruction)>

Fixes: #1787
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-07-21 14:04:36 -04:00
Donald Sharp 9691937d8b zebra: Move individual lines to table in show zebra client command
Move some individual add/delete lines to the table format in
the `show zebra client` command

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-06-10 20:41:35 -04:00
Donald Sharp a9d8faf7ab zebra: Add message counts for show zebra client
There were counters FRR was keeping but never displaying.  Add them
in.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-06-10 20:24:44 -04:00
David Lamparter 6a0eb6885b *: drop zassert.h
It's not actually working properly...

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2021-04-23 12:06:35 +02:00
Philippe Guibert 7723e8d3fd zebra: link layer config and notification, implementation in zebra
zebra implements zebra api for configuring link layer information. that
can be an arp entry (for ipv4) or ipv6 neighbor discovery entry. This
can also be an ipv4/ipv6 entry associated to an underlay ipv4 address,
as it is used in gre point to multipoint interfaces.
this api will also be used as monitoring. an hash list is instantiated
into zebra (this is the vrf bitmap). each client interested in those entries
in a specific vrf, will listen for following messages: entries added, removed,
or who-has messages.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2021-04-09 18:29:58 +02:00
Donald Sharp 99a30b4760 zebra: Display instance id as part of show zebra client summ
When displaying `show zebra client summ` when we have instances
running, display the instance number as well.

New Output:

sharpd@eva ~/frr7 (instance_data)> vtysh -c "show zebra client summ"
Name      Connect Time    Last Read  Last Write  IPv4 Routes       IPv6 Routes
--------------------------------------------------------------------------------
ospf[1]       00:00:02     00:00:02    00:00:02       0/0              0/0
ospf[5]       00:00:02     00:00:02    00:00:02       0/0              0/0
sharp         00:00:02     00:00:02    00:00:02       0/0              0/0
static        00:00:02     00:00:02    00:00:02       0/0              0/0
Routes column shows (added+updated)/deleted

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-02-04 08:35:14 -05:00
Donald Sharp 7ed5844bef zebra: Allow show zebra client to give clues about route update status
When entering `show zebra client` allow the display of the client->notify_status
for route updates.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2020-12-10 12:59:14 -05:00
Karen Schoener 581e797e02 zebra: Adding zapi client close notification
When zebra detects a client close, send a zapi client close
notification.

Signed-off-by: Karen Schoener <karen@voltanet.io>
2020-12-07 18:22:36 -05:00
Donatas Abraitis 2dbe669bdf :* Convert prefix2str to %pFX
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2020-10-22 09:07:41 +03:00
Stephen Worley 24db1a7b9a zebra: handle proto NHG uninstall client disconnect
Add code to handle proto-based NHG uninstalling after
the owning client disconnects.

This is handled the same way as rib_score_proto() but for now
we are ignoring instance.

Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
2020-09-28 12:40:59 -04:00