Currently, when the locator block and sid block differs, staticd would
still go ahead and request zebra to allocate the SID which it does if
there is atleast one match (from any locators).
Only when staticd tries to install the route, it sees that the locator
block and sid block are different and avoids installing the route.
Fix:
Check if the locator block and sid block match before even requesting
Zebra to allocate one.
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
CI tests are showing cases where staticd is connecting to
zebra after config is read in and the nexthops are never
being registered w/ zebra:
2025/03/11 15:39:44 STATIC: [T83RR-8SM5G] staticd 10.4-dev starting: vty@2616
2025/03/11 15:39:45 STATIC: [GH3PB-C7X4Y] Static Route to 13.13.13.13/32 not installed currently because dependent config not fully available
2025/03/11 15:39:45 STATIC: [RHJK1-M5FAR] static_zebra_nht_register: Failure to send nexthop 1.1.1.2/32 for 11.11.11.11/32 to zebra
2025/03/11 15:39:45 STATIC: [M7Q4P-46WDR] vty[14]@> enable
Zebra shows connection time as:
2025/03/11 15:39:45.933343 ZEBRA: [V98V0-MTWPF] client 5 says hello and bids fair to announce only static routes vrf=0
As a result staticd never installs the route because it has no nexthop
tracking to say that the route could be installed.
Modify staticd on startup to go through it's nexthops and dump them to
zebra to allow the staticd state machine to get to work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This commit extends the `static_zebra_srv6_sid_uninstall` function to
allow staticd to remove SRv6 uA SIDs from the zebra RIB.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
This commit extends the `static_zebra_srv6_sid_install` function to
allow staticd to install SRv6 uA SIDs into the zebra RIB.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
When removing an SRv6 uA SID, staticd should ask SRv6 SID Manager to
release the SID.
Currently, `static_zebra_release_srv6_sid` does not allow to release uA
SIDs.
This commit extends `static_zebra_release_srv6_sid` to allow staticd to
release SRv6 uA SIDs.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
In order to configure an SRv6 uA SID in staticd, staticd should request
SRv6 SID Manager to allocate a SID bound to the uA behavior.
Currently, `static_zebra_request_srv6_sid` does not support requesting
SIDs bound to the uA behavior.
This commit extends the `static_zebra_request_srv6_sid` function to
enable staticd to request SIDs bound to the uA behavior.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
With recent commit:
c1adc8f1d6 staticd has started to crash
aproximately 1/10 of the tine in the static_vrf topotest
(gdb) bt
0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:44
1 __pthread_kill_internal (signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:78
2 __GI___pthread_kill (threadid=140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3 0x00007fb1a6442476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4 0x00007fb1a6950823 in core_handler (signo=6, siginfo=0x7ffd6d832ff0, context=0x7ffd6d832ec0) at lib/sigevent.c:268
5 <signal handler called>
6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:44
7 __pthread_kill_internal (signo=6, threadid=140400982256064) at ./nptl/pthread_kill.c:78
8 __GI___pthread_kill (threadid=140400982256064, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9 0x00007fb1a6442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x00007fb1a64287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x00007fb1a699a422 in _zlog_assert_failed (xref=0x55f7dfd3dac0 <_xref.117>,
extra=0x55f7dfd30c30 "BUG: NH %pFX registered but not in hashtable") at lib/zlog.c:789
12 0x000055f7dfd1201f in static_zebra_nht_register (nh=0x55f7fd2ecd80, reg=true) at staticd/static_zebra.c:333
13 0x000055f7dfd29c9d in static_install_nexthop (nh=0x55f7fd2ecd80) at staticd/static_routes.c:299
14 0x000055f7dfd2a126 in static_fixup_vrf (vrf=0x55f7fd2333a0, stable=0x55f7fd271030, afi=AFI_IP, safi=SAFI_UNICAST)
at staticd/static_routes.c:441
15 0x000055f7dfd2a2be in static_fixup_vrf_ids (vrf=0x55f7fd2333a0) at staticd/static_routes.c:494
16 0x000055f7dfd15b53 in static_vrf_enable (vrf=0x55f7fd2333a0) at staticd/static_vrf.c:124
17 0x00007fb1a696ffa5 in vrf_enable (vrf=0x55f7fd2333a0) at lib/vrf.c:325
18 0x00007fb1a6991c87 in zclient_vrf_add (cmd=33, zclient=0x55f7fd29f740, length=76, vrf_id=8) at lib/zclient.c:2701
19 0x00007fb1a6996cba in zclient_read (thread=0x7ffd6d834230) at lib/zclient.c:4764
20 0x00007fb1a696bd9b in event_call (thread=0x7ffd6d834230) at lib/event.c:2019
21 0x00007fb1a68e1a3a in frr_run (master=0x55f7fd102e10) at lib/libfrr.c:1246
22 0x000055f7dfd1081e in main (argc=7, argv=0x7ffd6d834478, envp=0x7ffd6d8344b8) at staticd/static_main.c:193
Tracking this down, the crash is because the nh believes that is already
registered but lookup fails, causing this assert. Looking at the code
static_fixup_vrf is changing the vrf_id. I put a zlog_debug right
before the change of the nh vrf_id and noticed that the vrf id was
UNKNOWN. So, the code is attempting to register into zebra the nexthop
with a vrf unknown( which will be ignored ).
Modify the code in the registration process to notice that the nh is
still UNKNOWN and as such nothing should be done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Some codepoints can not be read by interoperating with CISCO.
This is because PSP/USP flavor are used by default, and the display of
the isis output has to be adapted.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The SRv6 support in staticd (PR #16894) does not set the correct SID
parameters (block length, node length, function length).
This commit fixes the issue and computes the correct parameters.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
When staticd receives a `ZAPI_SRV6_SID_RELEASED` notification from SRv6
SID Manager, it tries to unset the validity flag of `sid`. But since
the `sid` variable is NULL, we get a NULL pointer dereference.
```
=================================================================
==13815==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000060 (pc 0xc14b813d9eac bp 0xffffcb135a40 sp 0xffffcb135a40 T0)
==13815==The signal is caused by a READ memory access.
==13815==Hint: address points to the zero page.
#0 0xc14b813d9eac in static_zebra_srv6_sid_notify staticd/static_zebra.c:1172
#1 0xe44e7aa2c194 in zclient_read lib/zclient.c:4746
#2 0xe44e7a9b69d8 in event_call lib/event.c:1984
#3 0xe44e7a85ac28 in frr_run lib/libfrr.c:1246
#4 0xc14b813ccf98 in main staticd/static_main.c:193
#5 0xe44e7a4773f8 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#6 0xe44e7a4774c8 in __libc_start_main_impl ../csu/libc-start.c:392
#7 0xc14b813cc92c in _start (/usr/lib/frr/staticd+0x1c92c)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV staticd/static_zebra.c:1172 in static_zebra_srv6_sid_notify
==13815==ABORTING
```
This commit fixes the problem by doing a SID lookup first. If the SID
can't be found, we log an error and return. If the SID is found, we go
ahead and unset the validity flag.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
Currently FRR needs to send a uint16_t value for the number
of nexthops as well it needs the ability to properly decode
all of this. Find and handle all the places that this happens.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use `vtysh` with this input file:
```
ip route A nh1
ip route A nh2
ip route B nh1
ip route B nh2
```
When running "ip route B" with "nh1" and "nh2", the procedure maybe is:
1) Create the two nexthops: "nh1" and "nh2".
2) Register "nh1" with `static_zebra_nht_register()`, then the states of both
"nh1" and "nht2" are set to "STATIC_SENT_TO_ZEBRA".
3) Register "nh2" with `static_zebra_nht_register()`, then only the routes with
nexthop of "STATIC_START" will be sent to zebra.
So, send the routes with the nexthop of "STATIC_SENT_TO_ZEBRA" to zebra.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Currently, staticd configuration is tightly coupled with VRF existence.
Because of that, it has to use a hack in NB infrastructure to create a
VRF configuration when at least one static route is configured for this
VRF. This hack is incompatible with mgmtd, because mgmtd doesn't execute
configuration callbacks. Because of that, the configuration may become
out of sync between mgmtd and staticd. There are two main cases:
1. Create static route in a VRF. The VRF data node will be created
automatically in staticd by the NB hack, but not in mgmtd.
2. Delete VRF which has some static routes configured. The static route
configuration will be deleted from staticd by the NB hack, but not
from mgmtd.
To fix the problem, decouple configuration of static routes from VRF
configuration. Now it is possible to configure static routes even if the
VRF doesn't exist yet. Once the VRF is created, staticd applies all the
preconfigured routes.
This change also fixes the problem with static routes being preserved in
the system when staticd "control-plane-protocol" container is deleted
but the VRF is still configured.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Send `ZEBRA_ROUTE_NOTIFY_REQUEST` rather than relying on the options
field in zclient startup.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
...so that multiple functions can be subscribed.
The create/destroy hooks are renamed to real/unreal because that's what
they *actually* signal.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Daemons like staticd already implement nexthop zapi (un)registration.
b7ca809d1c ("lib: BFD automatic source selection") has implemented a
concurrent nexthop (un)registration. Some nexthop could be unregistred
by the bfd whereas they were still needed by the daemon.
Let the deamons deal with nexthop zapi (un)registration.
Fixes: b7ca809d1c ("lib: BFD automatic source selection")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
b7ca809d1c ("lib: BFD automatic source selection") has added a dedicated
zclient socket for nht tracking. Since the bfd lib is used by daemons
that already has a zclient socket, those daemons has now a second
zclient socket. However, zebra does not distinguish the two zclient
sessions. For example, the interfaces are asked a second via
zebra_message_send(zclient, ZEBRA_INTERFACE_ADD, VRF_DEFAULT) in
zclient_start(). As a result, callbacks functions like bgp_ifp_create()
are called a second time, which causes some processing overhead and
might cause bugs.
Re-use the existing zclient socket for nht tracking.
Note that BFD automatic source selection is only currently implemented
in staticd. Other daemons will require to add the following in their
ZEBRA_NEXTHOP_UPDATE callback function:
> if (zclient->bfd_integration)
> bfd_nht_update(&matched, &nhr);
Fixes: b7ca809d1c ("lib: BFD automatic source selection")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Implement all BFD integration northbound callbacks and integrate BFD
with `staticd` route installation procedure.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When compiling with -fsanitize=thread. I started getting this error:
staticd/static_zebra.c: In function ‘static_zebra_nht_get_prefix’:
staticd/static_zebra.c:316:1: error: control reaches end of non-void function [-Werror=return-type]
316 | }
| ^
Just to make future efforts still work, let's just make the compiler happy.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Other VRFs get VRF_ADD notifications from zebra which triggers
static_fixup_vrf_ids, but since the default VRF is implicit we need to
make that same call on connect.
This should fix problems with staticd being started before (or
concurrent with and thus racing) zebra.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
nh_update is only called in two places and both precede a matching
follow-up nht_register call. Fold the update into register, and make
register do the right thing™ for all cases (i.e. update refcounts as
needed, and retry zebra NHT registration if it failed before).
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Since this is a free()-type function, clear the caller's pointer to
NULL to aid static analysis and prevent UAF bugs.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Put static_nexthop -> prefix code into a small helper, remove extra
prefix variable, and grab AFI from prefix.
This commit should not result in any functional change.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Fixes: #10766
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Recent commit:
abc246e193
Has broken `make check` with recently new compilers:
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): warning: relocation against `zebra_ecmp_count' in read-only section `.text'
CCLD tests/bgpd/test_peer_attr
CCLD tests/bgpd/test_packet
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_capabilities':
/home/sharpd/frr5/staticd/static_zebra.c:208: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_zebra.o): in function `static_zebra_route_add':
/home/sharpd/frr5/staticd/static_zebra.c:418: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: staticd/libstatic.a(static_nb_config.o): in function `static_nexthop_create':
/home/sharpd/frr5/staticd/static_nb_config.c:174: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: /home/sharpd/frr5/staticd/static_nb_config.c:175: undefined reference to `zebra_ecmp_count'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make: *** [Makefile:8679: tests/lib/test_grpc] Error 1
make: *** Waiting for unfinished jobs....
Essentially the newly introduced variable zebra_ecmp_count is not available in the
libstatic.a compiled and make check has code that compiles against it.
The fix is to just move the variable to the library.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Zebra sends a nexthop-update message on registeration, which will cause
existing routes to be reconfigured even no changes actually happened.
Don't register the nexthop again if it's already done.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Restrict the number of nexthops for a route to the compiled-in
limit. Be careful with the zapi route struct's array of nexthops
too.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
When the VRF interface is coming up, we don't need to fixup VRF ids - it
was already done in static_vrf_enable when the interface was created.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Since f60a1188 we store a pointer to the VRF in the interface structure.
There's no need anymore to store a separate vrf_id field.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Most users of if_lookup_address_exact only cared about whether the
address is any local address. Split that off into a separate function.
For the users that actually need the ifp - which I'm about to add a few
of - change it to prefer returning interfaces that are UP.
(Function name changed due to slight change in behavior re. UP state, to
avoid possible bugs from this change.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>