Some IDEs (e.g., emacs+lsp) run pylint from the root directory and so
we need to add `tests/topotests` so that `lib` and `munet` are found
by pylint when used in imports
Signed-off-by: Christian Hopps <chopps@labn.net>
Previously if you configured a different timestamp precision then
`make check` would fail as the non-default config is generated and
fails test_cli config file comparison.
Signed-off-by: Christian Hopps <chopps@labn.net>
The call into bgp_open_capability can return that it wrote more
than BGP_OPEN_NON_EXT_OPT_LEN bytes, in that case the open
part needs to be written again with ext_opt_params set to
true to allow extended parameters to be written thus keeping
the len < 255 bytes. The code to do this was first creating
a new stream and then copying into it the stream, trying
to call bgp_open_capability() and if it succeeded recopying
the tmp stream back onto the original.
Let's change this around such that we save the current spot
in the stream of where we are writing and if the change does
not work reset the pointer and try again with the correct
parameter. This removes the stream and multiple copies and
eventual free of the temporary stream.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The fpm_listener didn't have the ability to specify the output
file location at all. Modify the code to accept this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This fix will delete an OSPFv3 area when all the interfaces and
configuration (ranges, NSSA ranges, stub area, NSSA area, filter-list,
import-list and export-list) have been removed. The changes provides
a general solution to https://github.com/FRRouting/frr/issues/18324.
Signed-off-by: Acee Lindem <acee@lindem.com>
BT:
```
3 <signal handler called>
4 0x00005616837546fc in bgp_static_update (bgp=bgp@entry=0x5616865eac50, p=0x561686639e40,
bgp_static=0x561686639f50, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST) at ../bgpd/bgp_route.c:7232
5 0x0000561683754ad0 in bgp_static_add (bgp=0x5616865eac50) at ../bgpd/bgp_table.h:413
6 0x0000561683785e2e in no_bgp_network_import_check (self=<optimized out>, vty=0x5616865e04c0,
argc=<optimized out>, argv=<optimized out>) at ../bgpd/bgp_vty.c:4609
7 0x00007fdbcc294820 in cmd_execute_command_real (vline=vline@entry=0x561686663000,
```
The program encountered a SEG FAULT when attempting to access pi->extra->vrfleak->bgp_orig because
pi->extra->vrfleak was NULL.
```
(gdb) p pi->extra->vrfleak
$1 = (struct bgp_path_info_extra_vrfleak *) 0x0
(gdb) p pi->extra->vrfleak->bgp_orig
Cannot access memory at address 0x8
```
Added NOT NULL check on pi->extra->vrfleak before accessing pi->extra->vrfleak->bgp_orig
to prevent the segmentation fault.
Signed-off-by: Manpreet Kaur <manpreetk@nvidia.com>
The command `show bgp <afi> <safi>` has this output:
r1# show bgp ipv4 uni 10.0.0.0
BGP routing table entry for 10.0.0.0/32, version 1
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
r1-eth0 r1-eth1 r1-eth2 r1-eth3
....
It specifically states `Advertised to non peer-group peers:` yet
the code is not filtering those out.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Resolve conflict between F_ISIS_UNIT_TEST and ISIS_OPT_DUMMY_AS_LOOPBACK
which were both using the same bit value (0x01). This collision caused
unit test mode to be unintentionally enabled when DUMMY_AS_LOOPBACK was set.
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
This reverts commit 58592be577.
This commit is being reverted because of several issues:
a) tcpdump -i <any interface that bgp happens to use>
This command causes bgp to dump it's entire table to all
of it's peers again. This is a huge problem in any type
of scaled environment *and* it is not unusual to have an
operator do this.
b) This commit appears to be attempting to solve the problem
with route leaking across vrf's using labels( or somesuch ).
Unfortunately we have absolutely no topotests that show the
behavior. I am also unable to get any type of how to reproduce
the problem being solved by the commit. I do know, though,
that the problem really stems from the fact that bgp has
decided to cheat and not create bnc's for route leaking.
Thus when a nexthop changes, bgp is not being notified.
This commit was being used as a hammer to solve the problem.
While I do agree backing out a bug fix for some operator
is less then ideal, I believe that since I cannot get the
operator to tell me the problem it solved and the fact
that sending large amounts of updates with just a simple
tcpdump command ( actually 2 one for tcpdump start and
one for finishing ) is more detrimental in my eyes at
this point in time. Additionally the solution used
is the wrong one for the problem.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
CI tests are showing cases where staticd is connecting to
zebra after config is read in and the nexthops are never
being registered w/ zebra:
2025/03/11 15:39:44 STATIC: [T83RR-8SM5G] staticd 10.4-dev starting: vty@2616
2025/03/11 15:39:45 STATIC: [GH3PB-C7X4Y] Static Route to 13.13.13.13/32 not installed currently because dependent config not fully available
2025/03/11 15:39:45 STATIC: [RHJK1-M5FAR] static_zebra_nht_register: Failure to send nexthop 1.1.1.2/32 for 11.11.11.11/32 to zebra
2025/03/11 15:39:45 STATIC: [M7Q4P-46WDR] vty[14]@> enable
Zebra shows connection time as:
2025/03/11 15:39:45.933343 ZEBRA: [V98V0-MTWPF] client 5 says hello and bids fair to announce only static routes vrf=0
As a result staticd never installs the route because it has no nexthop
tracking to say that the route could be installed.
Modify staticd on startup to go through it's nexthops and dump them to
zebra to allow the staticd state machine to get to work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Issue:
Not freeing the neighbor n within the same function can lead to
memory leak.
zebra_neigh_del_all() -> zebra_neigh_del() re lookup and free
Fix: not accessing n after its freed.
Directly free the neighbor entry (n) when its interface index matches
ifp->ifindex.
This fixes:
ERROR: AddressSanitizer: heap-use-after-free on address 0x6070001052e8 at pc 0x7f6bf7d09ddb bp 0x7ffd3366a000 sp 0x7ffd33669ff0
READ of size 8 at 0x6070001052e8 thread T0
#0 0x7f6bf7d09dda in _rb_next lib/openbsd-tree.c:455
#1 0x55f95a307261 in zebra_neigh_rb_head_RB_NEXT zebra/zebra_neigh.h:34
#2 0x55f95a3082e9 in zebra_neigh_del_all zebra/zebra_neigh.c:162
#3 0x55f95a121ee7 in zebra_interface_down_update zebra/redistribute.c:571
#4 0x55f95a0f819d in if_down zebra/interface.c:1017
#5 0x55f95a0fe168 in zebra_if_dplane_ifp_handling zebra/interface.c:2102
#6 0x55f95a0ff10c in zebra_if_dplane_result zebra/interface.c:2241
#7 0x55f95a27ce9c in rib_process_dplane_results zebra/zebra_rib.c:5015
#8 0x7f6bf7da3ad9 in event_call lib/event.c:1984
#9 0x7f6bf7c62141 in frr_run lib/libfrr.c:1246
#10 0x55f95a11ca7f in main zebra/main.c:543
#11 0x7f6bf7029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#12 0x7f6bf7029e3f in __libc_start_main_impl ../csu/libc-start.c:392
#13 0x55f95a0dd0b4 in _start (/usr/lib/frr/zebra+0x1a80b4)
Ticket: #18047
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
Add a test to control that when the extended community limit is set
to 0, then only 2 BGP updates are received on R3.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a test to control that when the community limit is set to 0, then
only 2 BGP updates are received on R3.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The functions alloc_srv6_sid_func_explicit/dynamic expect to return bool
but we have places where we return a -1 or NULL which the caller is
assuming as a True/Valid and ending up allocating Sid
Without Fix:
2025/03/10 21:44:04.295350 ZEBRA: [XWV20-TGK70] alloc_srv6_sid_func_explicit: trying to allocate explicit SID function 65088 from block fcbb:bbbb::/32
2025/03/10 21:44:04.295351 ZEBRA: [MM61M-TQZNP] alloc_srv6_sid_func_explicit: elib s 10000 e 20000 wlib s 1000 ewlib s 30000 e 1000 SID_FUNC 65088
2025/03/10 21:44:04.295352 ZEBRA: [QGHMB-SWNFW] alloc_srv6_sid_func_explicit: function 65088 is outside ELIB [10000/20000] and EWLIB alloc ranges [30000/1000]
2025/03/10 21:44:04.295367 ZEBRA: [H0GZA-NNSWJ] get_srv6_sid_explicit: allocated explicit SRv6 SID fcbb:bbbb:1:fe40:: for context End.X nh6 2001::2
2025/03/10 21:44:04.295368 ZEBRA: [XBBYD-T1Q7P] srv6_manager_get_sid_internal: got new SRv6 SID for ctx End.X nh6 2001::2: sid_value=fcbb:bbbb:1:fe40:: (func=65088) (proto=4, instance=0, sessionId=0), notifying all clients
With Fix:
2025/03/10 22:04:25.052235 ZEBRA: [MM61M-TQZNP] alloc_srv6_sid_func_explicit: elib s 30000 e 31000 wlib s 31000 ewlib s 30000 e 31000 SID_FUNC 65056
2025/03/10 22:04:25.052236 ZEBRA: [YHMRC-EMYNX] alloc_srv6_sid_func_explicit: function 65056 is outside ELIB [30000/31000] and EWLIB alloc ranges [30000/31000]
2025/03/10 22:04:25.052254 ZEBRA: [XSG8X-Q2XJX] get_srv6_sid_explicit: invalid SM request arguments: failed to allocate SID function 65056 from block fcbb:bbbb::/32
2025/03/10 22:04:25.052257 ZEBRA: [YC52T-427SJ] srv6_manager_get_sid_internal: not got SRv6 SID for ctx End.DT6 vrf_id 4, sid_value=fcbb:bbbb:1:fe20::, locator_name=MAIN
root@rajasekarr:/tmp/topotests/static_srv6_sids.test_static_srv6_sids/r1#
Ticket :#
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
At VRF disabling, keep the route entries that was associated to its
table ID but not to the VRF itself. Kernel flushes these entries so we
need to reinstall them.
To do so, add a flag to mean that a route entry is owned by a table ID
and not by a VRF. If the VRF associated to the table ID is deleted, the
route entry must not be deleted.
Update to tests with new flag. 2057 is in hexa 0x809, meaning that the
new flag has been to some prefix.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When a routing table (RT) already has a default route before being
assigned to a VRF, the default route vanishes in zebra after the VRF
assignment.
> root@router:~# ip route add blackhole default table 100
> root@router:~# ip route show table 100
> blackhole default
> root@router:~# vtysh -c 'show ip route table 100'
> [...]
> VRF default table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), 00:00:05
> root@router:~# ip l add red type vrf table 100
> root@router:~# vtysh -c 'show ip route table 100'
> root@router:~#
Do not override the default route if it exists.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This is the continuation of the previous commit.
When a VRF is deleted, the kernel retains only its own routing entries
in the former VRF table and removes all others.
This change ensures that routing entries created by FRR daemons are also
removed from the former zebra VRF table when the VRF is disabled.
To test:
> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip l add du0 type dummy
> ifconfig du0 192.168.0.1/24 up
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'
Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:
> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100
Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).
When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.
Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.
Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.
> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
> #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
> #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
> #2 0x7fa32474d783 in route_node_get lib/table.c:283
> #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
> #4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
> #5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
> #6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
> #7 0x7fa32476689c in event_call lib/event.c:1996
> #8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #9 0x55b0e4e6c32a in main zebra/main.c:526
> #10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
> #11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
> #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
> #1 0x7fa324668d8f in qfree lib/memory.c:130
> #2 0x7fa32474c421 in route_table_free lib/table.c:126
> #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
> #4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
> #5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
> #6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
> #7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
> #8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
> #9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
> #10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
> #11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #13 0x7fa32476689c in event_call lib/event.c:1996
> #14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #15 0x55b0e4e6c32a in main zebra/main.c:526
> #16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
> #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> #1 0x7fa324668c4d in qcalloc lib/memory.c:105
> #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
> #3 0x7fa32474e73c in route_table_init lib/table.c:512
> #4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
> #5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
> #6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
> #7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
> #8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
> #9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
> #10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #12 0x7fa32476689c in event_call lib/event.c:1996
> #13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #14 0x55b0e4e6c32a in main zebra/main.c:526
> #15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Ensure that the zserv.api socket is actually up and running
before moving onto other daemons after zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently the topotest infrastructure is starting up daemons
in mgmtd,zebra, staticd then everything else.
The problem that is happening, under heavy load, is that
zebra may not be fully started and when a daemon attempts
to connect to it, it will not be able to connect.
Some of the daemons do not have great retry mechanisms at all.
In addition our normal systemctl startup scripts actually
wait a small amount of time for zebra to be ready before
moving onto the other daemons.
Let's make topotests startup a tiny bit more nuanced
and have mgmtd fully up before starting up zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>