Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:
> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100
Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).
When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.
Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.
Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.
> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
> #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
> #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
> #2 0x7fa32474d783 in route_node_get lib/table.c:283
> #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
> #4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
> #5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
> #6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
> #7 0x7fa32476689c in event_call lib/event.c:1996
> #8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #9 0x55b0e4e6c32a in main zebra/main.c:526
> #10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
> #11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
> #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
> #1 0x7fa324668d8f in qfree lib/memory.c:130
> #2 0x7fa32474c421 in route_table_free lib/table.c:126
> #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
> #4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
> #5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
> #6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
> #7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
> #8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
> #9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
> #10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
> #11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #13 0x7fa32476689c in event_call lib/event.c:1996
> #14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #15 0x55b0e4e6c32a in main zebra/main.c:526
> #16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
> #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> #1 0x7fa324668c4d in qcalloc lib/memory.c:105
> #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
> #3 0x7fa32474e73c in route_table_init lib/table.c:512
> #4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
> #5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
> #6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
> #7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
> #8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
> #9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
> #10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
> #11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
> #12 0x7fa32476689c in event_call lib/event.c:1996
> #13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
> #14 0x55b0e4e6c32a in main zebra/main.c:526
> #15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The previous commit modified zebra to reinstall the singleton
nexthops for a nexthop group when a interface event comes up.
Now let's modify zebra to attempt to reuse the nexthop group
when this happens and the upper level protocol resends the
route down with that. Only match if the protocol is the same
as well as the instance and the nexthop groups would match.
Here is the new behavior:
eva(config)# do show ip route 9.9.9.9/32
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:08 ago
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config)# do show ip route nexthop-group 9.9.9.9/32
% Unknown command: do show ip route nexthop-group 9.9.9.9/32
eva(config)# do show ip route 9.9.9.9/32 nexthop-group
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:54 ago
Nexthop Group ID: 57
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config)# exit
eva# conf
eva(config)# int dummy3
eva(config-if)# shut
eva(config-if)# no shut
eva(config-if)# do show ip route 9.9.9.9/32 nexthop-group
Routing entry for 9.9.9.9/32
Known via "static", distance 1, metric 0, best
Last update 00:00:08 ago
Nexthop Group ID: 57
* 192.168.99.33, via dummy1, weight 1
* 192.168.100.33, via dummy2, weight 1
* 192.168.101.33, via dummy3, weight 1
* 192.168.102.33, via dummy4, weight 1
eva(config-if)# exit
eva(config)# exit
eva# exit
sharpd@eva ~/frr1 (master) [255]> ip nexthop show id 57
id 57 group 37/43/50/58 proto zebra
sharpd@eva ~/frr1 (master)> ip route show 9.9.9.9/32
9.9.9.9 nhid 57 proto 196 metric 20
nexthop via 192.168.99.33 dev dummy1 weight 1
nexthop via 192.168.100.33 dev dummy2 weight 1
nexthop via 192.168.101.33 dev dummy3 weight 1
nexthop via 192.168.102.33 dev dummy4 weight 1
sharpd@eva ~/frr1 (master)>
Notice that we now no longer are creating a bunch of new
nexthop groups.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently the FRR code will receive both kernel and
connected routes that do not actually have an underlying
nexthop group at all. Zebra turns around and creates
a `matching` nexthop hash entry and installs it.
For connected routes, this will create 2 singleton
nexthops in the dplane per interface (v4 and v6).
For kernel routes it would just create 1 singleton
nexthop that might be used or not.
This is bad because the dplane has a limited amount
of space available for nexthop entries and if you
happen to have a large number of interfaces then
all of a sudden you have 2x(# of interfaces) singleton
nexthops.
Let's modify the code to delay creation of these singleton
nexthops until they have been used by something else in the
system.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The current code is unsetting the fact that the
NHG is installed. It is installed but we are
reinstalling it. Let's note this in the code
appropriately as REINSTALL and not remove the
INSTALLED FLAG.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Intermittently zebra and kernel are out of sync
when interface flaps and the add's/dels are in
same processing queue and zebra assumes no change in nexthop.
Hence we need to bring in a reinstall to kernel
of the nexthops and routes to sync their states.
Upon interface flap kernel would have deleted NHGs
associated to a interface (the one flapped),
zebra retains NHGs for 3 mins even though upper
layer protocol removes the nexthops (associated NHG).
As part of interface address add ,
re-add singleton NHGs associated to interface.
Ticket: #3173663
Issue: 3173663
Signed-off-by: Ashwini Reddy <ashred@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fixup both memory leaks as well as use after free's in nhg's
on shutdown.
This approach is effectively just iterating through all the
hash items and directly just freeing the memory instead
of handling ref counts or cross references.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Before deleting nexthop groups, that are installed,
from the system, start a timer and hold the nexthop
group for that time.
Suppose you have this scenario
a) create a static route with 1 x ecmp
creates a nhg with 1 x ecmp
b) create a static route with 2 x ecmp
creates a nhg with 2 x ecmp
deletes a's nhg
c) create a static route with 3 x ecmp
creates a nhg with 3 x ecmp
deletes b's nhg
d) create a different route with 1 x ecmp
creates another 1 x ecmp ( since a's ecmp was deleted )
e) create a different route with 2 x ecmp
creates another 2 x ecmp ( since b's ecmp was deleted )
If you don't delete the nhg, start a timer, the nhg's used
in steps a and b can be reused for steps d and e. This reduces
overhead work with zebra <-> kernel interactions and improves
the speed of the system.
So modify the code to note that an installed nexthop group should
be kept around a bit and hopefully reused.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I keep getting confused about nhg_depends and nhg_dependents.
So take a second and write them down for the next person.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add `%pNG` so that a nexthop group can be displayed in debugs/logs
such that it can provide useful information.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add uptime for use with NHEs to keep track of how
long we have had this NHE in our rib without an update.
This is treated exactly the same as the re->uptime for
routes. When we get an update for a route, we reset the
uptime.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add a PROTO_OWNED macro for code readability when checking
ID bounds for whether a NHG is proto owned.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Use the main zebra workqueue for daemon-owned NHGs, in addition
to processing kernel-owned NHGs. The zapi message processing
creates a temporary object that's enqueued to the workqueue,
then processed/installed as part of the workqueue processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a control and api for the use of backup nexthops in
recursive resolution. With 'no', we won't try to use installed
backup nexthops when resolving a recursive route.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Start reorg of zebra nexthop-resolution so that we can use the
resolution logic for nexthop-groups as well as routes. Change
the signature of the core nexthop_active() api so that it does
not require a route-entry or route-node. Move some of the logic
around so that nexthop-specific logic is in nexthop_active(),
while route-oriented logic is in nexthop_active_check().
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Send the results of daemons' nhg updates asynchronously,
after the update has actually completed. Capture additional
info about the source daemon in order to locate the correct
zapi session. Simplify the result types considered by the
zebra_nhg module.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
It is now 4bits of type and 28bits of value -
1. type=0 is for L3 NHG
2. type=1 is for L2 NH
3. type=2 is for L2 NHG
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When `-r` is specified to zebra, on shutdown we should
not remove any routes from the fib. This was a problem
with nhg's on shutdown due to their ref-count behavior.
Introduce a methodology where on shutdown we don't mess
with the nexthop groups in the kernel. That way on
next startup things will be ok.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add type to the nhg_proto_del API params for sanity checking
that the types of the route sent by the proto matches the type
found with the ID.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a flag to track the released state of a proto-based NHG.
This flag is used to know whether the upper level proto has called
the *_del API. Typically, the NHG would just get removed and uninstalled
at this point but there is a chance we are being sent it while routes
are still being owned or we were sent it multiple times. This flag
and associated code handles that.
Ticket: CM-30369
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Fix some reference counting issues seen when replacing
a NHG and deleting one.
For replacement, we should end with the same refcnt on the new
one.
For delete, its the caller's job to decrement its ref after
its done with it.
Further, update routes in the rib with the new pointer after replace.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code to handle proto-based NHG uninstalling after
the owning client disconnects.
This is handled the same way as rib_score_proto() but for now
we are ignoring instance.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Remove some leftover boilerplate from the old replace
code path. That code ended up in the add API so its no
longer needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a command/functionality to only install proto-based nexthops.
That is nexthops owned/created by upper level protocols, not ones
implicitly created by zebra.
There are some scenarios where you would not want zebra to be
arbitrarily installing nexthop groups and but you still want
to use ones you have control over via lib/nexthop_group config
and an upper level protocol.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Implement the underlying zebra functionality to Add/Del an
internal zebra and kernel NHG.
These NHGs are managed by the upperlevel protocols that send them
down via zapi messaging.
They are not put into the overall zebra NHG hash table and only
put into to the ID table. Therefore, different protos cannot
and will not share NHGs.
The proto is also set appropriately when sent to the kernel.
Expand the separation of Zebra hashed/shared/created NHGs and
proto created and mangaged NHGs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Implement the next hop group send on startup if you are using
them. Normally you will only have them if you are already using this
Linux kernel feature.
NOTE: to make sure all next hop groups exist, we send/enqueue all next
hop groups first and then we send routes. The RIB route walk start is
at the end of the function `fpm_nhg_send()`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Include backup nexthops in nhe processing; connect incoming
zapi route data with updated rib/nhg apis; add more debugs in
nhg processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a backup index in a nexthop directly (if it has a backup
nexthop); revise the zebra nhe/nhg code; revise zapi route
decoding to match; revise the dataplane route datastructs.
Refactor some of the rib_add_multipath code to be prepared to
be called with an nhe, carrying nexthop and (possibly) backup
info together.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Embed nexthop-group, which is just a pointer, in the zebra
nexthop-hash-entry object, rather than mallocing one.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a config that disables use of kernel-level nexthop ids.
Currently, zebra always uses nexthop ids if the kernel supports
them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Replace the existing list of nexthops (via a nexthop_group
struct) in the route_entry with a direct pointer to zebra's
new shared group (from zebra_nhg.h). This allows more
direct access to that shared group and the info it carries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Clean up the relationships between zebra's rib and nexthop-group
headers as prep for adding a nexthop-group pointer to the
route_entry.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-work the validity setting and checking APIs
for nhg_hash_entry's to make them clearer.
Further, they were originally only beings set
on ifdown and install. Extended their use into
releasing entries and to account for setting
the validity of a recursive dependent.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Change the wording of the flag indicating we have received
a nexthop group from the kernel with a different ID but
is fundamentally identical to one we already have.
It was colliding with a flag of similar name in the nexthop struct.
Change it from NEXTHOP_GROUP_DUPLICATE -> NEXTHOP_GROUP_UNHASHABLE
since it is in fact unhashable.
Also change the wording of functions and comments referencing the same
problem.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a comment to the header of `zebra_nhg.c` to point the reader
to where the hashtables containing the nhg entries are held.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a mechanism to requeue groups we receive from the
kernel if the IDs are in a weird order (Group ID is lower
than individual nexthop IDs for example).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add handling for delete/update nexthop object messages from the
kernel.
If someone deletes a nexthop object we are still using, send it back
down. If the someone updates a nexthop we are using, replace that nexthop
with ours. Routes are referencing this nexthop object ID and we resolved
it ourselves, so we should force the other `someone` to submit to our
will.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>