Commit graph

823 commits

Author SHA1 Message Date
Donald Sharp e0437aba6d zebra: Add more vrf name to debugs
Trying to debug some cross vrf stuff in zebra and frankly
it's hard to grep the file for the routes you are interested
in.  Let's clean this up some and get a bit better
information for us developers

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-09-11 15:30:43 -04:00
Russ White add56c61dd
Merge pull request #15259 from dmytroshytyi-6WIND/nexthop_resolution
zebra: add LSP entry to nexthop via recursive (part 2)
2024-09-10 10:04:08 -04:00
Donald Sharp 98b11de9f6 zebra: Modify show zebra dplane providers to give more data
The show zebra dplane provider command was ommitting
the input and output queues to the dplane itself.
It would be nice to have this insight as well.

New output:
r1# show zebra dplane providers
dataplane Incoming Queue from Zebra: 100
Zebra dataplane providers:
  Kernel (1): in: 6, q: 0, q_max: 3, out: 6, q: 14, q_max: 3
  dplane_fpm_nl (2): in: 6, q: 10, q_max: 3, out: 6, q: 0, q_max: 3
dataplane Outgoing Queue to Zebra: 43
r1#

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-09-05 15:52:05 -04:00
Donald Sharp 0c72a78930 zebra: Allow for initial deny of installation of nhe's
Currently the FRR code will receive both kernel and
connected routes that do not actually have an underlying
nexthop group at all.  Zebra turns around and creates
a `matching` nexthop hash entry and installs it.
For connected routes, this will create 2 singleton
nexthops in the dplane per interface (v4 and v6).
For kernel routes it would just create 1 singleton
nexthop that might be used or not.

This is bad because the dplane has a limited amount
of space available for nexthop entries and if you
happen to have a large number of interfaces then
all of a sudden you have 2x(# of interfaces) singleton
nexthops.

Let's modify the code to delay creation of these singleton
nexthops until they have been used by something else in the
system.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-30 08:23:48 -04:00
Donald Sharp 8ad5643abe zebra: Convince SA that the ng will always be valid
There is a code path that could theoretically get you
to a point where the ng->nexthop is a NULL value.
Let's just make sure the SA system believes that
cannot happen anymore.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-29 18:10:30 -04:00
Donald Sharp 9bc0cd8241 zebra: Prevent accidental re memory leak in odd case
There exists a path in rib_add_multipath where if a decision
is made to not use the passed in re, we just drop the memory
instead of freeing it.  Let's free it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp d528c02a20 zebra: Handle kernel routes appropriately
Current code intentionally ignores kernel routes.  Modify
zebra to allow these routes to be read in on linux.  Also
modify zebra to look to see if a route should be treated
as a connected and mark it as such.

Additionally this should properly handle some of the issues
being seen with NOPREFIXROUTE.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp bdfccf69fa zebra: Expose rib_update_handle_vrf_all
This function will be used on interface down
events to allow for kernel routes to be cleaned
up.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp f450e1cda4 zebra: Make p and src_p const for rib_delete
The prefix'es p and src_p are not const.  Let's make
them so.  Useful to signal that we will not change this
data.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donatas Abraitis 1e288c9b55 zebra: Do not forget to free opaque data for route entry
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-13 18:00:30 +03:00
Donald Sharp e53fa582bc zebra: Fix removal of routes on MetaQ when client goes down
It is possible that right before an upper level protocol dies
or is killed routes would be installed into zebra.  These routes
could be on the Meta-Q for early route-processing.  Leaving us with
a situation where the client is removed, and all it's routes that are
in the rib at that time, and then after that the MetaQ is run and the
routes are reprocessed leaving routes from an upper level daemon
post daemon going away from zebra's perspective.  These routes will
be abandoned.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-07-30 08:34:06 -04:00
Philippe Guibert 731f74e35f zebra, topotests: do not set nexthop's FIB flag when DUPLICATE present
The bgp_duplicate_nexthop test installs routes with nexthop's
flags set to both DUPLICATE and FIB: this should not happen.

The DUPLICATE flag of a nexthop indicates this nexthop is already
used in the same nexthop-group, and there is no need to install it
twice in the system; having the FIB flag set indicates that the
nexthop is installed in the system. This is why both flags should
not be set on the same nexthop.

This case happens at installation time, but can also happen
at update time.
- Fix this by not setting the FIB flag value when the DUPLICATE
flag is present.
- Modify the bgp_duplicate_test to check that the FIB flag is not
present on duplicated nexthops.
- Modify the bgp_peer_type_multipath_relax test.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-07-08 15:42:02 +02:00
vivek b5682ffbf0 *: Add and use option for graceful (re)start
Add a new start option "-K" to libfrr to denote a graceful start,
and use it in zebra and bgpd.

zebra will use this option to denote a planned FRR graceful restart
(supporting only bgpd currently) to wait for a route sync completion
from bgpd before cleaning up old stale routes from the FIB. An optional
timer provides an upper-bounds for this cleanup.

bgpd will use this option to denote either a planned FRR graceful
restart or a bgpd-only graceful restart, and this will drive the BGP
GR restarting router procedures.

Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
2024-07-01 13:02:52 -07:00
Philippe Guibert 9f88ed98a4 zebra: relax mpls entries installation with labeled unicast entries
Until now, when a FEC entry is added in zebra, driven by the
reception of a BGP labeled unicast update, an LSP entry is
created. That LSP entry is resolved by using the route entry
which is also installed by BGP labeled unicast.

This route entry is not available when we face with i-bgp
peering session. I-BGP labeled sessions are used to establish
IP connectivity across separate IGPs.

The below dumps illustrate a 3 IGP topology. An attempt to create
connectivity between the north and the south machines is done.

The 3 separate IGPs are configured in Segment routing:
- north-east
- east-west
- west-south

We create BGP peerings between each endpoint of each IGP:
- iBGP between (north) and (east)
- iBGP between (east) and (west)
- iBGP between (west) and (south)

Before that patch, the FEC entries could not be resolved on the
east machine:

Before:
east-vm# show mpls fec
192.0.2.1/32
  Label: 18
  Client list: bgp(fd 48)
192.0.2.5/32
  Label: 17
  Client list: bgp(fd 48)
192.0.2.7/32
  Label: 19
  Client list: bgp(fd 48)

east-vm# show mpls table
 Inbound Label  Type        Nexthop      Outbound Label
 --------------------------------------------------------
 1011           SR (OSPF)   192.168.2.2  1011
 1022           SR (OSPF)   192.168.2.2  implicit-null
 11044          SR (IS-IS)  192.168.3.4  implicit-null
 11055          SR (IS-IS)  192.168.3.4  11055
 30000          SR (OSPF)   192.168.2.2  implicit-null
 30001          SR (OSPF)   192.168.2.2  implicit-null
 36000          SR (IS-IS)  192.168.3.4  implicit-null

east-vm# show ip route
[..]
B   192.0.2.1/32 [200/0] via 192.0.2.1 inactive, label implicit-null, weight 1, 00:17:45
O>* 192.0.2.1/32 [110/20] via 192.168.2.2, r3-eth0, label 1011, weight 1, 00:17:47
O>* 192.0.2.2/32 [110/10] via 192.168.2.2, r3-eth0, label implicit-null, weight 1, 00:17:47
O   192.0.2.3/32 [110/0] is directly connected, lo, weight 1, 00:17:57
C>* 192.0.2.3/32 is directly connected, lo, 00:18:03
I>* 192.0.2.4/32 [115/20] via 192.168.3.4, r3-eth1, label implicit-null, weight 1, 00:17:59
B   192.0.2.5/32 [200/0] via 192.0.2.5 inactive, label implicit-null, weight 1, 00:17:56
I>* 192.0.2.5/32 [115/30] via 192.168.3.4, r3-eth1, label 11055, weight 1, 00:17:58
B>  192.0.2.7/32 [200/0] via 192.0.2.5 (recursive), label 19, weight 1, 00:17:45
  *                        via 192.168.3.4, r3-eth1, label 11055/19, weight 1, 00:17:45
[..]

After command "mpls fec nexthop-resolution" was applied, the FEC
entries will resolve over any non BGP route that has a labeled path
selected.

east-vm# show mpls table
 Inbound Label  Type        Nexthop      Outbound Label
 --------------------------------------------------------
 17             SR (IS-IS)  192.168.3.4  11055
 18             SR (OSPF)   192.168.2.2  1011
 19             BGP         192.168.3.4  11055/19
 1011           SR (OSPF)   192.168.2.2  1011
 1022           SR (OSPF)   192.168.2.2  implicit-null
 11044          SR (IS-IS)  192.168.3.4  implicit-null
 11055          SR (IS-IS)  192.168.3.4  11055
 30000          SR (OSPF)   192.168.2.2  implicit-null
 30001          SR (OSPF)   192.168.2.2  implicit-null
 36000          SR (IS-IS)  192.168.3.4  implicit-null

Co-developed-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-06-07 12:33:48 +02:00
Jafar Al-Gharaibeh 080054686f
Merge pull request #15216 from donaldsharp/zebra_opaque_mem_leak
zebra: Fix opaque memory leak in rare situation
2024-02-02 10:54:20 -06:00
Donald Sharp 8d5ed65e0d zebra: Cleanup dest assignment
dest was shadowing dest inside of an if statement additionally
both legs needed dest to be assigned.  Let's clean this up a
slight bit and use it appropriately

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-01-24 17:10:45 -05:00
Donald Sharp 0d638b7437 zebra: Fix opaque memory leak in rare situation
Fix this:
***********************************************************************************
Address Sanitizer Error detected in zebra_opaque.test_zebra_opaque/r3.asan.zebra.11099

=================================================================
==11099==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 66 byte(s) in 1 object(s) allocated from:
    #0 0x7f527fc06b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7f527f5e852b in qmalloc lib/memory.c:100
    #2 0x56418d20832d in zread_route_add zebra/zapi_msg.c:2125
    #3 0x56418d215d08 in zserv_handle_commands zebra/zapi_msg.c:4011
    #4 0x56418d32ab5b in zserv_process_messages zebra/zserv.c:520
    #5 0x7f527f6938d3 in event_call lib/event.c:2003
    #6 0x7f527f5cb692 in frr_run lib/libfrr.c:1218
    #7 0x56418d1c3336 in main zebra/main.c:508
    #8 0x7f527e656c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

SUMMARY: AddressSanitizer: 66 byte(s) leaked in 1 allocation(s).
***********************************************************************************

Code inspection leads to some code paths where the opaque data was not
freed up.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-01-24 10:14:18 -05:00
Carmine Scarpitta 8d0b4745a1 zebra: Add code to set SRv6 encap source addr in dplane
Add a bunch of set functions and associated data structure in
zebra_dplane to allow the configuration of the source address for SRv6
encap in the data plane.

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-12-14 14:56:44 +01:00
Donald Sharp 7f9c5c7fa2 zebra: The dplane_fpm_nl return path leaks memory
The route entry created when using a ctx to pass route
entry data backup to the master pthread in zebra is
being leaked.  Prevent this from happening.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-12-11 13:44:06 -05:00
Donald Sharp fff267e4ee
Merge pull request #14968 from opensourcerouting/fix/missing_whitespace
zebra: Add missing whitespace when printing route entry status
2023-12-10 14:30:27 -05:00
Donatas Abraitis 162433cb2a zebra: Add missing whitespace when printing route entry status
Before:

```
status: Removed ReplacingInstalled
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-12-08 13:57:56 +02:00
Philippe Guibert 4d60d9e2b4 zebra: enqueue NHG_DEL in rib_nhg meta queue
The NHG_DEL operation is done directly from ZAPI call, whereas
the NHG_ADD operation is done in the rib_nhg meta queue.

This may be problematic when ADD is followed by DEL. Imagine a
scenarion with two protocol NHIDs. <NH1> depends of <NH2> and
<NH3>. The deletion of <NH3> at the protocol level will trigger
2 messages to ZEBRA: NHG_ADD(<NH1>) and NHG_DEL(<NH3>).

Those operations are properly enqueued in ZAPI, but in the end,
the NHG_DEL is executed first. This causes NHG_ADD to unlink an
already freed NHG.

Fix this by consistently enqueuing NHG_DEL and NHG_ADD operations.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-12-07 17:20:20 +01:00
Mark Stapp 2648661ed5
Merge pull request #14795 from donaldsharp/zebra_notify_admin_lost
zebra: Fix non-notification of better admin won
2023-12-07 08:30:00 -05:00
Russ White 0a79e117d6
Merge pull request #12600 from donaldsharp/local_routes
*: Introduce Local Host Routes to FRR
2023-12-05 11:00:44 -05:00
Donald Sharp 2d3005068c zebra: Fix non-notification of better admin won
If there happens to be a entry in the zebra rib
that has a lower admin distance then a newly received
re, zebra would not notify the upper level protocol
about this happening.  Imagine a case where there
is a connected route for say a /32 and bgp receives
a route from a peer that is the same route as the
connected.  Since BGP has no network statement and
perceives the route as being `good` bgp will install
the route into zebra.  Zebra will look at the new
bgp re and correctly identify that the re is not
something that it will use and do nothing.  This
change notices this and sends up a BETTER_ADMIN_WON
route notification.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-29 13:47:48 -05:00
Donald Sharp a1c1b9d9c5 zebra: On shutdown, ensure ctx's in rib_dplane_q are freed
a) Rename rib_init to zebra_rib_init() to better follow how
things are named

b) on shutdown cycle through the rib_dplane_q and free
up any contexts sitting in it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-21 12:41:18 -05:00
Mark Stapp bb3faf1b95 zebra: reduce number of switch statements with dplane opcodes
Replace several switch blocks that contain every dplane opcode
with simpler sets of if()s. In these cases the code only
uses a couple of opcodes.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-11-17 08:40:58 -05:00
Donald Sharp 6de9f7fbf5 *: Move distance related defines into their own header
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-07 06:47:51 -05:00
Donald Sharp 315aa6cde4 *: Remove netlink headers from lib/zebra.h
The headers associated with netlink code
really only belong in those that need it.
Move these headers out of lib/zebra.h and
into more appropriate places.  bgp's usage
of the RT_TABLE_XXX defines are probably not
appropriate and will be cleaned up in future
commits.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-07 06:46:19 -05:00
Donald Sharp d4aa24ba7d *: Introduce Local Host Routes to FRR
Create Local routes in FRR:

S   0.0.0.0/0 [1/0] via 192.168.119.1, enp39s0, weight 1, 00:03:46
K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:03:51
O   192.168.119.0/24 [110/100] is directly connected, enp39s0, weight 1, 00:03:46
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:03:51
L>* 192.168.119.224/32 is directly connected, enp39s0, 00:03:51
O   192.168.119.229/32 [110/100] via 0.0.0.0, enp39s0 inactive, weight 1, 00:03:46
C>* 192.168.119.229/32 is directly connected, enp39s0, 00:03:46

Create ability to redistribute local routes.

Modify tests to support this change.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-01 17:13:06 -04:00
Philippe Guibert daad19071c zebra: add redistribute table-direct support
Redistributing routes from a specific routing table to a particular routing
protocol necessitates copying route entries to the main routing table using the
"ip import-table" command. Once copied, these routes are assigned a distinct
"table" route type, which the "redistribute table" command of the routing
protocol then picks up.

For illustration, here is a configuration that showcases the use of
"import-table" and "redistribute":

> # show running-config
> [..]
> ip route 172.31.0.10/32 172.31.1.10 table 100
> router bgp 65500
>  address-family ipv4 unicast
>   redistribute table 100
>  exit-address-family
> exit
> ip import-table 100
>
> # show ip route vrf default
> [..]
> T[100]>* 172.31.0.10/32 [15/0] via 172.31.1.10, r2-eth1, weight 1, 00:00:05

However, this method has inherent constraints:

- The 'import-table' parameter only handles route table id up to 252. The
253/254/255 table ids are reserved in the linux system, and adding other table
IDs above 255 leads to a design issue, where the size of some tables is directly
related to the maximum number of table ids to support.
- Duplicated route entries might interfere with original default table routes,
leading to potential conflicts. There is no guarantee that the zebra RIB will
favor these duplicated entries during redistribution.
- There are cases where the table ID can be checked independently of the default
routing table, as seen in Linux where the "ip rule" command is able to divert
traffic to that routing table. In that case, there is no need to duplicate route
entries in the default routing table.

To overcome these issues, a new redistribution type is proposed to redistribute
route entries directly from a specified routing table, eliminating the need for
an initial import into the default table.

Add a 'ZEBRA_ROUTE_TABLE_DIRECT' type to the 'REDISTRIBUTE' ZAPI messages. It
allows sending routes from a given non default table ID from zebra to a routing
daemon. The destination routing protocol table must be the default table.
The redistributed route inherit from the default distance value of 14: this is
the distance value reserved for routes redistributed via ROUTE_TABLE_DIRECT.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-10-20 13:28:52 +02:00
Donald Sharp 02cbd97801
Merge pull request #14561 from idryzhov/implicit-fallthrough
build: add -Wimplicit-fallthrough
2023-10-13 11:51:11 -04:00
Donald Sharp 4f7dc7709f
Merge pull request #13340 from leonshaw/fix/rib-del-connected
zebra: Fix connected route deletion when multiple entry exists
2023-10-13 08:53:43 -04:00
Igor Ryzhov 7d67b9ff28 build: add -Wimplicit-fallthrough
Also:
- replace all /* fallthrough */ comments with portable fallthrough;
pseudo keyword to accomodate both gcc and clang
- add missing break; statements as required by older versions of gcc
- cleanup some code to remove unnecessary fallthrough

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2023-10-12 21:23:18 +03:00
Rajasekar Raja 27ccfd9aa6 zebra: Fix zebra crash when replacing NHE during shutdown
During replace of a NHE from upper proto in zebra_nhg_proto_add(),
 - rib_handle_nhg_replace() is invoked with old NHE where we walk all
   RNs/REs & replace the re->nhe whose address points to old NHE.
 - In this walk, if prev re->nhe refcnt is decremented to 0, we free up
   the memory which the old NHE is pointing to.
Later in zebra_nhg_proto_add(), we end up accessing this freed memory
and crash.

Logs:
1380766 2023/08/16 22:34:11.994671 ZEBRA: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x56091d890840 (70312519[2756/2762/2810]) 2 => 1
1380773 2023/08/16 22:34:11.994678 ZEBRA: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x56091d890840 (70312519[2756/2762/2810]) 1 => 0
1380777 2023/08/16 22:34:11.994844 ZEBRA: [JE46R-G2NEE] zebra_nhg_release: nhe 0x56091d890840 (70312519[2756/2762/2810])
1380778 2023/08/16 22:34:11.994849 ZEBRA: [SCDBM-4H062] zebra_nhg_free: nhe 0x56091d890840 (70312519[2756/2762/2810]), refcnt 0
1380782 2023/08/16 22:34:11.995000 ZEBRA: [SCDBM-4H062] zebra_nhg_free: nhe 0x56091d890840 (0[]), refcnt 0
1380783 2023/08/16 22:34:11.995011 ZEBRA: lib/memory.c:84: mt_count_free(): assertion (mt->n_alloc) failed

Backtrace:
0  0x00007f833f5f48eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
1  0x00007f833f5df535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2  0x00007f833f636648 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3  0x00007f833f63cd6a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
4  0x00007f833f63cfb4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
5  0x00007f833f63fbc8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
6  0x00007f833f64172a in malloc () from /lib/x86_64-linux-gnu/libc.so.6
7  0x00007f833f6c3fd2 in backtrace_symbols () from /lib/x86_64-linux-gnu/libc.so.6
8  0x00007f833f9013fc in zlog_backtrace_sigsafe (priority=priority@entry=2, program_counter=program_counter@entry=0x7f833f5f48eb <raise+267>) at lib/log.c:222
9  0x00007f833f901593 in zlog_signal (signo=signo@entry=6, action=action@entry=0x7f833f988ee8 "aborting...", siginfo_v=siginfo_v@entry=0x7ffee1ce4a30,
    program_counter=program_counter@entry=0x7f833f5f48eb <raise+267>) at lib/log.c:154
10 0x00007f833f92dbd1 in core_handler (signo=6, siginfo=0x7ffee1ce4a30, context=<optimized out>) at lib/sigevent.c:254
11 <signal handler called>
12 0x00007f833f5f48eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
13 0x00007f833f5df535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
14 0x00007f833f958f96 in _zlog_assert_failed (xref=xref@entry=0x7f833f9e4080 <_xref.10705>, extra=extra@entry=0x0) at lib/zlog.c:680
15 0x00007f833f905400 in mt_count_free (mt=0x7f833fa02800 <MTYPE_NH_LABEL>, ptr=0x51) at lib/memory.c:84
16 mt_count_free (ptr=0x51, mt=0x7f833fa02800 <MTYPE_NH_LABEL>) at lib/memory.c:80
17 qfree (mt=0x7f833fa02800 <MTYPE_NH_LABEL>, ptr=0x51) at lib/memory.c:140
18 0x00007f833f90799c in nexthop_del_labels (nexthop=nexthop@entry=0x56091d776640) at lib/nexthop.c:563
19 0x00007f833f907b91 in nexthop_free (nexthop=0x56091d776640) at lib/nexthop.c:393
20 0x00007f833f907be8 in nexthops_free (nexthop=<optimized out>) at lib/nexthop.c:408
21 0x000056091c21aa76 in zebra_nhg_free_members (nhe=0x56091d890840) at zebra/zebra_nhg.c:1628
22 zebra_nhg_free (nhe=0x56091d890840) at zebra/zebra_nhg.c:1628
23 0x000056091c21bab2 in zebra_nhg_proto_add (id=<optimized out>, type=9, instance=<optimized out>, session=0, nhg=nhg@entry=0x56091d7da028, afi=afi@entry=AFI_UNSPEC)
    at zebra/zebra_nhg.c:3532
24 0x000056091c22bc4e in process_subq_nhg (lnode=0x56091d88c540) at zebra/zebra_rib.c:2689
25 process_subq (qindex=META_QUEUE_NHG, subq=0x56091d24cea0) at zebra/zebra_rib.c:3290
26 meta_queue_process (dummy=<optimized out>, data=0x56091d24d4c0) at zebra/zebra_rib.c:3343
27 0x00007f833f9492c8 in work_queue_run (thread=0x7ffee1ce55a0) at lib/workqueue.c:285
28 0x00007f833f93f60d in thread_call (thread=thread@entry=0x7ffee1ce55a0) at lib/thread.c:2008
29 0x00007f833f8f9888 in frr_run (master=0x56091d068660) at lib/libfrr.c:1223
30 0x000056091c1b8366 in main (argc=12, argv=0x7ffee1ce5988) at zebra/main.c:551

Issue: 3492162

Ticket# 3492162

Signed-off-by: Chirag Shah <chirag@nvidia.com>

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2023-08-31 05:40:34 +00:00
Donald Sharp 4d22b41321
Merge pull request #14256 from rodecker/rt-table-id
zebra: Make main routing table (RT_TABLE_MAIN) configurable
2023-08-25 17:33:52 -04:00
Martin Pels 4d96ce1b4d zebra: Make main routing table (RT_TABLE_MAIN) configurable
Signed-off-by: Martin Pels <mpels@ripe.net>
2023-08-22 15:29:07 +02:00
Pavel Ivashchenko 69906fdbd6 zebra: fix assert in process_subq_route
Bug is reporoduced in case of switching interfaces betwean VRFs.
 ospf6d is enabled and configured in each VRF.

 'dest' can be removed from the route node in the time when the same
 route node waiting processing in another sub-queue.

 A route node must only be in one sub-queue at a time.

 Details:

 1. Config:

    interface if0
     ipv6 address 2001:db8:cafe:2::2/64
     ipv6 nat inside
     ipv6 ospf6 area 0.0.0.51
     ipv6 ospf6 cost 10
     vrf test2
    exit
    !
    interface if1
     ipv6 address 2001:db8:cafe:4::1/64
     ipv6 nat outside
     ipv6 ospf6 area 0.0.0.0
     ipv6 ospf6 cost 10
     vrf test2
    exit
    !
    router ospf6
     ospf6 router-id 2.2.2.2
    exit
    !
    router ospf6 vrf test1
     ospf6 router-id 2.2.2.2
    exit
    !
    router ospf6 vrf test2
     ospf6 router-id 2.2.2.2
    exit

  I just quickly switched interfaces between different VRFs (default/test1/test2).

 2. Log messages:

  Aug 02 16:51:56 ubuntu zebra[386985]: [MFYWV-KH3MC] process_subq_early_route_add: (0:?):2001:db8:cafe:2::/64: Inserting route rn 0x56267593de90, re 0x56267595ae40 (connected) existing 0x0, same_count 0
  Aug 02 16:51:56 ubuntu zebra[386985]: [Q4T2G-E2SQF] process_subq_early_route_add: dumping RE entry 0x56267595ae40 for 2001:db8:cafe:2::/64 vrf default(0)
  Aug 02 16:51:56 ubuntu zebra[386985]: [GCGMT-SQR82] rib_link: (0:?):2001:db8:cafe:2::/64: rn 0x56267593de90 adding dest
  Aug 02 16:51:56 ubuntu zebra[386985]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):2001:db8:cafe:2::/64: queued rn 0x56267593de90 into sub-queue Connected Routes
  Aug 02 16:51:56 ubuntu zebra[386985]: [QE6V0-J8BG5] rib_delnode: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, re 0x56267595ae40, removing
  Aug 02 16:51:56 ubuntu zebra[386985]: [KMPGN-JBRKW] rib_meta_queue_add: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90 is already queued in sub-queue Connected Routes
  Aug 02 16:51:56 ubuntu zebra[386985]: [MFYWV-KH3MC] process_subq_early_route_add: (0:254):2001:db8:cafe:2::/64: Inserting route rn 0x56267593de90, re 0x56267595abf0 (ospf6) existing 0x0, same_count 1
  Aug 02 16:51:56 ubuntu zebra[386985]: [Q4T2G-E2SQF] process_subq_early_route_add: dumping RE entry 0x56267595abf0 for 2001:db8:cafe:2::/64 vrf default(0)
  Aug 02 16:51:56 ubuntu zebra[386985]: [KMPGN-JBRKW] rib_meta_queue_add: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90 is already queued in sub-queue Connected Routes
  Aug 02 16:51:56 ubuntu zebra[386985]: [YEYFX-TDSC2] process_subq_early_route_add: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, removing unneeded re 0x56267595ae40
  Aug 02 16:51:56 ubuntu zebra[386985]: [Y53JX-CBC5H] rib_unlink: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, re 0x56267595ae40
  Aug 02 16:51:56 ubuntu zebra[386985]: [QE6V0-J8BG5] rib_delnode: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, re 0x56267595abf0, removing
  Aug 02 16:51:56 ubuntu zebra[386985]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):2001:db8:cafe:2::/64: queued rn 0x56267593de90 into sub-queue RIP/OSPF/ISIS/EIGRP/NHRP Routes
  Aug 02 16:51:56 ubuntu zebra[386985]: [NZNZ4-7P54Y] default(0:254):2001:db8:cafe:2::/64: Processing rn 0x56267593de90
  Aug 02 16:51:56 ubuntu zebra[386985]: [ZJVZ4-XEGPF] default(0:254):2001:db8:cafe:2::/64: Examine re 0x56267595abf0 (ospf6) status: Removed Changed flags: None dist 110 metric 10
  Aug 02 16:51:56 ubuntu zebra[386985]: [NM15X-X83N9] rib_process: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, removing re 0x56267595abf0
  Aug 02 16:51:56 ubuntu zebra[386985]: [Y53JX-CBC5H] rib_unlink: (0:254):2001:db8:cafe:2::/64: rn 0x56267593de90, re 0x56267595abf0
  Aug 02 16:51:56 ubuntu zebra[386985]: [KT8QQ-45WQ0] rib_gc_dest: (0:?):2001:db8:cafe:2::/64: removing dest from table
  Aug 02 16:51:56 ubuntu zebra[386985]: [HH6N2-PDCJS] default(0:0):2001:db8:cafe:2::/64 rn 0x56267593de90 dequeued from sub-queue Connected Routes

 3. ...and then assert:

  (gdb) bt
  #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140662163115136) at ./nptl/pthread_kill.c:44
  #1  __pthread_kill_internal (signo=6, threadid=140662163115136) at ./nptl/pthread_kill.c:78
  #2  __GI___pthread_kill (threadid=140662163115136, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
  #3  0x00007fee76753476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #4  0x00007fee767397f3 in __GI_abort () at ./stdlib/abort.c:79
  #5  0x00007fee76a420fd in _zlog_assert_failed () from target:/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
  #6  0x0000562674efe0f0 in process_subq_route (qindex=7 '\a', lnode=0x562675940c60) at zebra/zebra_rib.c:2540
  #7  process_subq (qindex=META_QUEUE_NOTBGP, subq=0x562675574580) at zebra/zebra_rib.c:3055
  #8  meta_queue_process (dummy=<optimized out>, data=0x56267556d430) at zebra/zebra_rib.c:3091
  #9  0x00007fee76a386e8 in work_queue_run () from target:/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
  #10 0x00007fee76a31c91 in thread_call () from target:/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
  #11 0x00007fee769ee528 in frr_run () from target:/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
  #12 0x0000562674e97ec5 in main (argc=5, argv=0x7ffd1e275958) at zebra/main.c:478

  (gdb) print lnode->data
  $10 = (void *) 0x56267593de90
  (gdb) p/x *(struct route_node *)0x56267593de90
  $11 = {
    p = {
      family = 0xa,
      prefixlen = 0x40,
      u = {
        prefix = 0x20,
        prefix4 = {
          s_addr = 0xb80d0120
        },
        prefix6 = {
          __in6_u = {
            __u6_addr8 = {0x20, 0x1, 0xd, 0xb8, 0xca, 0xfe, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
            __u6_addr16 = {0x120, 0xb80d, 0xfeca, 0x200, 0x0, 0x0, 0x0, 0x0},
            __u6_addr32 = {0xb80d0120, 0x200feca, 0x0, 0x0}
          }
        },
   ...
    table = 0x5626755ae010,
    parent = 0x5626755ae070,
    link = {0x0, 0x0},
    lock = 0x4,
    nodehash = {
      hi = {
        next = 0x5626755ae0d0,
        hashval = 0xebe8bdbf
      }
    },
    info = 0x0

 3. What's happen:

   We removed unneeded re 0x56267595ae40 while adding re 0x56267595abf0. It was the last connected re,
   but rn 0x56267593de90 is still in the connected sub-queue.

   Then rib_delnode was called for 0x56267595abf0. (rn 0x56267593de90 is still in the connected sub-queue).
   rib_delnode have called rib_meta_queue_add which have checked, that rn is absent in sub-queue RIP/OSPF/ISIS/EIGRP/NHRP
   and have added rn in the second sub-queue.

 Fixes: d7ac4c4d88 ("zebra: Introduce early route processing on the MetaQ")

Signed-off-by: Pavel Ivashchenko <pivashchenko@nfware.com>
2023-08-09 21:25:33 +00:00
Xiao Liang cea3f7f25a lib, zebra: Fix EVPN nexthop config order
Delay EVPN route addition to synchronize with rib_delete(), which now
uses early route queue.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
2023-07-27 15:07:42 +08:00
anlan_cs 90bc24408b zebra: add several fields for debug
Two changes for debug:
1. Add a field to indicate its vrf for nexthop.  When the interface changes
vrf, we can't easily know the vrf of this nexthop according to current log.
2. Add a field to indicate operation type.  We can't know whether to add or
remove route according to current log.

Before:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6] nh type 3 flags 0x1
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
```

After:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6 vrfid 9]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6 vrfid 8] nh type 3 flags 0x1
Route 77.75.1.0/24(8) (add) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) (delete) queued for processing into sub-queue Early Route Processing
```

Signed-off-by: anlan_cs <anlan_cs@tom.com>
2023-07-25 14:23:35 +08:00
Donald Sharp af80201876 zebra: Further handle route replace semantics
When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.

Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well.  This has come up because the early route processing queueing
that was implemented late last year.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-17 10:00:32 -04:00
Donald Sharp 605df8d44f zebra: Use zebra dplane for RTM link and addr
a) Move the reads of link and address information
into the dplane
b) Move the startup read of data into the dplane
as well.
c) Break up startup reading of the linux kernel data
into multiple phases.  As that we have implied ordering
of data that must be read first and if the dplane has
taken over some data reading then we must delay initial
read-in of other data.

Fixes: #13288
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Donald Sharp a014450441 zebra: Add code to get/set interface to pass up from dplane
1) Add a bunch of get/set functions and associated data
structure in zebra_dplane to allow the setting and retrieval
of interface netlink data up into the master pthread.

2) Add a bit of code to breakup startup into stages.  This is
because FRR currently has a mix of dplane and non dplane interactions
and the code needs to be paused before continuing on.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Mark Stapp d6caf0dbd7
Merge pull request #13875 from donaldsharp/static_dplane_issues
zebra: Static routes async notification do not need this test
2023-07-05 08:27:23 -04:00
Donatas Abraitis 64510b9467 zebra: Dump route details when deleting a route
Just more details what's going on when deleting a route.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-29 17:39:45 +03:00
Donald Sharp d0123a9012 zebra: Static routes async notification do not need this test
When using asic_offload with an asynchronous notification the
rib_route_match_ctx function is testing for distance and tag
being correct against the re.

Normal route notification for static routes is this(well really all routes):
a) zebra dplane generates a ctx to send to the dplane for route install
b) dplane installs it in the kernel
c) if the dplane_fpm_nl.c module is being used it installs it.
d) The context's success code is set to it worked and passes the context
back up to zebra for processing.
e) Zebra master receives this and checks the distance and tag are correct
for static routes and accepts the route and marks it installed.

If the operator is using a wait for install mechansim where the dplane
is asynchronously sending the result back up at a future time *and*
it is using the dplane_fpm_nl.c code where it uses the rt_netlink.c
route parsing code, then there is no way to set distance as that we
do not pass distance to the kernel.

As such static routes were never being properly handled since the re and
context would not match and the route would still be marked as queued.

Modify the code such that the asynchronous path notification for static
routes ignores the distance and tag's as that there is no way to test
for this data from that path at this point in time.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-29 09:35:00 -04:00
Donald Sharp d8be139972 zebra: Reduce creation and fix memory leak of frrscripting pointers
There are two issues being addressed:

a) The ZEBRA_ON_RIB_PROCESS_HOOK_CALL script point
was creating a fs pointer per dplane ctx in
rib_process_dplane_results().

b) The fs pointer was not being deleted and directly
leaked.

For (a) Move the creation of the fs to outside
the do while loop.

For (b) At function end ensure that the pointer
is actually deleted.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-05 12:24:02 -04:00
Xiao Liang a35ba7ba60 zebra: Fix connected route deletion when multiple entry exists
When multiple interfaces have addresses in the same network, deleting
one of them may cause the wrong connected route being deleted.
For example:

    ip link add veth1 type veth peer veth2
    ip link set veth1 up
    ip link set veth2 up
    ip addr add dev veth1 192.168.0.1/24
    ip addr add dev veth2 192.168.0.2/24
    ip addr flush dev veth1

Zebra deletes the route of interface veth2 rather than veth1.

Should match nexthop against ere->re_nhe instead of ere->re->nhe.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
2023-04-20 12:13:18 +08:00
Donald Sharp 1b192d88e4 zebra: Actually free up memory associated with the mq list
Free up the link list data structures as well as properly
account for data sizes.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-04-12 10:41:42 -04:00
Jafar Al-Gharaibeh 92c4494ce5
Merge pull request #13145 from donaldsharp/do_delete
Improve and fix zebra GR
2023-04-04 21:10:54 -05:00