Section 4.1 of RFC8955 defines how the length field of flowspec NLRIs is encoded.
The method use implies a maximum length of 4095 for a single flowspec NLRI.
However, in bgp_flowspec.c, we check the length attribute of the bgp_nlri structure against this maximum value, which actually is the *total* length of all NLRI included in the considered MP_REACH_NLRI path attribute.
Due to this confusion, frr would reject valid announces that contain many flowspec NLRIs, when their cummulative length exceeds 4095, and close the session.
The proposed change removes that check entirely. Indeed, there is no need to check the length field of each invidual NLRI because the method employed make it impossible to encode a length greater than 4095.
Signed-off-by: Stephane Poignant <stephane.poignant@proton.ch>
When peer connections encounter errors, attempt to batch some
of the clearing processing that occurs. Add a new batch object,
add multiple peers to it, if possible. Do one rib walk for the
batch, rather than one walk per peer. Use a handler callback
per batch to check and remove peers' path-infos, rather than
a work-queue and callback per peer. The original clearing code
remains; it's used for single peers.
Signed-off-by: Mark Stapp <mjs@cisco.com>
EVPN imported routes AF is not AF_EVPN, in that case
the path info extra with EVPN is nver created.
In order to create evpn info under path extra, define
new api which does not specifically check for AF_EVPN
afi type.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Sometimes it's very useful to compare pointers from the gdb (and/or from the
logs) or just do some quick adhoc analysis.
```
donatas# sh ip bgp 1.1.1.0/24 internal
BGP routing table entry for 1.1.1.0/24, version 0
Paths: (1 available, no best path)
Not advertised to any peer
65002
127.0.0.1 (inaccessible, import-check enabled) from 127.0.0.1 (127.0.0.2)
Origin IGP, invalid, external
Last update: Thu Jan 16 16:49:53 2025
net: 0x63f3e6fc2ea0, path: 0x63f3e6fc2f50, pathext: 0x63f3e6faed00, attr: 0x63f3e6e8c550
flags net: 0x0, path: 0x1024, attr: 0x7
donatas#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This reverts commit ee1986f1b5.
The fix is incomplete, and is no longer needed with the fix that applies
the route-map for an aggregate and then compares the attribute.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
When stopping and restarting BGP daemon part of the configuration
remains. It should be cleared.
Particulary those are address-family parametes, like: distance,
ead-es-frag, disable-ead-evi-rx, disable-ead-evi-tx.
Signed-off-by: Yaroslav Kholod <y.kholod@vyos.io>
If we have a route-map that sets some attributes e.g. community or large-community,
and the route-map is applied for outgoing direction, everything is fine, but
we missed the point that `advertised-routes detail` was not using the applied
attributes to display and instead it uses what is received from the peer (original).
Let's fix this, and use what's already applied (advertise attributes), and
we can now see:
```
route-map r3 permit 10
match ip address prefix-list p1
set community 65001:65002
set extcommunity bandwidth 100
set large-community 65001:65002:65003
exit
!
...
address-family ipv4 unicast
neighbor 192.168.2.3 route-map r3 out
exit-address-family
...
```
The output:
```
r2# show bgp ipv4 neighbors 192.168.2.3 advertised-routes detail
BGP table version is 1, local router ID is 192.168.2.2, vrf id 0
Default local pref 100, local AS 65002
BGP routing table entry for 10.10.10.1/32, version 1
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.1.1 192.168.2.3
65001
0.0.0.0 from 192.168.1.1 (192.168.1.1)
Origin IGP, valid, external, best (First path received)
Community: 65001:65002
Extended Community: LB:65002:12500000 (100.000 Mbps)
Large Community: 65001:65002:65003
Last update: Thu Dec 19 17:00:40 2024
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This commit introduces meta queue to the BGP process_queue which is
helpful in having a priority of lists where some routes can be processed
earlier than 'other' routes. This is similar to how meta queue is
present in zebra.
After Fix:
---------
For testing, note that all 100.x routes are marked as Early routes which
got enqueued and dequeued first before Other routes in every batch of
updates. Also, the items are dequeued in FIFO order.
switch# cat /var/log/frr/bgpd.log | grep sub-queue
2024/12/06 19:19:42.788014 BGP: [V64FH-G6883] 88.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:42.856127 BGP: [V64FH-G6883] 100.90.9.186/32 queued into sub-queue Early Route
2024/12/06 19:19:42.856138 BGP: [V64FH-G6883] 100.90.9.187/32 queued into sub-queue Early Route
2024/12/06 19:19:42.886715 BGP: [V64FH-G6883] 66.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.022835 BGP: [V64FH-G6883] 33.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.058842 BGP: [V64FH-G6883] 44.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.092365 BGP: [V64FH-G6883] 55.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.540770 BGP: [ZAPXS-9754G] 100.90.9.186/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.541233 BGP: [ZAPXS-9754G] 100.90.9.187/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.541523 BGP: [ZAPXS-9754G] 88.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:43.602094 BGP: [V64FH-G6883] 88.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.649083 BGP: [V64FH-G6883] 100.90.9.186/32 queued into sub-queue Early Route
2024/12/06 19:19:43.649092 BGP: [V64FH-G6883] 100.90.9.187/32 queued into sub-queue Early Route
2024/12/06 19:19:43.649148 BGP: [V64FH-G6883] 77.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.712282 BGP: [V64FH-G6883] 100.90.9.138/32 queued into sub-queue Early Route
2024/12/06 19:19:43.712314 BGP: [V64FH-G6883] 100.90.9.139/32 queued into sub-queue Early Route
2024/12/06 19:19:43.817194 BGP: [V64FH-G6883] 100.90.8.58/32 queued into sub-queue Early Route
2024/12/06 19:19:43.817205 BGP: [V64FH-G6883] 100.90.8.59/32 queued into sub-queue Early Route
2024/12/06 19:19:43.942464 BGP: [ZAPXS-9754G] 100.90.9.186/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942530 BGP: [ZAPXS-9754G] 100.90.9.187/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942550 BGP: [ZAPXS-9754G] 100.90.9.138/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942738 BGP: [ZAPXS-9754G] 100.90.9.139/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942763 BGP: [ZAPXS-9754G] 100.90.8.58/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942788 BGP: [ZAPXS-9754G] 100.90.8.59/32 dequeued from sub-queue Early Route
2024/12/06 19:19:44.558611 BGP: [ZAPXS-9754G] 66.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:44.893541 BGP: [ZAPXS-9754G] 33.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.171794 BGP: [ZAPXS-9754G] 44.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.453137 BGP: [ZAPXS-9754G] 55.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.685269 BGP: [ZAPXS-9754G] 88.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.764752 BGP: [ZAPXS-9754G] 77.0.0.9/32 dequeued from sub-queue Other Route
With 'update-delay' feature (EOIU marker):
------------------------------------------
switch# vtysh -c "show run bgp" | grep update-delay
update-delay 40
switch# cat /var/log/frr/bgpd.log | grep sub-queue
2024/12/06 23:27:46.124461 BGP: [V64FH-G6883] 22.0.0.9/32 queued into sub-queue Other Route
2024/12/06 23:27:46.160224 BGP: [V64FH-G6883] 100.90.8.11/32 queued into sub-queue Early Route
2024/12/06 23:27:46.219663 BGP: [W9QTR-P4REP] EOIU Marker queued into sub-queue EOIU Marker
2024/12/06 23:27:46.269711 BGP: [ZAPXS-9754G] 100.90.8.11/32 dequeued from sub-queue Early Route
2024/12/06 23:27:46.270980 BGP: [ZAPXS-9754G] 22.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 23:27:46.404868 BGP: [RBX2V-K33CZ] EOIU Marker dequeued from sub-queue EOIU Markera
Ticket: #4200787
Signed-off-by: Karthikeya Venkat Muppalla <kmuppalla@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is handy when you need to do source matching e.g. `match src-peer ...`
on outgoing direction with a route-map.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
There is a need to be able to process certain bgp
routes earlier than others. Especially when there
is major trauma going on in the network. Start
the ability for this to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
evpn has a concept of `local` tables where the evpn routes
are actually converted into underlying routes/neighbor
table entries( or vice versa ). Then this local route
is propagated to the global evpn l2vpn table and sent
to the peers. Certain show commands in evpn look
operate on the local table but make the output look
like the data has not been sent to the peer. This
is confusing for the operator. Modify the code
such that local tables get a `Local BGP table not advertised`
in the place where the code talks about whom has received
the data or not.
Example:
torm11# show bgp l2vpn evpn route vni 1000 mac 8a:a1:cc:73:a3:ac ip 45.0.0.5
BGP routing table entry for [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5]
Paths: (2 available, best #2)
Local BGP table not advertised
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf2) from leaf2(192.168.5.1) (192.168.100.14)
Origin IGP, valid, external
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf1) from leaf1(192.168.1.1) (192.168.100.13)
Origin IGP, valid, external, bestpath-from-AS 65101, best (Router ID)
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add bgp_path_info_labels_same() to compare labels with labels from
path_info. Remove labels_same() that was used for mplsvpn only.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The usage of the `bgp bestpath med missing-as-worst` command
was being accepted and applied during bestpath, but during output
of the routes affected by this it would not give any indication
that this was happening or what med value was being used.
Fixes: #15718
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently bgp_path_info's are stored in reverse order
received. Sort them by the best path ordering.
This will allow for optimizations in the future on
how multipath is done.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This will allow a consistency of approach to adding/removing
pi's to from the workqueue for processing as well as properly
handling the dest->info pi list more appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a new flag BGP_PATH_UNSORTED to keep track
of sorted -vs- unsorted path_info's. Add some
ability to the system to understand when that
flag is set.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running `show ip bgp` command, the 'route short status' and
'network' columns do not have white-space between them.
Old show:
Network Next Hop Metric LocPrf Weight Path
*>i1.1.1.1/32 10.1.12.111 0 100 0 i
New show:
Network Next Hop Metric LocPrf Weight Path
*>i 1.1.1.1/32 10.1.12.111 0 100 0 i
Added white-space to enhance readability between them.
Signed-off-by: Cassiano Campes <cassiano.campes@venkonetworks.com>
Structure size of bgp_path_info_extra when compiled
with vnc is 184 bytes. Reduce this size to 72 bytes
when compiled w/ vnc but not necessarily turned
on vnc.
With 2 full bgp feeds this saves aproximately 100mb
when compiling with vnc and not using vnc.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Without this change when we change the route-map, we never reinstall the route
if the route-map has changed.
We checked only some attributes like aspath, communities, large-communities,
extended-communities, but ignoring the rest of attributes.
With this change, let's check if the route-map has changed.
bgp_route_map_process_update() is triggered on route-map change, and we set
`changed` to true, which treats aggregated route as not the same as it was before.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
moved loc-rib uptime field "bgp_rib_uptime" to struct bgp_path_info_extra for memory concerns
moved logic into bgp_route_update's callback bmp_route_update
written timestamp in per peer header
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
added time_t field to bgp_path_info
set value before bgp dp hook is called
value not set in the msg yet, testing and double checking is needed before
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
set peer type flag to 3 for loc rib monitoring
leave to 0 in other cases like before, even though RFC7854 tells us to set it to 0 1 or 2 depending on the case global/rd/local instance
Signed-off-by: Maxence Younsi <mx.yns@outlook.fr>
But never really does due to locking, but since it can
we need to treat it like it does and ensure that FRR
is not making a mistake, by using memory after it
has been freed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>