Assuming attr is null, a dereference can happen in the function
make_prefix(). Add the protection over attr before accessing the
variable.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When exporting a VPN SRv6 route, the path may not be considered valid if
the nexthop is not valid. This is the case when the 'nexthop vpn export'
command is used. The below example illustrates that the VPN path to
2001:1::/64 is not selected, as the expected nexthop to find in vrf10 is
the one configured:
> # show running-config
> router bgp 1 vrf vrf10
> address-family ipv6 unicast
> nexthop vpn export 2001::1
> # show bgp ipv6 vpn
> [..]
> Route Distinguisher: 1:10
> 2001:1::/64 2001::1@4 0 0 65001 i
> UN=2001::1 EC{99:99} label=16 sid=2001:db8:1:1:: sid_structure=[40,24,16,0] type=bgp, subtype=5
The analysis indicates that the 2001::1 nexthop is considered.
> 2025/03/20 21:47:53.751853 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0
> 2025/03/20 21:47:53.751855 BGP: [VWNP2-DNMFV] Found existing bnc 2001::1/128(0)(VRF vrf10) flags 0x82 ifindex 0 #paths 2 peer 0x0, resolved prefix UNK prefix
> 2025/03/20 21:47:53.751856 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is not valid (in VRF vrf10)
> 2025/03/20 21:47:53.751857 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change
Actually, to check the nexthop validity, only the source path in the VRF
has the correct nexthop. Fix this by reusing the source path information
instead of the current one.
> 2025/03/20 22:43:51.703521 BGP: [RD1WY-YE9EC] leak_update: entry: leak-to=VRF default, p=2001:1::/64, type=10, sub_type=0
> 2025/03/20 22:43:51.703523 BGP: [VWNP2-DNMFV] Found existing bnc fe80::b812:37ff:fe13:d441/128(0)(VRF vrf10) flags 0x87 ifindex 0 #paths 2 peer 0x0, resolved prefix fe80::/64
> 2025/03/20 22:43:51.703525 BGP: [VWC2R-4REXZ] leak_update_nexthop_valid: 2001:1::/64 nexthop is valid (in VRF vrf10)
> 2025/03/20 22:43:51.703526 BGP: [HX87B-ZXWX9] leak_update: ->VRF default: 2001:1::/64: Found route, no change
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The resulting VPN prefix of a BGP route from a L3VPN in an srv6 setup
is not advertised to remote devices.
> r1# show bgp ipv6 vpn
> BGP table version is 2, local router ID is 1.1.1.1, vrf id 0
> Default local pref 100, local AS 65500
> Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
> i internal, r RIB-failure, S Stale, R Removed
> Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
> Origin codes: i - IGP, e - EGP, ? - incomplete
> RPKI validation codes: V valid, I invalid, N Not found
>
> Network Next Hop Metric LocPrf Weight Path
> Route Distinguisher: 1:10
> 2011:1::/64 2001:1::2@6< 0 100 0 i
> UN=2001:1::2 EC{99:99} label=4096 sid=2001:db8:1:1:: sid_structure=[40,24,8,0] type=bgp, subtype=5
What happens is that the SID of this BGP update is used as nexthop.
Consequently, the prefix is not valid because of nexthop unreachable.
obviously the locator prefix is not reachable in that L3VRF, and the
real nexthop 2001:1::2 should be used.
> r1# show bgp vrf vrf10 nexthop detail
> Current BGP nexthop cache:
> 2001:db8:1:1:100:: invalid, #paths 1
> Last update: Fri Mar 14 21:18:59 2025
> Paths:
> 2/3 2011:1::/64 RD 1:10 VRF default flags 0x4000
Fix this by considering the SID of a given BGP update, only if the SID
is not ours.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The `clear bgp *` and the interface down events
cause a global clearing of data from the bgp rib.
Let's tie those into the clear peer code such
that we can take advantage of the reduced load
in these cases too.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This reverts commit 58592be577.
This commit is being reverted because of several issues:
a) tcpdump -i <any interface that bgp happens to use>
This command causes bgp to dump it's entire table to all
of it's peers again. This is a huge problem in any type
of scaled environment *and* it is not unusual to have an
operator do this.
b) This commit appears to be attempting to solve the problem
with route leaking across vrf's using labels( or somesuch ).
Unfortunately we have absolutely no topotests that show the
behavior. I am also unable to get any type of how to reproduce
the problem being solved by the commit. I do know, though,
that the problem really stems from the fact that bgp has
decided to cheat and not create bnc's for route leaking.
Thus when a nexthop changes, bgp is not being notified.
This commit was being used as a hammer to solve the problem.
While I do agree backing out a bug fix for some operator
is less then ideal, I believe that since I cannot get the
operator to tell me the problem it solved and the fact
that sending large amounts of updates with just a simple
tcpdump command ( actually 2 one for tcpdump start and
one for finishing ) is more detrimental in my eyes at
this point in time. Additionally the solution used
is the wrong one for the problem.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Even if we have unnumbered peering, let's respect `no neighbor X capability link-local`
and disable it per-neighbor on demand.
Fixes: db853cc97e ("bgpd: Implement Link-Local Next Hop capability")
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently when a directly connected peer is going down
BGP gets a call back for nexthop tracking in addition
the interface down events. On the interface down
event BGP goes through and sets up a per peer Q that
holds all the bgp path info's associated with that peer
and then it goes and processes this in the future. In
the meantime zebra is also at work and sends a nexthop
removal event to BGP as well. This triggers a complete
walk of all path info's associated with the bnc( which
happens to be all the path info's already scheduled
for removal here shortly). This evaluate paths
is not an inexpensive operation in addition the work
for handling this is already being done via the
peer down queue. Let's optimize the bnc handling
of evaluate paths and check to see if the peer is
still up to actually do the work here.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Related: https://datatracker.ietf.org/doc/html/draft-white-linklocal-capability
TL;DR; use 16 bytes long next-hops for point-to-point (unnumbered) links instead
of sending 32 bytes (::/LL, GUA/LL, LL/LL combinations).
For backward compatiblity we should handle even 32 bytes existing next hops.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When a BGP listener configured with BMP receives the first BGP
IPv6 update from a connected BGP IPv6 peer, the BMP collector
receives a withdraw post-policy message.
> {"peer_type": "route distinguisher instance", "policy": "post-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111962", "bmp_log_type": "withdraw", "afi": 2,
> "safi": 1, "ip_prefix": "2001::1125/128", "seq": 22}
> {"peer_type": "route distinguisher instance", "policy": "pre-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111963", "bmp_log_type": "update", "origin":
> "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3",
> "nxhp_link-local": "fe80::7063:d8ff:fedb:9e11", "ip_prefix": "2001::1125/128", "seq": 23}
Actually, the BGP update is not valid, and BMP considers it as a
withdraw message. The BGP upate is not valid, because the nexthop
reachability is unknown at the time of reception, and no other
BMP message is sent.
Fix this by re-sending a BMP post update message when nexthop
tracking becomes successfull. Generalise the re-sending of
messages when nexthop tracking changes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When receiving an SRv6 BGP update, the nexthop tracking is used
to find out the reachability of the BGP update.
> # show bgp ipv6 vpn fd00:200::/64
> Paths: (1 available, best #1)
> [..]
> 4:4::4:4 from 4:4::4:4 (4.4.4.4)
> Origin incomplete, metric 0, localpref 100, valid, internal, best (First path received)
> Extended Community: RT:52:100
> Remote label: 16
> Remote SID: 2001:db8:f4::
> Last update: Mon Mar 11 11:50:04 2024
The IPv6 address used is the "Remote SID". Actually, this value is
incomplete. Remote SID stands for the attribute value received in BGP,
while the label value determines a complement of SRv6 SID value. The
transposition technique authorises that in BGP, and in the above case,
the incoming BGP update has used the transposition length.
When there is a transposition in the SID attribute available, use the
real SID address. The nexthop tracking will use that forged address.
> # show bgp nexthop
> Current BGP nexthop cache:
> 4:4::4:4 valid [IGP metric 30], #paths 0, peer 4:4::4:4
> gate fe80::dced:1ff:fed6:878c, if ntfp3
> Last update: Mon Mar 11 11:50:02 2024
> 2001:db8:f4:1:: valid [IGP metric 0], #paths 2
> gate fe80::dced:1ff:fed6:878c, if ntfp3
Fixes: 26c747ed6c ("bgpd: extend make_prefix to form srv6-based prefix")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Introduce a command to stop bgpd from enabling IPv6 router advertisement
messages sending on interfaces.
Signed-off-by: Mikhail Sokolovskiy <sokolmish@gmail.com>
Without this patch:
```
r1# sh ip bgp vrf CUSTOMER-A
BGP table version is 1, local router ID is 20.20.20.0, vrf id 4
Default local pref 100, local AS 65000
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
192.168.2.0/24 192.168.179.5@0<
0 0 479 ?
Displayed 1 routes and 1 total paths
r1#
```
This is because the route is imported, next-hop is in a default VRF, and we should
evaluate an ultimate path info, not the current path info.
After:
```
r1# sh ip bgp vrf CUSTOMER-A
BGP table version is 1, local router ID is 20.20.20.0, vrf id 4
Default local pref 100, local AS 65000
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 192.168.2.0/24 192.168.179.5@0<
0 0 479 ?
Displayed 1 routes and 1 total paths
r1#
```
In both cases next-hop in cache table is valid:
```
r1# sh ip bgp nexthop
Current BGP nexthop cache:
192.168.179.5 valid [IGP metric 0], #paths 2, peer 192.168.179.5
Resolved prefix 192.168.179.0/24
if r1-eth0
Last update: Thu Sep 5 11:24:37 2024
r1#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
On a multihomed setup with colored bgp updates, when the primary
PE goes offline, only a small subset of colored bgp routes are
not switching to the secondary pe.
When a switchover happens, due to a remote IP becoming unreachable,
some nexthop tracking down notifications are sent, but those messages
are completely ignored for colored bgp updates.
The original code has been thought for mounting up the SR-TE service,
when IP reachability is ok, but not when services goes offline.
Fix this by extending the down notification mechanism for colored routes
too.
Fixes: 545aeef1d1 ("bgpd: extend the NHT code to understand SR-TE colors")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When the SR-TE service is off, colored BGP routes are not
selected if it is recursively resolved over routes that are
colored only.
Actually, a BGP nexthop context includes the color attribute;
when an update from ZEBRA is received, there is no color, and
the colored BGP nexthop contexts are parsed, only if there
is a non colored BGP nexthop context. The actual setup shows
this may not be the case every time.
Fix this by parsing all the colored BGP nexthop contexts.
Fixes: b8210849b8 ("bgpd: Make bgp ready to remove distinction between 2 nh tracking types")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a peer sends an IPv4-mapped IPv6 global and a IPv6 link-local
nexthop, prefer the link-local unless a route-map tells to use the
global.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
SRTE_COLOR is not defined at all as an attribute, it was a mistake from the
beginning.
SRTE_COLOR is extended community, can't see the reason having it as a community,
and a separate attribute.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This will allow a consistency of approach to adding/removing
pi's to from the workqueue for processing as well as properly
handling the dest->info pi list more appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The nexthop tracking never displays the prefix that
has been used in ZEBRA to resolve its nexthop. This
information will be useful if some decision has to be
taken regarding any loops, that is to say if for instance
a BGP prefix is resolved over a prefix in ZEBRA that is
exactly the same.
Store the value in bgp nexthop context, and display it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In make_prefix, the code checks to see if the pi->attr
is non-NULL. Since (A) we cannot have a path_info without
an attribute and (B) all paths leading up to the test
in make_prefix already have pi->attr deref'ed and the
code is not crashing we know this is safe to remove.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move mp_nexthop_prefer_global boolean attribute to nh_flags. It does
not currently save memory because of the packing.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In the case of SRv6-VPN we track the reachability
to the SID. We check that the SID is available
in the BGP update and then we check the nexthop
reachability.
Fixes 7f8c7d9 ("bgpd: ignore nexthop validation for srv6-vpn")
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Enable the SRv6 SID prefix generation in make_prefix()
function of bgp_nht.c.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
fixup: bgpd: extend make_prefix to form srv6-based prefix
The following configuration creates an infinite routing leaking loop
because 'rt vpn both' parameters are the same in both VRFs.
> router bgp 5227 vrf r1-cust4
> no bgp network import-check
> bgp router-id 192.168.1.1
> address-family ipv4 unicast
> network 28.0.0.0/24
> rd vpn export 10:12
> rt vpn both 52:100
> import vpn
> export vpn
> exit-address-family
> !
> router bgp 5227 vrf r1-cust5
> no bgp network import-check
> bgp router id 192.168.1.1
> address-family ipv4 unicast
> network 29.0.0.0/24
> rd vpn export 10:13
> rt vpn both 52:100
> import vpn
> export vpn
> exit-address-family
The previous commit has added a routing leak update when a nexthop
update is received from zebra. It indirectly calls
bgp_find_or_add_nexthop() in which a static route triggers a nexthop
cache entry registration that triggers a nexthop update from zebra.
Do not register again the nexthop cache entry if the BGP_STATIC_ROUTE is
already set.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
If 'bgp network import-check' is defined on the source BGP session,
prefixes that are defined with the network command cannot be leaked to
the other VRFs BGP table even if they are present in the origin VRF RIB
if the 'rt import' statement is defined after the 'network <prefix>'
ones.
When a prefix nexthop is updated, update the prefix route leaking. The
current state of nexthop validation is now stored in the attributes of
the bgp path info. Attributes are compared with the previous ones at
route leaking update so that a nexthop validation change now triggers
the update of destination VRF BGP table.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
"if not XX else" statements are confusing.
Replace two "if not XX else" statements by "if XX else" to prepare next
commits. The patch is only cosmetic.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This rework separates l3nhg functionality from the nexthop
tracking code, by introducing two bgp_nhg.[ch] files. The
calling functions are renamed from bgp_l3nhg* to bgp_nhg*.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In some cases BGP can be monitoring the same prefix
in both the nexthop and import check tables. If this
is the case, when unregistering one bnc from one table
make sure we are not still registered in the other
Example of the problem:
r1(config-router)# address-family ipv4 uni
r1(config-router-af)# no network 192.168.100.41/32
r1(config-router-af)# exit
r1# show bgp import-check-table
Current BGP import check cache:
r1# show bgp nexthop
Current BGP nexthop cache:
192.168.100.41 valid [IGP metric 0], #paths 1, peer 192.168.100.41
if r1-eth0
Last update: Wed Dec 6 11:01:40 2023
BGP now believes it is only watching 192.168.100.41 in the nexthop
cache, but zebra doesn't have anything:
r1# show ip import-check
VRF default:
Resolve via default: on
r1# show ip nht
VRF default:
Resolve via default: on
So if anything happens to the route that is being matched for
192.168.100.41 bgp is no longer going to be notified about this.
The source of this problem is that zebra has dropped the two different
tables into 1 table, while bgp has 2 tables to track this. The solution
to this problem (other than the rewrite that is being done ) is to have
BGP have a bit of smarts about looking in both tables for the bnc and
if found in both don't send the delete of the prefix tracking to zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In zebra, the import check table and the nexthop check tables
were combined. This leaves an issue where when bgp happens
to have a tracked address in both the import check table
and the nexthop track table that are the same address.
When the the item is removed from one table the call
to remove it from zebra removes tracking for the other
table.
Combine the two tables together and keep track where
they came from for processing in bgpd.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>