Trigger:
Imagine a route utilizing an NHG with six nexthops (Intf swp1-swp6).
If interfaces swp1-swp4 flaps, the NHG remains the same but now only
references two nexthops (swp5-6) instead of all six. This behavior
occurs due to how NHGs with recursive nexthops are managed within Zebra.
In the scenario below, NHG 370 has all six nexthops installed in the
kernel. However, Zebra maintains a list of recursive NHGs that NHG 370
references i.e., Depends: (371), (372), (373) which are not directly
installed in the kernel.
- When an interface comes up, its nexthop and corresponding dependents
are installed.
- These dependents (counterparts to 371-373) are non-recursive and
are installed as well.
- However, when attempting to install the recursive ones in
zebra_nhg_install_kernel(), they resolve to the already installed
counterparts, resulting in a NO-OP.
Fixing this by iterating all dependents of the recursively resolved
NHGs and reinstalling them.
Trigger: Flap swp1 to swp4
Before Fix:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 370 proto bgp metric 20
ip -d next show
id 337 via 2000:1:0:1:0:f:0:9 dev swp6 scope link proto zebra
id 339 via 2000:1:0:1:0:e:0:9 dev swp5 scope link proto zebra
id 341 via 2000:1:0:1:0:8:0:8 dev swp4 scope link proto zebra
id 343 via 2000:1:0:1:0:7:0:8 dev swp3 scope link proto zebra
id 346 via 2000:1:0:1:0:1:0:7 dev swp2 scope link proto zebra
id 348 via 2000:1:0:1::7 dev swp1 scope link proto zebra
id 370 group 346/348/341/343/337/339 scope global proto zebra
After Trigger:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 370 proto bgp metric 20
root@leaf-11:mgmt:/var/home/cumulus# ip -d next show
id 337 via 2000:1:0:1:0:f:0:9 dev swp6 scope link proto zebra
id 339 via 2000:1:0:1:0:e:0:9 dev swp5 scope link proto zebra
id 370 group 337/339 scope global proto zebra
After Fix:
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 432 proto bgp metric 20
ip -d next show
id 432 group 395/397/400/402/405/407 scope global proto zebra
After Trigger
root@leaf-11:mgmt:/var/home/cumulus# ip route show | grep 6.0.0.5
6.0.0.5 nhid 432 proto bgp metric 20
root@leaf-11:mgmt:/var/home/cumulus# ip -d next show
id 432 group 395/397/400/402/405/407 scope global proto zebra
Ticket :#
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I noticed that there was some missed code coverage in zebra.
multicast [enable|disable]
and
show interface description vrf all
Add a bit to get it covered.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add the abilty to track how much time is spent in routemaps.
Example of the new output:
eva# show route-map
ZEBRA:
route-map: FOO Invoked: 1000000 (323 milliseconds total) Optimization: enabled Processed Change: false
deny, sequence 10 Invoked 1000000 (320 milliseconds total)
Match clauses:
Set clauses:
Call clause:
Action:
Exit routemap
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Signed-off-by: Mark Stapp <mjs@cisco.com>
Zebra currently does a shortest prefix match for
resolving nexthops for a prefix. This is typically
an ok thing to do but fails in several specific scenarios.
If a nexthop matches to a route that is not usable, nexthop
resolution just gives up and refuses to use that particular
route. For example if zebra currently has a covering prefix
say a 10.0.0.0/8. And about the same time it receives a
10.1.0.0/16 ( a more specific than the /8 ) and another
route A, who's nexthop is 10.1.1.1. Imagine the 10.1.0.0/16
is processed enough to know we want to install it and the
prefix is sent to the dataplane for installation( it is queued )
and then route A is processed, nexthop resolution will fail
and the route A will be left in limbo as uninstallable.
Let's modify the nexthop resolution code in zebra such that
if a nexthop's most specific match is unusable, continue looking
up the table till we get to the 0.0.0.0/0 route( if it's even
installed ). If we find a usable route for the nexthop accept
it and use it.
The bgp_default_originate topology test is frequently failing
with this exact problem:
B>* 0.0.0.0/0 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B 1.0.1.17/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 1.0.2.17/32 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
C>* 1.0.3.17/32 is directly connected, lo, 00:02:00
B>* 1.0.5.17/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B>* 192.168.0.0/24 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B 192.168.1.0/24 [200/0] via 192.168.1.1 inactive, weight 1, 00:00:21
C>* 192.168.1.0/24 is directly connected, r2-r1-eth0, 00:02:00
C>* 192.168.2.0/24 is directly connected, r2-r3-eth1, 00:02:00
B>* 192.168.3.0/24 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B 198.51.1.1/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 198.51.1.2/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
Notice that the 1.0.1.17/32 route is inactive but the nexthop
192.168.0.1 is covered by both the 192.168.0.0/24 prefix( shortest match )
*and* the 0.0.0.0/0 route ( longest match ). When looking at the logs
the 1.0.1.17/32 route was not being installed because the matching
route was not in a usable state, which is because the 192.168.0.0/24
route was in the process of being installed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Don't hard-code a sharpd nhg id: those values aren't stable
if the daemons/protos/route-types change. Use json show output
to find the id in the 'resilient' nhg test case in
the all_protocol_startup suite.
Signed-off-by: Mark Stapp <mjs@labn.net>
The route replace test was doing this seq of events:
a) Create nhg
b) Install route w/ sharpd
c) Ensure it worked
d) Modify nhg
d) Ensure the update group replace worked
The problem is that the sharp code is doing this:
/* Only send via ID if nhgroup has been successfully installed */
if (nhgid && sharp_nhgroup_id_is_installed(nhgid)) {
SET_FLAG(api.message, ZAPI_MESSAGE_NHG);
api.nhgid = nhgid;
} else {
for (ALL_NEXTHOPS_PTR(nhg, nh)) {
api_nh = &api.nexthops[i];
zapi_nexthop_from_nexthop(api_nh, nh);
i++;
}
api.nexthop_num = i;
}
The created nhg has not been successfully installed( or at least
sharpd has not read the results yet) when it gets the command
to install the routes. As such it passes down the individual
nexthops instead. The route replace is never going to work.
Modify the code to add a bit of sleep to allow sharpd to
get notified when the system is under load. At this point
there is no way to query sharpd for whether or not it
thinks it's nhg is installed properly or not. This
test is failing all over the place for a bunch of people
let's get this fixed so people can get running
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
the test_nexthop_groups function is failing occassionally
because the test executes 4 in succession sharp install
routes commands. When I dumped the rib on a failed test
run there were only 2 of the 4 routes in the rib and
the two that were in were the last 2 installed.
The sharp daemon setups a event process where it
installs routes `automatically`. If the previous
run is not finished entering a new command to install
the routes will mess up the last one from ever happening.
It is assumed that the user doesn't do stupid stuff here.
In this case I am just adding a small sleep between each
installation to just let the test proceed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The nexthop group code is installing routes and nexthop groups
and immediately expecting zebra to have processed the results
as a result there is a situation when the CI system is under
intense load that the nexthop group might not have been processed.
Add a bit of code to allow the test to give FRR some time
to finish work before declaring it not working.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Any command that uses `peer_lookup_in_view` crashes when "vrf all" is
used, because bgp is NULL in this case.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Remove references to the deprecated "CLI()" function; clean up
a couple of string escapes; make one test-case sensitive to
previous failures.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Fix issue of topotest failures with BGP status Connect or Idle
instead of the expected Active
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Add a terse option to show bgp summary to shorten output.
Do not show the following information about the BGP
instances: the number of RIB entries, the table version and the used memory.
The "terse" option can be used in combination with the "remote-as", "neighbor",
"failed" and "established" filters, and with the "wide" option as well.
Before patch:
ubuntu# show bgp summary remote-as 123456
IPv4 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
BGP table version 0
RIB entries 3, using 552 bytes of memory
Peers 5, using 3635 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.200.200.2 4 123456 81432 4 0 56092 0 00:00:13 572106 0 N/A
Displayed neighbors 1
Total number of neighbors 4
IPv6 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
BGP table version 0
RIB entries 3, using 552 bytes of memory
Peers 5, using 3635 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
% No matching neighbor
Total number of neighbors 5
After patch:
ubuntu# show bgp summary remote-as 123456 terse
IPv4 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.200.200.2 4 123456 81432 4 0 56092 0 00:00:13 572106 0 N/A
Displayed neighbors 1
Total number of neighbors 4
IPv6 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 1
% No matching neighbor
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When filtering sessions on show bgp summary with failed, established,
neighbor and remote-as options, add a counter of displayed neighbors
in addition to the total number of neighbor :
"Displayed neighbors X"
ubuntu# show bgp summary failed remote-as external
IPv4 Unicast Summary (VRF default):
Neighbor EstdCnt DropCnt ResetTime Reason
10.200.200.2 0 0 never Waiting for NHT
172.16.29.2 0 0 never Waiting for NHT
10.22.1.2 0 0 never Waiting for NHT
Displayed neighbors 3
Total number of neighbors 5
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Display on which VRF/view the neighbor was not found. Useful when
selecting "vrf all".
Before patch:
No such neighbor in this view/vrf
After patch:
No such neighbor in VRF default
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Modify VRF/view display in show bgp summary:
- to be more concise
- to display on which VRF/view no neighbor was found
Before patch:
ubuntu# show bgp vrf all summary
Instance default:
IPv4 Unicast Summary:
BGP router identifier XX.XX.XX.XX, local AS number XXXX vrf-id 0
(...)
IPv6 Unicast Summary:
Instance private:
IPv4 Unicast Summary:
ubuntu# show bgp vrf all ipv4 multicast summary
% No BGP neighbors found
% No BGP neighbors found
After patch:
ubuntu# show bgp vrf all summary
IPv4 Unicast Summary (VRF default):
BGP router identifier XX.XX.XX.XX, local AS number XXXX vrf-id 0
(...)
IPv6 Unicast Summary (VRF default):
(...)
IPv4 Unicast Summary (VRF private):
(...)
ubuntu# show bgp vrf all ipv4 multicast summary
% No BGP neighbors found in VRF default
% No BGP neighbors found in VRF private
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add ability to filter session on show bgp summary by neighbor or
remote AS:
ubuntu# show bgp summary ?
neighbor Show only the specified neighbor session
remote-as Show only the specified remote AS session
ubuntu# show bgp summary neighbor ?
A.B.C.D Neighbor to display information about
WORD Neighbor on BGP configured interface
X:X::X:X Neighbor to display information about
ubuntu# show bgp summary remote-as ?
(1-4294967295) AS number
external External (eBGP) AS sessions
internal Internal (iBGP) AS sessions
This patch includes the documentation and the topotest.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Change every `-` to `_` in directory names. This is to avoid mixing _ and -.
Just for consistency and directory sorting properly.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-05-11 14:14:26 +03:00
Renamed from tests/topotests/all-protocol-startup/test_all_protocol_startup.py (Browse further)