When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l
3570
3570 hashes for bitmaps associated with the vrf. This is a very
large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2
instead of 32.
b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal
operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
According to RFC 7166, the sequence number should be treated as an
unsigned 64-bit value, although it is stored as two 32-bit values.
When incrementing it, the code caused the lower-order 32-bit value
to skip from 0xFFFFFFFE to 0. As a side effect, an error was never
produced if the full 64-bit sequence number wrapped.
Fixes: #13805
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Commit 76249532fa ("ospf6d: Handle Premature Aging of LSAs") added a
duplicate call to OSPF6_INTRA_PREFIX_LSA_EXECUTE_TRANSIT(), when the
interface state changes to "Down".
Fixes: #1738
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Change timestamp parameter from int to time_t to avoid truncation.
Found by Coverity Scan (CID 1563226 and 1563222)
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This command makes unplanned GR more reliable by manipulating the
sending of Grace-LSAs and Hello packets for a certain amount of time,
increasing the chance that the neighboring routers are aware of
the ongoing graceful restart before resuming normal OSPF operation.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In practical terms, unplanned GR refers to the act of recovering
from a software crash without affecting the forwarding plane.
Unplanned GR and Planned GR work virtually the same, except for the
following difference: on planned GR, the router sends the Grace-LSAs
*before* restarting, whereas in unplanned GR the router sends the
Grace-LSAs immediately *after* restarting.
For unplanned GR to work, ospf6d was modified to send a
ZEBRA_CLIENT_GR_CAPABILITIES message to zebra as soon as GR is
enabled. This causes zebra to freeze the OSPF routes in the RIB as
soon as the ospf6d daemon dies, for as long as the configured grace
period (the defaults is 120 seconds). Similarly, ospf6d now stores in
non-volatile memory that GR is enabled as soon as GR is configured.
Those two things are no longer done during the GR preparation phase,
which only happens for planned GRs.
Unplanned GR will only take effect when the daemon is killed
abruptly (e.g. SIGSEGV, SIGKILL), otherwise all OSPF routes will be
uninstalled while ospf6d is exiting. Once ospf6d starts, it will
check whether GR is enabled and enter in the GR mode if necessary,
sending Grace-LSAs out all operational interfaces.
One disadvantage of unplanned GR is that the neighboring routers
might time out their corresponding adjacencies if ospf6d takes too
long to come back up. This is especially the case when short dead
intervals are used (or BFD). For this and other reasons, planned
GR should be preferred whenever possible.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ospf6 is started up and SPF is run depending on which route is
selected as the parent route we could miss adding a NH. If one
possible parent route has two equal cost paths and the second possible
parent route has only one depending on which one is selected first
determines if we have have one or two NHs.
In the network below when creating a route 2001:db8:3:4::/64 in R2.
When SPF is run there are two possible parent routes R3 and R4.
2001:db8:1:2 +-----+ 2001:db8:2:5
+--------------+ 2 +---------------+
| ::2 | | ::2 |
| +-----+ |
| |
::1| |
+-----+ |::5
| 1 |2001:db8:1:3+-----+2001:db8:3:5+-----+2001:db8:5:6+-----+
| +------------+ 3 +------------+ 5 +------------+ 6 |
+-----+ ::1 ::3 | |::3 ::5 | |::5 ::6| |
::1| +-----+ +-----+ +-----+
| |::3
| | 2001:db8:3:4
| |
| |::4
| 2001:db8:1:4 +-----+
+--------------+ 4 |
::4 | |
+-----+
The problem was if we first created the route to 2001:db8:3:4::/64 with R3
as the parent route all is fine. The code was merging the NH from the parent
route and R3 has 2 NH, one pointing to R1 and one to R5. But if route
2001:db8:3:4::/64 was first created with parent as R4, it has only one NH
pointing to R1, and then later a new vertex was created pointing to R3 the
code would only copy the nhs from the vertex not from the parent route. The
vertex always has just one NH. But the parent route could have more. So
when we would bringup this setup one time we would see one NH for
2001:db8:3:4::/64 and the next time we would see two NHs. The code has been
modified so that it behaves the same when the route is first created, or when
a vertex is created, it selects the NHs from the parent route.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a hash_clean_and_free() function as well as convert
the code to use it. This function also takes a double
pointer to the hash to set it NULL. Also it cleanly
does nothing if the pointer is NULL( as a bunch of
code tested for ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The GR code should check for topology changes only upon the receipt
of Router-LSAs and Network-LSAs. Other LSAs types don't affect the
topology as far as a restarting router is concerned.
This optimization reduces unnecessary computations when the
restarting router receives thousands of inter-area LSAs or external
LSAs while coming back up.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Upon exiting the GR mode, force reorigination of intra-area-prefix-LSAs
on all attached areas. This is to ensure all configured areas will have
an associated intra-area-prefix-LSA at the end of the GR procedures,
even if the area doesn't have any full adjacency.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This commit fixes a bug where self-originated NSSA/AS-External LSAs
would age out about one hour after exiting from the GR mode. The
reason is because received self-originated LSAs aren't registered
for periodic refreshing, so that needs to be done manually. Fix
this by explicitly reoriginating all NSSA/AS-External LSAs while
exiting from the GR mode.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
An ABR that is originating inter-area-prefix-LSAs should take into
account the fact that there might be self-originated LSAs for the
same prefixes that were originated prior to a graceful restart. When
that happens, the previous LSA-IDs should be reused to avoid having
duplicate LSAs.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 5340 says that Link-LSAs should not be originated for virtual
links. Add a check to prevent that from happening while exiting
from the GR mode.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Consider the case where a DR router is performing a graceful restart,
and all neighbors attached to the DR network have their priorities
set to zero.
According to RFC 3623, the router should reclaim its DR status while
coming back up once it receives a Hello packet from a neighbor
listing the router as the DR, and the associated interface is in
Waiting state.
The problem arises when the DR election starts. Since the router
is already elected the DR, and no BDR will be elected (since all
neighbors have their priorities set to zero), the AdjOk event won't
be triggered at the end of the DR election as it would normally
happen. That causes all neighbors reachable over the broadcast
interface to get stuck in the 2-Way state.
Fix this corner case by always triggering the AdjOk event at the
end of the DR election process when a GR is in progress. Triggering
the AdjOk event when not necessary should never be a problem as
the neighbor FSM is already prepared to deal with that.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 5340, Section 4.8.3 says:
"Prefixes having the NU-bit set in their PrefixOptions field should
be ignored by the inter-area route calculation".
Fix a bug where, in addition to the NU-bit, ospf6d was also ignoring
prefixes having the LA-bit set when computing inter-area routes. In
practice, this fixes interoperability issues with vendors that set
the LA-bit in loopback prefixes (among other cases).
While here, fix a copy-and-paste error where a log message wasn't
showing accurate information about what happened.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Do not assume that all redistributed routes have a nexthop address,
otherwise blackhole nexthops can be misinterpreted as IPv6 addresses,
leading to inconsistencies.
Also, change the signature of a few functions to allow const nexthop
addresses, such that in6addr_any can be used without type casts.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Originate AS-External LSAs with forwarding addresses whenever the
corresponding redistributed routes have a global nexthop address.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Don't directly use `time()` for generating sequence numbers for two
reasons:
1. `time()` can go backwards (due to NTP or time adjustments)
2. Coverity Scan warns every time we truncate a `time_t` variable for
good reason (verify that we are Y2K38 ready).
Found by Coverity Scan (CID 1519812, 1519786, 1519783 and 1519772)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Problem Statement:
=================
Memory leak backtraces
2022-11-23 01:51:10,525 - ERROR: ==842== 1,100 (1,000 direct, 100 indirect) bytes in 5 blocks are definitely lost in loss record 29 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv (ospf6_message.c:1038)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,526 - ERROR: ==842==
2022-11-23 01:51:10,524 - ERROR: Found memory leak in module ospf6d
2022-11-23 01:51:10,525 - ERROR: ==842== 220 (200 direct, 20 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv_master (ospf6_message.c:760)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv (ospf6_message.c:1036)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,525 - ERROR: ==842==
RCA:
====
These memory leaks are beacuse of last lsa in neighbour's request_list is not
getting freed beacuse of lsa lock. The last request has an addtional lock which
is added as a part of ospf6_make_lsreq, this lock needs to be removed
in order for the lsa to get freed.
Fix:
====
Check and remove the lock on the last request in all the functions.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
When using auth keys in ospfv3, there are some memory
leaks when you change the key or remove the interface
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ospf6_route_cmp_nexthops function was returning 0 for same
and 1 for not same. Let's reverse the polarity and actually make
the returns useful long term.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Commit 8f359e1593 removed a check that prevented the same route
from being added twice. In certain topologies, that change resulted in
the following infinite loop when adding an ASBR route:
ospf6_route_add
ospf6_top_brouter_hook_add
ospf6_abr_examin_brouter
ospf6_abr_examin_summary
ospf6_route_add
(repeat until stack overflow)
Revert the offending commit and update `ospf6_route_is_identical()` to
not do comparison using `memcmp()`.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Problem:
Delay in ospfv3 route installation when area gets converted to regular
from NSSA.
RCA:
when area gets converted from NSSA to normal the type-7(NSSA_LSAs)
gets flushed from the area, as a result the external routes
learnt from these type-7s gets removed. Once the area is moved
to nomral the type 5 lsas needs to flooded through the area
so that routes are re-learnt. however there is a delay in
flooding of these routes until these routes are refreshed.
Due to this there is delay installation of these routes.
Fix:
The Fix involves refreshing of the type 5 lsas once the area
is changed from nssa to regular area.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
donatas-pc# sh ipv6 ospf6 interface enp3s0
enp3s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.10.17/24
inet6: fe80::ca5d:fd0d:cd8:1bb7/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State Waiting, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 10(8.149), Dead 40, Retransmit 5
DR: 0.0.0.0 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
donatas-pc# con
donatas-pc(config)# int enp3s0
donatas-pc(config-if)# ipv6 ospf6 passive
donatas-pc(config-if)# do sh ipv6 ospf6 interface enp3s0
enp3s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.10.17/24
inet6: fe80::ca5d:fd0d:cd8:1bb7/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State Waiting, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
No Hellos (Passive interface)
DR: 0.0.0.0 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
donatas-pc(config-if)#
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
ospfd and ospf6d define the same metric-type route-map commands. Make
them have the same help string too.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem:
Multiple memory leaks in ospf6.
260 ==6637== 32 bytes in 1 blocks are definitely lost in loss record 5 of 24
261 ==6637== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
262 ==6637== by 0x4E8A1BF: qcalloc (memory.c:111)
263 ==6637== by 0x11EE27: ospf6_summary_add_aggr_route_and_blackhole (ospf6_asbr.c:2779)
264 ==6637== by 0x11EEBA: ospf6_originate_new_aggr_lsa (ospf6_asbr.c:2811)
265 ==6637== by 0x4E7C6A7: hash_clean (hash.c:325)
266 ==6637== by 0x11FA93: ospf6_handle_external_aggr_update (ospf6_asbr.c:3164)
267 ==6637== by 0x11FA93: ospf6_asbr_summary_process (ospf6_asbr.c:3386)
268 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
269 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
270 ==6637== by 0x119535: main (ospf6_main.c:228)
356 ==6637== 240 bytes in 12 blocks are indirectly lost in loss record 13 of 24
357 ==6637== at 0x4C2FE96: malloc (vg_replace_malloc.c:309)
358 ==6637== by 0x4E8A0DA: qmalloc (memory.c:106)
359 ==6637== by 0x13545C: ospf6_lsa_alloc (ospf6_lsa.c:724)
360 ==6637== by 0x1354E3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
361 ==6637== by 0x1355F2: ospf6_lsa_copy (ospf6_lsa.c:790)
362 ==6637== by 0x13B58B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
363 ==6637== by 0x13B58B: ospf6_dbdesc_recv (ospf6_message.c:1038)
364 ==6637== by 0x13B58B: ospf6_read_helper (ospf6_message.c:1838)
365 ==6637== by 0x13B58B: ospf6_receive (ospf6_message.c:1875)
366 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
367 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
368 ==6637== by 0x119535: main (ospf6_main.c:228)
RCA:
1. when the ospf6 area is being deleted, the neighbor related information
was not being cleaned up.
2. when aggr route gets deleted from rt_aggr_tbl the corrsponding summary
route attched to the aggr route was not being deleted.
Fix:
Added the ospf6_neighbor_delete in ospf6_area_delete to free the
neighbor related information and added ospf6_route_delete while
freeing external aggr route to free the summary route.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
Description:
Active GR count field is missing in json o/p
of 'show ipv6 ospf gr helper' command.
Issue: #12100
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
There are lib debugs being set but never show up in
`show debug` commands because there was no way to show
that they were being used. Add a bit of infrastructure
to allow this and then use it for `debug route-map`
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix issue #11839.
When the user defines a range in an area other than the backbone area, the
summary route will be announced to the backbone area as an inter-area LSA.
However, if the prefix defined in the range is the same prefix as a connected
route in that area, the LSA won't be announced to the backbone area.
This is because when ospf6d is originating the summary route for the
intra-area route, it finds the range configured by the user and tries to
suppress the route by deleting the existing summary route, which happens to be
the one created by the range.
Although the range definition is not necessary in this case, it should not
fail this use case. So let's just keep the summary route there if it is
created from the user defined range.
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
After all needed interfaces ( for example: interface "a1", vrf "vrf1", and
"a1" is binded to "vrf1" ) are ready/created, then restart/start frr. zebra
at startup will call `netlink_interface()` to process all interfaces and notify
all clients, but its calling `get_iflink_speed()` maybe fails for unexpected
order of the coming interfaces: when processing "a1", "vrf1" maybe is unknown
at that time. `if_zebra_speed_update()` timer is introduced to deal with this
order problem.
Currently only ospfd and ospf6d deal with this speed change to recalculated
route cost. ospfd can deal with this change, but ospf6d will wrongly missed it.
Since both `ipv6 ospf6 cost COST` and `auto-cost reference-bandwidth COST` are
not set, cost of this ospf6 interface should be calculated with interface
speed, but it is wrongly kept to `10`, which is based on interface speed being
`0` for it missed speed change. Further, ECMP function becomes invalid after
restart frr, beacuse some ospf6 interfaces of one ECMP are wrongly with cost
`10`.
To avoid missing, recalculate cost for ospf6 interfaces based on potentially
changed speed.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
1. topo test failure seen in below mentioned step of execution with
routes not synced with ABR
ospf6_gr_topo1/test_ospf6_gr_topo1.py::test_gr_rt3 - AssertionError:
"rt1" JSON output mismatches the expected result.
2. as experimental, increasing the sleep interval(21),
cleared the above step but failed in the step
FAILED ospf6_gr_topo1/test_ospf6_gr_topo1.py::test_gr_rt5 -
AssertionError: "rt2" JSON output mismatches the expected result
fix:
tuning retry parameter in check_routers cleared the topotest.
so, changing default value of ospf6 ABR task delay to 5 seconds.
Signed-off-by: Punith Kumar S <punith.shivakumar@sophos.com>
topology: C1--R1---R2---R3--C2
client C1 connected to router node R1
client C2 connected to router node R3
router nodes R1,R2 and R3 are back to back connected
area 0 configured between R1 and R2
R1: all routes of area 0 are learnt successfully
R2: all routes of area 0 are learnt successfully
area 1 configured between R2 and R3
R2: all routes are learnt from R3
R3: routes learnt from C1 on ABR router R2 does not get forward to R3
root cause: on interface start, ABR schedule task is missing.
fix: handle ABR schedule during interface start event
Signed-off-by: Punith Kumar S <punith.shivakumar@sophos.com>
It's possible for ospf6 to decide to delete a route after it's
removed all of the route's nexthops. It's ok to delete a prefix
alone - be a little more forgiving when preparing a route delete.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
OSPFv3 packets can be fragmented and up to 64k long, regardless of
interface MTU. Trying to size these buffers to MTU is just plain wrong.
To not make this a super intrusive change during the 8.3 release freeze,
just code this into ospf6_iobuf_size().
Since the buffer is now always 64k, don't waste time zeroing the entire
thing in receive; instead just zero kind of a "sled" of 128 bytes after
the buffer as a security precaution.
Fixes: #11298
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
a) Remove setting of thread pointer to NULL after
thread invocation, this is already done.
b) Use thread_is_scheduled()
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ospf6_is_valid_summary_addr function is checking
to see if a prefix is the default and also then double
comparing it against the v6 prefix part. No need to do this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If a end users does something like this:
int enp39s0
ipv6 ospf6 hello-interval 65535
And then the timer pops and we send the hello and immediately
if the end user does this:
ipv6 ospf6 hello-interval 5
The timer is not being reset and FRR waits the full 65k seconds
before sending the hello again, which then immediately sets
the next hello to go out in 5 seconds.
When FRR receives the new timer value, look at how much time
is left on the timer in seconds. If this value is greater
than the new hello timer, stop the timer and set it too that
value.
This should fix a CI system test failure found, where the
system is testing setting timer from things like 12 seconds
to 65k seconds then back down to 12 and that the ospf6 neighbor
relationship stays up.
The code was also changed from thread_add_event to thread_add_timer
in all cases. I am not sure what would happen if a show command
comes in for a thread timer remaining with an event instead of a timer
just make it consistent.
This was chased down because the support bundle showed this:
r0# show ipv6 ospf6 vrf all interface
r0-r1-eth0 is up, type BROADCAST
Interface ID: 6
Internet Address:
inet6: fe80::a4ea:d3ff:fe35:cef1/64
inet6: fd00::1/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 10
State DR, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 12(65480.960), Dead 48, Retransmit 5
And looking at the test code is doing stuff like this:
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config)# interface r1-r0-eth0
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 65535
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# no ipv6 ospf6 hello-interval
2022/05/16 17:08:16 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 1
2022/05/16 17:08:16 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 12
If the old timer value pops, the hello interval is set to 65k and never reset again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running `show ipv6 ospf6 interface` the hello timer period
is shown, but there is no indication on how much time is left
on the timer. Add a clue:
sharpd@eva ~/frr5 (master)> vtysh -c "show ipv6 ospf6 int"
enp39s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.119.224/24
inet6: 2603:6080:602:509e:9a14:998:b154:9e9/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State DR, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 10(2.652), Dead 40, Retransmit 5
DR: 192.168.122.1 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.
Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.
Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.
Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
ensure it is a created node, not a found node.
Refer to `isis_vertex_queue_insert()` of isisd, there
are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
is a found node, then free <searching_data>.
Refer to `aspath_intern()` of bgpd, there are many
examples of this case in bgpd.
Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
A router has some static routes and redistributes turned on.
"clear ipv6 ospf process" command is applied. Then static routes
are deleted. In 1 in 5 runs, AS-External LSAs are not getting removed
from the neighbors even though it gets removed from its own LSDB.
Because of the clear process command, MAX_AGE LSAs are advertised and
fresh LSAs are installed in the LSDB. When the MAX_LSAs are advertised
back to the same router as part of the flooding process, it gets added
to the LSUpdate list even though it comes inside the MinLSArrival time.
When the static routes get deleted, it removed the LSA from the
LSRetrans list but not from LSUpdate list. The LSAs present in the
LSUpdate list gets advertised when sending LS Updates.
When an old copy of an LSA is more recent than the new LSA, check if it
has come inside the MinLSArrival time before adding to the LSUpdate
list.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
ospf6_routemap_rule_match_interface uses route->ospf6 field for matching
so we must fill the field in our temporary variable.
Fixes#10911.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When an area-range command is applied in an ABR, the more specific prefixes
need to be removed.
r2# sh ipv6 ospf database
AS Scoped Link State Database
Type LSId AdvRouter Age SeqNum Payload
ASE 0.0.0.1 10.254.254.2 53 80000001 ::
ASE 0.0.0.2 10.254.254.2 51 80000001 2001:db8:1::/64
ASE 0.0.0.3 10.254.254.2 51 80000001 2001:db8:3::/64
ASE 0.0.0.4 10.254.254.2 51 80000001 2001:db8:2::/64
ASE 0.0.0.5 10.254.254.2 46 80000001 2001:db8:1::/64
ASE 0.0.0.6 10.254.254.2 46 80000001 2001:db8:3::/64
ASE 0.0.0.7 10.254.254.2 46 80000001 2001:db8:2::/64
ASE 0.0.0.8 10.254.254.2 41 80000001 2001:db8:3::/64
ASE 0.0.0.9 10.254.254.2 41 80000001 2001:db8:1000::1/128 <-- **
ASE 0.0.0.10 10.254.254.2 41 80000001 2001:db8:1000::2/128 <-- **
ASE 0.0.0.12 10.254.254.2 24 80000001 2001:db8:1000::/64
ASE 0.0.0.1 10.254.254.3 52 80000001 2001:db8:2::/64
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
Problem:
ospf6d crash is observed when lsack is received from the neighbour for
AS External LSA.
RCA:
The crash is observed in ospf6_decrement_retrans_count while decrementing
retransmit counter for the LSA when lsack is recived. This is because in
ospf6_flood_interace when new LSA is being added to the neighbour's list
the incrementing is happening on the received LSA instead of the already
present LSA in scope DB which is already carrying counters.
when this new LSA replaces the old one, the already present counters are
not copied on the new LSA this creates counter mismatch which results in
a crash when lsack is recevied due to counter going to negative.
Fix:
The fix involves following changes.
1. In ospf6_flood_interace when LSA is being added to retrans list
check if there is alreday lsa in the scoped db and increment
the counter on that if present.
2. In ospf6_lsdb_add copy the retrans counter from old to new lsa
when its being replaced.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Fixes: #10766
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem Statement:
=================
The feature is not enabled, needs to be enabled by doing required
initialization.
RCA:
====
Changes to support the feature is present, but the feature macro
needs to be enabled.
Fix:
====
This commit has changes to enable the code.
Risk:
=====
Medium
Need to ensure all existing ospf6 related topotests pass. to ensure
packet processing is not impacted.
Tests Executed:
===============
Have tested the functionality with enabling openssl and also disabling
openssl.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
Problem Statement:
==================
RFC 7166 support for OSPF6 in FRR code.
RCA:
====
This feature is newly supported in FRR
Fix:
====
Core functionality implemented in previous commit is
stitched with rest of ospf6 code as part of this commit.
Risk:
=====
Low risk
Tests Executed:
===============
Have executed the combination of commands.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
Problem Statement:
==================
Implement RFC 7166 support for OSPF6 in FRR code.
RCA:
====
This feature is newly supported in FRR.
Fix:
====
Changes are done to implement ospf6 ingress and egress
packet processing.
This commit has the core functionality.
It supports below debugability commands:
---------------------------------------
debug ospf6 authentication [<tx|rx>]
It supports below clear command:
--------------------------------
clear ipv6 ospf6 auth-counters interface [IFNAME]
It supports below show commands:
--------------------------------
frr# show ipv6 ospf6 interface ens192
ens192 is up, type BROADCAST
Interface ID: 5
Number of I/F scoped LSAs is 2
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication trailer is enabled with manual key ==> new info added
Packet drop Tx 0, Packet drop Rx 0 ==> drop counters
frr# show ipv6 ospf6 neighbor 2.2.2.2 detail
Neighbor 2.2.2.2%ens192
Area 1 via interface ens192 (ifindex 3)
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication header present ==> new info added
hello DBDesc LSReq LSUpd LSAck
Higher sequence no 0x0 0x0 0x0 0x0 0x0
Lower sequence no 0x242E 0x1DC4 0x1DC3 0x23CC 0x1DDA
frr# show ipv6 ospf6
OSPFv3 Routing Process (0) with Router-ID 2.2.2.2
Number of areas in this router is 1
Authentication Sequence number info ==> new info added
Higher sequence no 3, Lower sequence no 1656
Risk:
=====
Low risk
Tests Executed:
===============
Have executed the combination of commands.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
Problem Statement:
==================
RFC 7166 support for OSPF6 in FRR code.
RCA:
====
This feature is newly supported in FRR
Fix:
====
Changes are done to add support for two new CLIs to configure
ospf6 authentication trailer feature.
One CLI is to support manual key configuration.
Other CLI is to configure key using keychain.
below CLIs are implemented as part of this commit. this configuration
is applied on interface level.
Without openssl:
ipv6 ospf6 authentication key-id (1-65535) hash-algo <md5|hmac-sha-256> key WORD
With openssl:
ipv6 ospf6 authentication key-id (1-65535) hash-algo <md5|hmac-sha-256|hmac-sha-1|hmac-sha-384|hmac-sha-512> key WORD
With keychain support:
ipv6 ospf6 authentication keychain KEYCHAIN_NAME
Running config for these command:
frr# show running-config
Building configuration...
Current configuration:
!
interface ens192
ipv6 address 2001:DB8:1::2/64
ipv6 ospf6 authentication key-id 10 hash-algo hmac-sha-256 key abhinay
!
interface ens224
ipv6 address 2001:DB8:2::2/64
ipv6 ospf6 authentication keychain abhinay
!
Risk:
=====
Low risk
Tests Executed:
===============
Have executed the combination of commands.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
Problem Statement:
==================
As of now there is no support for ospf6 authentication.
To support ospf6 authentication need to have keychain support for
managing the auth key.
RCA:
====
New support
Fix:
====
Enabling keychain for ospf6 authentication feature.
Risk:
=====
Low risk
Tests Executed:
===============
Have verified the support for ospf6 auth trailer feature.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
if r1 has a route received from a neighbor and the same route
configured as static, the administrative distance will determine
which route to use
r1(config)# ipv6 route 1:1::1/128 Null0 70
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
S>* 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:12
O 1:1::1/128 [110/20] via fe80::1833:c9ff:fe7b:3e43, r1-r2-eth0, weight 1, 00:00:49
The static route is selected. If we now change the administrative distance
in ospf6, the OSPF route should be selected
r1(config)# router ospf6
r1(config-ospf6)# distance 50
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
S>* 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:39
O 1:1::1/128 [110/20] via fe80::1833:c9ff:fe7b:3e43, r1-r2-eth0, weight 1, 00:01:16
However the distance is not applied as there are no changes in the routing table
This commit will force the update of the routing table with the new configured distance
r1# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 1:1::1/128 [50/20] via fe80::8cb7:e6ff:fef5:2344, r1-r2-eth0, weight 1, 00:00:03
S 1:1::1/128 [70/0] unreachable (blackhole), weight 1, 00:00:19
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
This PR will include if the area is NSSA in the output of "show ipv6 ospf"
r2# show ipv6 ospf
...
Area 0.0.0.0
Number of Area scoped LSAs is 8
Interface attached to this area: r2-eth1
SPF last executed 20.46717s ago
Area 0.0.0.1[Stub]
Number of Area scoped LSAs is 9
Interface attached to this area: r2-eth0
SPF last executed 20.46911s ago
Area 0.0.0.2[NSSA]
Number of Area scoped LSAs is 14
Interface attached to this area: r2-eth2
SPF last executed 20.46801s ago
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
Opaque data takes up a lot of memory when there are a lot of routes on
the box. Given that this is just a cosmetic info, I propose to disable
it by default to not shock people who start using FRR for the first time
or upgrades from an old version.
Fixes#10101.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
VRF name should not be printed in the config since 574445ec. The update
was done for NB config output but I missed it for regular vty output.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Fixes#9720. When updating an ECMP inter-area route, we compute
a new route and check whether that already exists. If so, we keep the old
route and only update its nexthops. Previously, we merged the new route's
nexthops into the old one's, but this way, it's impossible to remove
nexthops from the old route, resulting in stale nexthops.
This commit fixes this by first removing all nexthops from the old route and
then copying all nexthops from the new route into it. If the new route has
fewer nexthops, the old one will have as well afterwards.
Signed-off-by: Martin Buck <mb-tmp-tvguho.pbz@gromit.dyndns.org>
Update ospfd and ospf6d to send opaque route attributes to
zebra. Those attributes are stored in the RIB and can be viewed
using the "show ip[v6] route" commands (other than that, they are
completely ignored by zebra).
Example:
```
debian# show ip route 192.168.1.0/24
Routing entry for 192.168.1.0/24
Known via "ospf", distance 110, metric 20, best
Last update 01:57:08 ago
* 10.0.1.2, via eth-rt2, weight 1
OSPF path type : External-2
OSPF tag : 0
debian#
debian# show ip route 192.168.1.0/24 json
{
"192.168.1.0\/24":[
{
"prefix":"192.168.1.0\/24",
"prefixLen":24,
"protocol":"ospf",
"vrfId":0,
"vrfName":"default",
"selected":true,
[snip]
"ospfPathType":"External-2",
"ospfTag":"0"
}
]
}
```
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Currently the ospf6d's commands with non-exist vrfs can't give the error
informations to users.
This commit adds a macro "OSPF6_CMD_CHECK_VRF" to give error information
if with non-exist vrfs. As usual, skip the checking process in the case
of json.
So one command can call this macro to do the checking process in its
end. At that time it need know json style or not, so add "json" parameter for
several related functions.
BTW, suppress the build warning of the macro `OSPF6_FIND_VRF_ARGS`:
"Macros starting with if should be enclosed by a do - while loop to avoid
possible if/else logic defects."
Signed-off-by: anlan_cs <anlan_cs@tom.com>
Comparison of the two pointer is confusing, they have no relevance
except the time both of them are empty.
Additionly modify one variable name and correct some comment words, they
are same in both ospfd and ospf6d.
Signed-off-by: anlan_cs <anlan_cs@tom.com>
The OSPF6_LSA_UNAPPROVED flag is set in the function above
ospf6_lsa_translated_nssa_new, so there is no need to set
the flag again
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
When running topotest ospf6_topo2 we can see a log message with wrong lsa id
2021/12/20 23:14:40.330 OSPF6: [V8P0C-HB5Z2] ASBR[default:Status:3]: Update
2021/12/20 23:14:40.330 OSPF6: [Z489N-JAJ6P] ASBR[default:Status:3]: Already ASBR
2021/12/20 23:14:40.330 OSPF6: [Z9D0B-12SBJ] Redistribute 2001:db8:2::/64 (connected)
2021/12/20 23:14:40.330 OSPF6: [N66XP-ANN4G] Advertise as AS-External Id:8.70.41.177 prefix 2001:db8:2::/64 metric 2 (**)
2021/12/20 23:14:40.330 OSPF6: [K4Y9R-C22T6] Advertise new AS-External Id:0.0.0.3 prefix 2001:db8:2::/64 metric 2
2021/12/20 23:14:40.330 OSPF6: [PKX0N-KNRQR] Originate AS-External-LSA for 2001:db8:2::/64
This PR removes the log (instead of fixing it) as same information is printed
in the following entry
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
Since ospf6Enabled and attachedToArea are denoting the same thing.
It is decided to remove ospf6Enabled from json output to make
CLI and json output similar.
Fixes: #9286
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
With the current code, in a topology like this:
r1 ---- 0.0.0.0 ---- r2(ABR) ---- 1.1.1.1 -----r3(ASBR)
NSSA
where r3 is redistributing statics within the NSSA area, the ABR (r2)
is translating type-7 lsa to type-5.
Everytime the function ospf6_abr_nssa_task() is executed all translated
type-5 are aged out and refreshed for no reason. So for instance having 3
lsas already advertised:
r1# sh ipv6 os database
AS Scoped Link State Database
Type LSId AdvRouter Age SeqNum Payload
ASE 0.0.0.1 2.2.2.2 39 80000001 3:3::3/128
ASE 0.0.0.2 2.2.2.2 39 80000001 4:4::4/128
ASE 0.0.0.3 2.2.2.2 39 80000001 5:5::5/128
Adversting a new route from r3:
r3(config)# ipv6 route 6:6::6/128 Null0
r1# sh ipv6 os database
AS Scoped Link State Database
Type LSId AdvRouter Age SeqNum Payload
ASE 0.0.0.1 2.2.2.2 124 80000001 3:3::3/128
ASE 0.0.0.2 2.2.2.2 124 80000001 4:4::4/128
ASE 0.0.0.3 2.2.2.2 124 80000001 5:5::5/128
ASE 0.0.0.4 2.2.2.2 8 80000001 6:6::6/128
That seems okay, however a few seconds later we see all prefixes refreshed
r1# sh ipv6 os database
AS Scoped Link State Database
Type LSId AdvRouter Age SeqNum Payload
ASE 0.0.0.1 2.2.2.2 3600 80000001 3:3::3/128
ASE 0.0.0.2 2.2.2.2 3600 80000001 4:4::4/128
ASE 0.0.0.3 2.2.2.2 3600 80000001 5:5::5/128
ASE 0.0.0.4 2.2.2.2 3600 80000001 6:6::6/128
ASE 0.0.0.5 2.2.2.2 3 80000001 3:3::3/128
ASE 0.0.0.6 2.2.2.2 3 80000001 4:4::4/128
ASE 0.0.0.7 2.2.2.2 3 80000001 5:5::5/128
ASE 0.0.0.8 2.2.2.2 3 80000001 6:6::6/128
This PR prevents the LSA of being refreshed by unsetting the OSPF6_LSA_UNAPPROVED
flag so advertising the last prefix will not refresh all of them:
r1# sh ipv6 os database
AS Scoped Link State Database
Type LSId AdvRouter Age SeqNum Payload
ASE 0.0.0.1 2.2.2.2 90 80000001 3:3::3/128
ASE 0.0.0.2 2.2.2.2 47 80000001 4:4::4/128
ASE 0.0.0.3 2.2.2.2 35 80000001 5:5::5/128
ASE 0.0.0.4 2.2.2.2 7 80000001 6:6::6/128
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>