RFC 3623 says:
"If the restarting router determines that it was the Designated
Router on a given segment prior to the restart, it elects
itself as the Designated Router again. The restarting router
knows that it was the Designated Router if, while the
associated interface is in Waiting state, a Hello packet is
received from a neighbor listing the router as the Designated
Router".
Implement that logic when processing Hello messages to ensure DR
interfaces will preserve their DR status across a graceful restart.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Issue:
===================
OSPF neighbors are not going down even after 10 mins when
having a mismatch in hello and dead interval.
First neighbors are formed and then a mismatch in the interval
is created, it is observed that the neighbor is not going down.
Root Cause Analysis:
====================
The event HelloReceived defined in RFC 2328 was named as PacketReceived
and this event was scheduled whenever LS Update, LS Ack, LS Request,
DD description packet or Hello packet is received.
Although there is a mismatch in the Hello packet contents, the
event PacketReceived gets triggered due to LS Update received and the
dead timer gets reset and hence the neighbor was never going Down and
remains FULL.
Fix:
==================
As per RFC 2328, the HelloReceived needs to be triggered only when
valid OSPF Hello packet is received and not when other OSPF packets
are received. Modified the function name as well.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement :
===================
LSA with InitialSequenceNumber is not originated
after MaxSequenceNumber.
ANVL Test case 25.33 states:
============================
As soon as this flooding of a LSA with LS sequence number
MaxSequenceNumber has been acknowledged by all adjacent neighbors,
a new instance can be originated with sequence number of InitialSequenceNumber.
RCA :
=====
DUT did not originated LSA with INITIAL_SEQUENCE number even
after receiving ACK for max sequence LSA.
Code is not present to handle this situation in the lsa ack flow.
Fix :
=====
Add code to originate LSA with initial sequence number in the
LSA ack flow in case of wrap around sequence number.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
ANVL Test case 28.11
If the database copy has LS age equal to MaxAge and LS sequence number
equal to MaxSequenceNumber, simply discard the received LSA
without acknowledging it.
ANVL Test Case 25.22
When an attempt is made to increment the sequence number past the maximum
value of N - 1 (0x7fffffff; also referred to as MaxSequenceNumber),
the current instance of the LSA must first be flushed from the routing domain.
ANVL Test Case 25.23
As soon as this flooding of a LSA with LS sequence number MaxSequenceNumber
has been acknowledged by all adjacent neighbors, a new instance can be
originated with sequence number of InitialSequenceNumber.
RCA:
When IXIA sent LS Seq num as MAX and LS Age as (MAX - 3),
DUT dropped the packet instead of sending ACK.
In function ospf_ls_upd, at Line 2106 the code is there to drop the LSA.
Hence its failing.
Fix:
LSAs ACK must be sent when received LSA is having max sequence number
but not max-aged.
Considering /* CVE-2017-3224 */ issue, have corrected the existing
code to prevent attacker from sending LSAs with max sequence number
and higher checksum and blocking the flooding of the Max-sequence numbered LSAs.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
===================
DUT selecting itself as DR when RR goes for reload.
Test Case 7.2
DUT (GR Helper) receives the Hello packet from the OSPF GR RESTARTER
(ANVL here) with DR and BDR set to 0.0.0.0 and DUT in its hello
neighbor list. DUT triggers the DR and BDR election although it is
in the Helper mode for that neighbor.
Root Cause Analysis:
====================
When hello packet is received with self router ID in the neighbor list,
there is no check in the code to handle this scenario. Hence the DR/BDR
election happens and it changes the DR although it is helper.
Fix:
===================
As per RFC 3623 Section 3. Operation of Helper Neighbor, below point,
we need to maintain the DR relationship.
Also, if X was the Designated Router on network segment S when the
helping relationship began, Y maintains X as the Designated Router
until the helping relationship is terminated.
Adding the check when DUT is under neighbor helper mode, we need to avoid
ISM state change when hello packet is received with DR/BDR set to 0.0.0.0.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
RFC 3623 specifies the Graceful Restart enhancement to the OSPF
routing protocol. This PR implements support for the restarting mode,
whereas the helper mode was implemented by #6811.
This work is based on #6782, which implemented the pre-restart part
and settled the foundations for the post-restart part (behavioral
changes, GR exit conditions, and on-exit actions).
Here's a quick summary of how the GR restarting mode works:
* GR can be enabled on a per-instance basis using the `graceful-restart
[grace-period (1-1800)]` command;
* To perform a graceful shutdown, the `graceful-restart prepare ospf`
EXEC-level command needs to be issued before restarting the ospfd
daemon (there's no specific requirement on how the daemon should
be restarted);
* `graceful-restart prepare ospf` will initiate the graceful restart
for all GR-enabled instances by taking the following actions:
o Flooding Grace-LSAs over all interfaces
o Freezing the OSPF routes in the RIB
o Saving the end of the grace period in non-volatile memory (a JSON
file stored in `$frr_statedir`)
* Once ospfd is started again, it will follow the procedures
described in RFC 3623 until it detects it's time to exit the graceful
restart (either successfully or unsuccessfully).
Testing done:
* New topotest featuring a multi-area OSPF topology (including stub
and NSSA areas);
* Successful interop tests against IOS-XR routers acting as helpers.
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Both the GR helper code and the upcoming GR restarting code are going
to share a lot of definitions. As such, rename ospf_gr_helper.h to
ospf_gr.h, which will be the central point of all GR definitions
and prototypes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remove previous log config
debug ospf graceful-restart helper
and just use
debug ospf graceful-restart
for everything related to OSPF GR.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
Log the LSA advertising router in addition to the LSA type and
ID in the places where that information is necessary to uniquely
identify the LSA in the LSDB.
This is useful, for example, to know exactly which LSA has changed
when the router is exiting from the GR helper mode when a topology
change was detected.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ip nhrp map multicast is being used, this is usually accompanied by an
iptables rule to block the original multicast packet. This causes sendmsg to
return EPERM.
Signed-off-by: Reuben Dowle <reuben.dowle@4rf.com>
Currently if the sysctl net.ipv4.raw_l3mdev_accept is 1, packets
destined to a specific vrf also end up being delivered to the default
vrf. We will see logs like this in ospf:
2021/02/10 21:17:05.245727 OSPF: ospf_recv_packet: fd 20(default) on interface 1265(swp1s1.26)
2021/02/10 21:17:05.245740 OSPF: Hello received from [9.9.36.12] via [swp1s1.26:200.254.26.13]
2021/02/10 21:17:05.245741 OSPF: src [200.254.26.14],
2021/02/10 21:17:05.245743 OSPF: dst [224.0.0.5]
2021/02/10 21:17:05.245769 OSPF: ospf_recv_packet: fd 45(vrf1036) on interface 1265(swp1s1.26)
2021/02/10 21:17:05.245774 OSPF: Hello received from [9.9.36.12] via [swp1s1.26:200.254.26.13]
2021/02/10 21:17:05.245775 OSPF: src [200.254.26.14],
2021/02/10 21:17:05.245777 OSPF: dst [224.0.0.5]
This really really makes ospf unhappy in the vrf we are running in.
I am approaching the problem by just dropping the packet if read in the
default vrf because of:
commit 0556fc33c7
Author: Donald Sharp <sharpd@cumulusnetworks.com>
Date: Fri Feb 1 11:54:59 2019 -0500
lib: Allow bgp to always create a listen socket for the vrf
Effectively if we have `router ospf vrf BLUE` but no ospf running
in the default vrf, we will not have a listener and that would
require a fundamental change in our approach to handle the ospf->fd
at a global level. I think this is less than ideal at the moment
but it will get us moving again and allow FRR to work with
a bunch of vrf's and ospf neighbors.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Description:
1. Skipping inactivity timer during graceful restart to make
the RESTARTER active even after dead timer expiry.
2. Handling HELPER on unplanned outages.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
OSPFD sends ARP proactively to speed up convergence for /32 networks
on a p2p connection. It is only an optimization, so it can be disabled.
It is enabled by default.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Fix
- Modulo check on data length not inclusive enough
- Garbage heap read when bounds checking
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
ospf_opaque_self_originated_lsa_received decrements refcount which can
result in a free, this is followed by a call to ospf_ls_ack_send which
accesses the freed LSA
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We actually don't validate the IHL field, although it certainly looks
like we do at a casual glance.
This patch saves us from an assert in case we actually do get an IP
packet with an incorrect header length field.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We test nbr->oi in a couple of places for null, but
in the majority of places of the nbr->oi data is being
used we just access it. Touch up code to trust this
assertion and make the code more consistent in others.
Found in Coverity.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The indentation level for ospf_read was starting to be pretty
extremene. Rework into 2 functions for improved readability.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read in up to 20(ospf write-multipler X) packets, for handling of data.
This improves performance because we allow ospf to have a bit more data
to work on in one go for spf calculations instead of 1 packet at a time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Turning on packet debugs and seeing a header dump that is 11
lines long is useless
2019/11/07 01:07:05.941798 OSPF: ip_v 4
2019/11/07 01:07:05.941806 OSPF: ip_hl 5
2019/11/07 01:07:05.941813 OSPF: ip_tos 192
2019/11/07 01:07:05.941821 OSPF: ip_len 68
2019/11/07 01:07:05.941831 OSPF: ip_id 48576
2019/11/07 01:07:05.941838 OSPF: ip_off 0
2019/11/07 01:07:05.941845 OSPF: ip_ttl 1
2019/11/07 01:07:05.941857 OSPF: ip_p 89
2019/11/07 01:07:05.941865 OSPF: ip_sum 0xcf33
2019/11/07 01:07:05.941873 OSPF: ip_src 200.254.30.14
2019/11/07 01:07:05.941882 OSPF: ip_dst 224.0.0.5
We already have this debugged, it's not going to change and the
end developer can stick this back in if needed by hand to debug
something that is not working properly.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit has:
The received packet path in ospf, had absolutely no debugs associated with
it. This makes it extremely hard to know when we receive packets for
consumption. Add some breadcrumbs to this end.
Large chunks of commands have no ability to debug what is happening
in what vrf. With ip overlap X vrf this becomes a bit of a problem
Add some breadcrumbs here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have a bunch of places that look for ORIGINAL_CODING. There is
nothing in our configure system to define this value and a quick
git blame shows this code as being original to the import a very
very long time ago. This is dead code, removing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Recently Lot of issues are seen in OSPF adjacnecy establishements,
sessions was tear down because of DD Sequence Number mismatch.
adding Debugs to capture Master & slave generated sequence numbers.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
When OSPF receives a Database description packet and is in
`Down`, `Attempt` or `2-Way` state we are creating a warning
for the end user.
rfc2328 states(10.6):
Down - The packet should be rejected
Attempt - The packet should be rejected
2-Way - The packet should be ignored
I cannot find any instructions in the rfc to state what the operational
difference is between rejected and ignored. Neither can I figure
out what FRR expects the end user to do with this information.
I can see this information being useful if we encounter a bug
down the line and we have gathered a bunch of data. As such
let's modify the code to remove the flog_warn and convert
the message to a debug level message that can be controlled by
appropriate debug statements.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This looks like a finish up of the partial cleanup that
ocurred at some point in time in the past. When we
alloc oi also always alloc the oi->obuf. When we delete
oi always delete the oi->obuf right before.
This cleans up a bunch of code to be simpler and hopefully
easier to follow.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I am rarely seeing this crash:
r2: ospfd crashed. Core file found - Backtrace follows:
[New LWP 32748]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospfd'.
Program terminated with signal SIGABRT, Aborted.
2019-08-29 15:59:36,149 ERROR: assert failed at "test_ospf_sr_topo1/test_memory_leak":
Which translates to this code:
node = listhead(ospf->oi_write_q);
assert(node);
oi = listgetdata(node);
assert(oi);
So if we get into ospf_write without anything on the oi_write_q
we are stopping the program.
This is happening because in ospf_ls_upd_queue_send we are calling
ospf_write. Imagine that we have a interface already on the on_write_q
and then ospf_write handles the packet send for all functions. We
are not clearing the t_write thread and we are popping and causing
a crash.
Additionally modify OSPF_ISM_WRITE_ON(O) to not just blindly
turn on the t_write thread. Only do so if we have data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
ospfd: Remove redundant asserts
assert(oi) is impossible all listgetdata(node) directly proceeding
it already asserts here, besides a node cannot be created
with a null pointer!
If list_isempty is called directly before the listhead call
it is impossilbe that we do not have a valid pointer here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
no router ospf triggers to cancel all threads
including read/write (receive/send packets) threads,
cleans up resources fd, message queue and data.
Last job of write (packet) thread invoked where the
ospf instance is referenced is not running nor
the socket fd valid.
Write thread callback should check if fd is valid and
ospf instance is running before proceeding to send a
message over socket.
Ticket:CM-20095
Testing Done:
Performed the multiple 'no router ospf' with the fix
in topology where the crash was seen.
Post fix the crash is not observed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>