Issue:
If there are any changes to the prefix list, we perform a re-lookup to map the correct RP for the group.
Even if the S,G entry is PIM_UPSTREAM_NOTJOINED and in FHR, In the case of IGMPv3, an S,G entry can be
created with no joins. this is not necessary.
https://www.rfc-editor.org/rfc/rfc4601#section-4.5.7 says no op in case of NOTJOINED
Solution:
To solve this issue, Stop RP mapping when the state is NOTJOINED
Ticket: #3496931
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
Issue:
When there is no traffic for a group, the LHR and RP take the default KAT+Join timer expiry of
a maximum of 480 seconds to clear the S,G . However, in the FHR, we update the state from JOINED
to NOT Joined, downstream state from PPto NOINFO. This restarts the ET timer, causing S,G on FHR to
take more than 10 minutes to age out.
In other words,
Consider a case where (S,G) is in Join state. When the traffic stops and the KAT (210) expires,
the Join expiry timer restarts. At this time, if we receive a prune, the expectation is to set
PPT to 0 (RFC 4601 sec 4.5.2).
When the PPT expires, we move to the noinfo state and restart the expiry timer one more time. We remove the
(S,G) entry only after ~10 minutes when there is no active traffic.
Summary:
KAT Join ET 210 + PP ET 210 + NOINFO ET 210.
Solution:
Delete the ifchannel when in noinfo state, and KAT is not running.
Ticket: #13703
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
Refactor the next hop tracking in PIM to fully support the configured RPF lookup mode.
Moved many NHT related functions to pim_nht.h/c
NHT now tracks both MRIB and URIB tables and makes nexthop decisions based on the configured lookup mode.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
It looks like the code was trying to do this with the null_register
parameter on pim_upstream_start_register_stop_timer(), but that didn't
quite work right. Restructure a bit to get it right.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Topology:
TOR11 (FHR) --- LEAF-11---SPINE1 (RP)MSDP SPINE-2(RP)MSDP --- LEAF-12 -- TOR12 (LHR)
| | | | |
| -----------------------------------------------------(ECMP) |
| | | |
-----------------------------------------------------------------------(ECMP)
Issue:
In some triggers, S,G upstream is preserved even with the PP timer expiry, resulting
in S,G with NULL OILS. This could be because we create a dummy S,G upstream and
dummy channel_oif for *,G, where RPF is UNKNOWN. As a result, PIM+VXLAN traffic is never
forwarded downstream to LHR.
Fix:
when the S,G stream is running, Determine if a reevaluation of the outgoing interface
list (OIL) is required. S,G upstream should then inherit the OIL from *,G.
Testing:
- Evpn pim tests - TestEvpnPimSingleVtepOneMdt.test_02_broadcast_traffic_spt_zero
- pim-smoke
Ticket: #
Signed-off-by: Rajesh Varatharaj <rvaratharaj@nvidia.com>
In the scenario on an intermediate router where a *,G join has
been received and a S,G stream is being sent through that router
on the *,G stream, there exists a situation when the *,G in has been pruned
but the stream is still being received on on incoming interface towards
the RP for the *,G. In this situation PIM will see the S,G stream
initially as a NOCACHE from the dataplane, PIM will then do a RPF
for the S and notice that it is supposed to be coming in on adifferent
interface. In this case PIM the original PIM code would create
a blackhole mroute towards the RPF of the *,G( the interface the
stream is being received on ). The original reason for this is that
if there is a scenario where this particular S1,G stream is sending
at basically line rate, and there also happens to be a different
S2,G stream that is sending at a very low rate. With certain
dataplanes there is no way to really rate limit the S1 -vs- S2
stream and the S1 stream completely overwhelms the S2 stream
for sending up to the control plane for proper pim handling.
The problem then becomes that FRR never properly responds
to the situation where the *,G is rereceived and the S,G
stream switches back over to the SPT for itself and FRR ends
up with a dead mroute that stops everything from working properly.
This code change, installs the blackhole mroute with the RPF
towards the RP for the G and then resets the RPF to the correct
RPF for the Stream but does not modify the mroute. When the
*,G is rereceived and we attempt to transition to the S,G stream
this now works.
As a note: Both David L and myself do not necessarily believe
we fully understand the problem yet. What this does do is fix
all the inconsistent CI issues we are seeing in the topotests
at this time. Internally I am seeing other test failures
in PIM that I don't fully understand and we suspect that
there are other problems in the state machine. We plan to
revisit this problem as we are able to debug the issue better.
In the meantime both David and Myself agree that this gets
the CI working again and Streams end up in the right state.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use oil_incoming_vif instead of oil_parent. I had
to go look this up as that I failed to remember that
the linux kernel calls this parent for some bizarre
reason.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When pim is creating an upstream for a S,G that it has received
*but* it has not received a route to the S, the oil is not
scanned to see if it should inherit anything from the *,G
that may be present when it cannot find the correct iif to
use. When the nexthop tracking actually
resolves the route, the oil is never rescanned and the
S,G stream will be missing a correct oil list leading
to absolute mayhem in the network.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Setup:
------
R1( LHR) ---------R2( RP) ----------R3( FHR)
Problem:
-------
- Send IGMP/MLD join and traffic.
LHR: (S,G) mroute is created with reference count = 2
and set the flag SRC_STREAM.
(Code flow: pim_mroute_msg_wholepkt -> pim_upstream_add,
pim_upstream_sg_running_proc -> pim_upstream_ref)
- Send IGMP/MLD prune.
LHR: removes (*,G) entry and it tries to remove childen (S,G) entries.
But (S,G) is having reference count = 2. So after prune,
(S,G) entry reference count becomes 1 and will be present
until KAT expires.
Fix:
---
Don't set SRC_STREAM flag for LHR.
In LHR, (S,G) should be maintained, until (*,G) is present.
When prune receives delete (*,G) and children (S,G).
When traffic stops, delete (S,G) after KAT expires.
Issue: #13893
Signed-off-by: Sarita Patra <saritap@vmware.com>
Problem:
When ipv6 pim configuration is removed from the IIF on FHR node,
if we wait for RST timer to expire and then add the ipv6 pim configuration on the IIF again,
it is seen that pimreg is not getting added due to which null register wont be sent,
the register flag state also remains NO_INFO forever instead of RegPrune.
The reason for this is, when RST timer expires and IIF is unknown for the (S,G) upstream,
the FHR state is not reset due to which when the RP becomes reachable,
upstream state changes from NotJoined to Join but the register suppress timer could not be started
since we see there is no change in FHR state.
Fix:
When the Register Timer expires and the reg state is set to PIM_REG_NOINFO,reset the FHR flag,
so that when the RP becomes reachable can be because of config change or RP not reachable,
it is able to resume its duty.
Signed-off-by: Sai Gomathi N <nsaigomathi@vmware.com>
The vrf name should be separated when it is displayed. And remove
unnecessary space after number.
Before:
```
pim_upstream_sg_running: (100.100.1.100,232.100.100.100)x is not installed in mroute
pim_upstream_del(pim_ifchannel_delete): Delete (100.100.1.100,232.100.100.100)[x] ref count: 1 , flags: 1048585 c_oil ref count 2 (Pre decrement)
```
After:
```
pim_upstream_sg_running: (100.100.1.100,232.100.100.100)[x] is not installed in mroute
pim_upstream_del(pim_ifchannel_delete): Delete (100.100.1.100,232.100.100.100)[x] ref count: 1, flags: 1048585 c_oil ref count 2 (Pre decrement)
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When upstream RPF address is secondary address, and
neighborship is built with primary address,
then pim_neighbor_find() fails.
Verify the upstream RPF address is present in the
neighbor primary and secondary address list.
Signed-off-by: Sarita Patra <saritap@vmware.com>
... the prefix length wasn't ignored as expected. Both S and G are
always /32. But expecting "le 32" in prefix-list config is unexpected &
counterintuitive.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
If PIM had received a register packet with the Border Router
bit set, pimd would have crashed. Since I wrote this code
in 2015 and really have pretty much no memory of this and
no-one has ever reported this crash, let's just remove this
code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When calling pim_upstream_add, the lookup for upstream
or the creation of the upstream cannot fail. As such
up is never NULL.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fixed ANVL Conformance PIM-SM 16.3 test case.
When (S,G,rpt) prune is received, we were installing
the mroute immediately with none as OIF.
This leads to dropping the (S,G) traffic during prune
pending time as well.
Also we should not install the mroute if there is no
change in the rpf update.
These 2 things lead to the failure of the test case.
Fixed it by blocking the installation in this scenario.
When prune pending timer pops, it will take care of
installing the mroute with none as OIF.
Fixes: #11535
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
For pim callbacks, we pass pim_addr as value, not pointer.
So making it consistent for pim_nht callbacks.
Signed-off-by: sarita patra <saritap@vmware.com>
If the upstream is freed in pim_upstream_del, then trying to
call pim_upstream_timers_stop will lead to accessing freed memory.
Fix:
Stop the timer only if upstream is not deleted.
Co-authored-by: Sarita Patra <saritap@vmware.com>
Co-authored-by: Mobashshera Rasool <mrasool@vmware.com>
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem:
PIM assert is not triggered even after receiving WRONGVIF notification because of Could_assert flag not set.
CouldAssert(S,G,I) =
SPTbit(S,G)==TRUE
AND (RPF_interface(S) != I)
AND (I in ( ( joins(*,G) (-) prunes(S,G,rpt) )
(+) ( pim_include(*,G) (-) pim_exclude(S,G) )
(-) lost_assert(*,G)
(+) joins(S,G) (+) pim_include(S,G) ) )
Once SPTbit is set, Could_assert has to be reevaluated
Signed-off-by: plsaranya <Saranya_Panjarathina@dell.com>
The pim_upstream data structure has a ref count that is
not properly being incremented/decremented and when an
instance is removed this is causing crashes because timers
are still popping afterwords on data structures that are
now freed.
Since getting the ref count code is extremely hard and
we know that this crash is happening, add a bit of code
to prevent the timers from popping at all when we go
through and free up other data structures that we
are pointing at.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is just hitting the pim_mroute code with a hammer until it doesn't
print warnings anymore. This is NOT quite tested or working yet, it
just compiles.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Since `pim_sgaddr` is `pim_addr` now, that causes a whole lot of fallout
anywhere S,G pairs are handled.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Need a separate constant that is IPv6 when needed. Also assign the
whole struct rather than just s_addr.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>