Before the fix:
2019/11/14 19:52:21 BGP: peer 192.168.2.5 deleted from subgroup s4peer
cnt 0 - missing space after s4 before peer
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Previous error was misleading and made it seem like Null0,
reject, or blackhole nexthops on static routes are invalid.
This commit makes it more clear as to why the error is seen.
Signed-off-by: Trey Aspelund <taspelund@cumulusnetworks.com>
With this code change, we can now filter evpn routes based on RD using the
match statement: "match evpn rd XX"
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Recently Lot of issues are seen in OSPF adjacnecy establishements,
sessions was tear down because of DD Sequence Number mismatch.
adding Debugs to capture Master & slave generated sequence numbers.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Validate that the FEC prefix length is within the allowed limit
(depending on the FEC address family) in order to prevent possible
buffer overflows.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Problem reported with tracebacks seen when making multiple bfd timer
changes in frr.conf and applying via frr-reload.py. Found that when
multiple bfd timer changes are made, the same line can be added for
deletion more than once, causing the traceback when the deletion is
performed. This fix verifies the correct line is being appended for
deletion.
Ticket: CM-27233
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
The bug:
As part of displaying advertised routes to a peer, in the outer loop, we
iterate through all prefixes in the evpn table. In the inner loop,
we iterate through adj_out of each prefix.
If a prefix which is present in the evpn table is not advertised to a peer,
its corresponding attr == NULL. Checking for this condition is the fix.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
This code is called from the zebra main pthread during shutdown
but the thread event is scheduled via the zebra dplane pthread.
Hence, we should be using the `thread_cancel_async()` API to
cancel the thread event on a different pthread.
This is only ever hit in the rare case that we still have work left
to do on the update queue during shutdown.
Found via zebra crash:
```
(gdb) bt
\#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
\#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79
\#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()",
file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92
\#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c",
line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
\#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274
\#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202
\#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599
\#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024
\#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483
(gdb) down
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
1185 assert(master->owner == pthread_self());
(gdb)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When receiving igmp packets we are spitting out a lot of
debugs. Attempt to clean this up to allow us to understand
what is going on a bit better by just being able to look
at the log file.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When you turn on `debug igmp trace` we are seeing a bunch
of debugs associated with pim processing. This is because we were
using PIM_DEBUG_TRACE which is both `debug igmp trace` and `debug pim trace`
when tracing igmp code it would be nice to only see igmp work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are seeing situations where PIM is sending a IGMP v3 query
and immediately receiving back up the pim kernel interface the
query from itself:
from `show int brief`:
swp7 up default 192.168.202.1/24
We are also receiving these debugs:
2019-11-11T20:52:40.452307+00:00 leaf02 pimd[1592]: Send IGMPv3 query to 224.4.0.8 on swp7 for group 224.4.0.8, sources=0 msg_size=12 s_flag=0 QRV=2 QQI=125 QQIC=7d
2019-11-11T20:52:40.452430+00:00 leaf02 pimd[1592]: pim_mroute_msg(default): igmp kernel upcall on swp7(0x55eaa7dc7dc0) for 192.168.202.1 -> 224.4.11.123
2019-11-11T20:52:40.452574+00:00 leaf02 pimd[1592]: Recv IP packet from 192.168.202.1 to 224.4.11.123 on swp7: size=40 ip_header_size=24 ip_proto=2
2019-11-11T20:52:40.452699+00:00 leaf02 pimd[1592]: Recv IGMP packet from 192.168.202.1 to 224.4.11.123 on swp7: ttl=1 msg_type=17 msg_size=16
2019-11-11T20:52:40.452824+00:00 leaf02 pimd[1592]: Recv IGMP query v3 from 192.168.202.1 on swp7 for group 224.4.11.123
This query is causing us to do some weird gyrations around the IGMP state machine for handling
queries. Let's just prevent it from happening.
Ticket: CM-27247
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were only checking that two nhg_hash_entry's were equal
based on the active nexthop NUMBER. This is not sufficient in
special cases where whats active with one route using it,
might not be active with the other. We can see this with
routes trying to resolve to themselves.
Ex)
1.1.1.0/24
-> 1.1.1.1 dummy1 (inactive)
-> 1.1.1.2 dummy2
1.1.2.0/24
-> 1.1.1.1 dummy1
-> 1.1.1.2 dummy1 (inactive)
Without checking each nexthop individually, they will
hash to the same group since they have the same number of
active nexthops.
Fix this by looping over every nexthop for each nhe (they should
be sorted) and checking if the NEXTHOP_FLAG_ACTIVE flag's match.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We cannot clear the NEXTHOP_FLAG_FIB nexthop flag
when sending routes to the dataplane anymore since
nexthops are now shared.
We were seeing a situation where if we delete a route
using a nexthop group that is still active with another
route, the fib flag was being unset by this code
path despite them still being valid fib nexthops with the
other route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were crashing due to a missed label change code path
in mpls_ftn_uninstall() with the zebra_nhg hashing code.
Add a static handler function for label changing everywhere
in that code and use it in mpls_ftn_uninstall().
The crash was found in the ISIS-SR tests:
==23== Thread 1:
==23== Invalid read of size 4
==23== at 0x15B20E: zebra_nhg_hash_equal (zebra_nhg.c:365)
==23== by 0x489A2FD: hash_get (hash.c:143)
==23== by 0x489A4BC: hash_lookup (hash.c:183)
==23== by 0x15B5A3: zebra_nhg_find (zebra_nhg.c:494)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x1573E8: mpls_ftn_update (zebra_mpls.c:2661)
==23== by 0x1A2554: zread_mpls_labels_replace (zapi_msg.c:1890)
==23== by 0x1A41CD: zserv_handle_commands (zapi_msg.c:2613)
==23== by 0x199B17: zserv_process_messages (zserv.c:517)
==23== by 0x48EE6B7: thread_call (thread.c:1549)
==23== by 0x48A8AD5: frr_run (libfrr.c:1064)
==23== by 0x1391B7: main (main.c:468)
==23== Address 0x5839330 is 0 bytes inside a block of size 80 free'd
==23== at 0x48369AB: free (vg_replace_malloc.c:530)
==23== by 0x48AEE6C: qfree (memory.c:129)
==23== by 0x15C5F8: zebra_nhg_free (zebra_nhg.c:1095)
==23== by 0x15BC8C: zebra_nhg_handle_uninstall (zebra_nhg.c:734)
==23== by 0x15DCFA: zebra_nhg_uninstall_kernel (zebra_nhg.c:1826)
==23== by 0x15C666: zebra_nhg_decrement_ref (zebra_nhg.c:1106)
==23== by 0x15D9D7: zebra_nhg_re_update_ref (zebra_nhg.c:1711)
==23== by 0x15D8B1: nexthop_active_update (zebra_nhg.c:1660)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
==23== Block was alloc'd at
==23== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23== by 0x48AED56: qcalloc (memory.c:110)
==23== by 0x15B07B: zebra_nhg_copy (zebra_nhg.c:307)
==23== by 0x15B13E: zebra_nhg_hash_alloc (zebra_nhg.c:329)
==23== by 0x489A339: hash_get (hash.c:148)
==23== by 0x15B6CA: zebra_nhg_find (zebra_nhg.c:532)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x15D89A: nexthop_active_update (zebra_nhg.c:1658)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Use a local variable to avoid trying to take the address
of a packed struct member - an address from the ip header
in these cases.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There is a memory leak of the bgp node (route node)
in vni table while processing evpn remote route(s).
During the remote evpn route processing, a new route
is created in per vni route table, the refcount for
the route node is incremented twice. First refcount
is incremented during the node creation and the second
one when the bgp_info_add is added.
Post evpn route creation, the bgp node refcount needs
to be decremented.
Ticket:CM-26898,CM-26838,CM-27169
Reviewed By:CCR-9474
Testing Done:
In EVPN topology send 1K MAC routes then check the memory footprint
at the remote VTEP before sending 1K type-2 routes
and after flushing/withdrawal of the routes.
Before fix:
-----------
Initial memory footprint:
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
With 1K MAC (type-2 routes)
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 6008 32
BGP node : 4182 152
BGP route : 2096 112
After cleaning up 1K MAC entries from source VTEP which triggers BGP withdraw
at the remote VTEP.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 4008 32
BGP node : 2182 152 <-- Here 2K delta from initial count.
BGP route : 96 112
With fix:
---------
After 1K MAC entries cleaned up at the remote VTEP, the memory footprint
(BGP Node and Hash Bucket count) is equilibrium to start of the test.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
PR 5118 introduce additional (prepend) keywords
like 'ip' to Route Distinguisher output which
breaks existing evpn route show commands parsing.
Restore to original behavior.
Testing Done:
vtysh -c 'show bgp l2vpn evpn route'
Before fix:
Route Distinguisher: ip 27.0.0.15:44
Post fix:
Route Distinguisher: 27.0.0.15:44
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The opaque lsa that we are storing is stored on various
lists depending on it's type. This removal from the
list was being done *after* the pointer was freed. This
is not a good idea. Since the use after free was just
removal from a linked list and the freeing function does
not do anything other than free data, than just switching the function
order should be sufficient.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>