When building FRR with `--enable-dev-build`. Add a bit of
code to include the pointer value as part of the output.
Helps with tracking down issues and let's us see more data
when using the dev build option.
New output:
2024/03/08 19:48:56 BGP: [V0J1J-W5RHA] 11.0.20.1/32(0x5759ddf8d7c0) for 11.0.20.1/32
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP Link-State prefixes are special prefixes that contains a lot of
data.
Extend the length of the prefix string buffer in order to display
properly this type of prefixes with the next commits.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add the ability to store link-state prefixes in the BGP table.
Store a raw copy of the BGP link state NLRI TLVs as received in the
packet in 'p.u.prefix_linkstate.ptr'.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
dest will not be freed due to lock but coverity does not know
that. Give it a hint. This change includes modifying bgp_dest_unlock_node
to return the dest pointer so that we can determine if we should
continue working on dest or not with an assert. Since this
is lock based we should be ok.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
Allowing printfrr extensions to directly write to the output buffer has
a few advantages:
- there is no arbitrary length limit imposed (previously 64)
- the output doesn't need to be copied another time
- the extension can directly use bprintfrr() to put together pieces
The downside is that the theoretical length (regardless of available
buffer space) must be computed correctly.
Extended unit tests to test these paths a bit more thoroughly.
Signed-off-by: David Lamparter <equinox@diac24.net>
The `struct listnode *rt_node` data structure is adding
8 bytes of size to the `struct bgp_dest`. This is a large
amount of data for a flag we are already setting on each
node for this. Just set the flag and use that to figure
out who we are doing graceful restart on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Announcements that are marked as invalid were previously not revalidated.
This was fixed by replacing the range lookup with a subtree lookup.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Tell the compiler that the prefix is being used for lookups
and it will never change.
Setup for future work.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
*After a restarting router comes up and the bgp session is
successfully established with the peer. If the restarting
router doesn’t have any route to send, it send EOR to
the peer immediately before receiving updates from its peers.
*Instead the restarting router should send EOR, if the
selection deferral timer is not running OR count of eor received
and eor required are matches then send EOR.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
The function bgp_table_range_lookup attempts to walk down
the table node data structures to find a list of matching
nodes. We need to guard against the current node from
not matching and not having anything in the child nodes.
Add a bit of code to guard against this.
Traceback that lead me down this path:
Nov 24 12:22:38 frr bgpd[20257]: Received signal 11 at 1574616158 (si_addr 0x2, PC 0x46cdc3); aborting...
Nov 24 12:22:38 frr bgpd[20257]: Backtrace for 11 stack frames:
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_backtrace_sigsafe+0x67) [0x7fd1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(zlog_signal+0x113) [0x7fd1ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(+0x70e65) [0x7fd1ad465e65]ad445db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libpthread.so.0(+0xf5f0) [0x7fd1abd605f0]45db3]1ad445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x63) [0x46cdc3]445957]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib64/frr/modules/bgpd_rpki.so(+0x4f0d) [0x7fd1a934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(thread_call+0x60) [0x7fd1ad4736e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libfrr.so.0(frr_run+0x128) [0x7fd1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd(main+0x2e3) [0x41c043]1ad443ab8]e0]934ff0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: /usr/lib/frr/bgpd() [0x41d9bb]main+0xf5) [0x7fd1ab9a5505]f0d]57]
Nov 24 12:22:38 frr bgpd[20257]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:351#012; aborting...
Nov 24 12:22:38 frr watchfrr[6779]: [EC 268435457] bgpd state -> down : read returned EOF
Nov 24 12:22:38 frr zebra[5952]: [EC 4043309116] Client 'bgp' encountered an error and is shutting down.
Nov 24 12:22:38 frr zebra[5952]: zebra/zebra_ptm.c:1345 failed to find process pid registration
Nov 24 12:22:38 frr zebra[5952]: client 15 disconnected. 0 bgp routes removed from the rib
I am not really 100% sure what we are really trying to do with this function, but we must
guard against child nodes not having any data.
Fixes: #5440
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In case the topmost node has a larger prefix length than the lookup
prefix it never matches even if it was still lower than maxlen
This also alters a test case to check for this bug.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
The adj_out data structure is a linked list of adjacencies
1 per update group. In a large scale env where we are
not using peer groups, this list lookup starts to become
rather costly. Convert to a better data structure for this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The bgp_info data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
Wrapper the get/set of the table->info pointer so that
people are not directly accessing this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.
If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.
Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
lib/zebra.h has FILTER_X #define's. These do not belong there.
Put them in lib/filter.h where they belong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit 0490729cc033a3483fc6b0ed45085ee249cac779)
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-8122
per draft-ietf-idr-ix-bgp-route-server-09:
2.3.2.2.2. BGP ADD-PATH Approach
The [I-D.ietf-idr-add-paths] Internet draft proposes a different
approach to multiple path propagation, by allowing a BGP speaker to
forward multiple paths for the same prefix on a single BGP session.
As [RFC4271] specifies that a BGP listener must implement an implicit
withdraw when it receives an UPDATE message for a prefix which
already exists in its Adj-RIB-In, this approach requires explicit
support for the feature both on the route server and on its clients.
If the ADD-PATH capability is negotiated bidirectionally between the
route server and a route server client, and the route server client
propagates multiple paths for the same prefix to the route server,
then this could potentially cause the propagation of inactive,
invalid or suboptimal paths to the route server, thereby causing loss
of reachability to other route server clients. For this reason, ADD-
PATH implementations on a route server should enforce send-only mode
with the route server clients, which would result in negotiating
receive-only mode from the client to the route server.
This allows us to delete all of the following code:
- All XXXX_rsclient() functions
- peer->rib
- BGP_TABLE_MAIN and BGP_TABLE_RSCLIENT
- RMAP_IMPORT and RMAP_EXPORT
This patch implements the 'update-groups' functionality in BGP. This is a
function that can significantly improve BGP performance for Update generation
and resultant network convergence. BGP Updates are formed for "groups" of
peers and then replicated and sent out to each peer rather than being formed
for each peer. Thus major BGP operations related to outbound policy
application, adj-out maintenance and actual Update packet formation
are optimized.
BGP update-groups dynamically groups peers together based on configuration
as well as run-time criteria. Thus, it is more flexible than update-formation
based on peer-groups, which relies on operator configuration.
[Note that peer-group based update formation has been introduced into BGP by
Cumulus but is currently intended only for specific releases.]
From 11098af65b2b8f9535484703e7f40330a71cbae4 Mon Sep 17 00:00:00 2001
Subject: [PATCH] updgrp commits
Make the BGP table code a thin wrapper around the table implementation
in libzebra.
* bgpd/bgp_table.[ch]
- Use the ROUTE_NODE_FIELDS macro to embed the fields of a
route_node in the bgp_node structure.
- Add a route_table field to the bgp_table structure.
Initialize the route_table with a delegate, such that the nodes
in the table are bgp_node structures.
- Add inline wrappers that call route_table functions underneath,
and accept/return the correct BGP types.
* bgpd/bgp_route.c
Change some code to use inline wrappers instead of accessing
fields of nodes/tables directly. The latter does not always work
because the types of some fields need to be translated now.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Reduce indirection for values that doesn't change in the loop.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
[adjusted after dropping previous patch]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
If the radix tree creates an extra interior node in bgp_node_get(),
it locks the interior node even though this node is not returned to
the caller, so it may never be unlocked. The lock prevents this node
from being deleted.
* bgpd/bgp_table.c: (bgp_node_get) Remove lock on interior node which
prevents proper node deletion
Make one version of check prefix bit, and put it inline
with proper prototype. This gets rid of some macro's and also some
assert() that can never happen on a non-broken compiler.
* bgpd/bgp_table.c
* CHECK_BIT(): sayonara
* check_bit(): sayonara
* SET_LINK(): sayonara
* set_link(): make use of prefix_bit() instead of check_bit()
* bgp_node_match(): idem
* bgp_node_lookup(): idem
* bgp_node_get(): idem
* lib/prefix.h
* prefix_bit(): new inline version of check_bit()
* lib/table.c
* CHECK_BIT(): sayonara
* check_bit(): sayonara
* SET_LINK(): sayonara
* set_link(): make use of prefix_bit() instead of check_bit()
* route_node_match(): idem
* route_node_lookup(): idem
* route_node_get(): idem
* ospf6d/ospf6_lsdb.c
* CHECK_BIT(): sayonara
* ospf6_lsdb_lookup_next(): make use of prefix_bit() instead of
CHECK_BIT()
* ospf6_lsdb_type_router_head(): idem
* ospf6_lsdb_type_head(): idem
* ospf6d/ospf6_route.c
* CHECK_BIT(): sayonara
* ospf6_route_match_head() make use of prefix_bit() instead of
* CHECK_BIT()
I've spent the last several weeks working on stability fixes to bgpd.
These patches fix all of the numerous crashes, assertion failures, memory
leaks and memory stomping I could find. Valgrind was used extensively.
Added new function bgp_exit() to help catch problems. If "debug bgp" is
configured and bgpd exits with status of 0, statistics on remaining
lib/memory.c allocations are printed to stderr. It is my hope that other
developers will use this to stay on top of memory issues.
Example questionable exit:
bgpd: memstats: Current memory utilization in module LIB:
bgpd: memstats: Link List : 6
bgpd: memstats: Link Node : 5
bgpd: memstats: Hash : 8
bgpd: memstats: Hash Bucket : 2
bgpd: memstats: Hash Index : 8
bgpd: memstats: Work queue : 3
bgpd: memstats: Work queue item : 2
bgpd: memstats: Work queue name string : 3
bgpd: memstats: Current memory utilization in module BGP:
bgpd: memstats: BGP instance : 1
bgpd: memstats: BGP peer : 1
bgpd: memstats: BGP peer hostname : 1
bgpd: memstats: BGP attribute : 1
bgpd: memstats: BGP extra attributes : 1
bgpd: memstats: BGP aspath : 1
bgpd: memstats: BGP aspath str : 1
bgpd: memstats: BGP table : 24
bgpd: memstats: BGP node : 1
bgpd: memstats: BGP route : 1
bgpd: memstats: BGP synchronise : 8
bgpd: memstats: BGP Process queue : 1
bgpd: memstats: BGP node clear queue : 1
bgpd: memstats: NOTE: If configuration exists, utilization may be expected.
Example clean exit:
bgpd: memstats: No remaining tracked memory utilization.
This patch fixes bug #397: "Invalid free in bgp_announce_check()".
This patch fixes bug #492: "SIGBUS in bgpd/bgp_route.c:
bgp_clear_route_node()".
My apologies for not separating out these changes into individual patches.
The complexity of doing so boggled what is left of my brain. I hope this
is all still useful to the community.
This code has been production tested, in non-route-server-client mode, on
a linux 32-bit box and a 64-bit box.
Release/reset functions, used by bgp_exit(), added to:
bgpd/bgp_attr.c,h
bgpd/bgp_community.c,h
bgpd/bgp_dump.c,h
bgpd/bgp_ecommunity.c,h
bgpd/bgp_filter.c,h
bgpd/bgp_nexthop.c,h
bgpd/bgp_route.c,h
lib/routemap.c,h
File by file analysis:
* bgpd/bgp_aspath.c: Prevent re-use of ashash after it is released.
* bgpd/bgp_attr.c: #if removed uncalled cluster_dup().
* bgpd/bgp_clist.c,h: Allow community_list_terminate() to be called from
bgp_exit().
* bgpd/bgp_filter.c: Fix aslist->name use without allocation check, and
also fix memory leak.
* bgpd/bgp_main.c: Created bgp_exit() exit routine. This function frees
allocations made as part of bgpd initialization and, to some extent,
configuration. If "debug bgp" is configured, memory stats are printed
as described above.
* bgpd/bgp_nexthop.c: zclient_new() already allocates stream for
ibuf/obuf, so bgp_scan_init() shouldn't do it too. Also, made it so
zlookup is global so bgp_exit() can use it.
* bgpd/bgp_packet.c: bgp_capability_msg_parse() call to bgp_clear_route()
adjusted to use new BGP_CLEAR_ROUTE_NORMAL flag.
* bgpd/bgp_route.h: Correct reference counter "lock" to be signed.
bgp_clear_route() now accepts a bgp_clear_route_type of either
BGP_CLEAR_ROUTE_NORMAL or BGP_CLEAR_ROUTE_MY_RSCLIENT.
* bgpd/bgp_route.c:
- bgp_process_rsclient(): attr was being zero'ed and then
bgp_attr_extra_free() was being called with it, even though it was
never filled with valid data.
- bgp_process_rsclient(): Make sure rsclient->group is not NULL before
use.
- bgp_processq_del(): Add call to bgp_table_unlock().
- bgp_process(): Add call to bgp_table_lock().
- bgp_update_rsclient(): memset clearing of new_attr not needed since
declarationw with "= { 0 }" does it. memset was already commented
out.
- bgp_update_rsclient(): Fix screwed up misleading indentation.
- bgp_withdraw_rsclient(): Fix screwed up misleading indentation.
- bgp_clear_route_node(): Support BGP_CLEAR_ROUTE_MY_RSCLIENT.
- bgp_clear_node_queue_del(): Add call to bgp_table_unlock() and also
free struct bgp_clear_node_queue used for work item.
- bgp_clear_node_complete(): Do peer_unlock() after BGP_EVENT_ADD() in
case peer is released by peer_unlock() call.
- bgp_clear_route_table(): Support BGP_CLEAR_ROUTE_MY_RSCLIENT. Use
struct bgp_clear_node_queue to supply data to worker. Add call to
bgp_table_lock().
- bgp_clear_route(): Add support for BGP_CLEAR_ROUTE_NORMAL or
BGP_CLEAR_ROUTE_MY_RSCLIENT.
- bgp_clear_route_all(): Use BGP_CLEAR_ROUTE_NORMAL.
Bug 397 fixes:
- bgp_default_originate()
- bgp_announce_table()
* bgpd/bgp_table.h:
- struct bgp_table: Added reference count. Changed type of owner to be
"struct peer *" rather than "void *".
- struct bgp_node: Correct reference counter "lock" to be signed.
* bgpd/bgp_table.c:
- Added bgp_table reference counting.
- bgp_table_free(): Fixed cleanup code. Call peer_unlock() on owner if
set.
- bgp_unlock_node(): Added assertion.
- bgp_node_get(): Added call to bgp_lock_node() to code path that it was
missing from.
* bgpd/bgp_vty.c:
- peer_rsclient_set_vty(): Call peer_lock() as part of peer assignment
to owner. Handle failure gracefully.
- peer_rsclient_unset_vty(): Add call to bgp_clear_route() with
BGP_CLEAR_ROUTE_MY_RSCLIENT purpose.
* bgpd/bgp_zebra.c: Made it so zclient is global so bgp_exit() can use it.
* bgpd/bgpd.c:
- peer_lock(): Allow to be called when status is "Deleted".
- peer_deactivate(): Supply BGP_CLEAR_ROUTE_NORMAL purpose to
bgp_clear_route() call.
- peer_delete(): Common variable listnode pn. Fix bug in which rsclient
was only dealt with if not part of a peer group. Call
bgp_clear_route() for rsclient, if appropriate, and do so with
BGP_CLEAR_ROUTE_MY_RSCLIENT purpose.
- peer_group_get(): Use XSTRDUP() instead of strdup() for conf->host.
- peer_group_bind(): Call bgp_clear_route() for rsclient, and do so with
BGP_CLEAR_ROUTE_MY_RSCLIENT purpose.
- bgp_create(): Use XSTRDUP() instead of strdup() for peer_self->host.
- bgp_delete(): Delete peers before groups, rather than after. And then
rather than deleting rsclients, verify that there are none at this
point.
- bgp_unlock(): Add assertion.
- bgp_free(): Call bgp_table_finish() rather than doing XFREE() itself.
* lib/command.c,h: Compiler warning fixes. Add cmd_terminate(). Fixed
massive leak in install_element() in which cmd_make_descvec() was being
called more than once for the same cmd->strvec/string/doc.
* lib/log.c: Make closezlog() check fp before calling fclose().
* lib/memory.c: Catch when alloc count goes negative by using signed
counts. Correct #endif comment. Add log_memstats_stderr().
* lib/memory.h: Add log_memstats_stderr().
* lib/thread.c: thread->funcname was being accessed in thread_call() after
it had been freed. Rearranged things so that thread_call() frees
funcname. Also made it so thread_master_free() cleans up cpu_record.
* lib/vty.c,h: Use global command_cr. Add vty_terminate().
* lib/zclient.c,h: Re-enable zclient_free().