Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
New `FRR_NO_SPLIT_CONFIG` flag for newly added daemons where we're just
rolling without split config and always expect configs to be loaded via
vtysh/integrated config.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
frrmod_load() attempts to dlopen() several possible paths
(constructed from its basename argument) until one succeeds.
Each dlopen() attempt may fail for a different reason, and
the important one might not be the last one. Example:
dlopen(a/foo): file not found
dlopen(b/foo): symbol "bar" missing
dlopen(c/foo): file not found
Previous code reported only the most recent error. Now frrmod_load()
describes each dlopen() failure.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
This replaces the external libsystemd dependency with... pretty much the
same amount of built-in code. But with one fewer dependency and build
switch needed.
Also check `JOURNAL_STREAM` for future logging integration.
Signed-off-by: David Lamparter <equinox@diac24.net>
Compile with v2.0.0 tag of `libyang2` branch of:
https://github.com/CESNET/libyang
staticd init load time of 10k routes now 6s vs ly1 time of 150s
Signed-off-by: Christian Hopps <chopps@labn.net>
Creating any threads before we fork() into the background (if `-d` is
given) is an extremely dangerous footgun; the threads are created in
the parent and terminated when that exits.
This is extra dangerous because while testing, you'd often run the
daemon in foreground without `-d`, and everything works as expected.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... for any initialization that needs to run after forking, but that
would be racy if it were just scheduled on the thread_master (since the
config load is also just a thread callback, ordering would be undefined
for another scheduled thread callback.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
very_late_init doesn't really say what this does, config_post is much
more descriptive. (A config_pre is coming in a jiffy.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... by referencing all autogenerated headers relative to the root
directory. (90% of the changes here is `version.h`.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... in case the user does something like `zebra 3>logfile`. Also useful
for some module purposes, maybe even feeding config at some point in the
future.
Signed-off-by: David Lamparter <equinox@diac24.net>
Specify default via --with-scriptdir at compile time, override default
with --scriptdir at runtime. If unspecified, it's {sysconfdir}/scripts
(usually /etc/frr/scripts)
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Add a startup-time option to limit the number of fds used
by the thread/event infrastructure. If nothing is configured,
the system ulimit is used.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
From Sysrepo's documentation:
"Note: do not use fork() after creating a connection. Sysrepo
internally stores PID of every created connection and this way a
mismatch of PID and connection is created".
Introduce a new "frr_very_late_init" hook in libfrr that is only
called after the daemon is forked (when the '-d' option is used)
and after the configuration is read. This way we can initialize
the sysrepo plugin correctly even when the daemon is daemonized,
and after the Sysrepo CLI commands are processed (only "debug
northbound client sysrepo" for now).
Fixes#7062
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In case of config rollback is enabled,
record northbound transaction based on a control flag.
The actual frr daemons would set the flag to true via
nb_init from frr_init.
This will allow test daemon to bypass recording
transacation to db.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
This adds -N and --netns options to watchfrr, allowing it to start
daemons with -N and switching network namespaces respectively.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... it contains our pid, so doing it before fork leads to littering
buffers since we try to clean up with the forked pid...
Fixes: #6541
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Instead of returning only error codes (e.g. NB_ERR_VALIDATION)
to the northbound clients, do better than that and also return
a human-readable error message. This should make FRR more
automation-friendly since operators won't need to dig into system
logs to find out what went wrong in the case of an error.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The new northbound context structure contains information about
the client performing a configuration transaction. This information
will be made available to all configuration callbacks through the
args->context parameter.
The usefulness of this structure comes from the fact that it can be
used as a communication channel (both input and output) between the
northbound callbacks and the northbound clients. This can be done
through its "client_data" field which contains client-specific data.
This should cover some very specific scenarios where a northbound
callback should perform an action only if the configuration change
is coming from a given client. An example would be sending a PCEP
response to a PCE when an SR-TE policy is created or modified
through the PCEP northbound client (for that to happen, the
northbound callbacks need to have access to the PCEP request ID,
which needs to be available).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Our two northbound tools don't have embedded YANG modules like the
other FRR binaries. As such, ly_ctx_set_module_imp_clb() shouldn't be
called when the YANG subsystem it being initialized by a northbound
tool. To make that possible, add a new "embedded_modules" parameter
to the yang_init() function to control whether libyang should look
for embedded modules or not.
With this fix, "gen_northbound_callbacks" and "gen_yang_deviations"
won't emit "YANG model X not embedded, trying external file"
warnings anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
Since we've been writing out "frr version" and "frr defaults" for about
a year and a half now, we can now actually use them to manage defaults.
Signed-off-by: David Lamparter <equinox@diac24.net>
Load the startup configuration directly into the CLI shared candidate
configuration instead of loading it into a private candidate
configuration. This way we don't need to initialize the shared
candidate separately later as a copy of the running configuration,
which is a potentially expensive operation.
Also, make the northbound process SIGHUP correctly even when --tcli
is not used.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Adding a lock to protect the global running configuration doesn't
help much since the FRR daemons are not prepared to process
configuration changes in a pthread that is not the main one (a
whole lot of new protections would be necessary to prevent race
conditions).
This means the lock added by commit 83981138 only adds more
complexity for no benefit. Remove it now to simplify the code.
All northbound clients, including the gRPC one, should either run
in the main pthread or use synchronization primitives to process
configuration transactions in the main pthread.
This reverts commit 83981138fe.
Debian packaging when run finds a bunch of spelling errors:
I: frr: spelling-error-in-binary usr/bin/vtysh occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bfdd Amount of times Number of times
I: frr: spelling-error-in-binary usr/lib/frr/bgpd occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bgpd recieved received
I: frr: spelling-error-in-binary usr/lib/frr/isisd betweeen between
I: frr: spelling-error-in-binary usr/lib/frr/ospf6d Infomation Information
I: frr: spelling-error-in-binary usr/lib/frr/ospfd missmatch mismatch
I: frr: spelling-error-in-binary usr/lib/frr/pimd bootsrap bootstrap
I: frr: spelling-error-in-binary usr/lib/frr/pimd Unknwon Unknown
I: frr: spelling-error-in-binary usr/lib/frr/zebra Requsted Requested
I: frr: spelling-error-in-binary usr/lib/frr/zebra uknown unknown
I: frr: spelling-error-in-binary usr/lib/x86_64-linux-gnu/frr/libfrr.so.0.0.0 overriden overridden
This commit fixes all of them except the bgp `recieved` issue due to
it being part of json output. That one will need to go through
a deprecation cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add vtysh commands to add/del/clear/show filters across
all daemons and independently on each one. Add automake and
clippy boilerplate for those commands as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When the user specifies -N namespace allow it to influence the
frr_vtydir(DAEMON_VTY_DIR) to have namespace in it's path
like so: $frrstate_dir/<namespace>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When using -z, allow that to override the zapi domain socket
path. If using -N add the namespace name to the path to
$frr_statedir/<namespace>/zserv.api. If you don't specify
the -N or -z option then it is $frr_statedir/zserv.api
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add 'no log commands' cli and at the same time add a
--command-log-always to the daemon startup cli.
If --command-log-always is specified then all commands are
auto-logged and the 'no log commands' form of the command
is now ignored.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The vtypath_default variable had a possibility of being overwritten
due to size constraints. This fixes this issue.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Separate out the debug_init api to have 2 functions:
1) Function to register a callback
2) Function to initiate the cli.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The upcoming gRPC-based northbound plugin will run on a separate
pthread, and it will need to have access to the running configuration
global variable. Introduce a rw-lock to control concurrent access
to the running configuration. Add the lock inside the "nb_config"
structure so that it can be used to protect candidate configurations
as well (this might be necessary depending on the threading scheme
of future northbound plugins).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move call to nb_db_init() from nb_init() to frr_init() so that only
the FRR daemons will initialize the northbound database. This should
fix a few warnings when running some unit tests.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Confirmed commits allow the user to request an automatic rollback to
the previous configuration if the commit operation is not confirmed
within a number of minutes. This is particularly useful when the user
is accessing the CLI through the network (e.g. using SSH) and any
configuration change might cause an unexpected loss of connectivity
between the user and the managed device (e.g. misconfiguration of a
routing protocol). By using a confirmed commit, the user can rest
assured the connectivity will be restored after the given timeout
expires, avoiding the need to access the router physically to fix
the problem.
When "commit confirmed TIMEOUT" is used, a new "commit" command is
expected to confirm the previous commit before the given timeout
expires. If "commit confirmed TIMEOUT" is used while there's already
a confirmed-commit in progress, the confirmed-commit timeout is
reset to the new value.
In the current implementation, if other users perform commits while
there's a confirmed-commit in progress, all commits are rolled back
when the confirmed-commit timeout expires. It's recommended to use
the "configure exclusive" configuration mode to prevent unexpected
outcomes when using confirmed commits.
When an user exits from the configuration mode while there's a
confirmed-commit in progress, the commit is automatically rolled
back and the user is notified about it. In the future we might
want to prompt the user if he or she really wants to exit from the
configuration mode when there's a pending confirmed commit.
Needless to say, confirmed commit only work for configuration
commands converted to the new northbound model. vtysh support will
be implemented at a later time.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This cleans up watchfrr to be more "normal" like the other daemons in
terms of what it does in main(), i.e. using the full frr_*() call set.
Also, this changes the startup behaviour on watchfrr to stay attached on
the daemon's parent process until startup is really complete. This
should allow removing the "watchfrr.started" hack at some point.
Signed-off-by: David Lamparter <equinox@diac24.net>
This makes libfrr.so executable to print its version info. This is
useful if you need to check your libfrr.so matches your daemons.
Signed-off-by: David Lamparter <equinox@diac24.net>
Solution :
The following procedures would be performed :
1. Verify if the pid file for each daemon is present or not. If the file is not present, that means the
daemon is getting instantiated for the first time. So let it go ahead.
If the file is present proceed to point ‘2’.
2. Try fetching the properties of the pid file.
3. If it has RW lock, that means one instance of this the daemon is already running.
So stop moving ahead and do exit() else let it go ahead. Please note all above procedure happen at
the initial state of daemon’s instantiation, much before it starts any session with other
process/allocates resources etc.. and this verification do not have any impact of any
operations done later, if the verification succeeds.
Signed-off-by: bisdhdh sadhub@vmware.com
Add code to auto-create the ferr infrastructure as well as add
some initial error handling for vrf.c
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we read in a backup file, we should save the original
host.config so that we can put it back to the correct original
location after we read in the backup config.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the ability to specify the designated log level at startup.
--log-level <emergencies|alerts|critical|errors|warnings|notifications|informational|debugging>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Remove the special case code to use syslog for Cumulus.
They can specify this via startup now instead of having
a special compile flag for this option.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are starting a daemon, allow the user to specify:
--log <stdout|syslog|file:file_name>
This can be used on early startup to put the log files
where the end user wants them to show up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The read in of cli was happening prior to thread
event handling for non-integrated configs. This
is interesting for 2 reasons:
1) Read-in of integrated configs was after thread
event loop startup, so we had a difference of behavior
2) Read-in can cause a series of events that cause
us to attempt to communicate with zebra. The zebra
zapi connection only happens after the thread event
loop has been started. This can cause data that
is being written down to zebra to be lost and
no real way to notice that this has happened and
to recover gracefully.
Modify the code to create a thread event for read
in of client config.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we fail to read in the config file and we have
specified a backup of the backup, attempt to
read that information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
With a new version of clang 6.0, the compiler is detecting more
issues where we may be possibly be truncating the output string.
Fix by increasing the size of the output string to make the compiler
happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This allows running the daemons inside of Linux network namespaces
without messing with an additional mount/fs namespace (or a ton of
options).
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
if we're using --terminal, the daemon may in some cases exit fast enough
for the parent to see this; this resulted in a confusing/bogus "failed
to start, exited 0" message.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
adds a new all-daemon "debug memstats-at-exit" command. Also saves
memstats to a file in /tmp, useful if a long-running daemon is having
weird issues (e.g. in a user install).
Fixes: #437
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Start creating a counterpart to frr_init and frr_late_init.
Unfortunately, some daemons don't do any exit handling, this doesn't
change that just yet.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
c9c8d0d ("lib: close stdin/out/err in non-terminal case") overshot its
goal and closes stdin/stdout/stderr even when a daemon is running in
foreground. That means stdout logging & exit memory reporting are both
broken.
Reported-by: Lou Berger <lberger@labn.net>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
watchfrr doesn't know if there will be -u/-g options on the individual
daemons, so it doesn't know what the appropriate ownership is.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Disable this in the code to make it hard for people to shoot themselves
in the foot. It's only left as a remnant for development use.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This adds "@tcp" as new choice on the -z option present in zebra and the
protocol daemons. The --enable-tcp-zebra option on configure is no
longer needed, both UNIX and TCP socket support is always available.
Note that @tcp should not be used by default (e.g. in an init script),
and --enable-tcp-zebra should never have been in any distro package
builds, because
**** TCP-ZEBRA IS A SECURITY PROBLEM ****
It allows arbitrary local users to mess with the routing table and
inject bogus data -- and also ZAPI is not designed to be robust against
attacks.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Oops, forgot this path... in the --terminal case, stdio is closed when
the user ends the terminal session, but without terminal it was left
open.
(This caused a ssh session hang in the CentOS6 CI because the file
descriptors were still open, so ssh would keep the session alive...)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
zlog_* doesn't work in startup before we've loaded the real logging
configuration. Add some code to log to stderr for that window of time.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
If the paths for pid or vty don't exist, try creating them. Failure is
ignored (on EEXIST) or prints a non-fatal warning (other errors).
Fixes: #507
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This splits off privs_preinit(), which does the lookups for user and
group IDs. This is so the init code can create state directories while
still running as root.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
- SIGTSTP appropriately suspends the foreground terminal
- SIGINT causes the daemon to exit, regardless of -d
- SIGQUIT causes the daemon to daemonize, regardless of -d
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Block the parent process until the child has reached the main loop, e.g.
full service is available.
This means it's no longer neccessary to add a "safety sleep" for daemon
cross-dependencies, when using the -d startup option. This doesn't help
if -d isn't used.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>