diff mbox series

[v2,3/3] libkmod-module: add support for a patient module removal option

Message ID 20210810051602.3067384-4-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series kmod: add patient module removal support | expand

Commit Message

Luis Chamberlain Aug. 10, 2021, 5:16 a.m. UTC
When doing tests with modules such as scsi_debug, on test
frameworks such as fstests [0] you may run into situations
where false positives are triggered and a test fails but
the reality is that the test did not fail, but what did
fail was the removal of the module since the refcnt is
not yet 0. There are many races possible with modules,
and each subsystem would have their own quirks for why
these races exist. Additionally, even after the refcnt is
0, userspace cannot be sure in any way that the refcnt
will remain 0, and so its easy to run into a situation
where the refcnt is bumped from 0 after modprobe just
saw it as 0.

The scsi_debug races are all documented on korg#21233 [1],
some issues are part scsi_debug alone, however other parts
of the race issue are completely due to incomplete described
semantics for modules and the inability for userspace to be
100% sure that a race cannot happen to bump the refcnt
once it observes the refcnt is 0.

I've written a script mod-refcnt-race.sh attached to
korg#214015 [2] which helps us easily reproduce the races
described and abstracts them away from fstests. This can be
used to both verify the issue and also test the new kmod
patient module remover. The same bug report also has a
simple busy-open-block-device-sleep.c which busy opens
a block device, sleep 4 seconds in a loop, closes the
file descriptor, which can easily be used to force failure
of the module removal (delete_module()) and test this
new patient removal.

Although there are patches for fstests to account for
this [3] and work around it, a much suitable solution
long term is for these hacks to use a patient module
remover from kmod and modprobe directly. This patch does
just that, it adds these new arguments to modprobe and
rmmod:

 -p | --remove-patiently   patiently removes the module
 -t | --timeout            timeout in ms to remove the module

The patient removal works by polling until the module refcnt
/sys/module/name/refcnt goes to 0. One that does happens it
tries to also patiently remove the module. This second step
is required as a new temporary refcnt bump can occur after
a module refcnt becomes 0. By default modprobe and rmmod
will wait forever, however a timeout can be specified to
in milliseconds to ensure only that amount of time is
spent trying to remove the module.

If a timeout is used it applies cumulatively to both the
refcnt wait, and for trying to remove the module once
modprobe sees the refcnt is already 0.

This new fatures is useful for cases where you know the
refcnt is going to become 0, and it is just a matter of time,
such as with many fstests / blktests tests.

It is worth noting that prior to kernel v3.13 the kernel
module delete system call used to support a wait flag,
this was removed via commit 3f2b9c9cdf389 ("module: remove
rmmod --wait option.") in favor for a 10 second sleep in
kmod. A 10 second sleep does not address the issues, and
so we must implement a patient removal properly in kmod.

[0] git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
[1] https://bugzilla.kernel.org/show_bug.cgi?id=21233
[2] https://bugzilla.kernel.org/show_bug.cgi?id=214015
[3] https://lkml.kernel.org/r/20210727201045.2540681-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 libkmod/docs/libkmod-sections.txt  |   4 +
 libkmod/libkmod-module.c           | 328 +++++++++++++++++++++++++++++
 libkmod/libkmod.c                  |  71 +++++++
 libkmod/libkmod.h                  |   7 +
 libkmod/libkmod.sym                |   4 +
 libkmod/python/kmod/_libkmod_h.pxd |   3 +
 libkmod/python/kmod/module.pyx     |   4 +
 man/modprobe.xml                   |  59 ++++++
 man/rmmod.xml                      |  60 ++++++
 tools/modprobe.c                   |  21 +-
 tools/remove.c                     |  12 +-
 tools/rmmod.c                      |  27 ++-
 12 files changed, 584 insertions(+), 16 deletions(-)

Comments

Lucas De Marchi Sept. 23, 2021, 8:51 a.m. UTC | #1
On Mon, Aug 09, 2021 at 10:16:02PM -0700, Luis Chamberlain wrote:
>When doing tests with modules such as scsi_debug, on test
>frameworks such as fstests [0] you may run into situations
>where false positives are triggered and a test fails but
>the reality is that the test did not fail, but what did
>fail was the removal of the module since the refcnt is
>not yet 0. There are many races possible with modules,
>and each subsystem would have their own quirks for why
>these races exist. Additionally, even after the refcnt is
>0, userspace cannot be sure in any way that the refcnt
>will remain 0, and so its easy to run into a situation
>where the refcnt is bumped from 0 after modprobe just
>saw it as 0.
>
>The scsi_debug races are all documented on korg#21233 [1],
>some issues are part scsi_debug alone, however other parts
>of the race issue are completely due to incomplete described
>semantics for modules and the inability for userspace to be
>100% sure that a race cannot happen to bump the refcnt
>once it observes the refcnt is 0.
>
>I've written a script mod-refcnt-race.sh attached to
>korg#214015 [2] which helps us easily reproduce the races
>described and abstracts them away from fstests. This can be
>used to both verify the issue and also test the new kmod
>patient module remover. The same bug report also has a
>simple busy-open-block-device-sleep.c which busy opens
>a block device, sleep 4 seconds in a loop, closes the
>file descriptor, which can easily be used to force failure
>of the module removal (delete_module()) and test this
>new patient removal.
>
>Although there are patches for fstests to account for
>this [3] and work around it, a much suitable solution
>long term is for these hacks to use a patient module
>remover from kmod and modprobe directly. This patch does
>just that, it adds these new arguments to modprobe and
>rmmod:
>
> -p | --remove-patiently   patiently removes the module
> -t | --timeout            timeout in ms to remove the module
>
>The patient removal works by polling until the module refcnt
>/sys/module/name/refcnt goes to 0. One that does happens it
>tries to also patiently remove the module. This second step
>is required as a new temporary refcnt bump can occur after
>a module refcnt becomes 0. By default modprobe and rmmod
>will wait forever, however a timeout can be specified to
>in milliseconds to ensure only that amount of time is
>spent trying to remove the module.
>
>If a timeout is used it applies cumulatively to both the
>refcnt wait, and for trying to remove the module once
>modprobe sees the refcnt is already 0.
>
>This new fatures is useful for cases where you know the
>refcnt is going to become 0, and it is just a matter of time,
>such as with many fstests / blktests tests.
>
>It is worth noting that prior to kernel v3.13 the kernel
>module delete system call used to support a wait flag,
>this was removed via commit 3f2b9c9cdf389 ("module: remove
>rmmod --wait option.") in favor for a 10 second sleep in
>kmod. A 10 second sleep does not address the issues, and
>so we must implement a patient removal properly in kmod.

The story was not kind like that. It wasn't removed "in favor for a 10
second sleep" in the sense that the sleep would replace the wait.

It was actually for "this wait logic in the kernel is complex and
buggy, let's try to remove it". So we decided to deprecate it and add
a sleep rmmod to see if anyone complained. 1 year later of no complains
we removed it from kernel. This was all after noticing we had never
implemented the wait logic in modprobe - it was only done in rmmod.


>
>[0] git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
>[1] https://bugzilla.kernel.org/show_bug.cgi?id=21233
>[2] https://bugzilla.kernel.org/show_bug.cgi?id=214015
>[3] https://lkml.kernel.org/r/20210727201045.2540681-1-mcgrof@kernel.org
>Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
>---
> libkmod/docs/libkmod-sections.txt  |   4 +
> libkmod/libkmod-module.c           | 328 +++++++++++++++++++++++++++++
> libkmod/libkmod.c                  |  71 +++++++
> libkmod/libkmod.h                  |   7 +
> libkmod/libkmod.sym                |   4 +
> libkmod/python/kmod/_libkmod_h.pxd |   3 +
> libkmod/python/kmod/module.pyx     |   4 +
> man/modprobe.xml                   |  59 ++++++
> man/rmmod.xml                      |  60 ++++++
> tools/modprobe.c                   |  21 +-
> tools/remove.c                     |  12 +-
> tools/rmmod.c                      |  27 ++-
> 12 files changed, 584 insertions(+), 16 deletions(-)
>
>diff --git a/libkmod/docs/libkmod-sections.txt b/libkmod/docs/libkmod-sections.txt
>index e59ab7a..a5b71d2 100644
>--- a/libkmod/docs/libkmod-sections.txt
>+++ b/libkmod/docs/libkmod-sections.txt
>@@ -12,6 +12,8 @@ kmod_dump_index
>
> kmod_set_log_priority
> kmod_get_log_priority
>+kmod_get_refcnt_timeout
>+kmod_set_refcnt_timeout
> kmod_set_log_fn
> kmod_get_userdata
> kmod_set_userdata
>@@ -56,6 +58,7 @@ kmod_module_unref_list
> kmod_module_insert_module
> kmod_module_probe_insert_module
> kmod_module_remove_module
>+kmod_module_remove_module_wait
>
> kmod_module_get_module
> kmod_module_get_dependencies
>@@ -102,5 +105,6 @@ kmod_module_get_initstate
> kmod_module_initstate_str
> kmod_module_get_size
> kmod_module_get_refcnt
>+kmod_module_get_refcnt_wait
> kmod_module_get_holders
> </SECTION>
>diff --git a/libkmod/libkmod-module.c b/libkmod/libkmod-module.c
>index 04bb4d9..2d54d1a 100644
>--- a/libkmod/libkmod-module.c
>+++ b/libkmod/libkmod-module.c
>@@ -30,6 +30,9 @@
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>+#include <poll.h>
>+#include <time.h>
>+#include <math.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
>@@ -802,6 +805,143 @@ KMOD_EXPORT int kmod_module_remove_module(struct kmod_module *mod,
> 	return err;
> }
>
>+static int timespec_to_ms(struct timespec *t)
>+{
>+	return (t->tv_sec * 1000) + lround(t->tv_nsec / 1000000);
>+}
>+
>+static int time_delta_ms(struct timespec *before, struct timespec *after)
>+{
>+	if (!before || !after)
>+		return 0;
>+	return timespec_to_ms(after) - timespec_to_ms(before);
>+}

we have a similar thing in util.[ch]

>+
>+/**
>+ * kmod_module_remove_module_wait:
>+ * @mod: kmod module
>+ * @flags: flags to pass to Linux kernel when removing the module. The only valid flag is
>+ * KMOD_REMOVE_FORCE: force remove module regardless if it's still in
>+ * use by a kernel subsystem or other process;
>+ * KMOD_REMOVE_NOWAIT is always enforced, causing us to pass O_NONBLOCK to
>+ * delete_module(2). We do the waiting in userspace, if a wait was desired.
>+ *
>+ * Remove a module from Linux kernel patiently.
>+ *
>+ * Returns: 0 on success or < 0 on failure.
>+ */
>+KMOD_EXPORT int kmod_module_remove_module_wait(struct kmod_module *mod,
>+					       unsigned int flags,
>+					       bool wait)

why do you have kmod_get_refcnt_timeout/kmod_set_refcnt_timeout instead
of just doing s/bool wait/unsigned int wait_msec/)?

>+{
>+	int err, err_time, t_delta, t_delta_print, orig_timeout;
>+	int time_between_prints = 500; /* 1/2 second */
>+	int timeout_ms;
>+	struct timespec t1, t2, t_print = { .tv_sec = 0, .tv_nsec = 0};
>+	bool printed_msg = false;
>+
>+	if (mod == NULL)
>+		return -ENOENT;
>+
>+	if (wait)
>+		timeout_ms = kmod_get_refcnt_timeout(mod->ctx);
>+
>+	if (!wait || !timeout_ms)
>+		return kmod_module_remove_module(mod, flags);
>+
>+	err_time = clock_gettime(CLOCK_MONOTONIC, &t1);
>+	if (err_time != 0)
>+		return kmod_module_remove_module(mod, flags);
>+
>+	/*
>+	 * We now know an original patient removal was requested and
>+	 * an associated timeout was also provided. timeout_removal_ms
>+	 * should be set with what we are left with.
>+	 */
>+	timeout_ms = orig_timeout = kmod_get_removal_timeout(mod->ctx);
>+	DBG(mod->ctx, "removal timeout is %d for '%s'\n", timeout_ms, mod->name);
>+
>+	/*
>+	 * Filter out other flags and force ONONBLOCK
>+	 *
>+	 * This is true even for the patient removal. The wait flag was
>+	 * a now deprecated flag, see kernel commits:
>+	 *
>+	 * 3f2b9c9cdf389e ("module: remove rmmod --wait option." on v3.13
>+	 * 79465d2fd48e68 ("module: remove warning about waiting module removal.") on v3.15
>+	 *
>+	 * Patient removal is a userspace implementation.
>+	 */
>+	flags &= KMOD_REMOVE_FORCE;
>+	flags |= KMOD_REMOVE_NOWAIT;
>+
>+	while (true) {
>+		err = delete_module(mod->name, flags);
>+		if (err == 0)
>+			return 0;
>+		err = -errno;
>+
>+		if (timeout_ms == 0) {
>+			ERR(mod->ctx, "timeout (%d) trying to remove module immediately '%s': %m %s\n",
>+			    timeout_ms, mod->name, strerror(errno));
>+			return err;
>+		}
>+
>+		/* Wait forever request */
>+		if (timeout_ms < 0) {
>+			if (!printed_msg) {
>+				NOTICE(mod->ctx, "%s trying forever to remove...\n",
>+				       mod->name);
>+				printed_msg = true;
>+			}
>+			continue;
>+		}
>+
>+		/* All below is when we know we have a timeout */
>+
>+		err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
>+		if (err_time != 0) {
>+			ERR(mod->ctx, "Using CLOCK_MONOTONIC failed, cannot wait, prior module removal module failed for '%s': %m %s\n",
>+			    mod->name, strerror(errno));
>+			return err;
>+		}
>+
>+		t_delta = time_delta_ms(&t1, &t2);
>+		DBG(mod->ctx, "%s removal time delta: %d ms\n", mod->name, t_delta);
>+
>+		if (t_delta >= orig_timeout) {
>+			ERR(mod->ctx, "timeout (%d) trying to remove module '%s': %m %s\n",
>+			    timeout_ms, mod->name, strerror(errno));
>+			return err;
>+		}
>+
>+		/*
>+		 * We have more time to remove the module. The above check
>+		 * also ensures that timeout_ms won't ever be 0.
>+		 */
>+		timeout_ms = orig_timeout - t_delta;
>+
>+		if (!printed_msg) {
>+			NOTICE(mod->ctx, "%s removal failed. Re-trying for %d more ms (total timeout of %d ms)\n",
>+			       mod->name, timeout_ms, orig_timeout);
>+			t_print = t2;
>+			printed_msg = true;
>+		} else {
>+			/*
>+			 * Computes the delta between our last print
>+			 * and the time of our current current
>+			 * evaluation.
>+			 */
>+			t_delta_print = time_delta_ms(&t_print, &t2);
>+			if (t_delta_print < time_between_prints)
>+				continue;
>+			NOTICE(mod->ctx, "%s removal failed. Re-trying for %d more ms (total timeout of %d ms)\n",
>+			       mod->name, timeout_ms, orig_timeout);
>+			t_print = t2;
>+		}
>+	}
>+}
>+
> extern long init_module(const void *mem, unsigned long len, const char *args);
>
> /**
>@@ -1926,6 +2066,194 @@ KMOD_EXPORT int kmod_module_get_refcnt(const struct kmod_module *mod)
> 	return (int)refcnt;
> }
>
>+/**
>+ * kmod_module_get_refcnt_wait:
>+ * @mod: kmod module
>+ * @wait: if true will wait until the refcnt is 0 patiently forever
>+ *
>+ * Get the ref count of this @mod, as returned by Linux Kernel, by reading
>+ * /sys filesystem and wait patiently until the refcnt is 0, if the wait bool
>+ * argument is set to true, otherwise this behaves just as the call
>+ * kmod_module_get_refcnt(). Enabling wait is useful if you know for sure that
>+ * the module is quiescing soon, and so you should be able to remove it soon.
>+ * If wait is enabled, we wait a predetermined timeout setting which is
>+ * is configurable with kmod_set_refcnt_timeout(), by default we wait
>+ * forever.
>+ *
>+ * Returns: the reference count on success or < 0 on failure.
>+ */
>+KMOD_EXPORT int kmod_module_get_refcnt_wait(const struct kmod_module *mod, bool wait)
>+{
>+	struct pollfd fdset;
>+	int err, err_time, num_fds=1, t_delta, t_delta_print, orig_timeout;
>+	int time_between_prints = 500; /* 1/2 second */
>+	int timeout_ms;
>+	long refcnt;
>+	struct timespec t1, t2, t_print = { .tv_sec = 0, .tv_nsec = 0};
>+	bool printed_msg = false;
>+
>+	if (wait)
>+		timeout_ms = kmod_get_refcnt_timeout(mod->ctx);
>+
>+	if (!wait || !timeout_ms)
>+		return kmod_module_get_refcnt(mod);
>+
>+	orig_timeout = timeout_ms;
>+
>+	/*
>+	 * The refcnt may be something like 2, and poll() may return when
>+	 * the refcnt becomes 1, and so we must also check for the overall
>+	 * time ran and on each instance make the timeout available shorter
>+	 *
>+	 * If using the CLOCK_MONOTONIC failed and we have a timeout set, we
>+	 * simply cannot wait as we cannot trust time differences.
>+	 */
>+	err_time = clock_gettime(CLOCK_MONOTONIC, &t1);
>+	if (err_time != 0) {
>+		ERR(mod->ctx, "Using the CLOCK_MONOTONIC failed (%s) so processing removal for %s without a wait for the refcnt\n",
>+		    strerror(errno), mod->name);
>+		return kmod_module_get_refcnt(mod);
>+	}
>+
>+	kmod_set_removal_timeout(mod->ctx, orig_timeout);
>+
>+	while (true) {
>+		DBG(mod->ctx, "%s about to poll for refcnt using timeout %d ms\n", mod->name, timeout_ms);
>+		memset(&fdset, 0, sizeof(struct pollfd));
>+		fdset.events = POLLPRI | POLLIN;
>+		fdset.fd = kmod_module_get_refcnt_fd(mod);
>+		if (fdset.fd < 0)
>+			return fdset.fd;
>+		err = poll(&fdset, num_fds, timeout_ms);
>+		if (err == -1) {
>+			DBG(mod->ctx, "%s refcnt poll failed with err: %s\n", mod->name, strerror(errno));
>+			goto err;
>+		}
>+		if (err == 0) { /* timeout before fds were read */
>+			NOTICE(mod->ctx, "%s refcnt did not become 0 after %d ms\n",
>+			       mod->name, orig_timeout);
>+			kmod_set_removal_timeout(mod->ctx, 0);
>+			err = -ETIMEDOUT;
>+			goto err;
>+		}
>+		if (err != num_fds) { /* on success, we only asked for one fd */
>+			err = -EINVAL;
>+			goto err;
>+		}
>+		if (!fdset.revents) {
>+			close(fdset.fd);
>+			continue;
>+		}
>+		if ((!fdset.revents & POLLIN)) {
>+			err = -EINVAL;
>+			goto err;
>+		}
>+
>+		err = read_str_long(fdset.fd, &refcnt, 10);
>+		if (err < 0)
>+			goto err;
>+
>+		close(fdset.fd);
>+
>+		if ((refcnt <= 0) || (refcnt > 0 && !wait)) {
>+			NOTICE(mod->ctx, "%s refcnt is %d\n", mod->name, (int) refcnt);
>+			err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
>+			if (err_time != 0)
>+				kmod_set_removal_timeout(mod->ctx, 0);

I don't follow why kmod_module_get_refcnt_wait() is setting the removal
timeout at all. This seems to be doing it behind users back.

The idea of using the refcnt fd was actually that then
users could integrate it on their mainloops (probably using epoll). And
then the same impl could be shared by kmod_module_remove_module_wait(),
which would do a select().

This seems more like a kmod_module_refcnt_wait_zero() using poll()
+ adjusting the timeout

...

>diff --git a/tools/rmmod.c b/tools/rmmod.c
>index 3942e7b..77c813a 100644
>--- a/tools/rmmod.c
>+++ b/tools/rmmod.c
>@@ -35,10 +35,12 @@
> #define DEFAULT_VERBOSE LOG_ERR
> static int verbose = DEFAULT_VERBOSE;
> static int use_syslog;
>+static int do_remove_patient;
>
>-static const char cmdopts_s[] = "fsvVwh";
>+static const char cmdopts_s[] = "fpsvVwh";
> static const struct option cmdopts[] = {
> 	{"force", no_argument, 0, 'f'},
>+	{"remove-patiently", no_argument, 0, 'p'},
> 	{"syslog", no_argument, 0, 's'},
> 	{"verbose", no_argument, 0, 'v'},
> 	{"version", no_argument, 0, 'V'},
>@@ -51,13 +53,14 @@ static void help(void)
> 	printf("Usage:\n"
> 		"\t%s [options] modulename ...\n"
> 		"Options:\n"
>-		"\t-f, --force       forces a module unload and may crash your\n"
>-		"\t                  machine. This requires Forced Module Removal\n"
>-		"\t                  option in your kernel. DANGEROUS\n"
>-		"\t-s, --syslog      print to syslog, not stderr\n"
>-		"\t-v, --verbose     enables more messages\n"
>-		"\t-V, --version     show version\n"
>-		"\t-h, --help        show this help\n",
>+		"\t-f, --force            forces a module unload and may crash your\n"
>+		"\t                       machine. This requires Forced Module Removal\n"
>+		"\t                       option in your kernel. DANGEROUS\n"
>+		"\t-p, --remove-patiently patiently tries to remove the module\n"
>+		"\t-s, --syslog           print to syslog, not stderr\n"
>+		"\t-v, --verbose          enables more messages\n"
>+		"\t-V, --version          show version\n"
>+		"\t-h, --help             show this help\n",
> 		program_invocation_short_name);
> }
>
>@@ -93,7 +96,7 @@ static int check_module_inuse(struct kmod_module *mod) {
> 		return -EBUSY;
> 	}
>
>-	ret = kmod_module_get_refcnt(mod);
>+	ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);

for tool implementation, shouldn't we just ignore
kmod_module_get_refcnt() and proceed to
kmod_module_remove_module_wait()?

> 	if (ret > 0) {
> 		ERR("Module %s is in use\n", kmod_module_get_name(mod));
> 		return -EBUSY;
>@@ -120,6 +123,9 @@ static int do_rmmod(int argc, char *argv[])
> 		case 'f':
> 			flags |= KMOD_REMOVE_FORCE;
> 			break;
>+		case 'p':
>+			do_remove_patient = 1;
>+			break;
> 		case 's':
> 			use_syslog = 1;
> 			break;
>@@ -178,7 +184,8 @@ static int do_rmmod(int argc, char *argv[])
> 			goto next;
> 		}
>
>-		err = kmod_module_remove_module(mod, flags);
>+		err = kmod_module_remove_module_wait(mod, flags,
>+						     do_remove_patient);
> 		if (err < 0) {
> 			ERR("could not remove module %s: %s\n", arg,
> 			    strerror(-err));
>-- 
>2.30.2
>
Luis Chamberlain Sept. 28, 2021, 1:38 a.m. UTC | #2
On Thu, Sep 23, 2021 at 01:51:56AM -0700, Lucas De Marchi wrote:
> On Mon, Aug 09, 2021 at 10:16:02PM -0700, Luis Chamberlain wrote:
> The story was not kind like that. It wasn't removed "in favor for a 10
> second sleep" in the sense that the sleep would replace the wait.
> 
> It was actually for "this wait logic in the kernel is complex and
> buggy, let's try to remove it". So we decided to deprecate it and add
> a sleep rmmod to see if anyone complained. 1 year later of no complains
> we removed it from kernel. This was all after noticing we had never
> implemented the wait logic in modprobe - it was only done in rmmod.

OK fixed the commit log thanks!

> > --- a/libkmod/libkmod-module.c
> > +++ b/libkmod/libkmod-module.c
> > @@ -30,6 +30,9 @@
> > #include <stdlib.h>
> > #include <string.h>
> > #include <unistd.h>
> > +#include <poll.h>
> > +#include <time.h>
> > +#include <math.h>
> > #include <sys/mman.h>
> > #include <sys/stat.h>
> > #include <sys/syscall.h>
> > @@ -802,6 +805,143 @@ KMOD_EXPORT int kmod_module_remove_module(struct kmod_module *mod,
> > 	return err;
> > }
> > 
> > +static int timespec_to_ms(struct timespec *t)
> > +{
> > +	return (t->tv_sec * 1000) + lround(t->tv_nsec / 1000000);
> > +}
> > +
> > +static int time_delta_ms(struct timespec *before, struct timespec *after)
> > +{
> > +	if (!before || !after)
> > +		return 0;
> > +	return timespec_to_ms(after) - timespec_to_ms(before);
> > +}
> 
> we have a similar thing in util.[ch]

Alright, this OK?

diff --git a/shared/util.c b/shared/util.c
index b487b5f..b911e63 100644
--- a/shared/util.c
+++ b/shared/util.c
@@ -466,6 +466,19 @@ unsigned long long ts_usec(const struct timespec *ts)
 	       (unsigned long long) ts->tv_nsec / NSEC_PER_USEC;
 }
 
+unsigned long long ts_msec(const struct timespec *ts)
+{
+	return ts_usec(ts) * 1000;
+}
+
+unsigned long long ts_delta_ms(const struct timespec *before,
+			       const struct timespec *after)
+{
+	if (!before || !after)
+		return 0;
+	return ts_msec(after) - ts_msec(before);
+}
+
 unsigned long long stat_mstamp(const struct stat *st)
 {
 #ifdef HAVE_STRUCT_STAT_ST_MTIM
diff --git a/shared/util.h b/shared/util.h
index c6a31df..f8c28e7 100644
--- a/shared/util.h
+++ b/shared/util.h
@@ -43,6 +43,9 @@ int mkdir_p(const char *path, int len, mode_t mode);
 int mkdir_parents(const char *path, mode_t mode);
 unsigned long long stat_mstamp(const struct stat *st);
 unsigned long long ts_usec(const struct timespec *ts);
+unsigned long long ts_msec(const struct timespec *ts);
+unsigned long long ts_delta_ms(const struct timespec *before,
+			       const struct timespec *after);
 
 /* endianess and alignments                                                 */
 /* ************************************************************************ */

> > +/**
> > + * kmod_module_remove_module_wait:
> > + * @mod: kmod module
> > + * @flags: flags to pass to Linux kernel when removing the module. The only valid flag is
> > + * KMOD_REMOVE_FORCE: force remove module regardless if it's still in
> > + * use by a kernel subsystem or other process;
> > + * KMOD_REMOVE_NOWAIT is always enforced, causing us to pass O_NONBLOCK to
> > + * delete_module(2). We do the waiting in userspace, if a wait was desired.
> > + *
> > + * Remove a module from Linux kernel patiently.
> > + *
> > + * Returns: 0 on success or < 0 on failure.
> > + */
> > +KMOD_EXPORT int kmod_module_remove_module_wait(struct kmod_module *mod,
> > +					       unsigned int flags,
> > +					       bool wait)
> 
> why do you have kmod_get_refcnt_timeout/kmod_set_refcnt_timeout instead
> of just doing s/bool wait/unsigned int wait_msec/)?

Because it lets us do a smaller change on the respetive tools:

tools/modprobe.c-               flags |= KMOD_REMOVE_FORCE;
tools/modprobe.c-
tools/modprobe.c:       err = kmod_module_remove_module_wait(mod, flags, do_remove_patient);
tools/modprobe.c-       if (err == -EEXIST) {
tools/modprobe.c-               if (!first_time)
--
tools/remove.c-         goto unref;
tools/remove.c-
tools/remove.c: err = kmod_module_remove_module_wait(mod, 0, do_remove_patient);
tools/remove.c- if (err < 0)
tools/remove.c-         goto unref;
--
tools/rmmod.c-          }
tools/rmmod.c-
tools/rmmod.c:          err = kmod_module_remove_module_wait(mod, flags,
tools/rmmod.c- do_remove_patient);
tools/rmmod.c-          if (err < 0) {

That is, the timeout is contextual of the context.

> > +		if ((refcnt <= 0) || (refcnt > 0 && !wait)) {
> > +			NOTICE(mod->ctx, "%s refcnt is %d\n", mod->name, (int) refcnt);
> > +			err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
> > +			if (err_time != 0)
> > +				kmod_set_removal_timeout(mod->ctx, 0);
> 
> I don't follow why kmod_module_get_refcnt_wait() is setting the removal
> timeout at all. This seems to be doing it behind users back.

Because if clock_gettime() returns something other than 0 then
your clock is messed up and you should not be using a timeout, so
yes, we correct that then. We can scream loud, or use a default.

I figured not using one would be better in that case.

> The idea of using the refcnt fd was actually that then
> users could integrate it on their mainloops (probably using epoll). And
> then the same impl could be shared by kmod_module_remove_module_wait(),
> which would do a select().
> 
> This seems more like a kmod_module_refcnt_wait_zero() using poll()
> + adjusting the timeout

Sorry don't follow. And since I have one day before vacation, I suppose
I won't get to this until I get back. But I'd be happy if you massage
it as you see fit as you're used to the code base and I'm sure have
a better idea of what likely is best for the library.

> > +	ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);
> 
> for tool implementation, shouldn't we just ignore
> kmod_module_get_refcnt() and proceed to
> kmod_module_remove_module_wait()?

I'll let you decide. Otherwise this will have to wait until I get back
from vacation.

  Luis
Lucas De Marchi Sept. 29, 2021, 6:48 p.m. UTC | #3
On Mon, Sep 27, 2021 at 06:38:40PM -0700, Luis Chamberlain wrote:
>On Thu, Sep 23, 2021 at 01:51:56AM -0700, Lucas De Marchi wrote:
>> On Mon, Aug 09, 2021 at 10:16:02PM -0700, Luis Chamberlain wrote:
>> The story was not kind like that. It wasn't removed "in favor for a 10
>> second sleep" in the sense that the sleep would replace the wait.
>>
>> It was actually for "this wait logic in the kernel is complex and
>> buggy, let's try to remove it". So we decided to deprecate it and add
>> a sleep rmmod to see if anyone complained. 1 year later of no complains
>> we removed it from kernel. This was all after noticing we had never
>> implemented the wait logic in modprobe - it was only done in rmmod.
>
>OK fixed the commit log thanks!
>
>> > --- a/libkmod/libkmod-module.c
>> > +++ b/libkmod/libkmod-module.c
>> > @@ -30,6 +30,9 @@
>> > #include <stdlib.h>
>> > #include <string.h>
>> > #include <unistd.h>
>> > +#include <poll.h>
>> > +#include <time.h>
>> > +#include <math.h>
>> > #include <sys/mman.h>
>> > #include <sys/stat.h>
>> > #include <sys/syscall.h>
>> > @@ -802,6 +805,143 @@ KMOD_EXPORT int kmod_module_remove_module(struct kmod_module *mod,
>> > 	return err;
>> > }
>> >
>> > +static int timespec_to_ms(struct timespec *t)
>> > +{
>> > +	return (t->tv_sec * 1000) + lround(t->tv_nsec / 1000000);
>> > +}
>> > +
>> > +static int time_delta_ms(struct timespec *before, struct timespec *after)
>> > +{
>> > +	if (!before || !after)
>> > +		return 0;
>> > +	return timespec_to_ms(after) - timespec_to_ms(before);
>> > +}
>>
>> we have a similar thing in util.[ch]
>
>Alright, this OK?
>
>diff --git a/shared/util.c b/shared/util.c
>index b487b5f..b911e63 100644
>--- a/shared/util.c
>+++ b/shared/util.c
>@@ -466,6 +466,19 @@ unsigned long long ts_usec(const struct timespec *ts)
> 	       (unsigned long long) ts->tv_nsec / NSEC_PER_USEC;
> }
>
>+unsigned long long ts_msec(const struct timespec *ts)
>+{
>+	return ts_usec(ts) * 1000;
>+}
>+
>+unsigned long long ts_delta_ms(const struct timespec *before,
>+			       const struct timespec *after)
>+{
>+	if (!before || !after)
>+		return 0;
>+	return ts_msec(after) - ts_msec(before);
>+}
>+
> unsigned long long stat_mstamp(const struct stat *st)
> {
> #ifdef HAVE_STRUCT_STAT_ST_MTIM
>diff --git a/shared/util.h b/shared/util.h
>index c6a31df..f8c28e7 100644
>--- a/shared/util.h
>+++ b/shared/util.h
>@@ -43,6 +43,9 @@ int mkdir_p(const char *path, int len, mode_t mode);
> int mkdir_parents(const char *path, mode_t mode);
> unsigned long long stat_mstamp(const struct stat *st);
> unsigned long long ts_usec(const struct timespec *ts);
>+unsigned long long ts_msec(const struct timespec *ts);
>+unsigned long long ts_delta_ms(const struct timespec *before,
>+			       const struct timespec *after);
>
> /* endianess and alignments                                                 */
> /* ************************************************************************ */
>
>> > +/**
>> > + * kmod_module_remove_module_wait:
>> > + * @mod: kmod module
>> > + * @flags: flags to pass to Linux kernel when removing the module. The only valid flag is
>> > + * KMOD_REMOVE_FORCE: force remove module regardless if it's still in
>> > + * use by a kernel subsystem or other process;
>> > + * KMOD_REMOVE_NOWAIT is always enforced, causing us to pass O_NONBLOCK to
>> > + * delete_module(2). We do the waiting in userspace, if a wait was desired.
>> > + *
>> > + * Remove a module from Linux kernel patiently.
>> > + *
>> > + * Returns: 0 on success or < 0 on failure.
>> > + */
>> > +KMOD_EXPORT int kmod_module_remove_module_wait(struct kmod_module *mod,
>> > +					       unsigned int flags,
>> > +					       bool wait)
>>
>> why do you have kmod_get_refcnt_timeout/kmod_set_refcnt_timeout instead
>> of just doing s/bool wait/unsigned int wait_msec/)?
>
>Because it lets us do a smaller change on the respetive tools:
>
>tools/modprobe.c-               flags |= KMOD_REMOVE_FORCE;
>tools/modprobe.c-
>tools/modprobe.c:       err = kmod_module_remove_module_wait(mod, flags, do_remove_patient);
>tools/modprobe.c-       if (err == -EEXIST) {
>tools/modprobe.c-               if (!first_time)
>--
>tools/remove.c-         goto unref;
>tools/remove.c-
>tools/remove.c: err = kmod_module_remove_module_wait(mod, 0, do_remove_patient);
>tools/remove.c- if (err < 0)
>tools/remove.c-         goto unref;
>--
>tools/rmmod.c-          }
>tools/rmmod.c-
>tools/rmmod.c:          err = kmod_module_remove_module_wait(mod, flags,
>tools/rmmod.c- do_remove_patient);
>tools/rmmod.c-          if (err < 0) {
>
>That is, the timeout is contextual of the context.
>
>> > +		if ((refcnt <= 0) || (refcnt > 0 && !wait)) {
>> > +			NOTICE(mod->ctx, "%s refcnt is %d\n", mod->name, (int) refcnt);
>> > +			err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
>> > +			if (err_time != 0)
>> > +				kmod_set_removal_timeout(mod->ctx, 0);
>>
>> I don't follow why kmod_module_get_refcnt_wait() is setting the removal
>> timeout at all. This seems to be doing it behind users back.
>
>Because if clock_gettime() returns something other than 0 then
>your clock is messed up and you should not be using a timeout, so
>yes, we correct that then. We can scream loud, or use a default.
>
>I figured not using one would be better in that case.
>
>> The idea of using the refcnt fd was actually that then
>> users could integrate it on their mainloops (probably using epoll). And
>> then the same impl could be shared by kmod_module_remove_module_wait(),
>> which would do a select().
>>
>> This seems more like a kmod_module_refcnt_wait_zero() using poll()
>> + adjusting the timeout
>
>Sorry don't follow. And since I have one day before vacation, I suppose
>I won't get to this until I get back. But I'd be happy if you massage
>it as you see fit as you're used to the code base and I'm sure have
>a better idea of what likely is best for the library.


sure, np. I will take a look as time permits.

thanks
Lucas De Marchi

>
>> > +	ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);
>>
>> for tool implementation, shouldn't we just ignore
>> kmod_module_get_refcnt() and proceed to
>> kmod_module_remove_module_wait()?
>
>I'll let you decide. Otherwise this will have to wait until I get back
>from vacation.
>
>  Luis
diff mbox series

Patch

diff --git a/libkmod/docs/libkmod-sections.txt b/libkmod/docs/libkmod-sections.txt
index e59ab7a..a5b71d2 100644
--- a/libkmod/docs/libkmod-sections.txt
+++ b/libkmod/docs/libkmod-sections.txt
@@ -12,6 +12,8 @@  kmod_dump_index
 
 kmod_set_log_priority
 kmod_get_log_priority
+kmod_get_refcnt_timeout
+kmod_set_refcnt_timeout
 kmod_set_log_fn
 kmod_get_userdata
 kmod_set_userdata
@@ -56,6 +58,7 @@  kmod_module_unref_list
 kmod_module_insert_module
 kmod_module_probe_insert_module
 kmod_module_remove_module
+kmod_module_remove_module_wait
 
 kmod_module_get_module
 kmod_module_get_dependencies
@@ -102,5 +105,6 @@  kmod_module_get_initstate
 kmod_module_initstate_str
 kmod_module_get_size
 kmod_module_get_refcnt
+kmod_module_get_refcnt_wait
 kmod_module_get_holders
 </SECTION>
diff --git a/libkmod/libkmod-module.c b/libkmod/libkmod-module.c
index 04bb4d9..2d54d1a 100644
--- a/libkmod/libkmod-module.c
+++ b/libkmod/libkmod-module.c
@@ -30,6 +30,9 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <poll.h>
+#include <time.h>
+#include <math.h>
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
@@ -802,6 +805,143 @@  KMOD_EXPORT int kmod_module_remove_module(struct kmod_module *mod,
 	return err;
 }
 
+static int timespec_to_ms(struct timespec *t)
+{
+	return (t->tv_sec * 1000) + lround(t->tv_nsec / 1000000);
+}
+
+static int time_delta_ms(struct timespec *before, struct timespec *after)
+{
+	if (!before || !after)
+		return 0;
+	return timespec_to_ms(after) - timespec_to_ms(before);
+}
+
+/**
+ * kmod_module_remove_module_wait:
+ * @mod: kmod module
+ * @flags: flags to pass to Linux kernel when removing the module. The only valid flag is
+ * KMOD_REMOVE_FORCE: force remove module regardless if it's still in
+ * use by a kernel subsystem or other process;
+ * KMOD_REMOVE_NOWAIT is always enforced, causing us to pass O_NONBLOCK to
+ * delete_module(2). We do the waiting in userspace, if a wait was desired.
+ *
+ * Remove a module from Linux kernel patiently.
+ *
+ * Returns: 0 on success or < 0 on failure.
+ */
+KMOD_EXPORT int kmod_module_remove_module_wait(struct kmod_module *mod,
+					       unsigned int flags,
+					       bool wait)
+{
+	int err, err_time, t_delta, t_delta_print, orig_timeout;
+	int time_between_prints = 500; /* 1/2 second */
+	int timeout_ms;
+	struct timespec t1, t2, t_print = { .tv_sec = 0, .tv_nsec = 0};
+	bool printed_msg = false;
+
+	if (mod == NULL)
+		return -ENOENT;
+
+	if (wait)
+		timeout_ms = kmod_get_refcnt_timeout(mod->ctx);
+
+	if (!wait || !timeout_ms)
+		return kmod_module_remove_module(mod, flags);
+
+	err_time = clock_gettime(CLOCK_MONOTONIC, &t1);
+	if (err_time != 0)
+		return kmod_module_remove_module(mod, flags);
+
+	/*
+	 * We now know an original patient removal was requested and
+	 * an associated timeout was also provided. timeout_removal_ms
+	 * should be set with what we are left with.
+	 */
+	timeout_ms = orig_timeout = kmod_get_removal_timeout(mod->ctx);
+	DBG(mod->ctx, "removal timeout is %d for '%s'\n", timeout_ms, mod->name);
+
+	/*
+	 * Filter out other flags and force ONONBLOCK
+	 *
+	 * This is true even for the patient removal. The wait flag was
+	 * a now deprecated flag, see kernel commits:
+	 *
+	 * 3f2b9c9cdf389e ("module: remove rmmod --wait option." on v3.13
+	 * 79465d2fd48e68 ("module: remove warning about waiting module removal.") on v3.15
+	 *
+	 * Patient removal is a userspace implementation.
+	 */
+	flags &= KMOD_REMOVE_FORCE;
+	flags |= KMOD_REMOVE_NOWAIT;
+
+	while (true) {
+		err = delete_module(mod->name, flags);
+		if (err == 0)
+			return 0;
+		err = -errno;
+
+		if (timeout_ms == 0) {
+			ERR(mod->ctx, "timeout (%d) trying to remove module immediately '%s': %m %s\n",
+			    timeout_ms, mod->name, strerror(errno));
+			return err;
+		}
+
+		/* Wait forever request */
+		if (timeout_ms < 0) {
+			if (!printed_msg) {
+				NOTICE(mod->ctx, "%s trying forever to remove...\n",
+				       mod->name);
+				printed_msg = true;
+			}
+			continue;
+		}
+
+		/* All below is when we know we have a timeout */
+
+		err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
+		if (err_time != 0) {
+			ERR(mod->ctx, "Using CLOCK_MONOTONIC failed, cannot wait, prior module removal module failed for '%s': %m %s\n",
+			    mod->name, strerror(errno));
+			return err;
+		}
+
+		t_delta = time_delta_ms(&t1, &t2);
+		DBG(mod->ctx, "%s removal time delta: %d ms\n", mod->name, t_delta);
+
+		if (t_delta >= orig_timeout) {
+			ERR(mod->ctx, "timeout (%d) trying to remove module '%s': %m %s\n",
+			    timeout_ms, mod->name, strerror(errno));
+			return err;
+		}
+
+		/*
+		 * We have more time to remove the module. The above check
+		 * also ensures that timeout_ms won't ever be 0.
+		 */
+		timeout_ms = orig_timeout - t_delta;
+
+		if (!printed_msg) {
+			NOTICE(mod->ctx, "%s removal failed. Re-trying for %d more ms (total timeout of %d ms)\n",
+			       mod->name, timeout_ms, orig_timeout);
+			t_print = t2;
+			printed_msg = true;
+		} else {
+			/*
+			 * Computes the delta between our last print
+			 * and the time of our current current
+			 * evaluation.
+			 */
+			t_delta_print = time_delta_ms(&t_print, &t2);
+			if (t_delta_print < time_between_prints)
+				continue;
+			NOTICE(mod->ctx, "%s removal failed. Re-trying for %d more ms (total timeout of %d ms)\n",
+			       mod->name, timeout_ms, orig_timeout);
+			t_print = t2;
+		}
+	}
+}
+
 extern long init_module(const void *mem, unsigned long len, const char *args);
 
 /**
@@ -1926,6 +2066,194 @@  KMOD_EXPORT int kmod_module_get_refcnt(const struct kmod_module *mod)
 	return (int)refcnt;
 }
 
+/**
+ * kmod_module_get_refcnt_wait:
+ * @mod: kmod module
+ * @wait: if true will wait until the refcnt is 0 patiently forever
+ *
+ * Get the ref count of this @mod, as returned by Linux Kernel, by reading
+ * /sys filesystem and wait patiently until the refcnt is 0, if the wait bool
+ * argument is set to true, otherwise this behaves just as the call
+ * kmod_module_get_refcnt(). Enabling wait is useful if you know for sure that
+ * the module is quiescing soon, and so you should be able to remove it soon.
+ * If wait is enabled, we wait a predetermined timeout setting which is
+ * is configurable with kmod_set_refcnt_timeout(), by default we wait
+ * forever.
+ *
+ * Returns: the reference count on success or < 0 on failure.
+ */
+KMOD_EXPORT int kmod_module_get_refcnt_wait(const struct kmod_module *mod, bool wait)
+{
+	struct pollfd fdset;
+	int err, err_time, num_fds=1, t_delta, t_delta_print, orig_timeout;
+	int time_between_prints = 500; /* 1/2 second */
+	int timeout_ms;
+	long refcnt;
+	struct timespec t1, t2, t_print = { .tv_sec = 0, .tv_nsec = 0};
+	bool printed_msg = false;
+
+	if (wait)
+		timeout_ms = kmod_get_refcnt_timeout(mod->ctx);
+
+	if (!wait || !timeout_ms)
+		return kmod_module_get_refcnt(mod);
+
+	orig_timeout = timeout_ms;
+
+	/*
+	 * The refcnt may be something like 2, and poll() may return when
+	 * the refcnt becomes 1, and so we must also check for the overall
+	 * time ran and on each instance make the timeout available shorter
+	 *
+	 * If using the CLOCK_MONOTONIC failed and we have a timeout set, we
+	 * simply cannot wait as we cannot trust time differences.
+	 */
+	err_time = clock_gettime(CLOCK_MONOTONIC, &t1);
+	if (err_time != 0) {
+		ERR(mod->ctx, "Using the CLOCK_MONOTONIC failed (%s) so processing removal for %s without a wait for the refcnt\n",
+		    strerror(errno), mod->name);
+		return kmod_module_get_refcnt(mod);
+	}
+
+	kmod_set_removal_timeout(mod->ctx, orig_timeout);
+
+	while (true) {
+		DBG(mod->ctx, "%s about to poll for refcnt using timeout %d ms\n", mod->name, timeout_ms);
+		memset(&fdset, 0, sizeof(struct pollfd));
+		fdset.events = POLLPRI | POLLIN;
+		fdset.fd = kmod_module_get_refcnt_fd(mod);
+		if (fdset.fd < 0)
+			return fdset.fd;
+		err = poll(&fdset, num_fds, timeout_ms);
+		if (err == -1) {
+			DBG(mod->ctx, "%s refcnt poll failed with err: %s\n", mod->name, strerror(errno));
+			goto err;
+		}
+		if (err == 0) { /* timeout before fds were read */
+			NOTICE(mod->ctx, "%s refcnt did not become 0 after %d ms\n",
+			       mod->name, orig_timeout);
+			kmod_set_removal_timeout(mod->ctx, 0);
+			err = -ETIMEDOUT;
+			goto err;
+		}
+		if (err != num_fds) { /* on success, we only asked for one fd */
+			err = -EINVAL;
+			goto err;
+		}
+		if (!fdset.revents) {
+			close(fdset.fd);
+			continue;
+		}
+		if ((!fdset.revents & POLLIN)) {
+			err = -EINVAL;
+			goto err;
+		}
+
+		err = read_str_long(fdset.fd, &refcnt, 10);
+		if (err < 0)
+			goto err;
+
+		close(fdset.fd);
+
+		if ((refcnt <= 0) || (refcnt > 0 && !wait)) {
+			NOTICE(mod->ctx, "%s refcnt is %d\n", mod->name, (int) refcnt);
+			err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
+			if (err_time != 0)
+				kmod_set_removal_timeout(mod->ctx, 0);
+			else {
+				t_delta = time_delta_ms(&t1, &t2);
+				/*
+				 * We only do adjustments for removal when an
+				 * original timeout was set. If we happen to
+				 * have run out of time but refcnt is 0 we'll
+				 * try removal once.
+				 */
+				if (timeout_ms <= 0)
+					kmod_set_removal_timeout(mod->ctx, timeout_ms);
+				if (t_delta >= orig_timeout)
+					kmod_set_removal_timeout(mod->ctx, 0);
+				kmod_set_removal_timeout(mod->ctx, orig_timeout - t_delta);
+			}
+			return (int) refcnt;
+		}
+
+		/*
+		 * At this point we know refcnt is greater than 0.
+		 */
+
+		err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
+		if (err_time != 0) {
+			ERR(mod->ctx, "Using the CLOCK_MONOTONIC failed a second time so processing removal for %s without a wait for the refcnt\n",
+			    mod->name);
+			kmod_set_removal_timeout(mod->ctx, 0);
+			return kmod_module_get_refcnt(mod);
+		}
+
+		t_delta = time_delta_ms(&t1, &t2);
+		DBG(mod->ctx, "%s refcnt delta: %d ms\n", mod->name, t_delta);
+
+		/*
+		 * When the timeout is negative we leave the the interpretation
+		 * of the used timeout value to poll() by design. Adjustments
+		 * only make sense if a sensible timeout was set. We dealt
+		 * with a 0 timeout at the beginning but we check here just
+		 * for posterity.
+		 */
+		if (!timeout_ms) {
+			kmod_set_removal_timeout(mod->ctx, 0);
+			return kmod_module_get_refcnt(mod);
+		}
+
+		if (timeout_ms < 0) {
+			if (!printed_msg) {
+				NOTICE(mod->ctx, "%s refcnt is %ld waiting forever for it to become 0\n",
+				       mod->name, refcnt);
+				printed_msg = true;
+				t_print = t2;
+			}
+			continue;
+		}
+
+		/* timeout_ms must be > 0 below */
+
+		if (t_delta >= orig_timeout) {
+			NOTICE(mod->ctx, "%s refcnt did not become 0 after a total of %d ms we waited %d ms\n",
+			       mod->name, orig_timeout, t_delta);
+			kmod_set_removal_timeout(mod->ctx, 0);
+			return -ETIMEDOUT;
+		}
+
+		/*
+		 * We have more time to run. The above check also ensure that
+		 * timeout_ms won't ever be 0.
+		 */
+		timeout_ms = orig_timeout - t_delta;
+
+		if (!printed_msg) {
+			NOTICE(mod->ctx, "%s refcnt is %ld waiting %d more ms (total timeout of %d ms) for it to become 0\n",
+			       mod->name, refcnt, timeout_ms, orig_timeout);
+			t_print = t2;
+			printed_msg = true;
+		} else {
+			/*
+			 * Computes the delta between our last print
+			 * and the time of our current current
+			 * evaluation.
+			 */
+			t_delta_print = time_delta_ms(&t_print, &t2);
+			if (t_delta_print < time_between_prints)
+				continue;
+			NOTICE(mod->ctx, "%s refcnt is %ld waiting %d more ms (total timeout of %d ms) for it to become 0\n",
+			       mod->name, refcnt, timeout_ms, orig_timeout);
+			t_print = t2;
+		}
+	}
+
+err:
+	close(fdset.fd);
+	return err;
+}
+
 /**
  * kmod_module_get_holders:
  * @mod: kmod module
diff --git a/libkmod/libkmod.c b/libkmod/libkmod.c
index 7c2b889..09615a6 100644
--- a/libkmod/libkmod.c
+++ b/libkmod/libkmod.c
@@ -77,6 +77,8 @@  static const char *default_config_paths[] = {
 struct kmod_ctx {
 	int refcount;
 	int log_priority;
+	int timeout_refcnt_ms;
+	int timeout_removal_ms;
 	void (*log_fn)(void *data,
 			int priority, const char *file, int line,
 			const char *fn, const char *format, va_list args);
@@ -264,6 +266,7 @@  KMOD_EXPORT struct kmod_ctx *kmod_new(const char *dirname,
 	ctx->log_fn = log_filep;
 	ctx->log_data = stderr;
 	ctx->log_priority = LOG_ERR;
+	ctx->timeout_refcnt_ms = -1; /* forever */
 
 	ctx->dirname = get_kernel_release(dirname);
 
@@ -395,6 +398,74 @@  KMOD_EXPORT void kmod_set_log_priority(struct kmod_ctx *ctx, int priority)
 	ctx->log_priority = priority;
 }
 
+/**
+ * kmod_get_refcnt_timeout
+ * @ctx: kmod library context
+ *
+ * Returns: the timeout for waiting for refcnt to become 0
+ */
+KMOD_EXPORT int kmod_get_refcnt_timeout(const struct kmod_ctx *ctx)
+{
+	if (ctx == NULL)
+		return -1;
+	return ctx->timeout_refcnt_ms;
+}
+
+/**
+ * kmod_set_refcnt_timeout_ms:
+ * @ctx: kmod library context
+ * @timeout_ms: time timeout to wait for the recnt to become 0 in ms
+ *
+ * Sets the timeout which will be used to wait for the refcnt to become
+ * 0 when trying to remove a module. This timeout is only used on library
+ * calls which explicitly document support for it.
+ *
+ * We use the timeout with poll(2) and so a negative means infinite, so we
+ * would run forever where used unless the refcnt becomes 0. A value of
+ * 0 means poll() will return immediately, even if the refcnt has not
+ * changed.
+ *
+ * If kmod_set_refcnt_timeout() is never called, the default timeout is
+ * forever, and so calls which rely on the timeout_refcnt_ms will wait
+ * forever until the refcnt becomes 0.
+ */
+KMOD_EXPORT void kmod_set_refcnt_timeout(struct kmod_ctx *ctx, int timeout_ms)
+{
+	if (ctx == NULL)
+		return;
+	ctx->timeout_refcnt_ms = timeout_ms;
+}
+
+/**
+ * kmod_get_removal_timeout
+ * @ctx: kmod library context
+ *
+ * Returns: the timeout for waiting remove the module.
+ */
+KMOD_EXPORT int kmod_get_removal_timeout(const struct kmod_ctx *ctx)
+{
+	if (ctx == NULL)
+		return -1;
+	return ctx->timeout_removal_ms;
+}
+
+/**
+ * kmod_set_removal_timeout_ms:
+ * @ctx: kmod library context
+ * @timeout_ms: time timeout to wait for trying to remove the module in ms
+ *
+ * Sets the timeout which will be used to try to keep trying to remove the
+ * module. A timeout of 0 means we just try once. A negative timeout means
+ * we try forever.
+ */
+KMOD_EXPORT void kmod_set_removal_timeout(struct kmod_ctx *ctx, int timeout_ms)
+{
+	if (ctx == NULL)
+		return;
+	ctx->timeout_removal_ms = timeout_ms;
+}
+
+
 struct kmod_module *kmod_pool_get_module(struct kmod_ctx *ctx,
 							const char *key)
 {
diff --git a/libkmod/libkmod.h b/libkmod/libkmod.h
index 3cab2e5..acf3c2a 100644
--- a/libkmod/libkmod.h
+++ b/libkmod/libkmod.h
@@ -48,6 +48,10 @@  void kmod_set_log_fn(struct kmod_ctx *ctx,
 			const void *data);
 int kmod_get_log_priority(const struct kmod_ctx *ctx);
 void kmod_set_log_priority(struct kmod_ctx *ctx, int priority);
+int kmod_get_refcnt_timeout(const struct kmod_ctx *ctx);
+void kmod_set_refcnt_timeout(struct kmod_ctx *ctx, int timeout_ms);
+int kmod_get_removal_timeout(const struct kmod_ctx *ctx);
+void kmod_set_removal_timeout(struct kmod_ctx *ctx, int timeout_ms);
 void *kmod_get_userdata(const struct kmod_ctx *ctx);
 void kmod_set_userdata(struct kmod_ctx *ctx, const void *userdata);
 
@@ -172,6 +176,8 @@  enum kmod_filter {
 };
 
 int kmod_module_remove_module(struct kmod_module *mod, unsigned int flags);
+int kmod_module_remove_module_wait(struct kmod_module *mod, unsigned int flags,
+				   bool wait);
 int kmod_module_insert_module(struct kmod_module *mod, unsigned int flags,
 							const char *options);
 int kmod_module_probe_insert_module(struct kmod_module *mod,
@@ -217,6 +223,7 @@  enum kmod_module_initstate {
 const char *kmod_module_initstate_str(enum kmod_module_initstate state);
 int kmod_module_get_initstate(const struct kmod_module *mod);
 int kmod_module_get_refcnt(const struct kmod_module *mod);
+int kmod_module_get_refcnt_wait(const struct kmod_module *mod, bool wait);
 struct kmod_list *kmod_module_get_holders(const struct kmod_module *mod);
 struct kmod_list *kmod_module_get_sections(const struct kmod_module *mod);
 const char *kmod_module_section_get_name(const struct kmod_list *entry);
diff --git a/libkmod/libkmod.sym b/libkmod/libkmod.sym
index 5f5e1fb..8802784 100644
--- a/libkmod/libkmod.sym
+++ b/libkmod/libkmod.sym
@@ -6,6 +6,8 @@  global:
 	kmod_ref;
 	kmod_set_log_fn;
 	kmod_set_log_priority;
+	kmod_get_refcnt_timeout_ms;
+	kmod_set_refcnt_timeout;
 	kmod_set_userdata;
 	kmod_unref;
 	kmod_list_next;
@@ -36,6 +38,7 @@  global:
 	kmod_module_unref_list;
 	kmod_module_get_module;
 	kmod_module_remove_module;
+	kmod_module_remove_module_wait;
 	kmod_module_insert_module;
 	kmod_module_probe_insert_module;
 
@@ -49,6 +52,7 @@  global:
 	kmod_module_initstate_str;
 	kmod_module_get_initstate;
 	kmod_module_get_refcnt;
+	kmod_module_get_refcnt_wait;
 	kmod_module_get_sections;
 	kmod_module_section_free_list;
 	kmod_module_section_get_name;
diff --git a/libkmod/python/kmod/_libkmod_h.pxd b/libkmod/python/kmod/_libkmod_h.pxd
index 7191953..29ba3e7 100644
--- a/libkmod/python/kmod/_libkmod_h.pxd
+++ b/libkmod/python/kmod/_libkmod_h.pxd
@@ -79,6 +79,8 @@  cdef extern from 'libkmod/libkmod.h':
 
     int kmod_module_remove_module(
         kmod_module *mod, unsigned int flags)
+    int kmod_module_remove_module_wait(
+        kmod_module *mod, unsigned int flags, bool wait)
     int kmod_module_insert_module(
         kmod_module *mod, unsigned int flags, const_char_ptr options)
     int kmod_module_probe_insert_module(
@@ -99,6 +101,7 @@  cdef extern from 'libkmod/libkmod.h':
     # Information regarding "live information" from module's state, as
     # returned by kernel
     int kmod_module_get_refcnt(const_kmod_module_ptr mod)
+    int kmod_module_get_refcnt_wait(const_kmod_module_ptr mod, bool install)
     long kmod_module_get_size(const_kmod_module_ptr mod)
 
     # Information retrieved from ELF headers and section
diff --git a/libkmod/python/kmod/module.pyx b/libkmod/python/kmod/module.pyx
index 42aa92e..45fe774 100644
--- a/libkmod/python/kmod/module.pyx
+++ b/libkmod/python/kmod/module.pyx
@@ -72,6 +72,10 @@  cdef class Module (object):
         return _libkmod_h.kmod_module_get_refcnt(self.module)
     refcnt = property(fget=_refcnt_get)
 
+    def _refcnt_get_wait(self, wait=False):
+        return _libkmod_h.kmod_module_get_refcnt_wait(self.module, wait)
+    refcnt = property(fget=_refcnt_get_wait)
+
     def _size_get(self):
         return _libkmod_h.kmod_module_get_size(self.module)
     size = property(fget=_size_get)
diff --git a/man/modprobe.xml b/man/modprobe.xml
index 0372b6b..592276a 100644
--- a/man/modprobe.xml
+++ b/man/modprobe.xml
@@ -388,6 +388,65 @@ 
           </para>
         </listitem>
       </varlistentry>
+      <varlistentry>
+        <term>
+          <option>-p</option>
+        </term>
+        <term>
+          <option>--remove-patiently</option>
+        </term>
+        <listitem>
+          <para>
+            This option causes <command>modprobe</command> to try to patiently
+	    remove a module by waiting until its refcnt is 0 and then trying to
+	    remove it. The patient removal polls the module's refcnt sysfs file,
+	    typically /sys/module/module/refcnt, for changes and will end
+	    polling this file if it becomes 0, an error has ocurred, or we
+	    ran out of time waiting for the refcnt.
+
+	    A programmable timeout is available to be set, which modifies how
+	    long modprobe will wait for the refcnt to become 0. By default the
+	    timeout is set to wait forever. Refer to the -t option below for
+	    more information on the setting the timeout.
+
+	    Once the refcnt is found to be 0, <command>modprobe</command>
+	    will try to use the remaining time left from our progammed timeout
+	    to remove the module in a busy loop. By default we loop forever
+	    trying to remove the module when the patient module remover is
+	    used.
+          </para>
+          <para>
+            Removing modules may be done by test infrastruture code, it can also
+            be done to remove a live kernel patch.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term>
+          <option>-t</option>
+        </term>
+        <term>
+          <option>--timeout</option>
+        </term>
+        <listitem>
+          <para>
+            This option provides the timeout in millisecond for module removal.
+            The timeout is used first to wait for the refcnt to become 0, and
+            once it is found to be 0 it will use the remaining time from the
+            provided timeout trying to remove the module. A timeout is ignored
+            unless the patient removal option is used, -p. A timeout of 0
+            means no timeout is used. A negative timeout means we wait forever.
+            Systems where clock_gettime(CLOCK_MONOTONIC, t) does not work will
+            have the timeout ignored and removal will be attempted in one shot
+            just as if <command>modprobe -r</command> was called.
+          </para>
+          <para>
+            Removing modules may be done by test infrastruture code, it can also
+            be done to remove a live kernel patch if supported
+            by your distribution.
+          </para>
+        </listitem>
+      </varlistentry>
       <varlistentry>
         <term>
           <option>-S</option>
diff --git a/man/rmmod.xml b/man/rmmod.xml
index e7c7e5f..1b64f12 100644
--- a/man/rmmod.xml
+++ b/man/rmmod.xml
@@ -39,6 +39,7 @@ 
     <cmdsynopsis>
       <command>rmmod</command>
       <arg><option>-f</option></arg>
+      <arg><option>-p</option></arg>
       <arg><option>-s</option></arg>
       <arg><option>-v</option></arg>
       <arg><replaceable>modulename</replaceable></arg>
@@ -92,6 +93,65 @@ 
           </para>
         </listitem>
       </varlistentry>
+      <varlistentry>
+        <term>
+          <option>-p</option>
+        </term>
+        <term>
+          <option>--remove-patiently</option>
+        </term>
+        <listitem>
+          <para>
+            This option causes <command>modprobe</command> to try to patiently
+	    remove a module by waiting until its refcnt is 0 and then trying to
+	    remove it. The patient removal polls the module's refcnt sysfs file,
+	    typically /sys/module/module/refcnt, for changes and will end
+	    polling this file if it becomes 0, an error has ocurred, or we
+	    ran out of time waiting for the refcnt.
+
+	    A programmable timeout is available to be set, which modifies how
+	    long modprobe will wait for the refcnt to become 0. By default the
+	    timeout is set to wait forever. Refer to the -t option below for
+	    more information on the setting the timeout.
+
+	    Once the refcnt is found to be 0, <command>modprobe</command>
+	    will try to use the remaining time left from our progammed timeout
+	    to remove the module in a busy loop. By default we loop forever
+	    trying to remove the module when the patient module remover is
+	    used.
+          </para>
+          <para>
+            Removing modules may be done by test infrastruture code, it can also
+            be done to remove a live kernel patch if supported
+            by your distribution.
+          </para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term>
+          <option>-t</option>
+        </term>
+        <term>
+          <option>--timeout</option>
+        </term>
+        <listitem>
+          <para>
+            This option provides the timeout in millisecond for module removal.
+            The timeout is used first to wait for the refcnt to become 0, and
+            once it is found to be 0 it will use the remaining time from the
+            provided timeout trying to remove the module. A timeout is ignored
+            unless the patient removal option is used, -p. A timeout of 0
+            means no timeout is used. A negative timeout means we wait forever.
+            Systems where clock_gettime(CLOCK_MONOTONIC, t) does not work will
+            have the timeout ignored and removal will be attempted in one shot
+            just as if <command>modprobe -r</command> was called.
+          </para>
+          <para>
+            Removing modules may be done by test infrastruture code, it can also
+            be done to remove a live kernel patch.
+          </para>
+        </listitem>
+      </varlistentry>
       <varlistentry>
         <term>
           <option>-s</option>
diff --git a/tools/modprobe.c b/tools/modprobe.c
index 9387537..bf11dd3 100644
--- a/tools/modprobe.c
+++ b/tools/modprobe.c
@@ -56,11 +56,14 @@  static int strip_modversion = 0;
 static int strip_vermagic = 0;
 static int remove_dependencies = 0;
 static int quiet_inuse = 0;
+static int do_remove_patient = 0;
+static int use_timeout_ms = 0;
 
-static const char cmdopts_s[] = "arRibfDcnC:d:S:sqvVh";
+static const char cmdopts_s[] = "arpRibfDcnC:d:S:st:qvVh";
 static const struct option cmdopts[] = {
 	{"all", no_argument, 0, 'a'},
 	{"remove", no_argument, 0, 'r'},
+	{"remove-patiently", no_argument, 0, 'p'},
 	{"remove-dependencies", no_argument, 0, 5},
 	{"resolve-alias", no_argument, 0, 'R'},
 	{"first-time", no_argument, 0, 3},
@@ -81,6 +84,7 @@  static const struct option cmdopts[] = {
 	{"dry-run", no_argument, 0, 'n'},
 	{"show", no_argument, 0, 'n'},
 
+	{"timeout", required_argument, 0, 't'},
 	{"config", required_argument, 0, 'C'},
 	{"dirname", required_argument, 0, 'd'},
 	{"set-version", required_argument, 0, 'S'},
@@ -107,6 +111,8 @@  static void help(void)
 		"\t                            be a module name to be inserted\n"
 		"\t                            or removed (-r)\n"
 		"\t-r, --remove                Remove modules instead of inserting\n"
+		"\t-p, --remove-patiently      Patiently wait until the refcnt is 0 to remove\n"
+		"\t-t, --timeout               Timeout in ms to use for patient removal\n"
 		"\t    --remove-dependencies   Also remove modules depending on it\n"
 		"\t-R, --resolve-alias         Only lookup and print alias and exit\n"
 		"\t    --first-time            Fail if module already inserted or removed\n"
@@ -331,7 +337,7 @@  static int rmmod_do_remove_module(struct kmod_module *mod)
 	if (force)
 		flags |= KMOD_REMOVE_FORCE;
 
-	err = kmod_module_remove_module(mod, flags);
+	err = kmod_module_remove_module_wait(mod, flags, do_remove_patient);
 	if (err == -EEXIST) {
 		if (!first_time)
 			err = 0;
@@ -424,7 +430,7 @@  static int rmmod_do_module(struct kmod_module *mod, int flags)
 	}
 
 	if (!ignore_loaded && !cmd) {
-		int usage = kmod_module_get_refcnt(mod);
+		int usage = kmod_module_get_refcnt_wait(mod, do_remove_patient);
 
 		if (usage > 0) {
 			if (!quiet_inuse)
@@ -785,6 +791,10 @@  static int do_modprobe(int argc, char **orig_argv)
 		case 'r':
 			do_remove = 1;
 			break;
+		case 'p':
+			do_remove = 1;
+			do_remove_patient = 1;
+			break;
 		case 5:
 			remove_dependencies = 1;
 			break;
@@ -843,6 +853,9 @@  static int do_modprobe(int argc, char **orig_argv)
 			env_modprobe_options_append(optarg);
 			break;
 		}
+		case 't':
+			use_timeout_ms = atoi(optarg);
+			break;
 		case 'd':
 			root = optarg;
 			break;
@@ -921,6 +934,8 @@  static int do_modprobe(int argc, char **orig_argv)
 	log_setup_kmod_log(ctx, verbose);
 
 	kmod_load_resources(ctx);
+	if (use_timeout_ms)
+		kmod_set_refcnt_timeout(ctx, use_timeout_ms);
 
 	if (do_show_config)
 		err = show_config(ctx);
diff --git a/tools/remove.c b/tools/remove.c
index 387ef0e..8436c95 100644
--- a/tools/remove.c
+++ b/tools/remove.c
@@ -27,9 +27,12 @@ 
 
 #include "kmod.h"
 
-static const char cmdopts_s[] = "h";
+static int do_remove_patient = 0;
+
+static const char cmdopts_s[] = "hp";
 static const struct option cmdopts[] = {
 	{"help", no_argument, 0, 'h'},
+	{"remove-patiently", no_argument, 0, 'p'},
 	{ }
 };
 
@@ -74,7 +77,7 @@  static int check_module_inuse(struct kmod_module *mod) {
 		return -EBUSY;
 	}
 
-	ret = kmod_module_get_refcnt(mod);
+	ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);
 	if (ret > 0) {
 		ERR("Module %s is in use\n", kmod_module_get_name(mod));
 		return -EBUSY;
@@ -101,6 +104,9 @@  static int do_remove(int argc, char *argv[])
 		case 'h':
 			help();
 			return EXIT_SUCCESS;
+		case 'p':
+			do_remove_patient = 1;
+			break;
 
 		default:
 			ERR("Unexpected getopt_long() value '%c'.\n", c);
@@ -130,7 +136,7 @@  static int do_remove(int argc, char *argv[])
 	if (err < 0)
 		goto unref;
 
-	err = kmod_module_remove_module(mod, 0);
+	err = kmod_module_remove_module_wait(mod, 0, do_remove_patient);
 	if (err < 0)
 		goto unref;
 
diff --git a/tools/rmmod.c b/tools/rmmod.c
index 3942e7b..77c813a 100644
--- a/tools/rmmod.c
+++ b/tools/rmmod.c
@@ -35,10 +35,12 @@ 
 #define DEFAULT_VERBOSE LOG_ERR
 static int verbose = DEFAULT_VERBOSE;
 static int use_syslog;
+static int do_remove_patient;
 
-static const char cmdopts_s[] = "fsvVwh";
+static const char cmdopts_s[] = "fpsvVwh";
 static const struct option cmdopts[] = {
 	{"force", no_argument, 0, 'f'},
+	{"remove-patiently", no_argument, 0, 'p'},
 	{"syslog", no_argument, 0, 's'},
 	{"verbose", no_argument, 0, 'v'},
 	{"version", no_argument, 0, 'V'},
@@ -51,13 +53,14 @@  static void help(void)
 	printf("Usage:\n"
 		"\t%s [options] modulename ...\n"
 		"Options:\n"
-		"\t-f, --force       forces a module unload and may crash your\n"
-		"\t                  machine. This requires Forced Module Removal\n"
-		"\t                  option in your kernel. DANGEROUS\n"
-		"\t-s, --syslog      print to syslog, not stderr\n"
-		"\t-v, --verbose     enables more messages\n"
-		"\t-V, --version     show version\n"
-		"\t-h, --help        show this help\n",
+		"\t-f, --force            forces a module unload and may crash your\n"
+		"\t                       machine. This requires Forced Module Removal\n"
+		"\t                       option in your kernel. DANGEROUS\n"
+		"\t-p, --remove-patiently patiently tries to remove the module\n"
+		"\t-s, --syslog           print to syslog, not stderr\n"
+		"\t-v, --verbose          enables more messages\n"
+		"\t-V, --version          show version\n"
+		"\t-h, --help             show this help\n",
 		program_invocation_short_name);
 }
 
@@ -93,7 +96,7 @@  static int check_module_inuse(struct kmod_module *mod) {
 		return -EBUSY;
 	}
 
-	ret = kmod_module_get_refcnt(mod);
+	ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);
 	if (ret > 0) {
 		ERR("Module %s is in use\n", kmod_module_get_name(mod));
 		return -EBUSY;
@@ -120,6 +123,9 @@  static int do_rmmod(int argc, char *argv[])
 		case 'f':
 			flags |= KMOD_REMOVE_FORCE;
 			break;
+		case 'p':
+			do_remove_patient = 1;
+			break;
 		case 's':
 			use_syslog = 1;
 			break;
@@ -178,7 +184,8 @@  static int do_rmmod(int argc, char *argv[])
 			goto next;
 		}
 
-		err = kmod_module_remove_module(mod, flags);
+		err = kmod_module_remove_module_wait(mod, flags,
+						     do_remove_patient);
 		if (err < 0) {
 			ERR("could not remove module %s: %s\n", arg,
 			    strerror(-err));