Message ID | 20240718184311.3950526-3-leitao@debian.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | netconsole: Fix netconsole unsafe locking | expand |
On Thu, 2024-07-18 at 11:43 -0700, Breno Leitao wrote: > > +/* Clean up every target in the cleanup_list and move the clean > targets back to the > + * main target_list. > + */ > +static void netconsole_process_cleanups_core(void) > +{ > + struct netconsole_target *nt, *tmp; > + unsigned long flags; > + > + /* The cleanup needs RTNL locked */ > + ASSERT_RTNL(); > + > + mutex_lock(&target_cleanup_list_lock); > + list_for_each_entry_safe(nt, tmp, &target_cleanup_list, > list) { > + /* all entries in the cleanup_list needs to be > disabled */ > + WARN_ON_ONCE(nt->enabled); > + do_netpoll_cleanup(&nt->np); > + /* moved the cleaned target to target_list. Need to > hold both locks */ > + spin_lock_irqsave(&target_list_lock, flags); > + list_move(&nt->list, &target_list); > + spin_unlock_irqrestore(&target_list_lock, flags); > + } > + WARN_ON_ONCE(!list_empty(&target_cleanup_list)); > + mutex_unlock(&target_cleanup_list_lock); > +} > + > +/* Do the list cleanup with the rtnl lock hold */ > +static void netconsole_process_cleanups(void) > +{ > + rtnl_lock(); > + netconsole_process_cleanups_core(); > + rtnl_unlock(); > +} > I've got what may be a dumb question. If the traversal of the target_cleanup_list happens under the rtnl_lock, why do you need a new lock, and why is there a wrapper function that only takes this one lock, and then calls the other function? Are you planning a user of netconsole_process_cleanups_core() that already holds the rtnl_lock and should not use this wrapper? Also, the comment does not explain why the rtnl_lock is held. We can see that it grabs it, but not why. It would be nice to have that in the comment.
Hello Rik, On Thu, Jul 18, 2024 at 03:53:54PM -0400, Rik van Riel wrote: > On Thu, 2024-07-18 at 11:43 -0700, Breno Leitao wrote: > > > > +/* Clean up every target in the cleanup_list and move the clean > > targets back to the > > + * main target_list. > > + */ > > +static void netconsole_process_cleanups_core(void) > > +{ > > + struct netconsole_target *nt, *tmp; > > + unsigned long flags; > > + > > + /* The cleanup needs RTNL locked */ > > + ASSERT_RTNL(); > > + > > + mutex_lock(&target_cleanup_list_lock); > > + list_for_each_entry_safe(nt, tmp, &target_cleanup_list, > > list) { > > + /* all entries in the cleanup_list needs to be > > disabled */ > > + WARN_ON_ONCE(nt->enabled); > > + do_netpoll_cleanup(&nt->np); > > + /* moved the cleaned target to target_list. Need to > > hold both locks */ > > + spin_lock_irqsave(&target_list_lock, flags); > > + list_move(&nt->list, &target_list); > > + spin_unlock_irqrestore(&target_list_lock, flags); > > + } > > + WARN_ON_ONCE(!list_empty(&target_cleanup_list)); > > + mutex_unlock(&target_cleanup_list_lock); > > +} > > + > > +/* Do the list cleanup with the rtnl lock hold */ > > +static void netconsole_process_cleanups(void) > > +{ > > + rtnl_lock(); > > + netconsole_process_cleanups_core(); > > + rtnl_unlock(); > > +} > > First of all, thanks for reviewing this patch. > I've got what may be a dumb question. > > If the traversal of the target_cleanup_list happens under > the rtnl_lock, why do you need a new lock. Because the lock protect the target_cleanup_list list, and in some cases, the list is accessed outside of the region that holds the `rtnl` locks. For instance, enabled_store() is a function that is called from user space (through confifs). This function needs to populate target_cleanup_list (for targets that are being disabled). This code path does NOT has rtnl at all. > and why is there > a wrapper function that only takes this one lock, and then > calls the other function? I assume that the network cleanup needs to hold rtnl, since it is going to release a network interface. Thus, __netpoll_cleanup() needs to be called protected by rtnl lock. That said, netconsole calls `__netpoll_cleanup()` indirectly through 2 different code paths. 1) From enabled_store() -- userspace disabling the interface from configfs. * This code path does not have `rtnl` held, thus, it needs to be held along the way. 2) From netconsole_netdev_event() -- A network event callback * This function is called with `rtnl` held, thus, no need to acquire it anymore. > Are you planning a user of netconsole_process_cleanups_core() > that already holds the rtnl_lock and should not use this > wrapper? In fact, this patch is already using it today. See its invocation from netconsole_netdev_event(). > Also, the comment does not explain why the rtnl_lock is held. > We can see that it grabs it, but not why. It would be nice to > have that in the comment. Agree. I will add this comment in my changes. Thank you! --breno
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 9c09293b5258..0515fee5a2d1 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -37,6 +37,7 @@ #include <linux/configfs.h> #include <linux/etherdevice.h> #include <linux/utsname.h> +#include <linux/rtnetlink.h> MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>"); MODULE_DESCRIPTION("Console driver for network interfaces"); @@ -72,9 +73,16 @@ __setup("netconsole=", option_setup); /* Linked list of all configured targets */ static LIST_HEAD(target_list); +/* target_cleanup_list is used to track targets that need to be cleaned outside + * of target_list_lock. It should be cleaned in the same function it is + * populated. + */ +static LIST_HEAD(target_cleanup_list); /* This needs to be a spinlock because write_msg() cannot sleep */ static DEFINE_SPINLOCK(target_list_lock); +/* This needs to be a mutex because netpoll_cleanup might sleep */ +static DEFINE_MUTEX(target_cleanup_list_lock); /* * Console driver for extended netconsoles. Registered on the first use to @@ -323,6 +331,39 @@ static ssize_t remote_mac_show(struct config_item *item, char *buf) return sysfs_emit(buf, "%pM\n", to_target(item)->np.remote_mac); } +/* Clean up every target in the cleanup_list and move the clean targets back to the + * main target_list. + */ +static void netconsole_process_cleanups_core(void) +{ + struct netconsole_target *nt, *tmp; + unsigned long flags; + + /* The cleanup needs RTNL locked */ + ASSERT_RTNL(); + + mutex_lock(&target_cleanup_list_lock); + list_for_each_entry_safe(nt, tmp, &target_cleanup_list, list) { + /* all entries in the cleanup_list needs to be disabled */ + WARN_ON_ONCE(nt->enabled); + do_netpoll_cleanup(&nt->np); + /* moved the cleaned target to target_list. Need to hold both locks */ + spin_lock_irqsave(&target_list_lock, flags); + list_move(&nt->list, &target_list); + spin_unlock_irqrestore(&target_list_lock, flags); + } + WARN_ON_ONCE(!list_empty(&target_cleanup_list)); + mutex_unlock(&target_cleanup_list_lock); +} + +/* Do the list cleanup with the rtnl lock hold */ +static void netconsole_process_cleanups(void) +{ + rtnl_lock(); + netconsole_process_cleanups_core(); + rtnl_unlock(); +} + /* * This one is special -- targets created through the configfs interface * are not enabled (and the corresponding netpoll activated) by default. @@ -376,12 +417,20 @@ static ssize_t enabled_store(struct config_item *item, * otherwise we might end up in write_msg() with * nt->np.dev == NULL and nt->enabled == true */ + mutex_lock(&target_cleanup_list_lock); spin_lock_irqsave(&target_list_lock, flags); nt->enabled = false; + /* Remove the target from the list, while holding + * target_list_lock + */ + list_move(&nt->list, &target_cleanup_list); spin_unlock_irqrestore(&target_list_lock, flags); - netpoll_cleanup(&nt->np); + mutex_unlock(&target_cleanup_list_lock); } + /* Deferred cleanup */ + netconsole_process_cleanups(); + mutex_unlock(&dynamic_netconsole_mutex); return strnlen(buf, count); out_unlock: @@ -950,7 +999,7 @@ static int netconsole_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) { unsigned long flags; - struct netconsole_target *nt; + struct netconsole_target *nt, *tmp; struct net_device *dev = netdev_notifier_info_to_dev(ptr); bool stopped = false; @@ -958,9 +1007,9 @@ static int netconsole_netdev_event(struct notifier_block *this, event == NETDEV_RELEASE || event == NETDEV_JOIN)) goto done; + mutex_lock(&target_cleanup_list_lock); spin_lock_irqsave(&target_list_lock, flags); -restart: - list_for_each_entry(nt, &target_list, list) { + list_for_each_entry_safe(nt, tmp, &target_list, list) { netconsole_target_get(nt); if (nt->np.dev == dev) { switch (event) { @@ -970,25 +1019,16 @@ static int netconsole_netdev_event(struct notifier_block *this, case NETDEV_RELEASE: case NETDEV_JOIN: case NETDEV_UNREGISTER: - /* rtnl_lock already held - * we might sleep in __netpoll_cleanup() - */ nt->enabled = false; - spin_unlock_irqrestore(&target_list_lock, flags); - - __netpoll_cleanup(&nt->np); - - spin_lock_irqsave(&target_list_lock, flags); - netdev_put(nt->np.dev, &nt->np.dev_tracker); - nt->np.dev = NULL; + list_move(&nt->list, &target_cleanup_list); stopped = true; - netconsole_target_put(nt); - goto restart; } } netconsole_target_put(nt); } spin_unlock_irqrestore(&target_list_lock, flags); + mutex_unlock(&target_cleanup_list_lock); + if (stopped) { const char *msg = "had an event"; @@ -1007,6 +1047,11 @@ static int netconsole_netdev_event(struct notifier_block *this, dev->name, msg); } + /* Process target_cleanup_list entries. By the end, target_cleanup_list + * should be empty + */ + netconsole_process_cleanups_core(); + done: return NOTIFY_DONE; }
Current issue: - The `target_list_lock` spinlock is held while iterating over target_list() entries. - Mid-loop, the lock is released to call __netpoll_cleanup(), then reacquired. - This practice compromises the protection provided by `target_list_lock`. Reason for current design: 1. __netpoll_cleanup() may sleep, incompatible with holding a spinlock. 2. target_list_lock must be a spinlock because write_msg() cannot sleep. (See commit b5427c27173e ("[NET] netconsole: Support multiple logging targets")) Defer the cleanup of the netpoll structure to outside the target_list_lock() protected area. Create another list (target_cleanup_list) to hold the entries that need to be cleaned up, and clean them using a mutex (target_cleanup_list_lock). Signed-off-by: Breno Leitao <leitao@debian.org> --- drivers/net/netconsole.c | 77 +++++++++++++++++++++++++++++++--------- 1 file changed, 61 insertions(+), 16 deletions(-)