diff mbox series

[net-next] net-sysfs: try not to restart the syscall if it will fail eventually

Message ID 20211007140051.297963-1-atenart@kernel.org (mailing list archive)
State Accepted
Commit 146e5e733310379f51924111068f08a3af0db830
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net-sysfs: try not to restart the syscall if it will fail eventually | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 3 maintainers not CCed: hannes@stressinduktion.org weiwan@google.com alexanderduyck@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 103 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Antoine Tenart Oct. 7, 2021, 2 p.m. UTC
Due to deadlocks in the networking subsystem spotted 12 years ago[1],
a workaround was put in place[2] to avoid taking the rtnl lock when it
was not available and restarting the syscall (back to VFS, letting
userspace spin). The following construction is found a lot in the net
sysfs and sysctl code:

  if (!rtnl_trylock())
          return restart_syscall();

This can be problematic when multiple userspace threads use such
interfaces in a short period, making them to spin a lot. This happens
for example when adding and moving virtual interfaces: userspace
programs listening on events, such as systemd-udevd and NetworkManager,
do trigger actions reading files in sysfs. It gets worse when a lot of
virtual interfaces are created concurrently, say when creating
containers at boot time.

Returning early without hitting the above pattern when the syscall will
fail eventually does make things better. While it is not a fix for the
issue, it does ease things.

[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
    https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
    and https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
[2] Rightfully, those deadlocks are *hard* to solve.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---

Hello,

This patch comes from an RFC series[3] but wasn't strictly linked to the
other patches. As it helps at reducing the scope of the issue, it is
re-posted separately (as suggested by Jakub).

Since the RFC, this patch now also tries to avoid the trylock logic for
carrier_store and proto_down_store.

FYI, I haven't changed the behaviour of duplex_show and speed_show,
returning -EINVAL when the op is not supported. I can sent a
following-up patch to change that.

Thanks,
Antoine

[3] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/

 net/core/net-sysfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

Comments

Paolo Abeni Oct. 7, 2021, 2:16 p.m. UTC | #1
On Thu, 2021-10-07 at 16:00 +0200, Antoine Tenart wrote:
> Due to deadlocks in the networking subsystem spotted 12 years ago[1],
> a workaround was put in place[2] to avoid taking the rtnl lock when it
> was not available and restarting the syscall (back to VFS, letting
> userspace spin). The following construction is found a lot in the net
> sysfs and sysctl code:
> 
>   if (!rtnl_trylock())
>           return restart_syscall();
> 
> This can be problematic when multiple userspace threads use such
> interfaces in a short period, making them to spin a lot. This happens
> for example when adding and moving virtual interfaces: userspace
> programs listening on events, such as systemd-udevd and NetworkManager,
> do trigger actions reading files in sysfs. It gets worse when a lot of
> virtual interfaces are created concurrently, say when creating
> containers at boot time.
> 
> Returning early without hitting the above pattern when the syscall will
> fail eventually does make things better. While it is not a fix for the
> issue, it does ease things.
> 
> [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
>     https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/
>     and https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/
> [2] Rightfully, those deadlocks are *hard* to solve.
> 
> Signed-off-by: Antoine Tenart <atenart@kernel.org>

AFAICS, the current behaviour is preserved and the change is safe. I
think that preserving the current error-code for duplex_show and
speed_show is the correct thing to do.
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
patchwork-bot+netdevbpf@kernel.org Oct. 8, 2021, 2:30 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu,  7 Oct 2021 16:00:51 +0200 you wrote:
> Due to deadlocks in the networking subsystem spotted 12 years ago[1],
> a workaround was put in place[2] to avoid taking the rtnl lock when it
> was not available and restarting the syscall (back to VFS, letting
> userspace spin). The following construction is found a lot in the net
> sysfs and sysctl code:
> 
>   if (!rtnl_trylock())
>           return restart_syscall();
> 
> [...]

Here is the summary with links:
  - [net-next] net-sysfs: try not to restart the syscall if it will fail eventually
    https://git.kernel.org/netdev/net-next/c/146e5e733310

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index f6197774048b..5f6b421d06a1 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -175,6 +175,14 @@  static int change_carrier(struct net_device *dev, unsigned long new_carrier)
 static ssize_t carrier_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
+	struct net_device *netdev = to_net_dev(dev);
+
+	/* The check is also done in change_carrier; this helps returning early
+	 * without hitting the trylock/restart in netdev_store.
+	 */
+	if (!netdev->netdev_ops->ndo_change_carrier)
+		return -EOPNOTSUPP;
+
 	return netdev_store(dev, attr, buf, len, change_carrier);
 }
 
@@ -196,6 +204,12 @@  static ssize_t speed_show(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	int ret = -EINVAL;
 
+	/* The check is also done in __ethtool_get_link_ksettings; this helps
+	 * returning early without hitting the trylock/restart below.
+	 */
+	if (!netdev->ethtool_ops->get_link_ksettings)
+		return ret;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -216,6 +230,12 @@  static ssize_t duplex_show(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	int ret = -EINVAL;
 
+	/* The check is also done in __ethtool_get_link_ksettings; this helps
+	 * returning early without hitting the trylock/restart below.
+	 */
+	if (!netdev->ethtool_ops->get_link_ksettings)
+		return ret;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -468,6 +488,14 @@  static ssize_t proto_down_store(struct device *dev,
 				struct device_attribute *attr,
 				const char *buf, size_t len)
 {
+	struct net_device *netdev = to_net_dev(dev);
+
+	/* The check is also done in change_proto_down; this helps returning
+	 * early without hitting the trylock/restart in netdev_store.
+	 */
+	if (!netdev->netdev_ops->ndo_change_proto_down)
+		return -EOPNOTSUPP;
+
 	return netdev_store(dev, attr, buf, len, change_proto_down);
 }
 NETDEVICE_SHOW_RW(proto_down, fmt_dec);
@@ -478,6 +506,12 @@  static ssize_t phys_port_id_show(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	ssize_t ret = -EINVAL;
 
+	/* The check is also done in dev_get_phys_port_id; this helps returning
+	 * early without hitting the trylock/restart below.
+	 */
+	if (!netdev->netdev_ops->ndo_get_phys_port_id)
+		return -EOPNOTSUPP;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -500,6 +534,13 @@  static ssize_t phys_port_name_show(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	ssize_t ret = -EINVAL;
 
+	/* The checks are also done in dev_get_phys_port_name; this helps
+	 * returning early without hitting the trylock/restart below.
+	 */
+	if (!netdev->netdev_ops->ndo_get_phys_port_name &&
+	    !netdev->netdev_ops->ndo_get_devlink_port)
+		return -EOPNOTSUPP;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -522,6 +563,14 @@  static ssize_t phys_switch_id_show(struct device *dev,
 	struct net_device *netdev = to_net_dev(dev);
 	ssize_t ret = -EINVAL;
 
+	/* The checks are also done in dev_get_phys_port_name; this helps
+	 * returning early without hitting the trylock/restart below. This works
+	 * because recurse is false when calling dev_get_port_parent_id.
+	 */
+	if (!netdev->netdev_ops->ndo_get_port_parent_id &&
+	    !netdev->netdev_ops->ndo_get_devlink_port)
+		return -EOPNOTSUPP;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
@@ -1226,6 +1275,12 @@  static ssize_t tx_maxrate_store(struct netdev_queue *queue,
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	/* The check is also done later; this helps returning early without
+	 * hitting the trylock/restart below.
+	 */
+	if (!dev->netdev_ops->ndo_set_tx_maxrate)
+		return -EOPNOTSUPP;
+
 	err = kstrtou32(buf, 10, &rate);
 	if (err < 0)
 		return err;