diff mbox series

[net] net: core: synchronize link-watch when carrier is queried

Message ID 20231204214706.303c62768415.I1caedccae72ee5a45c9085c5eb49c145ce1c0dd5@changeid (mailing list archive)
State Accepted
Commit facd15dfd69122042502d99ab8c9f888b48ee994
Delegated to: Netdev Maintainers
Headers show
Series [net] net: core: synchronize link-watch when carrier is queried | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5287 this patch: 5287
netdev/cc_maintainers warning 12 maintainers not CCed: pabeni@redhat.com horms@kernel.org kuba@kernel.org d-tatianin@yandex-team.ru edumazet@google.com daniel@iogearbox.net andrew@lunn.ch idosch@nvidia.com lucien.xin@gmail.com edwin.peer@broadcom.com jdamato@fastly.com razor@blackwall.org
netdev/build_clang success Errors and warnings before: 1366 this patch: 1366
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5619 this patch: 5619
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 76 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Johannes Berg Dec. 4, 2023, 8:47 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

There are multiple ways to query for the carrier state: through
rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
work before these operations so that we don't have a situation
where userspace queries the carrier state between the driver's
carrier off->on transition and linkwatch running and expects it
to work, when really (at least) TX cannot work until linkwatch
has run.

I previously posted a longer explanation of how this applies to
wireless [1] but with this wireless can simply query the state
before sending data, to ensure the kernel is ready for it.

[1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 include/linux/netdevice.h | 9 +++++++++
 net/core/dev.c            | 2 +-
 net/core/dev.h            | 1 -
 net/core/link_watch.c     | 2 +-
 net/core/net-sysfs.c      | 8 +++++++-
 net/core/rtnetlink.c      | 8 ++++++++
 net/ethtool/ioctl.c       | 3 +++
 7 files changed, 29 insertions(+), 4 deletions(-)

Comments

Jiri Pirko Dec. 5, 2023, 8:32 a.m. UTC | #1
Mon, Dec 04, 2023 at 09:47:07PM CET, johannes@sipsolutions.net wrote:
>From: Johannes Berg <johannes.berg@intel.com>
>
>There are multiple ways to query for the carrier state: through
>rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
>work before these operations so that we don't have a situation
>where userspace queries the carrier state between the driver's
>carrier off->on transition and linkwatch running and expects it
>to work, when really (at least) TX cannot work until linkwatch
>has run.
>
>I previously posted a longer explanation of how this applies to
>wireless [1] but with this wireless can simply query the state
>before sending data, to ensure the kernel is ready for it.
>
>[1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/
>
>Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>---
> include/linux/netdevice.h | 9 +++++++++
> net/core/dev.c            | 2 +-
> net/core/dev.h            | 1 -
> net/core/link_watch.c     | 2 +-
> net/core/net-sysfs.c      | 8 +++++++-
> net/core/rtnetlink.c      | 8 ++++++++
> net/ethtool/ioctl.c       | 3 +++
> 7 files changed, 29 insertions(+), 4 deletions(-)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 2564e209465e..17dbaf379c69 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -4195,6 +4195,15 @@ static inline void netdev_ref_replace(struct net_device *odev,
>  */
> void linkwatch_fire_event(struct net_device *dev);
> 
>+/**
>+ * linkwatch_sync_dev - sync linkwatch for the given device
>+ * @dev: network device to sync linkwatch for
>+ *
>+ * Sync linkwatch for the given device, removing it from the
>+ * pending work list (if queued).
>+ */
>+void linkwatch_sync_dev(struct net_device *dev);
>+
> /**
>  *	netif_carrier_ok - test if carrier present
>  *	@dev: network device
>diff --git a/net/core/dev.c b/net/core/dev.c
>index c879246be48d..188799b2c6a5 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -10511,7 +10511,7 @@ void netdev_run_todo(void)
> 		write_lock(&dev_base_lock);
> 		dev->reg_state = NETREG_UNREGISTERED;
> 		write_unlock(&dev_base_lock);
>-		linkwatch_forget_dev(dev);
>+		linkwatch_sync_dev(dev);
> 	}
> 
> 	while (!list_empty(&list)) {
>diff --git a/net/core/dev.h b/net/core/dev.h
>index 5aa45f0fd4ae..cb06fe5e38ea 100644
>--- a/net/core/dev.h
>+++ b/net/core/dev.h
>@@ -30,7 +30,6 @@ int __init dev_proc_init(void);
> #endif
> 
> void linkwatch_init_dev(struct net_device *dev);
>-void linkwatch_forget_dev(struct net_device *dev);
> void linkwatch_run_queue(void);
> 
> void dev_addr_flush(struct net_device *dev);
>diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>index ed3e5391fa79..7be5b3ab32bd 100644
>--- a/net/core/link_watch.c
>+++ b/net/core/link_watch.c
>@@ -240,7 +240,7 @@ static void __linkwatch_run_queue(int urgent_only)
> 	spin_unlock_irq(&lweventlist_lock);
> }
> 
>-void linkwatch_forget_dev(struct net_device *dev)
>+void linkwatch_sync_dev(struct net_device *dev)
> {
> 	unsigned long flags;
> 	int clean = 0;
>diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
>index fccaa5bac0ed..d9b33e923b18 100644
>--- a/net/core/net-sysfs.c
>+++ b/net/core/net-sysfs.c
>@@ -194,8 +194,14 @@ static ssize_t carrier_show(struct device *dev,
> {
> 	struct net_device *netdev = to_net_dev(dev);
> 
>-	if (netif_running(netdev))
>+	if (netif_running(netdev)) {
>+		/* Synchronize carrier state with link watch,
>+		 * see also rtnl_getlink().
>+		 */
>+		linkwatch_sync_dev(netdev);
>+
> 		return sysfs_emit(buf, fmt_dec, !!netif_carrier_ok(netdev));
>+	}
> 
> 	return -EINVAL;
> }
>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>index e8431c6c8490..613268d7c491 100644
>--- a/net/core/rtnetlink.c
>+++ b/net/core/rtnetlink.c
>@@ -3853,6 +3853,14 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> 	if (nskb == NULL)
> 		goto out;
> 
>+	/* Synchronize the carrier state so we don't report a state
>+	 * that we're not actually going to honour immediately; if
>+	 * the driver just did a carrier off->on transition, we can
>+	 * only TX if link watch work has run, but without this we'd
>+	 * already report carrier on, even if it doesn't work yet.
>+	 */

This comment is a bit harder to understand for me, but I eventually did
get it :)

Patch looks fine to me.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>



>+	linkwatch_sync_dev(dev);
>+
> 	err = rtnl_fill_ifinfo(nskb, dev, net,
> 			       RTM_NEWLINK, NETLINK_CB(skb).portid,
> 			       nlh->nlmsg_seq, 0, 0, ext_filter_mask,
>diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
>index 0b0ce4f81c01..a977f8903467 100644
>--- a/net/ethtool/ioctl.c
>+++ b/net/ethtool/ioctl.c
>@@ -58,6 +58,9 @@ static struct devlink *netdev_to_devlink_get(struct net_device *dev)
> 
> u32 ethtool_op_get_link(struct net_device *dev)
> {
>+	/* Synchronize carrier state with link watch, see also rtnl_getlink() */
>+	linkwatch_sync_dev(dev);
>+
> 	return netif_carrier_ok(dev) ? 1 : 0;
> }
> EXPORT_SYMBOL(ethtool_op_get_link);
>-- 
>2.43.0
>
>
Johannes Berg Dec. 5, 2023, 10:28 a.m. UTC | #2
On Tue, 2023-12-05 at 09:32 +0100, Jiri Pirko wrote:
> > 
> > +	/* Synchronize the carrier state so we don't report a state
> > +	 * that we're not actually going to honour immediately; if
> > +	 * the driver just did a carrier off->on transition, we can
> > +	 * only TX if link watch work has run, but without this we'd
> > +	 * already report carrier on, even if it doesn't work yet.
> > +	 */
> 
> This comment is a bit harder to understand for me, but I eventually did
> get it :)

Do you want to propose different wording with your understanding? :)

> Patch looks fine to me.

Thanks :)

johannes
Jiri Pirko Dec. 5, 2023, 11:21 a.m. UTC | #3
Tue, Dec 05, 2023 at 11:28:32AM CET, johannes@sipsolutions.net wrote:
>On Tue, 2023-12-05 at 09:32 +0100, Jiri Pirko wrote:
>> > 
>> > +	/* Synchronize the carrier state so we don't report a state
>> > +	 * that we're not actually going to honour immediately; if
>> > +	 * the driver just did a carrier off->on transition, we can
>> > +	 * only TX if link watch work has run, but without this we'd
>> > +	 * already report carrier on, even if it doesn't work yet.
>> > +	 */
>> 
>> This comment is a bit harder to understand for me, but I eventually did
>> get it :)
>
>Do you want to propose different wording with your understanding? :)

Would not be better I'm afraid :)

>
>> Patch looks fine to me.
>
>Thanks :)
>
>johannes
Jakub Kicinski Dec. 6, 2023, 12:11 a.m. UTC | #4
On Mon,  4 Dec 2023 21:47:07 +0100 Johannes Berg wrote:
> There are multiple ways to query for the carrier state: through
> rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
> work before these operations so that we don't have a situation
> where userspace queries the carrier state between the driver's
> carrier off->on transition and linkwatch running and expects it
> to work, when really (at least) TX cannot work until linkwatch
> has run.
> 
> I previously posted a longer explanation of how this applies to
> wireless [1] but with this wireless can simply query the state
> before sending data, to ensure the kernel is ready for it.

Are you okay with net-next?
The previous behavior is, herm... well established.
Johannes Berg Dec. 6, 2023, 12:12 a.m. UTC | #5
On Tue, 2023-12-05 at 16:11 -0800, Jakub Kicinski wrote:
> On Mon,  4 Dec 2023 21:47:07 +0100 Johannes Berg wrote:
> > There are multiple ways to query for the carrier state: through
> > rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
> > work before these operations so that we don't have a situation
> > where userspace queries the carrier state between the driver's
> > carrier off->on transition and linkwatch running and expects it
> > to work, when really (at least) TX cannot work until linkwatch
> > has run.
> > 
> > I previously posted a longer explanation of how this applies to
> > wireless [1] but with this wireless can simply query the state
> > before sending data, to ensure the kernel is ready for it.
> 
> Are you okay with net-next?
> The previous behavior is, herm... well established.

Err, yeah, sure! Not sure I can tell you now how I ended up with net
there.

johannes
patchwork-bot+netdevbpf@kernel.org Dec. 6, 2023, 5 a.m. UTC | #6
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  4 Dec 2023 21:47:07 +0100 you wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> There are multiple ways to query for the carrier state: through
> rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch
> work before these operations so that we don't have a situation
> where userspace queries the carrier state between the driver's
> carrier off->on transition and linkwatch running and expects it
> to work, when really (at least) TX cannot work until linkwatch
> has run.
> 
> [...]

Here is the summary with links:
  - [net] net: core: synchronize link-watch when carrier is queried
    https://git.kernel.org/netdev/net-next/c/facd15dfd691

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2564e209465e..17dbaf379c69 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4195,6 +4195,15 @@  static inline void netdev_ref_replace(struct net_device *odev,
  */
 void linkwatch_fire_event(struct net_device *dev);
 
+/**
+ * linkwatch_sync_dev - sync linkwatch for the given device
+ * @dev: network device to sync linkwatch for
+ *
+ * Sync linkwatch for the given device, removing it from the
+ * pending work list (if queued).
+ */
+void linkwatch_sync_dev(struct net_device *dev);
+
 /**
  *	netif_carrier_ok - test if carrier present
  *	@dev: network device
diff --git a/net/core/dev.c b/net/core/dev.c
index c879246be48d..188799b2c6a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10511,7 +10511,7 @@  void netdev_run_todo(void)
 		write_lock(&dev_base_lock);
 		dev->reg_state = NETREG_UNREGISTERED;
 		write_unlock(&dev_base_lock);
-		linkwatch_forget_dev(dev);
+		linkwatch_sync_dev(dev);
 	}
 
 	while (!list_empty(&list)) {
diff --git a/net/core/dev.h b/net/core/dev.h
index 5aa45f0fd4ae..cb06fe5e38ea 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -30,7 +30,6 @@  int __init dev_proc_init(void);
 #endif
 
 void linkwatch_init_dev(struct net_device *dev);
-void linkwatch_forget_dev(struct net_device *dev);
 void linkwatch_run_queue(void);
 
 void dev_addr_flush(struct net_device *dev);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index ed3e5391fa79..7be5b3ab32bd 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,7 @@  static void __linkwatch_run_queue(int urgent_only)
 	spin_unlock_irq(&lweventlist_lock);
 }
 
-void linkwatch_forget_dev(struct net_device *dev)
+void linkwatch_sync_dev(struct net_device *dev)
 {
 	unsigned long flags;
 	int clean = 0;
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index fccaa5bac0ed..d9b33e923b18 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -194,8 +194,14 @@  static ssize_t carrier_show(struct device *dev,
 {
 	struct net_device *netdev = to_net_dev(dev);
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* Synchronize carrier state with link watch,
+		 * see also rtnl_getlink().
+		 */
+		linkwatch_sync_dev(netdev);
+
 		return sysfs_emit(buf, fmt_dec, !!netif_carrier_ok(netdev));
+	}
 
 	return -EINVAL;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e8431c6c8490..613268d7c491 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3853,6 +3853,14 @@  static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (nskb == NULL)
 		goto out;
 
+	/* Synchronize the carrier state so we don't report a state
+	 * that we're not actually going to honour immediately; if
+	 * the driver just did a carrier off->on transition, we can
+	 * only TX if link watch work has run, but without this we'd
+	 * already report carrier on, even if it doesn't work yet.
+	 */
+	linkwatch_sync_dev(dev);
+
 	err = rtnl_fill_ifinfo(nskb, dev, net,
 			       RTM_NEWLINK, NETLINK_CB(skb).portid,
 			       nlh->nlmsg_seq, 0, 0, ext_filter_mask,
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 0b0ce4f81c01..a977f8903467 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -58,6 +58,9 @@  static struct devlink *netdev_to_devlink_get(struct net_device *dev)
 
 u32 ethtool_op_get_link(struct net_device *dev)
 {
+	/* Synchronize carrier state with link watch, see also rtnl_getlink() */
+	linkwatch_sync_dev(dev);
+
 	return netif_carrier_ok(dev) ? 1 : 0;
 }
 EXPORT_SYMBOL(ethtool_op_get_link);