diff mbox series

[net-next,v5,04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

Message ID 20240424133023.4150624-5-danieller@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Add ability to flash modules' firmware | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 235 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 932 this patch: 932
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 944 this patch: 944
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-25--09-00 (tests: 995)

Commit Message

Danielle Ratson April 24, 2024, 1:30 p.m. UTC
Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
---

Notes:
    v2:
    	* Increase err_msg length.

 net/ethtool/module.c    | 83 +++++++++++++++++++++++++++++++++++++++++
 net/ethtool/module_fw.h | 10 +++++
 2 files changed, 93 insertions(+)
 create mode 100644 net/ethtool/module_fw.h

Comments

Jakub Kicinski April 30, 2024, 3:11 a.m. UTC | #1
On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> +	hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> +	if (!hdr)
> +		goto err_skb;

Do we want to blast it to all listeners or treat it as an async reply?
We can save the seq and portid of the original requester and use reply,
I think.

> +	ret = ethnl_fill_reply_header(skb, dev,
> +				      ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> +	if (ret < 0)
> +		goto err_skb;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
> +		goto err_skb;
> +
> +	if (status_msg &&
> +	    nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> +			   status_msg))
> +		goto err_skb;
> +
> +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
> +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> +		goto err_skb;
> +
> +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
> +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> +		goto err_skb;
> +
> +	genlmsg_end(skb, hdr);
> +	ethnl_multicast(skb, dev);
> +	return;
> +
> +err_skb:
> +	nlmsg_free(skb);
> +}
> +
> +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> +				   char *err_msg, char *sub_err_msg)
> +{
> +	char status_msg[120];
> +
> +	if (sub_err_msg)
> +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> +	else
> +		sprintf(status_msg, "%s.", err_msg);

Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116
is a bit surprising. But I guess you have a reason...

Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
nla_reserve() the right amount of space and sprintf() directly into the
skb?

> +	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> +				  status_msg, 0, 0);
Danielle Ratson April 30, 2024, 6:11 p.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, 30 April 2024 6:12
> To: Danielle Ratson <danieller@nvidia.com>
> Cc: netdev@vger.kernel.org; davem@davemloft.net; edumazet@google.com;
> pabeni@redhat.com; corbet@lwn.net; linux@armlinux.org.uk;
> sdf@google.com; kory.maincent@bootlin.com;
> maxime.chevallier@bootlin.com; vladimir.oltean@nxp.com;
> przemyslaw.kitszel@intel.com; ahmed.zaki@intel.com;
> richardcochran@gmail.com; shayagr@amazon.com;
> paul.greenwalt@intel.com; jiri@resnulli.us; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org; mlxsw <mlxsw@nvidia.com>; Petr Machata
> <petrm@nvidia.com>; Ido Schimmel <idosch@nvidia.com>
> Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver
> modules' firmware notifications ability
> 
> On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> > +	hdr = ethnl_bcastmsg_put(skb,
> ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> > +	if (!hdr)
> > +		goto err_skb;
> 
> Do we want to blast it to all listeners or treat it as an async reply?
> We can save the seq and portid of the original requester and use reply, I
> think.

I am sorry, I am not sure I understood what you meant here... it should be an async reply, but not sure I understood your suggestion.
Can you explain please?
Thanks!
 
> 
> > +	ret = ethnl_fill_reply_header(skb, dev,
> > +
> ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> > +	if (ret < 0)
> > +		goto err_skb;
> > +
> > +	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS,
> status))
> > +		goto err_skb;
> > +
> > +	if (status_msg &&
> > +	    nla_put_string(skb,
> ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> > +			   status_msg))
> > +		goto err_skb;
> > +
> > +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE,
> done,
> > +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
> 
> nla_put_uint()
> 
> > +		goto err_skb;
> > +
> > +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL,
> total,
> > +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
> 
> nla_put_uint()
> 
> > +		goto err_skb;
> > +
> > +	genlmsg_end(skb, hdr);
> > +	ethnl_multicast(skb, dev);
> > +	return;
> > +
> > +err_skb:
> > +	nlmsg_free(skb);
> > +}
> > +
> > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > +				   char *err_msg, char *sub_err_msg) {
> > +	char status_msg[120];
> > +
> > +	if (sub_err_msg)
> > +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > +	else
> > +		sprintf(status_msg, "%s.", err_msg);
> 
> Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> surprising. But I guess you have a reason...
> 
> Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> nla_reserve() the right amount of space and sprintf() directly into the skb?

I can get rid of the dot actually, would it be ok like that?

> 
> > +	ethnl_module_fw_flash_ntf(dev,
> ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> > +				  status_msg, 0, 0);
Jakub Kicinski April 30, 2024, 8:03 p.m. UTC | #3
On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > Do we want to blast it to all listeners or treat it as an async reply?
> > We can save the seq and portid of the original requester and use reply, I
> > think.  
> 
> I am sorry, I am not sure I understood what you meant here... it
> should be an async reply, but not sure I understood your suggestion.
> 
> Can you explain please?

Make sure you have read the netlink intro, it should help fill in some
gaps I won't explicitly cover:
https://docs.kernel.org/next/userspace-api/netlink/intro.html

"True" notifications will have pid = 0 and seq = 0, while replies to
commands have those fields populated based on the request.

pid identifies the socket where the message should be delivered.
ethnl_multicast() assumes that it's zero (since it's designed to work
for notifications) and sends the message to all sockets subscribed to 
a multicast / notification group (ETHNL_MCGRP_MONITOR).

So that's the background. What you're doing isn't incorrect but I think
it'd be better if we didn't use the multicast group here, and sent the
messages as a reply - just to the socket which requested the flashing.
Still asynchronously, we just need to save the right pid and seq to use.

Two reasons for this:
 1) convenience, the user space socket won't have to subscribe to 
    the multicast group
 2) the multicast group is really intended for notifying about
    _configuration changes_ done to the system. If there is a daemon
    listening on that group, there's a very high chance it won't care
    about progress of the flashing. Maybe we can send a single
    notification that flashing has been completed but not "progress
    updates"

I think it should work.

> > > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > > +				   char *err_msg, char *sub_err_msg) {
> > > +	char status_msg[120];
> > > +
> > > +	if (sub_err_msg)
> > > +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > > +	else
> > > +		sprintf(status_msg, "%s.", err_msg);  
> > 
> > Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> > surprising. But I guess you have a reason...
> > 
> > Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> > nla_reserve() the right amount of space and sprintf() directly into the skb?  
> 
> I can get rid of the dot actually, would it be ok like that?

It'd still be better to splice the two strings and the comma directly
to the skb, rather than on the stack using a function which doesn't
check the bounds of the buffer :S
Ido Schimmel May 1, 2024, 7:53 a.m. UTC | #4
On Tue, Apr 30, 2024 at 01:03:02PM -0700, Jakub Kicinski wrote:
> On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > > Do we want to blast it to all listeners or treat it as an async reply?
> > > We can save the seq and portid of the original requester and use reply, I
> > > think.  
> > 
> > I am sorry, I am not sure I understood what you meant here... it
> > should be an async reply, but not sure I understood your suggestion.
> > 
> > Can you explain please?
> 
> Make sure you have read the netlink intro, it should help fill in some
> gaps I won't explicitly cover:
> https://docs.kernel.org/next/userspace-api/netlink/intro.html
> 
> "True" notifications will have pid = 0 and seq = 0, while replies to
> commands have those fields populated based on the request.
> 
> pid identifies the socket where the message should be delivered.
> ethnl_multicast() assumes that it's zero (since it's designed to work
> for notifications) and sends the message to all sockets subscribed to 
> a multicast / notification group (ETHNL_MCGRP_MONITOR).
> 
> So that's the background. What you're doing isn't incorrect but I think
> it'd be better if we didn't use the multicast group here, and sent the
> messages as a reply - just to the socket which requested the flashing.
> Still asynchronously, we just need to save the right pid and seq to use.
> 
> Two reasons for this:
>  1) convenience, the user space socket won't have to subscribe to 
>     the multicast group
>  2) the multicast group is really intended for notifying about
>     _configuration changes_ done to the system. If there is a daemon
>     listening on that group, there's a very high chance it won't care
>     about progress of the flashing. Maybe we can send a single
>     notification that flashing has been completed but not "progress
>     updates"
> 
> I think it should work.

We can try to use unicast, but the current design is influenced by
devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
cable testing (see ethnl_cable_test_started() and
ethnl_cable_test_finished()), both of which use multicast notifications
although the latter does not update about progress.

Do you want us to try the unicast approach or be consistent with the
above examples?
Jakub Kicinski May 1, 2024, 2:37 p.m. UTC | #5
On Wed, 1 May 2024 10:53:48 +0300 Ido Schimmel wrote:
> We can try to use unicast, but the current design is influenced by
> devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
> cable testing (see ethnl_cable_test_started() and
> ethnl_cable_test_finished()), both of which use multicast notifications
> although the latter does not update about progress.
> 
> Do you want us to try the unicast approach or be consistent with the
> above examples?

We are charting a bit of a new territory here, you're right that 
the precedents point in the direction of multicast.
The unicast is harder to get done on the kernel side (we should
probably also check that the socket pid didn't get reused, stop
sending the notifications when original socket gets closed?)
It will require using pretty much all the pieces of advanced
netlink infra we have, I'm happy to explain more, but I'll also
understand if you prefer to stick to multicast.
diff mbox series

Patch

diff --git a/net/ethtool/module.c b/net/ethtool/module.c
index ceb575efc290..114a2ec986fe 100644
--- a/net/ethtool/module.c
+++ b/net/ethtool/module.c
@@ -5,6 +5,7 @@ 
 #include "netlink.h"
 #include "common.h"
 #include "bitset.h"
+#include "module_fw.h"
 
 struct module_req_info {
 	struct ethnl_req_info base;
@@ -158,3 +159,85 @@  const struct ethnl_request_ops ethnl_module_request_ops = {
 	.set			= ethnl_set_module,
 	.set_ntf_cmd		= ETHTOOL_MSG_MODULE_NTF,
 };
+
+/* MODULE_FW_FLASH_NTF */
+
+static void
+ethnl_module_fw_flash_ntf(struct net_device *dev,
+			  enum ethtool_module_fw_flash_status status,
+			  const char *status_msg, u64 done, u64 total)
+{
+	struct sk_buff *skb;
+	void *hdr;
+	int ret;
+
+	skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return;
+
+	hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
+	if (!hdr)
+		goto err_skb;
+
+	ret = ethnl_fill_reply_header(skb, dev,
+				      ETHTOOL_A_MODULE_FW_FLASH_HEADER);
+	if (ret < 0)
+		goto err_skb;
+
+	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
+		goto err_skb;
+
+	if (status_msg &&
+	    nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
+			   status_msg))
+		goto err_skb;
+
+	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
+			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
+		goto err_skb;
+
+	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
+			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
+		goto err_skb;
+
+	genlmsg_end(skb, hdr);
+	ethnl_multicast(skb, dev);
+	return;
+
+err_skb:
+	nlmsg_free(skb);
+}
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+				   char *err_msg, char *sub_err_msg)
+{
+	char status_msg[120];
+
+	if (sub_err_msg)
+		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
+	else
+		sprintf(status_msg, "%s.", err_msg);
+
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
+				  status_msg, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev)
+{
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_STARTED,
+				  NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev)
+{
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_COMPLETED,
+				  NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+					   u64 total)
+{
+	ethnl_module_fw_flash_ntf(dev,
+				  ETHTOOL_MODULE_FW_FLASH_STATUS_IN_PROGRESS,
+				  NULL, done, total);
+}
diff --git a/net/ethtool/module_fw.h b/net/ethtool/module_fw.h
new file mode 100644
index 000000000000..e40eae442741
--- /dev/null
+++ b/net/ethtool/module_fw.h
@@ -0,0 +1,10 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <uapi/linux/ethtool.h>
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+				   char *err_msg, char *sub_err_msg);
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+					   u64 total);