diff mbox series

[net-next,v5,04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

Message ID 20240424133023.4150624-5-danieller@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Add ability to flash modules' firmware | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 235 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 932 this patch: 932
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 944 this patch: 944
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-25--09-00 (tests: 995)

Commit Message

Danielle Ratson April 24, 2024, 1:30 p.m. UTC
Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
---

Notes:
    v2:
    	* Increase err_msg length.

 net/ethtool/module.c    | 83 +++++++++++++++++++++++++++++++++++++++++
 net/ethtool/module_fw.h | 10 +++++
 2 files changed, 93 insertions(+)
 create mode 100644 net/ethtool/module_fw.h

Comments

Jakub Kicinski April 30, 2024, 3:11 a.m. UTC | #1
On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> +	hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> +	if (!hdr)
> +		goto err_skb;

Do we want to blast it to all listeners or treat it as an async reply?
We can save the seq and portid of the original requester and use reply,
I think.

> +	ret = ethnl_fill_reply_header(skb, dev,
> +				      ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> +	if (ret < 0)
> +		goto err_skb;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
> +		goto err_skb;
> +
> +	if (status_msg &&
> +	    nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> +			   status_msg))
> +		goto err_skb;
> +
> +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
> +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> +		goto err_skb;
> +
> +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
> +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> +		goto err_skb;
> +
> +	genlmsg_end(skb, hdr);
> +	ethnl_multicast(skb, dev);
> +	return;
> +
> +err_skb:
> +	nlmsg_free(skb);
> +}
> +
> +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> +				   char *err_msg, char *sub_err_msg)
> +{
> +	char status_msg[120];
> +
> +	if (sub_err_msg)
> +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> +	else
> +		sprintf(status_msg, "%s.", err_msg);

Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116
is a bit surprising. But I guess you have a reason...

Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
nla_reserve() the right amount of space and sprintf() directly into the
skb?

> +	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> +				  status_msg, 0, 0);
Danielle Ratson April 30, 2024, 6:11 p.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, 30 April 2024 6:12
> To: Danielle Ratson <danieller@nvidia.com>
> Cc: netdev@vger.kernel.org; davem@davemloft.net; edumazet@google.com;
> pabeni@redhat.com; corbet@lwn.net; linux@armlinux.org.uk;
> sdf@google.com; kory.maincent@bootlin.com;
> maxime.chevallier@bootlin.com; vladimir.oltean@nxp.com;
> przemyslaw.kitszel@intel.com; ahmed.zaki@intel.com;
> richardcochran@gmail.com; shayagr@amazon.com;
> paul.greenwalt@intel.com; jiri@resnulli.us; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org; mlxsw <mlxsw@nvidia.com>; Petr Machata
> <petrm@nvidia.com>; Ido Schimmel <idosch@nvidia.com>
> Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver
> modules' firmware notifications ability
> 
> On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> > +	hdr = ethnl_bcastmsg_put(skb,
> ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> > +	if (!hdr)
> > +		goto err_skb;
> 
> Do we want to blast it to all listeners or treat it as an async reply?
> We can save the seq and portid of the original requester and use reply, I
> think.

I am sorry, I am not sure I understood what you meant here... it should be an async reply, but not sure I understood your suggestion.
Can you explain please?
Thanks!
 
> 
> > +	ret = ethnl_fill_reply_header(skb, dev,
> > +
> ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> > +	if (ret < 0)
> > +		goto err_skb;
> > +
> > +	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS,
> status))
> > +		goto err_skb;
> > +
> > +	if (status_msg &&
> > +	    nla_put_string(skb,
> ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> > +			   status_msg))
> > +		goto err_skb;
> > +
> > +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE,
> done,
> > +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
> 
> nla_put_uint()
> 
> > +		goto err_skb;
> > +
> > +	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL,
> total,
> > +			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
> 
> nla_put_uint()
> 
> > +		goto err_skb;
> > +
> > +	genlmsg_end(skb, hdr);
> > +	ethnl_multicast(skb, dev);
> > +	return;
> > +
> > +err_skb:
> > +	nlmsg_free(skb);
> > +}
> > +
> > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > +				   char *err_msg, char *sub_err_msg) {
> > +	char status_msg[120];
> > +
> > +	if (sub_err_msg)
> > +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > +	else
> > +		sprintf(status_msg, "%s.", err_msg);
> 
> Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> surprising. But I guess you have a reason...
> 
> Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> nla_reserve() the right amount of space and sprintf() directly into the skb?

I can get rid of the dot actually, would it be ok like that?

> 
> > +	ethnl_module_fw_flash_ntf(dev,
> ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> > +				  status_msg, 0, 0);
Jakub Kicinski April 30, 2024, 8:03 p.m. UTC | #3
On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > Do we want to blast it to all listeners or treat it as an async reply?
> > We can save the seq and portid of the original requester and use reply, I
> > think.  
> 
> I am sorry, I am not sure I understood what you meant here... it
> should be an async reply, but not sure I understood your suggestion.
> 
> Can you explain please?

Make sure you have read the netlink intro, it should help fill in some
gaps I won't explicitly cover:
https://docs.kernel.org/next/userspace-api/netlink/intro.html

"True" notifications will have pid = 0 and seq = 0, while replies to
commands have those fields populated based on the request.

pid identifies the socket where the message should be delivered.
ethnl_multicast() assumes that it's zero (since it's designed to work
for notifications) and sends the message to all sockets subscribed to 
a multicast / notification group (ETHNL_MCGRP_MONITOR).

So that's the background. What you're doing isn't incorrect but I think
it'd be better if we didn't use the multicast group here, and sent the
messages as a reply - just to the socket which requested the flashing.
Still asynchronously, we just need to save the right pid and seq to use.

Two reasons for this:
 1) convenience, the user space socket won't have to subscribe to 
    the multicast group
 2) the multicast group is really intended for notifying about
    _configuration changes_ done to the system. If there is a daemon
    listening on that group, there's a very high chance it won't care
    about progress of the flashing. Maybe we can send a single
    notification that flashing has been completed but not "progress
    updates"

I think it should work.

> > > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > > +				   char *err_msg, char *sub_err_msg) {
> > > +	char status_msg[120];
> > > +
> > > +	if (sub_err_msg)
> > > +		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > > +	else
> > > +		sprintf(status_msg, "%s.", err_msg);  
> > 
> > Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> > surprising. But I guess you have a reason...
> > 
> > Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> > nla_reserve() the right amount of space and sprintf() directly into the skb?  
> 
> I can get rid of the dot actually, would it be ok like that?

It'd still be better to splice the two strings and the comma directly
to the skb, rather than on the stack using a function which doesn't
check the bounds of the buffer :S
Ido Schimmel May 1, 2024, 7:53 a.m. UTC | #4
On Tue, Apr 30, 2024 at 01:03:02PM -0700, Jakub Kicinski wrote:
> On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > > Do we want to blast it to all listeners or treat it as an async reply?
> > > We can save the seq and portid of the original requester and use reply, I
> > > think.  
> > 
> > I am sorry, I am not sure I understood what you meant here... it
> > should be an async reply, but not sure I understood your suggestion.
> > 
> > Can you explain please?
> 
> Make sure you have read the netlink intro, it should help fill in some
> gaps I won't explicitly cover:
> https://docs.kernel.org/next/userspace-api/netlink/intro.html
> 
> "True" notifications will have pid = 0 and seq = 0, while replies to
> commands have those fields populated based on the request.
> 
> pid identifies the socket where the message should be delivered.
> ethnl_multicast() assumes that it's zero (since it's designed to work
> for notifications) and sends the message to all sockets subscribed to 
> a multicast / notification group (ETHNL_MCGRP_MONITOR).
> 
> So that's the background. What you're doing isn't incorrect but I think
> it'd be better if we didn't use the multicast group here, and sent the
> messages as a reply - just to the socket which requested the flashing.
> Still asynchronously, we just need to save the right pid and seq to use.
> 
> Two reasons for this:
>  1) convenience, the user space socket won't have to subscribe to 
>     the multicast group
>  2) the multicast group is really intended for notifying about
>     _configuration changes_ done to the system. If there is a daemon
>     listening on that group, there's a very high chance it won't care
>     about progress of the flashing. Maybe we can send a single
>     notification that flashing has been completed but not "progress
>     updates"
> 
> I think it should work.

We can try to use unicast, but the current design is influenced by
devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
cable testing (see ethnl_cable_test_started() and
ethnl_cable_test_finished()), both of which use multicast notifications
although the latter does not update about progress.

Do you want us to try the unicast approach or be consistent with the
above examples?
Jakub Kicinski May 1, 2024, 2:37 p.m. UTC | #5
On Wed, 1 May 2024 10:53:48 +0300 Ido Schimmel wrote:
> We can try to use unicast, but the current design is influenced by
> devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
> cable testing (see ethnl_cable_test_started() and
> ethnl_cable_test_finished()), both of which use multicast notifications
> although the latter does not update about progress.
> 
> Do you want us to try the unicast approach or be consistent with the
> above examples?

We are charting a bit of a new territory here, you're right that 
the precedents point in the direction of multicast.
The unicast is harder to get done on the kernel side (we should
probably also check that the socket pid didn't get reused, stop
sending the notifications when original socket gets closed?)
It will require using pretty much all the pieces of advanced
netlink infra we have, I'm happy to explain more, but I'll also
understand if you prefer to stick to multicast.
Danielle Ratson May 22, 2024, 1:08 p.m. UTC | #6
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, 1 May 2024 17:38
> To: Ido Schimmel <idosch@nvidia.com>
> Cc: Danielle Ratson <danieller@nvidia.com>; netdev@vger.kernel.org;
> davem@davemloft.net; edumazet@google.com; pabeni@redhat.com;
> corbet@lwn.net; linux@armlinux.org.uk; sdf@google.com;
> kory.maincent@bootlin.com; maxime.chevallier@bootlin.com;
> vladimir.oltean@nxp.com; przemyslaw.kitszel@intel.com;
> ahmed.zaki@intel.com; richardcochran@gmail.com; shayagr@amazon.com;
> paul.greenwalt@intel.com; jiri@resnulli.us; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org; mlxsw <mlxsw@nvidia.com>; Petr Machata
> <petrm@nvidia.com>
> Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver
> modules' firmware notifications ability
> 
> On Wed, 1 May 2024 10:53:48 +0300 Ido Schimmel wrote:
> > We can try to use unicast, but the current design is influenced by
> > devlink firmware flash (see __devlink_flash_update_notify()) and
> > ethtool cable testing (see ethnl_cable_test_started() and
> > ethnl_cable_test_finished()), both of which use multicast
> > notifications although the latter does not update about progress.
> >
> > Do you want us to try the unicast approach or be consistent with the
> > above examples?
> 
> We are charting a bit of a new territory here, you're right that the precedents
> point in the direction of multicast.
> The unicast is harder to get done on the kernel side (we should probably also
> check that the socket pid didn't get reused, stop sending the notifications
> when original socket gets closed?) It will require using pretty much all the
> pieces of advanced netlink infra we have, I'm happy to explain more, but I'll
> also understand if you prefer to stick to multicast.

Hi Jakub,

Following our discussion, I wanted to see if you are ok with the idea below:

1. Add a new unicast function to netlink.c:
void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8 cmd)

2. Use it in the notification function instead of the multicast previously used along with genlmsg_unicast().
'portid' and 'seq' taken from genl_info(), are added to the struct ethtool_module_fw_flash, which is accessible from the work item.

3. Create a global list that holds nodes from type struct ethtool_module_fw_flash() and add it as a field in the struct ethtool_module_fw_flash.
Before scheduling a work, a new node is added to the list.

4. Add a new netlink notifier that when the relevant event takes place, deletes the node from the list, wait until the end of the work item, with cancel_work_sync() and free allocations.

Thanks,
Danielle
Jakub Kicinski May 22, 2024, 1:45 p.m. UTC | #7
On Wed, 22 May 2024 13:08:43 +0000 Danielle Ratson wrote:
> 1. Add a new unicast function to netlink.c:
> void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8 cmd)
> 
> 2. Use it in the notification function instead of the multicast previously used along with genlmsg_unicast().
> 'portid' and 'seq' taken from genl_info(), are added to the struct ethtool_module_fw_flash, which is accessible from the work item.
> 
> 3. Create a global list that holds nodes from type struct ethtool_module_fw_flash() and add it as a field in the struct ethtool_module_fw_flash.
> Before scheduling a work, a new node is added to the list.

Makes sense.

> 4. Add a new netlink notifier that when the relevant event takes place, deletes the node from the list, wait until the end of the work item, with cancel_work_sync() and free allocations.

What's the "relevant event" in this case? Closing of the socket that
user had issued the command on?

Easiest way to "notice" the socket got closed would probably be to
add some info to genl_sk_priv_*(). ->sock_priv_destroy() will get
called. But you can also get a close notification in the family 
->unbind callback.

I'm on the fence whether we should cancel the work. We could just
mark the command as 'no socket present' and stop sending notifications.
Not sure which is better..
Danielle Ratson May 22, 2024, 1:56 p.m. UTC | #8
> > 1. Add a new unicast function to netlink.c:
> > void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8
> > cmd)
> >
> > 2. Use it in the notification function instead of the multicast previously used
> along with genlmsg_unicast().
> > 'portid' and 'seq' taken from genl_info(), are added to the struct
> ethtool_module_fw_flash, which is accessible from the work item.
> >
> > 3. Create a global list that holds nodes from type struct
> ethtool_module_fw_flash() and add it as a field in the struct
> ethtool_module_fw_flash.
> > Before scheduling a work, a new node is added to the list.
> 
> Makes sense.
> 
> > 4. Add a new netlink notifier that when the relevant event takes place,
> deletes the node from the list, wait until the end of the work item, with
> cancel_work_sync() and free allocations.
> 
> What's the "relevant event" in this case? Closing of the socket that user had
> issued the command on?

The event should match the below:
event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC

Then iterate over the list to look for work that matches the dev and portid.
The socket doesn’t close until the work is done in that case. 

> 
> Easiest way to "notice" the socket got closed would probably be to add some
> info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> get a close notification in the family
> ->unbind callback.
> 
> I'm on the fence whether we should cancel the work. We could just mark the
> command as 'no socket present' and stop sending notifications.
> Not sure which is better..

Is there a scenario that we hit this event and won't intend to cancel the work? 

Thanks,
Danielle
Jakub Kicinski May 22, 2024, 2:22 p.m. UTC | #9
On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > 4. Add a new netlink notifier that when the relevant event takes place,  
> > deletes the node from the list, wait until the end of the work item, with
> > cancel_work_sync() and free allocations.
> > 
> > What's the "relevant event" in this case? Closing of the socket that user had
> > issued the command on?  
> 
> The event should match the below:
> event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> 
> Then iterate over the list to look for work that matches the dev and portid.
> The socket doesn’t close until the work is done in that case. 

Okay, good, yes. I think you can use one of the callbacks I mentioned
below to achieve the same thing with less complexity than the notifier.

> > Easiest way to "notice" the socket got closed would probably be to add some
> > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > get a close notification in the family  
> > ->unbind callback.  
> > 
> > I'm on the fence whether we should cancel the work. We could just mark the
> > command as 'no socket present' and stop sending notifications.
> > Not sure which is better..  
> 
> Is there a scenario that we hit this event and won't intend to cancel the work? 

I think it's up to us. I don't see any legit reason for user space to
intentionally cancel the flashing. So the only option is that user space
is either buggy or has crashed, and the socket got closed before
flashing finished. Right?
Ido Schimmel May 27, 2024, 4:10 p.m. UTC | #10
On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote:
> On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > > 4. Add a new netlink notifier that when the relevant event takes place,  
> > > deletes the node from the list, wait until the end of the work item, with
> > > cancel_work_sync() and free allocations.
> > > 
> > > What's the "relevant event" in this case? Closing of the socket that user had
> > > issued the command on?  
> > 
> > The event should match the below:
> > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> > 
> > Then iterate over the list to look for work that matches the dev and portid.
> > The socket doesn’t close until the work is done in that case. 
> 
> Okay, good, yes. I think you can use one of the callbacks I mentioned
> below to achieve the same thing with less complexity than the notifier.

Danielle already has a POC with the notifier and it's not that
complicated. I wasn't aware of the netlink notifier, but we found it
when we tried to understand how other netlink families get notified
about a socket being closed.

Which advantages do you see in the sock_priv_destroy() approach? Are you
against the notifier approach?

> > > Easiest way to "notice" the socket got closed would probably be to add some
> > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > > get a close notification in the family  
> > > ->unbind callback.  

Isn't the unbind callback only for multicast (whereas we are using
unicast)?

> > > 
> > > I'm on the fence whether we should cancel the work. We could just mark the
> > > command as 'no socket present' and stop sending notifications.
> > > Not sure which is better..  
> > 
> > Is there a scenario that we hit this event and won't intend to cancel the work? 
> 
> I think it's up to us. I don't see any legit reason for user space to
> intentionally cancel the flashing. So the only option is that user space
> is either buggy or has crashed, and the socket got closed before
> flashing finished. Right?

We don't think that closing the socket / killing the process mid
flashing is a legitimate scenario. We looked into it in order to avoid
sending unicast notifications to a socket that did not ask for them but
gets them because it was bound to the port ID that was used by the old
socket.

I agree that we don't need to cancel the work and can simply have the
work item stop sending notifications. User space will get an error if it
tries to flash a module that is already being flashed in the background.
WDYT?
Jakub Kicinski May 27, 2024, 4:30 p.m. UTC | #11
On Mon, 27 May 2024 19:10:55 +0300 Ido Schimmel wrote:
> On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote:
> > On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:  
> > > The event should match the below:
> > > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> > > 
> > > Then iterate over the list to look for work that matches the dev and portid.
> > > The socket doesn’t close until the work is done in that case.   
> > 
> > Okay, good, yes. I think you can use one of the callbacks I mentioned
> > below to achieve the same thing with less complexity than the notifier.  
> 
> Danielle already has a POC with the notifier and it's not that
> complicated. I wasn't aware of the netlink notifier, but we found it
> when we tried to understand how other netlink families get notified
> about a socket being closed.
> 
> Which advantages do you see in the sock_priv_destroy() approach? Are you
> against the notifier approach?

Notifier is not incorrect, but I worry it will result in more code,
and basically duplication of what genl_sk_priv* does. Perhaps you
managed to code it up very neatly - if so feel free to send the v6
and we can discuss further if needed?

> > > > Easiest way to "notice" the socket got closed would probably be to add some
> > > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > > > get a close notification in the family    
> > > > ->unbind callback.    
> 
> Isn't the unbind callback only for multicast (whereas we are using
> unicast)?

True, should work in practice, I think. But sock_priv is much better.

> > > Is there a scenario that we hit this event and won't intend to cancel the work?   
> > 
> > I think it's up to us. I don't see any legit reason for user space to
> > intentionally cancel the flashing. So the only option is that user space
> > is either buggy or has crashed, and the socket got closed before
> > flashing finished. Right?  
> 
> We don't think that closing the socket / killing the process mid
> flashing is a legitimate scenario. We looked into it in order to avoid
> sending unicast notifications to a socket that did not ask for them but
> gets them because it was bound to the port ID that was used by the old
> socket.
> 
> I agree that we don't need to cancel the work and can simply have the
> work item stop sending notifications. User space will get an error if it
> tries to flash a module that is already being flashed in the background.
> WDYT?

SGTM!
diff mbox series

Patch

diff --git a/net/ethtool/module.c b/net/ethtool/module.c
index ceb575efc290..114a2ec986fe 100644
--- a/net/ethtool/module.c
+++ b/net/ethtool/module.c
@@ -5,6 +5,7 @@ 
 #include "netlink.h"
 #include "common.h"
 #include "bitset.h"
+#include "module_fw.h"
 
 struct module_req_info {
 	struct ethnl_req_info base;
@@ -158,3 +159,85 @@  const struct ethnl_request_ops ethnl_module_request_ops = {
 	.set			= ethnl_set_module,
 	.set_ntf_cmd		= ETHTOOL_MSG_MODULE_NTF,
 };
+
+/* MODULE_FW_FLASH_NTF */
+
+static void
+ethnl_module_fw_flash_ntf(struct net_device *dev,
+			  enum ethtool_module_fw_flash_status status,
+			  const char *status_msg, u64 done, u64 total)
+{
+	struct sk_buff *skb;
+	void *hdr;
+	int ret;
+
+	skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return;
+
+	hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
+	if (!hdr)
+		goto err_skb;
+
+	ret = ethnl_fill_reply_header(skb, dev,
+				      ETHTOOL_A_MODULE_FW_FLASH_HEADER);
+	if (ret < 0)
+		goto err_skb;
+
+	if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
+		goto err_skb;
+
+	if (status_msg &&
+	    nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
+			   status_msg))
+		goto err_skb;
+
+	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
+			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
+		goto err_skb;
+
+	if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
+			      ETHTOOL_A_MODULE_FW_FLASH_PAD))
+		goto err_skb;
+
+	genlmsg_end(skb, hdr);
+	ethnl_multicast(skb, dev);
+	return;
+
+err_skb:
+	nlmsg_free(skb);
+}
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+				   char *err_msg, char *sub_err_msg)
+{
+	char status_msg[120];
+
+	if (sub_err_msg)
+		sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
+	else
+		sprintf(status_msg, "%s.", err_msg);
+
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
+				  status_msg, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev)
+{
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_STARTED,
+				  NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev)
+{
+	ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_COMPLETED,
+				  NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+					   u64 total)
+{
+	ethnl_module_fw_flash_ntf(dev,
+				  ETHTOOL_MODULE_FW_FLASH_STATUS_IN_PROGRESS,
+				  NULL, done, total);
+}
diff --git a/net/ethtool/module_fw.h b/net/ethtool/module_fw.h
new file mode 100644
index 000000000000..e40eae442741
--- /dev/null
+++ b/net/ethtool/module_fw.h
@@ -0,0 +1,10 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <uapi/linux/ethtool.h>
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+				   char *err_msg, char *sub_err_msg);
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+					   u64 total);