mbox series

[net-next,0/4] devlink: add dry run support for flash update

Message ID 20211008104115.1327240-1-jacob.e.keller@intel.com (mailing list archive)
Headers show
Series devlink: add dry run support for flash update | expand

Message

Jacob Keller Oct. 8, 2021, 10:41 a.m. UTC
This is an implementation of a previous idea I had discussed on the list at
https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.com/

The idea is to allow user space to query whether a given destructive devlink
command would work without actually performing any actions. This is commonly
referred to as a "dry run", and is intended to give tools and system
administrators the ability to test things like flash image validity, or
whether a given option is valid without having to risk performing the update
when not sufficiently ready.

The intention is that all "destructive" commands can be updated to support
the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
flash update.

I expect we would want to support this for commands such as reload as well
as other commands which perform some action with no interface to check state
before hand.

I tried to implement the DRY_RUN checks along with useful extended ACK
messages so that even if a driver does not support DRY_RUN, some useful
information can be retrieved. (i.e. the stack indicates that flash update is
supported and will validate the other attributes first before rejecting the
command due to inability to fully validate the run within the driver).

Jacob Keller (4):
  ice: move and rename ice_check_for_pending_update
  ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image
  devlink: add dry run attribute to flash update
  ice: support dry run of a flash update to validate firmware file

 Documentation/driver-api/pldmfw/index.rst     |  10 ++
 drivers/net/ethernet/intel/ice/ice_devlink.c  |  53 +-----
 .../net/ethernet/intel/ice/ice_fw_update.c    | 170 ++++++++++--------
 .../net/ethernet/intel/ice/ice_fw_update.h    |   7 +-
 include/linux/pldmfw.h                        |   5 +
 include/net/devlink.h                         |   2 +
 include/uapi/linux/devlink.h                  |   2 +
 lib/pldmfw/pldmfw.c                           |  12 ++
 net/core/devlink.c                            |  19 +-
 9 files changed, 145 insertions(+), 135 deletions(-)


base-commit: c514fbb6231483b05c97eb22587188d4c453b28e

Comments

Jiri Pirko Oct. 8, 2021, 12:37 p.m. UTC | #1
Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
>This is an implementation of a previous idea I had discussed on the list at
>https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.com/
>
>The idea is to allow user space to query whether a given destructive devlink
>command would work without actually performing any actions. This is commonly
>referred to as a "dry run", and is intended to give tools and system
>administrators the ability to test things like flash image validity, or
>whether a given option is valid without having to risk performing the update
>when not sufficiently ready.
>
>The intention is that all "destructive" commands can be updated to support
>the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
>flash update.
>
>I expect we would want to support this for commands such as reload as well
>as other commands which perform some action with no interface to check state
>before hand.
>
>I tried to implement the DRY_RUN checks along with useful extended ACK
>messages so that even if a driver does not support DRY_RUN, some useful
>information can be retrieved. (i.e. the stack indicates that flash update is
>supported and will validate the other attributes first before rejecting the
>command due to inability to fully validate the run within the driver).

Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
really dry run. I guess that user might be surprised in that case...


>
>Jacob Keller (4):
>  ice: move and rename ice_check_for_pending_update
>  ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image
>  devlink: add dry run attribute to flash update
>  ice: support dry run of a flash update to validate firmware file
>
> Documentation/driver-api/pldmfw/index.rst     |  10 ++
> drivers/net/ethernet/intel/ice/ice_devlink.c  |  53 +-----
> .../net/ethernet/intel/ice/ice_fw_update.c    | 170 ++++++++++--------
> .../net/ethernet/intel/ice/ice_fw_update.h    |   7 +-
> include/linux/pldmfw.h                        |   5 +
> include/net/devlink.h                         |   2 +
> include/uapi/linux/devlink.h                  |   2 +
> lib/pldmfw/pldmfw.c                           |  12 ++
> net/core/devlink.c                            |  19 +-
> 9 files changed, 145 insertions(+), 135 deletions(-)
>
>
>base-commit: c514fbb6231483b05c97eb22587188d4c453b28e
>-- 
>2.31.1.331.gb0c09ab8796f
>
Jakub Kicinski Oct. 8, 2021, 6:21 p.m. UTC | #2
On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote:
> Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
> >This is an implementation of a previous idea I had discussed on the list at
> >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.com/
> >
> >The idea is to allow user space to query whether a given destructive devlink
> >command would work without actually performing any actions. This is commonly
> >referred to as a "dry run", and is intended to give tools and system
> >administrators the ability to test things like flash image validity, or
> >whether a given option is valid without having to risk performing the update
> >when not sufficiently ready.
> >
> >The intention is that all "destructive" commands can be updated to support
> >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
> >flash update.
> >
> >I expect we would want to support this for commands such as reload as well
> >as other commands which perform some action with no interface to check state
> >before hand.
> >
> >I tried to implement the DRY_RUN checks along with useful extended ACK
> >messages so that even if a driver does not support DRY_RUN, some useful
> >information can be retrieved. (i.e. the stack indicates that flash update is
> >supported and will validate the other attributes first before rejecting the
> >command due to inability to fully validate the run within the driver).  
> 
> Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> really dry run. I guess that user might be surprised in that case...

Would it be enough to do a policy dump in user space to check attr is
recognized and add a warning that this is required next to the attr
in the uAPI header?
Jacob Keller Oct. 8, 2021, 9:42 p.m. UTC | #3
> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, October 08, 2021 5:38 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org; Jakub Kicinski <kubakici@wp.pl>
> Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> 
> Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
> >This is an implementation of a previous idea I had discussed on the list at
> >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co
> m/
> >
> >The idea is to allow user space to query whether a given destructive devlink
> >command would work without actually performing any actions. This is
> commonly
> >referred to as a "dry run", and is intended to give tools and system
> >administrators the ability to test things like flash image validity, or
> >whether a given option is valid without having to risk performing the update
> >when not sufficiently ready.
> >
> >The intention is that all "destructive" commands can be updated to support
> >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
> >flash update.
> >
> >I expect we would want to support this for commands such as reload as well
> >as other commands which perform some action with no interface to check state
> >before hand.
> >
> >I tried to implement the DRY_RUN checks along with useful extended ACK
> >messages so that even if a driver does not support DRY_RUN, some useful
> >information can be retrieved. (i.e. the stack indicates that flash update is
> >supported and will validate the other attributes first before rejecting the
> >command due to inability to fully validate the run within the driver).
> 
> Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> really dry run. I guess that user might be surprised in that case...
> 

old kernel should reject the command with an invalid attribute entirely, no?

Thanks,
Jake
Jacob Keller Oct. 8, 2021, 9:43 p.m. UTC | #4
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, October 08, 2021 11:22 AM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> 
> On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote:
> > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
> > >This is an implementation of a previous idea I had discussed on the list at
> >
> >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co
> m/
> > >
> > >The idea is to allow user space to query whether a given destructive devlink
> > >command would work without actually performing any actions. This is
> commonly
> > >referred to as a "dry run", and is intended to give tools and system
> > >administrators the ability to test things like flash image validity, or
> > >whether a given option is valid without having to risk performing the update
> > >when not sufficiently ready.
> > >
> > >The intention is that all "destructive" commands can be updated to support
> > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
> > >flash update.
> > >
> > >I expect we would want to support this for commands such as reload as well
> > >as other commands which perform some action with no interface to check
> state
> > >before hand.
> > >
> > >I tried to implement the DRY_RUN checks along with useful extended ACK
> > >messages so that even if a driver does not support DRY_RUN, some useful
> > >information can be retrieved. (i.e. the stack indicates that flash update is
> > >supported and will validate the other attributes first before rejecting the
> > >command due to inability to fully validate the run within the driver).
> >
> > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> > really dry run. I guess that user might be surprised in that case...
> 
> Would it be enough to do a policy dump in user space to check attr is
> recognized and add a warning that this is required next to the attr
> in the uAPI header?

Doesn't the policy checks prevent any unknown attributes? Or are unknown attributes silently ignored?
Jakub Kicinski Oct. 8, 2021, 10:35 p.m. UTC | #5
On Fri, 8 Oct 2021 21:43:32 +0000 Keller, Jacob E wrote:
> > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> > > really dry run. I guess that user might be surprised in that case...  
> > 
> > Would it be enough to do a policy dump in user space to check attr is
> > recognized and add a warning that this is required next to the attr
> > in the uAPI header?  
> 
> Doesn't the policy checks prevent any unknown attributes? 
> Or are unknown attributes silently ignored?

Did you test it?

DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set.
Jacob Keller Oct. 8, 2021, 11:58 p.m. UTC | #6
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, October 08, 2021 3:36 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org
> Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> 
> On Fri, 8 Oct 2021 21:43:32 +0000 Keller, Jacob E wrote:
> > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> > > > really dry run. I guess that user might be surprised in that case...
> > >
> > > Would it be enough to do a policy dump in user space to check attr is
> > > recognized and add a warning that this is required next to the attr
> > > in the uAPI header?
> >
> > Doesn't the policy checks prevent any unknown attributes?
> > Or are unknown attributes silently ignored?
> 
> Did you test it?
> 
> DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set.

Hmm. I did run into an issue while initially testing where DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it rejected the command with an unknown attribute. I had to add the attribute to the policy table to fix this.

I'm double checking on a different kernel now with the new userspace to see if I get the same behavior.

I'm not super familiar with the validation code or what GENL_DONT_VALIDATE_STRICT means...

Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the only uses I can find are ones which *set* the flag. Nothing ever checks it!!

I suspect it got removed at some point.. still digging into exact history though...
Jakub Kicinski Oct. 9, 2021, 12:17 a.m. UTC | #7
On Fri, 8 Oct 2021 23:58:45 +0000 Keller, Jacob E wrote:
> > > Doesn't the policy checks prevent any unknown attributes?
> > > Or are unknown attributes silently ignored?  
> > 
> > Did you test it?
> > 
> > DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set.  
> 
> Hmm. I did run into an issue while initially testing where
> DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it
> rejected the command with an unknown attribute. I had to add the
> attribute to the policy table to fix this.
> 
> I'm double checking on a different kernel now with the new userspace
> to see if I get the same behavior.

Weird.
 
> I'm not super familiar with the validation code or what
> GENL_DONT_VALIDATE_STRICT means...
> 
> Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the
> only uses I can find are ones which *set* the flag. Nothing ever
> checks it!!
> 
> I suspect it got removed at some point.. still digging into exact
> history though...


 It's passed by genl_family_rcv_msg_doit() to
genl_family_rcv_msg_attrs_parse() where it chooses the netlink policy.
Jacob Keller Oct. 9, 2021, 12:32 a.m. UTC | #8
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, October 08, 2021 5:18 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org
> Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> 
> On Fri, 8 Oct 2021 23:58:45 +0000 Keller, Jacob E wrote:
> > > > Doesn't the policy checks prevent any unknown attributes?
> > > > Or are unknown attributes silently ignored?
> > >
> > > Did you test it?
> > >
> > > DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set.
> >
> > Hmm. I did run into an issue while initially testing where
> > DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it
> > rejected the command with an unknown attribute. I had to add the
> > attribute to the policy table to fix this.
> >
> > I'm double checking on a different kernel now with the new userspace
> > to see if I get the same behavior.
> 
> Weird.
> 
> > I'm not super familiar with the validation code or what
> > GENL_DONT_VALIDATE_STRICT means...
> >
> > Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the
> > only uses I can find are ones which *set* the flag. Nothing ever
> > checks it!!
> >
> > I suspect it got removed at some point.. still digging into exact
> > history though...
> 
> 
>  It's passed by genl_family_rcv_msg_doit() to
> genl_family_rcv_msg_attrs_parse() where it chooses the netlink policy.

Ah.. I see how its done. It's passed as the argument so you  don't see a direct comparison which makes it look like there isn't one... Feels like there could probably be a better abstraction that was more readable here...

Anyways. I'll confirm what happens on the kernel that doesn't have the attribute defined at all.

I wonder if the thing I saw differently was because the attribute *was* known but wasn't in policy. I.e. because it was defined it was validated....

Yep, I confirm that on a kernel without the DRY_RUN flag that it would allow the run because we aren't being strict.

I am guessing that we can't convert devlink over to strict validation?

Thanks,
Jake
Jakub Kicinski Oct. 9, 2021, 1:29 a.m. UTC | #9
On Sat, 9 Oct 2021 00:32:49 +0000 Keller, Jacob E wrote:
> Ah.. I see how its done. It's passed as the argument so you  don't
> see a direct comparison which makes it look like there isn't one...
> Feels like there could probably be a better abstraction that was more
> readable here...
> 
> Anyways. I'll confirm what happens on the kernel that doesn't have
> the attribute defined at all.
> 
> I wonder if the thing I saw differently was because the attribute
> *was* known but wasn't in policy. I.e. because it was defined it was
> validated....
> 
> Yep, I confirm that on a kernel without the DRY_RUN flag that it
> would allow the run because we aren't being strict.
> 
> I am guessing that we can't convert devlink over to strict validation?

I think the current best practice is not to opt-in commands which
started out as non-strict into strict validation. That said opting 
it in for MAXTYPE validation seems reasonable to me.

Alternatively, as I said, you can just check the max attr for the
family in user space. CTRL_CMD_GETFAMILY returns it as part of family
info (CTRL_ATTR_MAXATTR). We can make user space do the rejecting.
Jacob Keller Oct. 11, 2021, 8:21 a.m. UTC | #10
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, October 08, 2021 11:22 AM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> 
> On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote:
> > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
> > >This is an implementation of a previous idea I had discussed on the list at
> >
> >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co
> m/
> > >
> > >The idea is to allow user space to query whether a given destructive devlink
> > >command would work without actually performing any actions. This is
> commonly
> > >referred to as a "dry run", and is intended to give tools and system
> > >administrators the ability to test things like flash image validity, or
> > >whether a given option is valid without having to risk performing the update
> > >when not sufficiently ready.
> > >
> > >The intention is that all "destructive" commands can be updated to support
> > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for
> > >flash update.
> > >
> > >I expect we would want to support this for commands such as reload as well
> > >as other commands which perform some action with no interface to check
> state
> > >before hand.
> > >
> > >I tried to implement the DRY_RUN checks along with useful extended ACK
> > >messages so that even if a driver does not support DRY_RUN, some useful
> > >information can be retrieved. (i.e. the stack indicates that flash update is
> > >supported and will validate the other attributes first before rejecting the
> > >command due to inability to fully validate the run within the driver).
> >
> > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> > really dry run. I guess that user might be surprised in that case...
> 
> Would it be enough to do a policy dump in user space to check attr is
> recognized and add a warning that this is required next to the attr
> in the uAPI header?

I'd be more in favor of converting either this specific op (or devlink as a whole) to strict checking. I think that most of the devlink commands and ops would function better if unknown behaviors were rejected rather than ignored.

If we prefer to avoid that due to historically not being strict, I'm ok with a userspace check. It does complicate the userspace a bit more, but I agree that especially for dry_run we don't want it accidentally updating when a dry run was expected.

There is some maintenance cost to switching to strict checking but I think it's pretty minimal because the strict checking simply prevents the unknown attributes from being ignored.

I'm happy to go either direction if we get consensus on this thread though.

Thanks,
Jake
Jacob Keller Oct. 11, 2021, 11:21 p.m. UTC | #11
> -----Original Message-----
> From: Keller, Jacob E <jacob.e.keller@intel.com>
> Sent: Monday, October 11, 2021 1:22 AM
> To: Jakub Kicinski <kuba@kernel.org>; Jiri Pirko <jiri@resnulli.us>
> Cc: netdev@vger.kernel.org
> Subject: RE: [net-next 0/4] devlink: add dry run support for flash update
> 
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, October 08, 2021 11:22 AM
> > To: Jiri Pirko <jiri@resnulli.us>
> > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update
> >
> > On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote:
> > > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote:
> > > >This is an implementation of a previous idea I had discussed on the list at
> > >
> > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co
> > m/
> > > >
> > > >The idea is to allow user space to query whether a given destructive devlink
> > > >command would work without actually performing any actions. This is
> > commonly
> > > >referred to as a "dry run", and is intended to give tools and system
> > > >administrators the ability to test things like flash image validity, or
> > > >whether a given option is valid without having to risk performing the update
> > > >when not sufficiently ready.
> > > >
> > > >The intention is that all "destructive" commands can be updated to support
> > > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it
> for
> > > >flash update.
> > > >
> > > >I expect we would want to support this for commands such as reload as well
> > > >as other commands which perform some action with no interface to check
> > state
> > > >before hand.
> > > >
> > > >I tried to implement the DRY_RUN checks along with useful extended ACK
> > > >messages so that even if a driver does not support DRY_RUN, some useful
> > > >information can be retrieved. (i.e. the stack indicates that flash update is
> > > >supported and will validate the other attributes first before rejecting the
> > > >command due to inability to fully validate the run within the driver).
> > >
> > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be
> > > really dry run. I guess that user might be surprised in that case...
> >
> > Would it be enough to do a policy dump in user space to check attr is
> > recognized and add a warning that this is required next to the attr
> > in the uAPI header?
> 
> I'd be more in favor of converting either this specific op (or devlink as a whole) to
> strict checking. I think that most of the devlink commands and ops would
> function better if unknown behaviors were rejected rather than ignored.
> 
> If we prefer to avoid that due to historically not being strict, I'm ok with a
> userspace check. It does complicate the userspace a bit more, but I agree that
> especially for dry_run we don't want it accidentally updating when a dry run was
> expected.
> 
> There is some maintenance cost to switching to strict checking but I think it's
> pretty minimal because the strict checking simply prevents the unknown
> attributes from being ignored.
> 
> I'm happy to go either direction if we get consensus on this thread though.
> 
> Thanks,
> Jake

The ice changes in this patch conflict with similar work that I posted on IWL to cleanup some things for devlink reload support, so I'll probably wait to re-submit this until those changes make it through IWL and onto the list proper.

Thanks,
Jake
Jacob Keller April 25, 2022, 11:05 p.m. UTC | #12
On 10/8/2021 6:29 PM, Jakub Kicinski wrote:
> On Sat, 9 Oct 2021 00:32:49 +0000 Keller, Jacob E wrote:

Heh. It is frustrating how easily things slip through the cracks. I
dropped the ball on this.

> I think the current best practice is not to opt-in commands which
> started out as non-strict into strict validation. That said opting 
> it in for MAXTYPE validation seems reasonable to me.
> 

Opting in would only help future kernels, and would obviously not help
existing kernels which currently do not have strict validation. I
personally would prefer to migrate to strict matching, but understand
the concerns with breaking existing usage.

> Alternatively, as I said, you can just check the max attr for the
> family in user space. CTRL_CMD_GETFAMILY returns it as part of family
> info (CTRL_ATTR_MAXATTR). We can make user space do the rejecting.
> 

Yea I think this is the right approach because its the only way for new
userspace to properly protect against using this on an old kernel.

I'll update the iproute2 patches to do that.

Thanks,
Jake