Message ID | 20211008104115.1327240-1-jacob.e.keller@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | devlink: add dry run support for flash update | expand |
Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: >This is an implementation of a previous idea I had discussed on the list at >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.com/ > >The idea is to allow user space to query whether a given destructive devlink >command would work without actually performing any actions. This is commonly >referred to as a "dry run", and is intended to give tools and system >administrators the ability to test things like flash image validity, or >whether a given option is valid without having to risk performing the update >when not sufficiently ready. > >The intention is that all "destructive" commands can be updated to support >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for >flash update. > >I expect we would want to support this for commands such as reload as well >as other commands which perform some action with no interface to check state >before hand. > >I tried to implement the DRY_RUN checks along with useful extended ACK >messages so that even if a driver does not support DRY_RUN, some useful >information can be retrieved. (i.e. the stack indicates that flash update is >supported and will validate the other attributes first before rejecting the >command due to inability to fully validate the run within the driver). Hmm, old kernel vs. new-userspace, the requested dry-run, won't be really dry run. I guess that user might be surprised in that case... > >Jacob Keller (4): > ice: move and rename ice_check_for_pending_update > ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image > devlink: add dry run attribute to flash update > ice: support dry run of a flash update to validate firmware file > > Documentation/driver-api/pldmfw/index.rst | 10 ++ > drivers/net/ethernet/intel/ice/ice_devlink.c | 53 +----- > .../net/ethernet/intel/ice/ice_fw_update.c | 170 ++++++++++-------- > .../net/ethernet/intel/ice/ice_fw_update.h | 7 +- > include/linux/pldmfw.h | 5 + > include/net/devlink.h | 2 + > include/uapi/linux/devlink.h | 2 + > lib/pldmfw/pldmfw.c | 12 ++ > net/core/devlink.c | 19 +- > 9 files changed, 145 insertions(+), 135 deletions(-) > > >base-commit: c514fbb6231483b05c97eb22587188d4c453b28e >-- >2.31.1.331.gb0c09ab8796f >
On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote: > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: > >This is an implementation of a previous idea I had discussed on the list at > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.com/ > > > >The idea is to allow user space to query whether a given destructive devlink > >command would work without actually performing any actions. This is commonly > >referred to as a "dry run", and is intended to give tools and system > >administrators the ability to test things like flash image validity, or > >whether a given option is valid without having to risk performing the update > >when not sufficiently ready. > > > >The intention is that all "destructive" commands can be updated to support > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for > >flash update. > > > >I expect we would want to support this for commands such as reload as well > >as other commands which perform some action with no interface to check state > >before hand. > > > >I tried to implement the DRY_RUN checks along with useful extended ACK > >messages so that even if a driver does not support DRY_RUN, some useful > >information can be retrieved. (i.e. the stack indicates that flash update is > >supported and will validate the other attributes first before rejecting the > >command due to inability to fully validate the run within the driver). > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > really dry run. I guess that user might be surprised in that case... Would it be enough to do a policy dump in user space to check attr is recognized and add a warning that this is required next to the attr in the uAPI header?
> -----Original Message----- > From: Jiri Pirko <jiri@resnulli.us> > Sent: Friday, October 08, 2021 5:38 AM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: netdev@vger.kernel.org; Jakub Kicinski <kubakici@wp.pl> > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: > >This is an implementation of a previous idea I had discussed on the list at > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co > m/ > > > >The idea is to allow user space to query whether a given destructive devlink > >command would work without actually performing any actions. This is > commonly > >referred to as a "dry run", and is intended to give tools and system > >administrators the ability to test things like flash image validity, or > >whether a given option is valid without having to risk performing the update > >when not sufficiently ready. > > > >The intention is that all "destructive" commands can be updated to support > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for > >flash update. > > > >I expect we would want to support this for commands such as reload as well > >as other commands which perform some action with no interface to check state > >before hand. > > > >I tried to implement the DRY_RUN checks along with useful extended ACK > >messages so that even if a driver does not support DRY_RUN, some useful > >information can be retrieved. (i.e. the stack indicates that flash update is > >supported and will validate the other attributes first before rejecting the > >command due to inability to fully validate the run within the driver). > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > really dry run. I guess that user might be surprised in that case... > old kernel should reject the command with an invalid attribute entirely, no? Thanks, Jake
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, October 08, 2021 11:22 AM > To: Jiri Pirko <jiri@resnulli.us> > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote: > > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: > > >This is an implementation of a previous idea I had discussed on the list at > > > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co > m/ > > > > > >The idea is to allow user space to query whether a given destructive devlink > > >command would work without actually performing any actions. This is > commonly > > >referred to as a "dry run", and is intended to give tools and system > > >administrators the ability to test things like flash image validity, or > > >whether a given option is valid without having to risk performing the update > > >when not sufficiently ready. > > > > > >The intention is that all "destructive" commands can be updated to support > > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for > > >flash update. > > > > > >I expect we would want to support this for commands such as reload as well > > >as other commands which perform some action with no interface to check > state > > >before hand. > > > > > >I tried to implement the DRY_RUN checks along with useful extended ACK > > >messages so that even if a driver does not support DRY_RUN, some useful > > >information can be retrieved. (i.e. the stack indicates that flash update is > > >supported and will validate the other attributes first before rejecting the > > >command due to inability to fully validate the run within the driver). > > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > > really dry run. I guess that user might be surprised in that case... > > Would it be enough to do a policy dump in user space to check attr is > recognized and add a warning that this is required next to the attr > in the uAPI header? Doesn't the policy checks prevent any unknown attributes? Or are unknown attributes silently ignored?
On Fri, 8 Oct 2021 21:43:32 +0000 Keller, Jacob E wrote: > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > > > really dry run. I guess that user might be surprised in that case... > > > > Would it be enough to do a policy dump in user space to check attr is > > recognized and add a warning that this is required next to the attr > > in the uAPI header? > > Doesn't the policy checks prevent any unknown attributes? > Or are unknown attributes silently ignored? Did you test it? DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, October 08, 2021 3:36 PM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > On Fri, 8 Oct 2021 21:43:32 +0000 Keller, Jacob E wrote: > > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > > > > really dry run. I guess that user might be surprised in that case... > > > > > > Would it be enough to do a policy dump in user space to check attr is > > > recognized and add a warning that this is required next to the attr > > > in the uAPI header? > > > > Doesn't the policy checks prevent any unknown attributes? > > Or are unknown attributes silently ignored? > > Did you test it? > > DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set. Hmm. I did run into an issue while initially testing where DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it rejected the command with an unknown attribute. I had to add the attribute to the policy table to fix this. I'm double checking on a different kernel now with the new userspace to see if I get the same behavior. I'm not super familiar with the validation code or what GENL_DONT_VALIDATE_STRICT means... Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the only uses I can find are ones which *set* the flag. Nothing ever checks it!! I suspect it got removed at some point.. still digging into exact history though...
On Fri, 8 Oct 2021 23:58:45 +0000 Keller, Jacob E wrote: > > > Doesn't the policy checks prevent any unknown attributes? > > > Or are unknown attributes silently ignored? > > > > Did you test it? > > > > DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set. > > Hmm. I did run into an issue while initially testing where > DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it > rejected the command with an unknown attribute. I had to add the > attribute to the policy table to fix this. > > I'm double checking on a different kernel now with the new userspace > to see if I get the same behavior. Weird. > I'm not super familiar with the validation code or what > GENL_DONT_VALIDATE_STRICT means... > > Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the > only uses I can find are ones which *set* the flag. Nothing ever > checks it!! > > I suspect it got removed at some point.. still digging into exact > history though... It's passed by genl_family_rcv_msg_doit() to genl_family_rcv_msg_attrs_parse() where it chooses the netlink policy.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, October 08, 2021 5:18 PM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > On Fri, 8 Oct 2021 23:58:45 +0000 Keller, Jacob E wrote: > > > > Doesn't the policy checks prevent any unknown attributes? > > > > Or are unknown attributes silently ignored? > > > > > > Did you test it? > > > > > > DEVLINK_CMD_FLASH_UPDATE has GENL_DONT_VALIDATE_STRICT set. > > > > Hmm. I did run into an issue while initially testing where > > DEVLINK_ATTR_DRY_RUN wasn't in the devlink_nla_policy table and it > > rejected the command with an unknown attribute. I had to add the > > attribute to the policy table to fix this. > > > > I'm double checking on a different kernel now with the new userspace > > to see if I get the same behavior. > > Weird. > > > I'm not super familiar with the validation code or what > > GENL_DONT_VALIDATE_STRICT means... > > > > Indeed.. I just did a search for GENL_DONT_VALIDATE_STRICT and the > > only uses I can find are ones which *set* the flag. Nothing ever > > checks it!! > > > > I suspect it got removed at some point.. still digging into exact > > history though... > > > It's passed by genl_family_rcv_msg_doit() to > genl_family_rcv_msg_attrs_parse() where it chooses the netlink policy. Ah.. I see how its done. It's passed as the argument so you don't see a direct comparison which makes it look like there isn't one... Feels like there could probably be a better abstraction that was more readable here... Anyways. I'll confirm what happens on the kernel that doesn't have the attribute defined at all. I wonder if the thing I saw differently was because the attribute *was* known but wasn't in policy. I.e. because it was defined it was validated.... Yep, I confirm that on a kernel without the DRY_RUN flag that it would allow the run because we aren't being strict. I am guessing that we can't convert devlink over to strict validation? Thanks, Jake
On Sat, 9 Oct 2021 00:32:49 +0000 Keller, Jacob E wrote: > Ah.. I see how its done. It's passed as the argument so you don't > see a direct comparison which makes it look like there isn't one... > Feels like there could probably be a better abstraction that was more > readable here... > > Anyways. I'll confirm what happens on the kernel that doesn't have > the attribute defined at all. > > I wonder if the thing I saw differently was because the attribute > *was* known but wasn't in policy. I.e. because it was defined it was > validated.... > > Yep, I confirm that on a kernel without the DRY_RUN flag that it > would allow the run because we aren't being strict. > > I am guessing that we can't convert devlink over to strict validation? I think the current best practice is not to opt-in commands which started out as non-strict into strict validation. That said opting it in for MAXTYPE validation seems reasonable to me. Alternatively, as I said, you can just check the max attr for the family in user space. CTRL_CMD_GETFAMILY returns it as part of family info (CTRL_ATTR_MAXATTR). We can make user space do the rejecting.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, October 08, 2021 11:22 AM > To: Jiri Pirko <jiri@resnulli.us> > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote: > > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: > > >This is an implementation of a previous idea I had discussed on the list at > > > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co > m/ > > > > > >The idea is to allow user space to query whether a given destructive devlink > > >command would work without actually performing any actions. This is > commonly > > >referred to as a "dry run", and is intended to give tools and system > > >administrators the ability to test things like flash image validity, or > > >whether a given option is valid without having to risk performing the update > > >when not sufficiently ready. > > > > > >The intention is that all "destructive" commands can be updated to support > > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it for > > >flash update. > > > > > >I expect we would want to support this for commands such as reload as well > > >as other commands which perform some action with no interface to check > state > > >before hand. > > > > > >I tried to implement the DRY_RUN checks along with useful extended ACK > > >messages so that even if a driver does not support DRY_RUN, some useful > > >information can be retrieved. (i.e. the stack indicates that flash update is > > >supported and will validate the other attributes first before rejecting the > > >command due to inability to fully validate the run within the driver). > > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > > really dry run. I guess that user might be surprised in that case... > > Would it be enough to do a policy dump in user space to check attr is > recognized and add a warning that this is required next to the attr > in the uAPI header? I'd be more in favor of converting either this specific op (or devlink as a whole) to strict checking. I think that most of the devlink commands and ops would function better if unknown behaviors were rejected rather than ignored. If we prefer to avoid that due to historically not being strict, I'm ok with a userspace check. It does complicate the userspace a bit more, but I agree that especially for dry_run we don't want it accidentally updating when a dry run was expected. There is some maintenance cost to switching to strict checking but I think it's pretty minimal because the strict checking simply prevents the unknown attributes from being ignored. I'm happy to go either direction if we get consensus on this thread though. Thanks, Jake
> -----Original Message----- > From: Keller, Jacob E <jacob.e.keller@intel.com> > Sent: Monday, October 11, 2021 1:22 AM > To: Jakub Kicinski <kuba@kernel.org>; Jiri Pirko <jiri@resnulli.us> > Cc: netdev@vger.kernel.org > Subject: RE: [net-next 0/4] devlink: add dry run support for flash update > > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Friday, October 08, 2021 11:22 AM > > To: Jiri Pirko <jiri@resnulli.us> > > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org > > Subject: Re: [net-next 0/4] devlink: add dry run support for flash update > > > > On Fri, 8 Oct 2021 14:37:37 +0200 Jiri Pirko wrote: > > > Fri, Oct 08, 2021 at 12:41:11PM CEST, jacob.e.keller@intel.com wrote: > > > >This is an implementation of a previous idea I had discussed on the list at > > > > > >https://lore.kernel.org/netdev/51a6e7a33c7d40889c80bf37159f210e@intel.co > > m/ > > > > > > > >The idea is to allow user space to query whether a given destructive devlink > > > >command would work without actually performing any actions. This is > > commonly > > > >referred to as a "dry run", and is intended to give tools and system > > > >administrators the ability to test things like flash image validity, or > > > >whether a given option is valid without having to risk performing the update > > > >when not sufficiently ready. > > > > > > > >The intention is that all "destructive" commands can be updated to support > > > >the new DEVLINK_ATTR_DRY_RUN, although this series only implements it > for > > > >flash update. > > > > > > > >I expect we would want to support this for commands such as reload as well > > > >as other commands which perform some action with no interface to check > > state > > > >before hand. > > > > > > > >I tried to implement the DRY_RUN checks along with useful extended ACK > > > >messages so that even if a driver does not support DRY_RUN, some useful > > > >information can be retrieved. (i.e. the stack indicates that flash update is > > > >supported and will validate the other attributes first before rejecting the > > > >command due to inability to fully validate the run within the driver). > > > > > > Hmm, old kernel vs. new-userspace, the requested dry-run, won't be > > > really dry run. I guess that user might be surprised in that case... > > > > Would it be enough to do a policy dump in user space to check attr is > > recognized and add a warning that this is required next to the attr > > in the uAPI header? > > I'd be more in favor of converting either this specific op (or devlink as a whole) to > strict checking. I think that most of the devlink commands and ops would > function better if unknown behaviors were rejected rather than ignored. > > If we prefer to avoid that due to historically not being strict, I'm ok with a > userspace check. It does complicate the userspace a bit more, but I agree that > especially for dry_run we don't want it accidentally updating when a dry run was > expected. > > There is some maintenance cost to switching to strict checking but I think it's > pretty minimal because the strict checking simply prevents the unknown > attributes from being ignored. > > I'm happy to go either direction if we get consensus on this thread though. > > Thanks, > Jake The ice changes in this patch conflict with similar work that I posted on IWL to cleanup some things for devlink reload support, so I'll probably wait to re-submit this until those changes make it through IWL and onto the list proper. Thanks, Jake
On 10/8/2021 6:29 PM, Jakub Kicinski wrote: > On Sat, 9 Oct 2021 00:32:49 +0000 Keller, Jacob E wrote: Heh. It is frustrating how easily things slip through the cracks. I dropped the ball on this. > I think the current best practice is not to opt-in commands which > started out as non-strict into strict validation. That said opting > it in for MAXTYPE validation seems reasonable to me. > Opting in would only help future kernels, and would obviously not help existing kernels which currently do not have strict validation. I personally would prefer to migrate to strict matching, but understand the concerns with breaking existing usage. > Alternatively, as I said, you can just check the max attr for the > family in user space. CTRL_CMD_GETFAMILY returns it as part of family > info (CTRL_ATTR_MAXATTR). We can make user space do the rejecting. > Yea I think this is the right approach because its the only way for new userspace to properly protect against using this on an old kernel. I'll update the iproute2 patches to do that. Thanks, Jake