Message ID | 20240710224455.188502-1-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8b9b59e27aa88ba133fbac85def3f8be67f2d5a8 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] i40e: fix: remove needless retries of NVM update | expand |
On 7/10/2024 3:44 PM, Tony Nguyen wrote: > From: Aleksandr Loktionov <aleksandr.loktionov@intel.com> > > Remove wrong EIO to EGAIN conversion and pass all errors as is. > > After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only > replace F/W specific error codes with Linux kernel generic, all EIO errors > suddenly started to be converted into EAGAIN which leads nvmupdate to retry > until it timeouts and sometimes fails after more than 20 minutes in the > middle of NVM update, so NVM becomes corrupted. > > The bug affects users only at the time when they try to update NVM, and > only F/W versions that generate errors while nvmupdate. For example, X710DA2 > with 0x8000ECB7 F/W is affected, but there are probably more... > > Command for reproduction is just NVM update: > ./nvmupdate64 > > In the log instead of: > i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM) > appears: > i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM > i40e: eeprom check failed (-5), Tx/Rx traffic disabled > > The problematic code did silently convert EIO into EAGAIN which forced > nvmupdate to ignore EAGAIN error and retry the same operation until timeout. > That's why NVM update takes 20+ minutes to finish with the fail in the end. > > Fixes: 230f3d53a547 ("i40e: remove i40e_status") > Co-developed-by: Kelvin Kang <kelvin.kang@intel.com> > Signed-off-by: Kelvin Kang <kelvin.kang@intel.com> > Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> > Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > Tested-by: Tony Brelinski <tony.brelinski@intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > --- > drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > index ee86d2c53079..55b5bb884d73 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h > @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc) > -EFBIG, /* I40E_AQ_RC_EFBIG */ > }; > > - /* aq_rc is invalid if AQ timed out */ > - if (aq_ret == -EIO) > - return -EAGAIN; > - Makes sense. This hunk originated from commit bf848f328cf5 ("i40e: check for AQ timeout in aq_rc decode") before the AQ return was converted to a standard Linux error value. Previously we checked for I40E_ERR_ADMIN_QUEUE_TIMEOUT, which is more specific. Now all errors that were -EIO get converted, which could include a significantly higher number of errors than just ADMIN_QUEUE_TIMEDOUT. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0])))) > return -ERANGE; >
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 10 Jul 2024 15:44:54 -0700 you wrote: > From: Aleksandr Loktionov <aleksandr.loktionov@intel.com> > > Remove wrong EIO to EGAIN conversion and pass all errors as is. > > After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only > replace F/W specific error codes with Linux kernel generic, all EIO errors > suddenly started to be converted into EAGAIN which leads nvmupdate to retry > until it timeouts and sometimes fails after more than 20 minutes in the > middle of NVM update, so NVM becomes corrupted. > > [...] Here is the summary with links: - [net] i40e: fix: remove needless retries of NVM update https://git.kernel.org/netdev/net/c/8b9b59e27aa8 You are awesome, thank you!
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h index ee86d2c53079..55b5bb884d73 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc) -EFBIG, /* I40E_AQ_RC_EFBIG */ }; - /* aq_rc is invalid if AQ timed out */ - if (aq_ret == -EIO) - return -EAGAIN; - if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0])))) return -ERANGE;