Message ID | 20240209124403.44781-1-damian.muszynski@intel.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: qat - resolve race condition during AER recovery | expand |
On Fri, Feb 09, 2024 at 01:43:42PM +0100, Damian Muszynski wrote: > During the PCI AER system's error recovery process, the kernel driver > may encounter a race condition with freeing the reset_data structure's > memory. If the device restart will take more than 10 seconds the function > scheduling that restart will exit due to a timeout, and the reset_data > structure will be freed. However, this data structure is used for > completion notification after the restart is completed, which leads > to a UAF bug. > > This results in a KFENCE bug notice. > > BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat] > Use-after-free read at 0x00000000bc56fddf (in kfence-#142): > adf_device_reset_worker+0x38/0xa0 [intel_qat] > process_one_work+0x173/0x340 > > To resolve this race condition, the memory associated to the container > of the work_struct is freed on the worker if the timeout expired, > otherwise on the function that schedules the worker. > The timeout detection can be done by checking if the caller is > still waiting for completion or not by using completion_done() function. > > Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") > Cc: <stable@vger.kernel.org> > Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> > Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > --- > drivers/crypto/intel/qat/qat_common/adf_aer.c | 22 ++++++++++++++----- > 1 file changed, 16 insertions(+), 6 deletions(-) Patch applied. Thanks.
diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index 3597e7605a14..9da2278bd5b7 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -130,7 +130,8 @@ static void adf_device_reset_worker(struct work_struct *work) if (adf_dev_restart(accel_dev)) { /* The device hanged and we can't restart it so stop here */ dev_err(&GET_DEV(accel_dev), "Restart device failed\n"); - if (reset_data->mode == ADF_DEV_RESET_ASYNC) + if (reset_data->mode == ADF_DEV_RESET_ASYNC || + completion_done(&reset_data->compl)) kfree(reset_data); WARN(1, "QAT: device restart failed. Device is unusable\n"); return; @@ -146,11 +147,19 @@ static void adf_device_reset_worker(struct work_struct *work) adf_dev_restarted_notify(accel_dev); clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); - /* The dev is back alive. Notify the caller if in sync mode */ - if (reset_data->mode == ADF_DEV_RESET_SYNC) - complete(&reset_data->compl); - else + /* + * The dev is back alive. Notify the caller if in sync mode + * + * If device restart will take a more time than expected, + * the schedule_reset() function can timeout and exit. This can be + * detected by calling the completion_done() function. In this case + * the reset_data structure needs to be freed here. + */ + if (reset_data->mode == ADF_DEV_RESET_ASYNC || + completion_done(&reset_data->compl)) kfree(reset_data); + else + complete(&reset_data->compl); } static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, @@ -183,8 +192,9 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, dev_err(&GET_DEV(accel_dev), "Reset device timeout expired\n"); ret = -EFAULT; + } else { + kfree(reset_data); } - kfree(reset_data); return ret; } return 0;