diff mbox series

[v1] dmaengine: dmatest: Fix dmatest waiting less when interrupted

Message ID 20250305230007.590178-1-vinicius.gomes@intel.com (mailing list archive)
State Accepted
Commit e87ca16e99118ab4e130a41bdf12abbf6a87656c
Headers show
Series [v1] dmaengine: dmatest: Fix dmatest waiting less when interrupted | expand

Commit Message

Vinicius Costa Gomes March 5, 2025, 11 p.m. UTC
Change the "wait for operation finish" logic to take interrupts into
account.

When using dmatest with idxd DMA engine, it's possible that during
longer tests, the interrupt notifying the finish of an operation
happens during wait_event_freezable_timeout(), which causes dmatest to
cleanup all the resources, some of which might still be in use.

This fix ensures that the wait logic correctly handles interrupts,
preventing premature cleanup of resources.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171134.8c403348-lkp@intel.com
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/dma/dmatest.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Dave Jiang March 5, 2025, 11:14 p.m. UTC | #1
On 3/5/25 4:00 PM, Vinicius Costa Gomes wrote:
> Change the "wait for operation finish" logic to take interrupts into
> account.
> 
> When using dmatest with idxd DMA engine, it's possible that during
> longer tests, the interrupt notifying the finish of an operation
> happens during wait_event_freezable_timeout(), which causes dmatest to
> cleanup all the resources, some of which might still be in use.
> 
> This fix ensures that the wait logic correctly handles interrupts,
> preventing premature cleanup of resources.
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202502171134.8c403348-lkp@intel.com
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/dma/dmatest.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
> index 91b2fbc0b864..d891dfca358e 100644
> --- a/drivers/dma/dmatest.c
> +++ b/drivers/dma/dmatest.c
> @@ -841,9 +841,9 @@ static int dmatest_func(void *data)
>  		} else {
>  			dma_async_issue_pending(chan);
>  
> -			wait_event_freezable_timeout(thread->done_wait,
> -					done->done,
> -					msecs_to_jiffies(params->timeout));
> +			wait_event_timeout(thread->done_wait,
> +					   done->done,
> +					   msecs_to_jiffies(params->timeout));
>  
>  			status = dma_async_is_tx_complete(chan, cookie, NULL,
>  							  NULL);
Vinod Koul March 10, 2025, 9:06 p.m. UTC | #2
On Wed, 05 Mar 2025 15:00:06 -0800, Vinicius Costa Gomes wrote:
> Change the "wait for operation finish" logic to take interrupts into
> account.
> 
> When using dmatest with idxd DMA engine, it's possible that during
> longer tests, the interrupt notifying the finish of an operation
> happens during wait_event_freezable_timeout(), which causes dmatest to
> cleanup all the resources, some of which might still be in use.
> 
> [...]

Applied, thanks!

[1/1] dmaengine: dmatest: Fix dmatest waiting less when interrupted
      commit: e87ca16e99118ab4e130a41bdf12abbf6a87656c

Best regards,
Nathan Lynch March 12, 2025, 6:58 p.m. UTC | #3
Hi,

Vinicius Costa Gomes <vinicius.gomes@intel.com> writes:
> Change the "wait for operation finish" logic to take interrupts into
> account.
>
> When using dmatest with idxd DMA engine, it's possible that during
> longer tests, the interrupt notifying the finish of an operation
> happens during wait_event_freezable_timeout(), which causes dmatest to
> cleanup all the resources, some of which might still be in use.
>
> This fix ensures that the wait logic correctly handles interrupts,
> preventing premature cleanup of resources.
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202502171134.8c403348-lkp@intel.com

Given the report at the URL above I'm struggling to follow the rationale
for this change. It looks like a use-after-free in idxd while
resetting/unbinding the device, and I can't see how changing whether
dmatest threads perform freezeable waits would change this.

Note the idxd code emits a couple of warnings about inconsistent state
before the UAF:

[   81.023244][ T1644] idxd dsa0: Active wq 0 on disable wq0.0.
[   81.040447][ T1644] idxd 0000:6a:01.0: Clients has claim on wq 0: 1

Here is the bad access:

BUG: KASAN: slab-use-after-free in idxd_dma_complete_txd+0x418/0x510 [idxd]
Write of size 4 at addr ff11000134978114 by task kworker/118:1/1644
CPU: 118 UID: 0 PID: 1644 Comm: kworker/118:1 Tainted: G S                 6.13.0-rc1-00054-g98d187a98903 #1
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5C7411.86B.8118.D04.2206151341 06/15/2022
Workqueue: 0000:6a:01.0 idxd_device_flr [idxd]
Call Trace:
 <TASK>
 dump_stack_lvl+0x4f/0x70
 print_address_description+0x2c/0x3a0
 print_report+0xb9/0x280
 kasan_report+0xaa/0xe0
 idxd_dma_complete_txd+0x418/0x510 [idxd]
 idxd_flush_pending_descs+0x4a8/0x7e0 [idxd]
 idxd_wq_free_irq+0xcd/0x330 [idxd]
 idxd_drv_disable_wq+0x125/0x2d0 [idxd]
 idxd_dmaengine_drv_remove+0x1fd/0x2f0 [idxd]
 device_release_driver_internal+0x36d/0x530
 idxd_device_drv_remove+0xa0/0x240 [idxd]
 device_release_driver_internal+0x36d/0x530
 idxd_reset_done+0x600/0x770 [idxd]
 pci_reset_function+0x1c9/0x230
 idxd_device_flr+0x34/0x90 [idxd]
 process_one_work+0x676/0x1000
 worker_thread+0x710/0xf40
 kthread+0x2d4/0x3c0
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Here is the allocation site:

Allocated by task 3664:
 kasan_save_stack+0x1c/0x40
 kasan_save_track+0x10/0x30
 __kasan_kmalloc+0x8b/0x90
 idxd_dmaengine_drv_probe+0x2eb/0x860 [idxd]
 really_probe+0x1e0/0x920
 __driver_probe_device+0x18c/0x3d0
 device_driver_attach+0xae/0x1b0
 bind_store+0xc9/0x140
 kernfs_fop_write_iter+0x2e6/0x4c0

And here is where the memory was released earlier:

Freed by task 1644:
 kasan_save_stack+0x1c/0x40
 kasan_save_track+0x10/0x30
 kasan_save_free_info+0x37/0x60
 __kasan_slab_free+0x33/0x40
 kfree+0xef/0x3e0
 idxd_dmaengine_drv_remove+0x1cb/0x2f0 [idxd]
 device_release_driver_internal+0x36d/0x530
 idxd_device_drv_remove+0xa0/0x240 [idxd]
 device_release_driver_internal+0x36d/0x530
 idxd_reset_done+0x600/0x770 [idxd]
 pci_reset_function+0x1c9/0x230
 idxd_device_flr+0x34/0x90 [idxd]
 process_one_work+0x676/0x1000
 worker_thread+0x710/0xf40
 kthread+0x2d4/0x3c0
 ret_from_fork+0x2d/0x70


> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/dma/dmatest.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
> index 91b2fbc0b864..d891dfca358e 100644
> --- a/drivers/dma/dmatest.c
> +++ b/drivers/dma/dmatest.c
> @@ -841,9 +841,9 @@ static int dmatest_func(void *data)
>  		} else {
>  			dma_async_issue_pending(chan);
>  
> -			wait_event_freezable_timeout(thread->done_wait,
> -					done->done,
> -					msecs_to_jiffies(params->timeout));
> +			wait_event_timeout(thread->done_wait,
> +					   done->done,
> +					   msecs_to_jiffies(params->timeout));
>  
>  			status = dma_async_is_tx_complete(chan, cookie, NULL,
>  							  NULL);
> -- 
> 2.48.1
Vinicius Costa Gomes March 12, 2025, 10:13 p.m. UTC | #4
Hi Nathan,

Nathan Lynch <nathan.lynch@amd.com> writes:

> Hi,
>
> Vinicius Costa Gomes <vinicius.gomes@intel.com> writes:
>> Change the "wait for operation finish" logic to take interrupts into
>> account.
>>
>> When using dmatest with idxd DMA engine, it's possible that during
>> longer tests, the interrupt notifying the finish of an operation
>> happens during wait_event_freezable_timeout(), which causes dmatest to
>> cleanup all the resources, some of which might still be in use.
>>
>> This fix ensures that the wait logic correctly handles interrupts,
>> preventing premature cleanup of resources.
>>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: https://lore.kernel.org/oe-lkp/202502171134.8c403348-lkp@intel.com
>
> Given the report at the URL above I'm struggling to follow the rationale
> for this change. It looks like a use-after-free in idxd while
> resetting/unbinding the device, and I can't see how changing whether
> dmatest threads perform freezeable waits would change this.
>

I think that the short version is that the reproducition script triggers
different problems on different platforms/configurations.

The solution I proposed fixes a problem I was seeing on a SPR system, on
a GNR system (that I was only able to get later) I see something more similar
to this particular splat (currently working on the fix).

Seeing this question, I realize that I should have added a note to the
commit detailing this.

So I am planning on proposing two (this and another) fixes for the same
report, hoping that it's not that confusing/unusual.


Cheers,
diff mbox series

Patch

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 91b2fbc0b864..d891dfca358e 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -841,9 +841,9 @@  static int dmatest_func(void *data)
 		} else {
 			dma_async_issue_pending(chan);
 
-			wait_event_freezable_timeout(thread->done_wait,
-					done->done,
-					msecs_to_jiffies(params->timeout));
+			wait_event_timeout(thread->done_wait,
+					   done->done,
+					   msecs_to_jiffies(params->timeout));
 
 			status = dma_async_is_tx_complete(chan, cookie, NULL,
 							  NULL);