diff mbox series

[2/3] drm/i915: Break up error capture compression loops with cond_resched()

Message ID 20200916090059.3189-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915/gt: Show engine properties in the pretty printer | expand

Commit Message

Chris Wilson Sept. 16, 2020, 9 a.m. UTC
As the error capture will compress user buffers as directed to by the
user, it can take an arbitrary amount of time and space. Break up the
compression loops with a call to cond_resched(), that will allow other
processes to schedule (avoiding the soft lockups) and also serve as a
warning should we try to make this loop atomic in the future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Sasha Levin Sept. 21, 2020, 12:54 p.m. UTC | #1
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.10, v5.4.66, v4.19.146, v4.14.198, v4.9.236, v4.4.236.

v5.8.10: Build OK!
v5.4.66: Build OK!
v4.19.146: Build OK!
v4.14.198: Build OK!
v4.9.236: Failed to apply! Possible dependencies:
    0a97015d45ee ("drm/i915: Compress GPU objects in error state")
    83bc0f5b432f ("drm/i915: Handle incomplete Z_FINISH for compressed error states")
    95374d759ac7 ("drm/i915: Always use the GTT for error capture")
    98a2f411671f ("drm/i915: Allow disabling error capture")
    9f267eb8d2ea ("drm/i915: Stop the machine whilst capturing the GPU crash dump")
    d636951ec01b ("drm/i915: Cleanup instdone collection")
    fc4c79c37e82 ("drm/i915: Consolidate error object printing")

v4.4.236: Failed to apply! Possible dependencies:
    0a97015d45ee ("drm/i915: Compress GPU objects in error state")
    0bc40be85f33 ("drm/i915: Rename intel_engine_cs function parameters")
    688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
    755412e29c77 ("drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER")
    98a2f411671f ("drm/i915: Allow disabling error capture")
    ca82580c9cea ("drm/i915: Do not call API requiring struct_mutex where it is not available")
    cbdc12a9fc9d ("drm/i915: make A0 wa's applied to A1")
    e87a005d90c3 ("drm/i915: add helpers for platform specific revision id range checks")
    ef712bb4b700 ("drm/i915: remove parens around revision ids")
    fffda3f4fb49 ("drm/i915/bxt: add revision id for A1 stepping and use it")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?
Pavel Machek Nov. 9, 2020, 10:24 a.m. UTC | #2
Hi!

> As the error capture will compress user buffers as directed to by the
> user, it can take an arbitrary amount of time and space. Break up the
> compression loops with a call to cond_resched(), that will allow other
> processes to schedule (avoiding the soft lockups) and also serve as a
> warning should we try to make this loop atomic in the future.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: stable@vger.kernel.org
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

This was queued for 4.19-stable, but is very likely wrong.

> @@ -397,6 +399,7 @@ static int compress_page(struct i915_vma_compress *c,
>  	if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE)))
>  		memcpy(ptr, src, PAGE_SIZE);
>  	dst->pages[dst->page_count++] = ptr;
> +	cond_resched();
>  
>  	return 0;
>  }

4.19 compress_page begins with

static int compress_page(struct compress *c,
...
        page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);

and likely may not sleep. That changed with commit
a42f45a2a85998453078, but that one is not present in 4.19..

I believe we don't need this in stable: dumping of error file will not
take so long to trigger softlockup detectors... and if userland access
blocked, we would be able to reschedule, anyway.

Best regards,
								Pavel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 3e6cbb0d1150..a635ec8d0b94 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -311,6 +311,8 @@  static int compress_page(struct i915_vma_compress *c,
 
 		if (zlib_deflate(zstream, Z_NO_FLUSH) != Z_OK)
 			return -EIO;
+
+		cond_resched();
 	} while (zstream->avail_in);
 
 	/* Fallback to uncompressed if we increase size? */
@@ -397,6 +399,7 @@  static int compress_page(struct i915_vma_compress *c,
 	if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE)))
 		memcpy(ptr, src, PAGE_SIZE);
 	dst->pages[dst->page_count++] = ptr;
+	cond_resched();
 
 	return 0;
 }