diff mbox series

drm/i915: Do not disable preemption for resets

Message ID 20230705093025.3689748-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Do not disable preemption for resets | expand

Commit Message

Tvrtko Ursulin July 5, 2023, 9:30 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
preempt disable section over the hardware reset callback to prepare the
driver for being able to reset from atomic contexts.

In retrospect I can see that the work item at a time was about removing
the struct mutex from the reset path. Code base also briefly entertained
the idea of doing the reset under stop_machine in order to serialize
userspace mmap and temporary glitch in the fence registers (see
eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"),
but that never materialized and was soon removed in 2caffbf11762
("drm/i915: Revoke mmaps and prevent access to fence registers across
reset") and replaced with a SRCU based solution.

As such, as far as I can see, today we still have a requirement that
resets must not sleep (invoked from submission tasklets), but no need to
support invoking them from a truly atomic context.

Given that the preemption section is problematic on RT kernels, since the
uncore lock becomes a sleeping lock and so is invalid in such section,
lets try and remove it. Potential downside is that our short waits on GPU
to complete the reset may get extended if CPU scheduling interferes, but
in practice that probably isn't a deal breaker.

In terms of mechanics, since the preemption disabled block is being
removed we just need to replace a few of the wait_for_atomic macros into
busy looping versions which will work (and not complain) when called from
non-atomic sections.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

Comments

Sebastian Andrzej Siewior July 26, 2023, 1:25 p.m. UTC | #1
On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> preempt disable section over the hardware reset callback to prepare the
> driver for being able to reset from atomic contexts.
> 
…

This looks like what I tested previously. So

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thank you.

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian
Sebastian Andrzej Siewior Sept. 13, 2023, 8:08 a.m. UTC | #2
On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> preempt disable section over the hardware reset callback to prepare the
> driver for being able to reset from atomic contexts.
…

This missed the v6.6 merge window. Has this been dropped for some reason
or just missed by chance? Can this be still applied, please?
 
Sebastian
Valentin Schneider Sept. 13, 2023, 5:04 p.m. UTC | #3
On Wed, 13 Sept 2023 at 18:48, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> > preempt disable section over the hardware reset callback to prepare the
> > driver for being able to reset from atomic contexts.
> …
>
> This missed the v6.6 merge window. Has this been dropped for some reason
> or just missed by chance? Can this be still applied, please?
>

Just an FYI, but I happened to be looking at an internal bug report
for exactly this
error site, so +1 here :)

>
> Sebastian
>
Tvrtko Ursulin Sept. 20, 2023, 11:54 a.m. UTC | #4
On 13/09/2023 18:04, Valentin Schneider wrote:
> On Wed, 13 Sept 2023 at 18:48, Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
>>
>> On 2023-07-05 10:30:25 [+0100], Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
>>> preempt disable section over the hardware reset callback to prepare the
>>> driver for being able to reset from atomic contexts.
>> …
>>
>> This missed the v6.6 merge window. Has this been dropped for some reason
>> or just missed by chance? Can this be still applied, please?
>>
> 
> Just an FYI, but I happened to be looking at an internal bug report
> for exactly this
> error site, so +1 here :)

It looks I failed to collect an r-b before the summer break and so it 
fell off my radar. Definitely want to merge it so I will try again.

Regards,

Tvrtko
Andi Shyti Sept. 26, 2023, 9:18 a.m. UTC | #5
Hi Tvrtko,

> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
> preempt disable section over the hardware reset callback to prepare the
> driver for being able to reset from atomic contexts.
> 
> In retrospect I can see that the work item at a time was about removing
> the struct mutex from the reset path. Code base also briefly entertained
> the idea of doing the reset under stop_machine in order to serialize
> userspace mmap and temporary glitch in the fence registers (see
> eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"),
> but that never materialized and was soon removed in 2caffbf11762
> ("drm/i915: Revoke mmaps and prevent access to fence registers across
> reset") and replaced with a SRCU based solution.
> 
> As such, as far as I can see, today we still have a requirement that
> resets must not sleep (invoked from submission tasklets), but no need to
> support invoking them from a truly atomic context.
> 
> Given that the preemption section is problematic on RT kernels, since the
> uncore lock becomes a sleeping lock and so is invalid in such section,
> lets try and remove it. Potential downside is that our short waits on GPU
> to complete the reset may get extended if CPU scheduling interferes, but
> in practice that probably isn't a deal breaker.
> 
> In terms of mechanics, since the preemption disabled block is being
> removed we just need to replace a few of the wait_for_atomic macros into
> busy looping versions which will work (and not complain) when called from
> non-atomic sections.

looks reasonable, few unrelated questions

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index e2152f75ba2e..6916eba3bd33 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -167,13 +167,13 @@ static int i915_do_reset(struct intel_gt *gt,
>  	/* Assert reset for at least 20 usec, and wait for acknowledgement. */

is this /20/50/ ?

>  	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
>  	udelay(50);
> -	err = wait_for_atomic(i915_in_reset(pdev), 50);
> +	err = _wait_for_atomic(i915_in_reset(pdev), 50, 0);

wait_for_atomic() waits in milliseconds, while _wait_for_atomic()
waits in microseconds, I think you need to update the timer.

Do you think we might need a wait_for_atomic_preempt() macro?

	err = wait_for_atomic_preempt(i915_in_reset(pdev), 50);

Thanks,
Andi
Tvrtko Ursulin Sept. 26, 2023, 10:03 a.m. UTC | #6
On 26/09/2023 10:18, Andi Shyti wrote:
> Hi Tvrtko,
> 
>> Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
>> preempt disable section over the hardware reset callback to prepare the
>> driver for being able to reset from atomic contexts.
>>
>> In retrospect I can see that the work item at a time was about removing
>> the struct mutex from the reset path. Code base also briefly entertained
>> the idea of doing the reset under stop_machine in order to serialize
>> userspace mmap and temporary glitch in the fence registers (see
>> eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"),
>> but that never materialized and was soon removed in 2caffbf11762
>> ("drm/i915: Revoke mmaps and prevent access to fence registers across
>> reset") and replaced with a SRCU based solution.
>>
>> As such, as far as I can see, today we still have a requirement that
>> resets must not sleep (invoked from submission tasklets), but no need to
>> support invoking them from a truly atomic context.
>>
>> Given that the preemption section is problematic on RT kernels, since the
>> uncore lock becomes a sleeping lock and so is invalid in such section,
>> lets try and remove it. Potential downside is that our short waits on GPU
>> to complete the reset may get extended if CPU scheduling interferes, but
>> in practice that probably isn't a deal breaker.
>>
>> In terms of mechanics, since the preemption disabled block is being
>> removed we just need to replace a few of the wait_for_atomic macros into
>> busy looping versions which will work (and not complain) when called from
>> non-atomic sections.
> 
> looks reasonable, few unrelated questions
> 
>> ---
>>   drivers/gpu/drm/i915/gt/intel_reset.c | 12 +++++-------
>>   1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index e2152f75ba2e..6916eba3bd33 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -167,13 +167,13 @@ static int i915_do_reset(struct intel_gt *gt,
>>   	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
> 
> is this /20/50/ ?

Unrelated change but okay.

> 
>>   	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
>>   	udelay(50);
>> -	err = wait_for_atomic(i915_in_reset(pdev), 50);
>> +	err = _wait_for_atomic(i915_in_reset(pdev), 50, 0);
> 
> wait_for_atomic() waits in milliseconds, while _wait_for_atomic()
> waits in microseconds, I think you need to update the timer.

Ah.. well spotted!

> Do you think we might need a wait_for_atomic_preempt() macro?
> 
> 	err = wait_for_atomic_preempt(i915_in_reset(pdev), 50);

I don't see what it would do? _wait_for_atomic when ATOMIC == 0 already 
enables preemption. To allow passing in milliseconds? I fear one more 
macro would create more confusion.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index e2152f75ba2e..6916eba3bd33 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -167,13 +167,13 @@  static int i915_do_reset(struct intel_gt *gt,
 	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
 	udelay(50);
-	err = wait_for_atomic(i915_in_reset(pdev), 50);
+	err = _wait_for_atomic(i915_in_reset(pdev), 50, 0);
 
 	/* Clear the reset request. */
 	pci_write_config_byte(pdev, I915_GDRST, 0);
 	udelay(50);
 	if (!err)
-		err = wait_for_atomic(!i915_in_reset(pdev), 50);
+		err = _wait_for_atomic(!i915_in_reset(pdev), 50, 0);
 
 	return err;
 }
@@ -193,7 +193,7 @@  static int g33_do_reset(struct intel_gt *gt,
 	struct pci_dev *pdev = to_pci_dev(gt->i915->drm.dev);
 
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
-	return wait_for_atomic(g4x_reset_complete(pdev), 50);
+	return _wait_for_atomic(g4x_reset_complete(pdev), 50, 0);
 }
 
 static int g4x_do_reset(struct intel_gt *gt,
@@ -210,7 +210,7 @@  static int g4x_do_reset(struct intel_gt *gt,
 
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
-	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
+	ret =  _wait_for_atomic(g4x_reset_complete(pdev), 50, 0);
 	if (ret) {
 		GT_TRACE(gt, "Wait for media reset failed\n");
 		goto out;
@@ -218,7 +218,7 @@  static int g4x_do_reset(struct intel_gt *gt,
 
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
-	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
+	ret =  _wait_for_atomic(g4x_reset_complete(pdev), 50, 0);
 	if (ret) {
 		GT_TRACE(gt, "Wait for render reset failed\n");
 		goto out;
@@ -788,9 +788,7 @@  int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 		reset_mask = wa_14015076503_start(gt, engine_mask, !retry);
 
 		GT_TRACE(gt, "engine_mask=%x\n", reset_mask);
-		preempt_disable();
 		ret = reset(gt, reset_mask, retry);
-		preempt_enable();
 
 		wa_14015076503_end(gt, reset_mask);
 	}