diff mbox series

[RFC] tentative fix for drm/i915/gt regression on preempt-rt

Message ID ZJTuDi1HNp9L2zxY@windriver.com (mailing list archive)
State New, archived
Headers show
Series [RFC] tentative fix for drm/i915/gt regression on preempt-rt | expand

Commit Message

Paul Gortmaker June 23, 2023, 12:57 a.m. UTC
Back in September, I reported a PM regression pinpointed by bisect:

https://lore.kernel.org/linux-rt-users/20220916181934.GA16961@windriver.com/T/#u

Around Feb, I checked and didn't see any relevant mainline changes, so I
decided to take a look at it myself, which is when I saw the eight patches
from Sebastian that included a trial of converting the uncore->lock to raw:

https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/

With a focus on just the suspend/resume regression reported through me, I
thought I had a way to perhaps at least fix that one issue.  I didn't have
the right hardware to reproduce it, but I was pretty sure it was clear what
was happening (as per details in the commit log)

I was pretty sure my change would fix the BUG() but since I've never touched
i915 before, I couldn't quite be 100% confident my reasoning wasn't opening
a door to some other locking issue.  So I put it in our internal soak pool
late March and since it fixed the BUG() and didn't trigger any new splat,
I'd largely forgot about it.

OK, with all that context, I'll finally get to the point.  It would be nice
if others who have worked on i915/rt can take a look at it and pick it apart.

This is still the v5.15-rt version, but I just checked mainline and also
linux-rt-devel and I'm not seeing any reason to believe it was fixed yet.

(Oh, and I'm told the impacted board is NUC7i5DNK1E -- and others?)

Thanks,
Paul.

--

From 20200bee7252d4c9c2a748a9e90fba33f2da9734 Mon Sep 17 00:00:00 2001
From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 9 Feb 2023 21:42:25 -0500
Subject: [PATCH] drm/i915/gt: don't use preempt_disable/enable around reset
 call

It turns out that a relatively recent commit breaks PM-suspend
operations on preempt-rt, on multiple versions, due to all the
linux-stable backports, including this v5.15 one:

  commit 0ee5874dad61d2b154a9e3db196fc33e8208ce1b
  Author: Chris Wilson <chris@chris-wilson.co.uk>
  Date:   Tue Jul 12 16:21:32 2022 +0100

    drm/i915/gt: Serialize GRDOM access between multiple engine resets

    [ Upstream commit b24dcf1dc507f69ed3b5c66c2b6a0209ae80d4d4 ]

Below is an example of the regression on v5.15-rt, with backport:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 45092, name: kworker/u8:4
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  INFO: lockdep is turned off.
  Preemption disabled at:
  [<ffffffffc0636522>] __intel_gt_reset+0x92/0x100 [i915]
  CPU: 3 PID: 45092 Comm: kworker/u8:4 Tainted: G        W  O      5.15.59-rt48-preempt-rt #1
  Hardware name: Intel(R) Client Systems NUC7i5DNKE/NUC7i5DNB, BIOS DNKBLi5v.86A.0064.2019.0523.1933 05/23/2019
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   <TASK>
   show_stack+0x52/0x5c
   dump_stack_lvl+0x5b/0x86
   dump_stack+0x10/0x16
   __might_resched.cold+0xf7/0x12f
   ? __gen6_reset_engines.constprop.0+0x80/0x80 [i915]
   rt_spin_lock+0x4e/0xf0
   ? gen8_reset_engines+0x2e/0x1e0 [i915]
   gen8_reset_engines+0x2e/0x1e0 [i915]
   ? __gen6_reset_engines.constprop.0+0x80/0x80 [i915]
   __intel_gt_reset+0x9d/0x100 [i915]
   gt_sanitize+0x16c/0x190 [i915]
   intel_gt_suspend_late+0x3d/0xc0 [i915]
   i915_gem_suspend_late+0x57/0x130 [i915]
   i915_drm_suspend_late+0x38/0x110 [i915]
   i915_pm_suspend_late+0x1d/0x30 [i915]
   pm_generic_suspend_late+0x28/0x40
   pci_pm_suspend_late+0x37/0x50
   ? pci_pm_poweroff_late+0x50/0x50
   dpm_run_callback.cold+0x3c/0xa8
   __device_suspend_late+0xa4/0x1e0
   async_suspend_late+0x20/0xa0
   async_run_entry_fn+0x28/0xc0
   process_one_work+0x239/0x6c0
   worker_thread+0x58/0x3e0
   kthread+0x1a9/0x1d0
   ? process_one_work+0x6c0/0x6c0
   ? set_kthread_struct+0x50/0x50
   ret_from_fork+0x1f/0x30
   </TASK>
  PM: late suspend of devices complete after 26.497 msecs

This happens, because the reset code now takes uncore->lock and that reacts
badly with commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic")

Specifically this part, called out above with "Preemption disabled at:"

+         preempt_disable();
+         ret = reset(i915, engine_mask, retry);
+         preempt_enable();

A conversion to raw lock was considered independently in:

https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/

...but was ruled out for latency reasons.  However I believe there is a
more simple solution, at least for this recent regression.  If we now have
the uncore->lock now wrapping the reset functions from b24dcf1dc507 (and
backports) then we aren't in any way relying on the added calls to
preempt_dis/enable() above to ensure non-conflicting resets or similar.

Hence we can "downgrade" them to migrate_dis/enable() so the reset
functions can run as-is, without any raw lock conversions or similar.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Comments

Sebastian Sewior June 30, 2023, 1:09 p.m. UTC | #1
On 2023-06-22 20:57:50 [-0400], Paul Gortmaker wrote:
[ longer report about what is broken.]

Commit ade8a0f598443 ("drm/i915: Make all GPU resets atomic") introduces
a preempt_disable() section around the invocation of the reset callback.
I can't find an explanation why this is needed. There was a comment
saying
| * We want to perform per-engine reset from atomic context (e.g.
| * softirq), which imposes the constraint that we cannot sleep.

but it does not state the problem with being preempted while waiting for
the reset. The commit itself does not explain why an atomic reset is
needed, it just states that it is a requirement now. On !RT we could
pull packets from a NICs and forward them other NICs for 2ms.

I've been looking over the reset callbacks and gen8_reset_engines() +
gen6_reset_engines() acquire a spinlock_t. Since this becomes a sleeping
lock on PREEMPT_RT it must not be acquired with disabled preemption.
i915_do_reset() acquires no lock but then has two udelay()s of 50us
each. Not good latency wise in a preempt-off section.

Could someone please explain why atomic is needed during reset, what
problems are introduced by a possible preemption?

Sebastian
Tvrtko Ursulin July 3, 2023, 3:30 p.m. UTC | #2
Hi,

On 30/06/2023 14:09, Sebastian Andrzej Siewior wrote:
> On 2023-06-22 20:57:50 [-0400], Paul Gortmaker wrote:
> [ longer report about what is broken.]
> 
> Commit ade8a0f598443 ("drm/i915: Make all GPU resets atomic") introduces
> a preempt_disable() section around the invocation of the reset callback.
> I can't find an explanation why this is needed. There was a comment
> saying
> | * We want to perform per-engine reset from atomic context (e.g.
> | * softirq), which imposes the constraint that we cannot sleep.
> 
> but it does not state the problem with being preempted while waiting for
> the reset. The commit itself does not explain why an atomic reset is
> needed, it just states that it is a requirement now. On !RT we could
> pull packets from a NICs and forward them other NICs for 2ms.
> 
> I've been looking over the reset callbacks and gen8_reset_engines() +
> gen6_reset_engines() acquire a spinlock_t. Since this becomes a sleeping
> lock on PREEMPT_RT it must not be acquired with disabled preemption.
> i915_do_reset() acquires no lock but then has two udelay()s of 50us
> each. Not good latency wise in a preempt-off section.
> 
> Could someone please explain why atomic is needed during reset, what
> problems are introduced by a possible preemption?

Atomic requirement from that commit text is likely referring to removing 
the old big sleeping mutex we had in the reset path. So it looks 
plausible that preempt_disable() section is not strictly needed and 
perhaps motivation simply was, given those 20-50us polls on hw registers 
involved, to make them happen as fast as possible and so minimize visual 
glitching during resets.

Although that reasoning would only apply on some hw generations, where 
the irqsave spinlock is not held across the whole sequence anyway.

And I suspect those same platforms would be the annoying ones, if one 
simply wanted to try without the preempt_disable section, given our 
wait_for_atomic macro will complain loudly if not used from an atomic 
context.

But I think we do have a macro for short register waits which works with 
preempting enabled. I will try and cook up a patch and submit to our CI 
during the week, then see what happens.

Or even moving the preempt_disable down so it just encompasses the 
register write + wait. That would then be under the spinlock which is 
presumable okay on RT? (Yes I know it wouldn't' solve one half of your 
"complaint" but lets just entertain the idea for now.)

Regards,

Tvrtko
Sebastian Sewior July 3, 2023, 4:12 p.m. UTC | #3
On 2023-07-03 16:30:01 [+0100], Tvrtko Ursulin wrote:
> Hi,
Hi,

> 
> Atomic requirement from that commit text is likely referring to removing the
> old big sleeping mutex we had in the reset path. So it looks plausible that
> preempt_disable() section is not strictly needed and perhaps motivation
> simply was, given those 20-50us polls on hw registers involved, to make them
> happen as fast as possible and so minimize visual glitching during resets.
> 
> Although that reasoning would only apply on some hw generations, where the
> irqsave spinlock is not held across the whole sequence anyway.
> 
> And I suspect those same platforms would be the annoying ones, if one simply
> wanted to try without the preempt_disable section, given our wait_for_atomic
> macro will complain loudly if not used from an atomic context.

It does not complain on RT, I did patch it out.
   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch?h=linux-6.4.y-rt-patches

Interesting, the link there refers to an older posting but this patch is
not included. For the other I didn't receive feedback at which point I
stopped pushing and put it on the list for later…

> But I think we do have a macro for short register waits which works with
> preempting enabled. I will try and cook up a patch and submit to our CI
> during the week, then see what happens.
>
> Or even moving the preempt_disable down so it just encompasses the register
> write + wait. That would then be under the spinlock which is presumable okay
> on RT? (Yes I know it wouldn't' solve one half of your "complaint" but lets
> just entertain the idea for now.)

You can't
	preempt_disable();
	spin_lock();

You could
	spin_lock();
	preempt_disable();

But if there is no need then there is no need ;)
What I worry a bit the udelays…

Thanks!

> Regards,
> 
> Tvrtko

Sebastian
Tvrtko Ursulin July 4, 2023, 8:02 a.m. UTC | #4
On 03/07/2023 17:12, Sebastian Andrzej Siewior wrote:
> On 2023-07-03 16:30:01 [+0100], Tvrtko Ursulin wrote:
>> Hi,
> Hi,
> 
>>
>> Atomic requirement from that commit text is likely referring to removing the
>> old big sleeping mutex we had in the reset path. So it looks plausible that
>> preempt_disable() section is not strictly needed and perhaps motivation
>> simply was, given those 20-50us polls on hw registers involved, to make them
>> happen as fast as possible and so minimize visual glitching during resets.
>>
>> Although that reasoning would only apply on some hw generations, where the
>> irqsave spinlock is not held across the whole sequence anyway.
>>
>> And I suspect those same platforms would be the annoying ones, if one simply
>> wanted to try without the preempt_disable section, given our wait_for_atomic
>> macro will complain loudly if not used from an atomic context.
> 
> It does not complain on RT, I did patch it out.
>     https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch?h=linux-6.4.y-rt-patches

It does not complain when you patch out the complaint, correct. ;)

Only purpose of that complaint is to let developers now they are 
needlessly using the atomic wait_for - the busy looping one intended for 
very short delays. So if those wait_for_atomic are not really needed (in 
code paths which are not under a spinlock) I'll try converting them to 
non-atomic wait_for_us. Or so, will see.

> Interesting, the link there refers to an older posting but this patch is
> not included. For the other I didn't receive feedback at which point I
> stopped pushing and put it on the list for later…
> 
>> But I think we do have a macro for short register waits which works with
>> preempting enabled. I will try and cook up a patch and submit to our CI
>> during the week, then see what happens.
>>
>> Or even moving the preempt_disable down so it just encompasses the register
>> write + wait. That would then be under the spinlock which is presumable okay
>> on RT? (Yes I know it wouldn't' solve one half of your "complaint" but lets
>> just entertain the idea for now.)
> 
> You can't
> 	preempt_disable();
> 	spin_lock();
> 
> You could
> 	spin_lock();
> 	preempt_disable();
> 
> But if there is no need then there is no need ;)
> What I worry a bit the udelays…

Lets make it a two patch series and then see. First patch to see if we 
can really get away without the top level preempt_disable, and then 
second patch to see if we can get away with preemptible short sleeps too.

I guess on RT the top priority is consistent scheduling latency and not 
so much potential UI latency in some edge cases? Saying that because if 
waiting on the hw reset is made preemptible, _in theory_ it can prolong 
the reset completion (as observed by i915), and so result in more UI 
glitching than it normally would. Could be a theoretical point only 
because it requires both CPU over-subscribe and GPU hangs. It could also 
easily be that the reset path is only one path, and not so interesting 
one even, which can cause this on RT.

Regards,

Tvrtko
Sebastian Sewior July 4, 2023, 9:25 a.m. UTC | #5
On 2023-07-04 09:02:07 [+0100], Tvrtko Ursulin wrote:

> Lets make it a two patch series and then see. First patch to see if we can
> really get away without the top level preempt_disable, and then second patch
> to see if we can get away with preemptible short sleeps too.
oki.

> I guess on RT the top priority is consistent scheduling latency and not so
> much potential UI latency in some edge cases? Saying that because if waiting
I would says, yes. If you do RT and you provide some kind of GUI then
you prefer to meet your deadlines for your RT load over some UI latency.

> on the hw reset is made preemptible, _in theory_ it can prolong the reset
> completion (as observed by i915), and so result in more UI glitching than it
> normally would. Could be a theoretical point only because it requires both
> CPU over-subscribe and GPU hangs. It could also easily be that the reset
> path is only one path, and not so interesting one even, which can cause this
> on RT.

I see.

> Regards,
> 
> Tvrtko

Sebastian
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 9dc244b70ce4..341833c364fe 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -736,9 +736,9 @@  int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
 	for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) {
 		GT_TRACE(gt, "engine_mask=%x\n", engine_mask);
-		preempt_disable();
+		migrate_disable();
 		ret = reset(gt, engine_mask, retry);
-		preempt_enable();
+		migrate_enable();
 	}
 	intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);