diff mbox series

drm/i915/gt: Increase MCR lock timeout

Message ID 20231003210840.1173401-1-jonathan.cavitt@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Increase MCR lock timeout | expand

Commit Message

Cavitt, Jonathan Oct. 3, 2023, 9:08 p.m. UTC
Increase the timeout MCR waits for the steering semaphore
in intel_gt_mcr_lock by a factor of 10.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tvrtko Ursulin Oct. 4, 2023, 8:25 a.m. UTC | #1
On 03/10/2023 22:08, Jonathan Cavitt wrote:
> Increase the timeout MCR waits for the steering semaphore
> in intel_gt_mcr_lock by a factor of 10.

Ideally you mention why in the commit message, with some appropriate 
level of detail depending on the situation.

+Matt for MCR stuff.

Regards,

Tvrtko

> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> index 326c2ed1d99bb..e3f7fb1248809 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> @@ -378,7 +378,7 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
>   	 */
>   	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>   		err = wait_for(intel_uncore_read_fw(gt->uncore,
> -						    MTL_STEER_SEMAPHORE) == 0x1, 100);
> +						    MTL_STEER_SEMAPHORE) == 0x1, 1000);
>   
>   	/*
>   	 * Even on platforms with a hardware lock, we'll continue to grab
Matt Roper Oct. 4, 2023, 3:30 p.m. UTC | #2
On Wed, Oct 04, 2023 at 09:25:40AM +0100, Tvrtko Ursulin wrote:
> 
> On 03/10/2023 22:08, Jonathan Cavitt wrote:
> > Increase the timeout MCR waits for the steering semaphore
> > in intel_gt_mcr_lock by a factor of 10.
> 
> Ideally you mention why in the commit message, with some appropriate level
> of detail depending on the situation.
> 
> +Matt for MCR stuff.

I agree; there should be some better explanation for why we're making
this change.  In this case, there's some evidence that some external
runtime firmware (e.g., pcode, csme, etc.) is holding this lock longer
than we expected, but is eventually releasing it.  I.e., their code's
critical section is larger than anticipated.  Extending the timeout here
prevents us from giving up too early.

This is different than the problem we were seeing recently during
suspend/resume where some firmware agent was grabbing the lock and
never releasing it (no matter how long we waited).

The expectation is that anyone using this hardware lock (i915/Xe
drivers, various pieces of firmware, etc.) keep the critical section
where the lock is held relatively small, but there's no absolute
guidance on how long a critical section can be, but waiting a whole
second should hopefully prevent us from declaring the lock stuck
prematurely.

So with some kind of updated commit message here that explains the
situation better, the code change below looks fine to me.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>


Matt

> 
> Regards,
> 
> Tvrtko
> 
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > index 326c2ed1d99bb..e3f7fb1248809 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > @@ -378,7 +378,7 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
> >   	 */
> >   	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> >   		err = wait_for(intel_uncore_read_fw(gt->uncore,
> > -						    MTL_STEER_SEMAPHORE) == 0x1, 100);
> > +						    MTL_STEER_SEMAPHORE) == 0x1, 1000);
> >   	/*
> >   	 * Even on platforms with a hardware lock, we'll continue to grab
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
index 326c2ed1d99bb..e3f7fb1248809 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
@@ -378,7 +378,7 @@  void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
 	 */
 	if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
 		err = wait_for(intel_uncore_read_fw(gt->uncore,
-						    MTL_STEER_SEMAPHORE) == 0x1, 100);
+						    MTL_STEER_SEMAPHORE) == 0x1, 1000);
 
 	/*
 	 * Even on platforms with a hardware lock, we'll continue to grab