diff mbox series

[1/1] drm/i915/gsc: Fix the Driver-FLR completion

Message ID 20230119194955.2426167-1-alan.previn.teres.alexis@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/1] drm/i915/gsc: Fix the Driver-FLR completion | expand

Commit Message

Teres Alexis, Alan Previn Jan. 19, 2023, 7:49 p.m. UTC
The Driver-FLR flow may inadvertently exit early before the full
completion of the re-init of the internal HW state if we only poll
GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
we need a two-step completion wait-for-completion flow that also
involves GU_CNTL. See the patch and new code comments for detail.
This is new direction from HW architecture folks.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
---
 drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
 1 file changed, 7 insertions(+)


base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13

Comments

Rodrigo Vivi Jan. 19, 2023, 7:57 p.m. UTC | #1
On Thu, Jan 19, 2023 at 11:49:55AM -0800, Alan Previn wrote:
> The Driver-FLR flow may inadvertently exit early before the full
> completion of the re-init of the internal HW state if we only poll
> GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> we need a two-step completion wait-for-completion flow that also
> involves GU_CNTL. See the patch and new code comments for detail.
> This is new direction from HW architecture folks.

Do we have this documented anywhere?

but the patch looks good to me...

> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 8dee9e62a73e..959869e2ff05 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2748,6 +2748,12 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
>  	/* Trigger the actual Driver-FLR */
>  	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
>  
> +	/* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown to complete */
> +	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> +					 DRIVERFLR_STATUS, 0,
> +					 flr_timeout_ms);
> +
> +	/* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init to complete */
>  	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
>  					 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
>  					 flr_timeout_ms);
> @@ -2756,6 +2762,7 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
>  		return;
>  	}
>  
> +	/* Write 1 to clear GU_DEBUG's sticky completion status bit */
>  	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
>  }
>  
> 
> base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13
> -- 
> 2.39.0
>
Teres Alexis, Alan Previn Jan. 19, 2023, 9:34 p.m. UTC | #2
Forwarded offline. Let's hold off R-B or merging until I verify that hw spec update is finalized to be exactly as what this patch is (probably a minor delay).

On Thu, 2023-01-19 at 14:57 -0500, Vivi, Rodrigo wrote:
> On Thu, Jan 19, 2023 at 11:49:55AM -0800, Alan Previn wrote:
> > The Driver-FLR flow may inadvertently exit early before the full
> > completion of the re-init of the internal HW state if we only poll
> > GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> > we need a two-step completion wait-for-completion flow that also
> > involves GU_CNTL. See the patch and new code comments for detail.
> > This is new direction from HW architecture folks.
> 
> Do we have this documented anywhere?
> 
> but the patch looks good to me...
> 
> > 
> > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> > Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 8dee9e62a73e..959869e2ff05 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -2748,6 +2748,12 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
> >         /* Trigger the actual Driver-FLR */
> >         intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
> >  
> > +       /* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown to complete */
> > +       ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> > +                                        DRIVERFLR_STATUS, 0,
> > +                                        flr_timeout_ms);
> > +
> > +       /* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init to complete */
> >         ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
> >                                          DRIVERFLR_STATUS, DRIVERFLR_STATUS,
> >                                          flr_timeout_ms);
> > @@ -2756,6 +2762,7 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
> >                 return;
> >         }
> >  
> > +       /* Write 1 to clear GU_DEBUG's sticky completion status bit */
> >         intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
> >  }
> >  
> > 
> > base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13
> > -- 
> > 2.39.0
> >
Gupta, Anshuman Jan. 20, 2023, 8:27 a.m. UTC | #3
> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Alan
> Previn
> Sent: Friday, January 20, 2023 1:20 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Vivi@freedesktop.org; dri-devel@lists.freedesktop.org; Teres Alexis,
> Alan Previn <alan.previn.teres.alexis@intel.com>; Vivi, Rodrigo
> <rodrigo.vivi@intel.com>
> Subject: [Intel-gfx] [PATCH 1/1] drm/i915/gsc: Fix the Driver-FLR completion
> 
> The Driver-FLR flow may inadvertently exit early before the full completion
> of the re-init of the internal HW state if we only poll GU_DEBUG Bit31 (polling
> for it to toggle from 0 -> 1). Instead we need a two-step completion wait-for-
> completion flow that also involves GU_CNTL. See the patch and new code
> comments for detail.
> This is new direction from HW architecture folks.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was
> loaded")
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c
> b/drivers/gpu/drm/i915/intel_uncore.c
> index 8dee9e62a73e..959869e2ff05 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2748,6 +2748,12 @@ static void driver_initiated_flr(struct intel_uncore
> *uncore)
>  	/* Trigger the actual Driver-FLR */
>  	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
> 
> +	/* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown
> to complete */
> +	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> +					 DRIVERFLR_STATUS, 0,
> +					 flr_timeout_ms);
We need an error here if above wait timeout then below wait is essentially a NOP.
And driver may return before completion of FLR.
Thanks,
Anshuman Gupta.
> +
> +	/* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init
> +to complete */
>  	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
>  					 DRIVERFLR_STATUS,
> DRIVERFLR_STATUS,
>  					 flr_timeout_ms);
> @@ -2756,6 +2762,7 @@ static void driver_initiated_flr(struct intel_uncore
> *uncore)
>  		return;
>  	}
> 
> +	/* Write 1 to clear GU_DEBUG's sticky completion status bit */
>  	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);  }
> 
> 
> base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13
> --
> 2.39.0
Jani Nikula Jan. 20, 2023, 9:14 a.m. UTC | #4
On Thu, 19 Jan 2023, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
> The Driver-FLR flow may inadvertently exit early before the full
> completion of the re-init of the internal HW state if we only poll
> GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> we need a two-step completion wait-for-completion flow that also
> involves GU_CNTL. See the patch and new code comments for detail.
> This is new direction from HW architecture folks.
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 8dee9e62a73e..959869e2ff05 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2748,6 +2748,12 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
>  	/* Trigger the actual Driver-FLR */
>  	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
>  
> +	/* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown to complete */

Please don't use comments to repeat what the code already says.

Here, you could just say, "Wait for hardware teardown to complete",
which describes what the code does at a higher level, but does not
duplicate any of it.

> +	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> +					 DRIVERFLR_STATUS, 0,
> +					 flr_timeout_ms);
> +
> +	/* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init to complete */

"Wait for hardware/firmware re-init to complete"

>  	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
>  					 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
>  					 flr_timeout_ms);
> @@ -2756,6 +2762,7 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
>  		return;
>  	}
>  
> +	/* Write 1 to clear GU_DEBUG's sticky completion status bit */

"Clear sticky completion status" maybe?

>  	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
>  }
>  
>
> base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13
Teres Alexis, Alan Previn Jan. 20, 2023, 4:41 p.m. UTC | #5
On Fri, 2023-01-20 at 08:27 +0000, Gupta, Anshuman wrote:
> 
> 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Alan
> > Previn
> > Sent: Friday, January 20, 2023 1:20 AM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: Vivi@freedesktop.org; dri-devel@lists.freedesktop.org; Teres Alexis,
> > Alan Previn <alan.previn.teres.alexis@intel.com>; Vivi, Rodrigo
> > <rodrigo.vivi@intel.com>
> > Subject: [Intel-gfx] [PATCH 1/1] drm/i915/gsc: Fix the Driver-FLR completion
> > 
> > 
alan:snip..

> > +       /* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown
> > to complete */
> > +       ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> > +                                        DRIVERFLR_STATUS, 0,
> > +                                        flr_timeout_ms);
> We need an error here if above wait timeout then below wait is essentially a NOP.
> And driver may return before completion of FLR.
> Thanks,
> Anshuman Gupta.

alan: my bad - good catch - will fix.

alan:snip..
Teres Alexis, Alan Previn Jan. 20, 2023, 4:42 p.m. UTC | #6
Thanks for reviewing - sounds good - will fix those comments up as per your recommendation.

On Fri, 2023-01-20 at 11:14 +0200, Jani Nikula wrote:
> On Thu, 19 Jan 2023, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
> > The Driver-FLR flow may inadvertently exit early before the full
> > completion of the re-init of the internal HW state if we only poll
> > GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
> > we need a two-step completion wait-for-completion flow that also
> > involves GU_CNTL. See the patch and new code comments for detail.
> > This is new direction from HW architecture folks.
> > 
> > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> > Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 8dee9e62a73e..959869e2ff05 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -2748,6 +2748,12 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
> >         /* Trigger the actual Driver-FLR */
> >         intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
> >  
> > +       /* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown to complete */
> 
> Please don't use comments to repeat what the code already says.
> 
> Here, you could just say, "Wait for hardware teardown to complete",
> which describes what the code does at a higher level, but does not
> duplicate any of it.
> 
> > +       ret = intel_wait_for_register_fw(uncore, GU_CNTL,
> > +                                        DRIVERFLR_STATUS, 0,
> > +                                        flr_timeout_ms);
> > +
> > +       /* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init to complete */
> 
> "Wait for hardware/firmware re-init to complete"
> 
> >         ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
> >                                          DRIVERFLR_STATUS, DRIVERFLR_STATUS,
> >                                          flr_timeout_ms);
> > @@ -2756,6 +2762,7 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
> >                 return;
> >         }
> >  
> > +       /* Write 1 to clear GU_DEBUG's sticky completion status bit */
> 
> "Clear sticky completion status" maybe?
> 
> >         intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
> >  }
> >  
> > 
> > base-commit: 0a0ee61784df01ac098a92bd43673ee30c629f13
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 8dee9e62a73e..959869e2ff05 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2748,6 +2748,12 @@  static void driver_initiated_flr(struct intel_uncore *uncore)
 	/* Trigger the actual Driver-FLR */
 	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
 
+	/* Completion Step 1 - poll for 'CNTL-BIT31 = 0' wait for hw teardown to complete */
+	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
+					 DRIVERFLR_STATUS, 0,
+					 flr_timeout_ms);
+
+	/* Completion: Step 2 - poll for 'DEBUG-BIT31 = 1' for hw/fw re-init to complete */
 	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
 					 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
 					 flr_timeout_ms);
@@ -2756,6 +2762,7 @@  static void driver_initiated_flr(struct intel_uncore *uncore)
 		return;
 	}
 
+	/* Write 1 to clear GU_DEBUG's sticky completion status bit */
 	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
 }