diff mbox series

drm/etnaviv: always start/stop scheduler in timeout processing

Message ID 20200824110248.5998-1-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series drm/etnaviv: always start/stop scheduler in timeout processing | expand

Commit Message

Lucas Stach Aug. 24, 2020, 11:02 a.m. UTC
The drm scheduler currently expects that the stop/start sequence is always
executed in the timeout handling, as the job at the head of the hardware
execution list is always removed from the ring mirror before the driver
function is called and only inserted back into the list when starting the
scheduler.

This adds some unnecessary overhead if the timeout handler determines
that the GPU is still executing jobs normally and just wished to extend
the timeout, but a better solution requires a major rearchitecture of the
scheduler, which is not applicable as a fix.

Fixes: 135517d3565b drm/scheduler: Avoid accessing freed bad job.)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Russell King - ARM Linux admin Aug. 24, 2020, 11:54 a.m. UTC | #1
On Mon, Aug 24, 2020 at 01:02:48PM +0200, Lucas Stach wrote:
> The drm scheduler currently expects that the stop/start sequence is always
> executed in the timeout handling, as the job at the head of the hardware
> execution list is always removed from the ring mirror before the driver
> function is called and only inserted back into the list when starting the
> scheduler.
> 
> This adds some unnecessary overhead if the timeout handler determines
> that the GPU is still executing jobs normally and just wished to extend
> the timeout, but a better solution requires a major rearchitecture of the
> scheduler, which is not applicable as a fix.
> 
> Fixes: 135517d3565b drm/scheduler: Avoid accessing freed bad job.)
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

From a brief test, this seems to fix the problem, thanks.

Tested-by: Russell King <rmk+kernel@armlinux.org.uk>

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 4e3e95dce6d8..cd46c882269c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -89,12 +89,15 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>  	u32 dma_addr;
>  	int change;
>  
> +	/* block scheduler */
> +	drm_sched_stop(&gpu->sched, sched_job);
> +
>  	/*
>  	 * If the GPU managed to complete this jobs fence, the timout is
>  	 * spurious. Bail out.
>  	 */
>  	if (dma_fence_is_signaled(submit->out_fence))
> -		return;
> +		goto out_no_timeout;
>  
>  	/*
>  	 * If the GPU is still making forward progress on the front-end (which
> @@ -105,12 +108,9 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>  	change = dma_addr - gpu->hangcheck_dma_addr;
>  	if (change < 0 || change > 16) {
>  		gpu->hangcheck_dma_addr = dma_addr;
> -		return;
> +		goto out_no_timeout;
>  	}
>  
> -	/* block scheduler */
> -	drm_sched_stop(&gpu->sched, sched_job);
> -
>  	if(sched_job)
>  		drm_sched_increase_karma(sched_job);
>  
> @@ -120,6 +120,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>  
>  	drm_sched_resubmit_jobs(&gpu->sched);
>  
> +out_no_timeout:
>  	/* restart scheduler after GPU is usable again */
>  	drm_sched_start(&gpu->sched, true);
>  }
> -- 
> 2.20.1
> 
>
Fabio Estevam Aug. 24, 2020, 2:11 p.m. UTC | #2
Hi Lucas,

On Mon, Aug 24, 2020 at 8:02 AM Lucas Stach <l.stach@pengutronix.de> wrote:
>
> The drm scheduler currently expects that the stop/start sequence is always
> executed in the timeout handling, as the job at the head of the hardware
> execution list is always removed from the ring mirror before the driver
> function is called and only inserted back into the list when starting the
> scheduler.
>
> This adds some unnecessary overhead if the timeout handler determines
> that the GPU is still executing jobs normally and just wished to extend
> the timeout, but a better solution requires a major rearchitecture of the
> scheduler, which is not applicable as a fix.
>
> Fixes: 135517d3565b drm/scheduler: Avoid accessing freed bad job.)

Just a nit: the correct syntax for the Fixes line is:

Fixes: 135517d3565b ("drm/scheduler: Avoid accessing freed bad job.")
Lucas Stach Aug. 25, 2020, 8:44 a.m. UTC | #3
Hi all,

Am Montag, den 24.08.2020, 11:11 -0300 schrieb Fabio Estevam:
> Hi Lucas,
> 
> On Mon, Aug 24, 2020 at 8:02 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > The drm scheduler currently expects that the stop/start sequence is always
> > executed in the timeout handling, as the job at the head of the hardware
> > execution list is always removed from the ring mirror before the driver
> > function is called and only inserted back into the list when starting the
> > scheduler.
> > 
> > This adds some unnecessary overhead if the timeout handler determines
> > that the GPU is still executing jobs normally and just wished to extend
> > the timeout, but a better solution requires a major rearchitecture of the
> > scheduler, which is not applicable as a fix.
> > 
> > Fixes: 135517d3565b drm/scheduler: Avoid accessing freed bad job.)
> 
> Just a nit: the correct syntax for the Fixes line is:
> 
> Fixes: 135517d3565b ("drm/scheduler: Avoid accessing freed bad job.")

I've added this patch with the above fixed and Russell's T-b to my
etnaviv/fixes branch.

Regards,
Lucas
diff mbox series

Patch

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 4e3e95dce6d8..cd46c882269c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -89,12 +89,15 @@  static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 	u32 dma_addr;
 	int change;
 
+	/* block scheduler */
+	drm_sched_stop(&gpu->sched, sched_job);
+
 	/*
 	 * If the GPU managed to complete this jobs fence, the timout is
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(submit->out_fence))
-		return;
+		goto out_no_timeout;
 
 	/*
 	 * If the GPU is still making forward progress on the front-end (which
@@ -105,12 +108,9 @@  static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 	change = dma_addr - gpu->hangcheck_dma_addr;
 	if (change < 0 || change > 16) {
 		gpu->hangcheck_dma_addr = dma_addr;
-		return;
+		goto out_no_timeout;
 	}
 
-	/* block scheduler */
-	drm_sched_stop(&gpu->sched, sched_job);
-
 	if(sched_job)
 		drm_sched_increase_karma(sched_job);
 
@@ -120,6 +120,7 @@  static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 
 	drm_sched_resubmit_jobs(&gpu->sched);
 
+out_no_timeout:
 	/* restart scheduler after GPU is usable again */
 	drm_sched_start(&gpu->sched, true);
 }