diff mbox series

[v4] drm/scheduler: Avoid accessing freed bad job.

Message ID 1574715089-14875-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State New, archived
Headers show
Series [v4] drm/scheduler: Avoid accessing freed bad job. | expand

Commit Message

Andrey Grodzovsky Nov. 25, 2019, 8:51 p.m. UTC
Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.

Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.

v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.

v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.

v4: Fix comments to relfect latest code in drm-misc.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Emily Deng Nov. 25, 2019, 9:44 p.m. UTC | #1
[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
    Seems you didn't submit this patch?

Best wishes
Emily Deng



>-----Original Message-----
>From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Sent: Monday, November 25, 2019 12:51 PM
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; Deng, Emily
><Emily.Deng@amd.com>; steven.price@arm.com; Grodzovsky, Andrey
><Andrey.Grodzovsky@amd.com>
>Subject: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Problem:
>Due to a race between drm_sched_cleanup_jobs in sched thread and
>drm_sched_job_timedout in timeout work there is a possiblity that bad job
>was already freed while still being accessed from the timeout thread.
>
>Fix:
>Instead of just peeking at the bad job in the mirror list remove it from the list
>under lock and then put it back later when we are garanteed no race with
>main sched thread is possible which is after the thread is parked.
>
>v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
>v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>drm_sched_get_cleanup_job already has a lock there.
>
>v4: Fix comments to relfect latest code in drm-misc.
>
>Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>Reviewed-by: Christian König <christian.koenig@amd.com>
>Tested-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/scheduler/sched_main.c | 27
>+++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index 6774955..1bf9c40 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
> 	unsigned long flags;
>
> 	sched = container_of(work, struct drm_gpu_scheduler,
>work_tdr.work);
>+
>+	/* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>+	spin_lock_irqsave(&sched->job_list_lock, flags);
> 	job = list_first_entry_or_null(&sched->ring_mirror_list,
> 				       struct drm_sched_job, node);
>
> 	if (job) {
>+		/*
>+		 * Remove the bad job so it cannot be freed by concurrent
>+		 * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>+		 * is parked at which point it's safe.
>+		 */
>+		list_del_init(&job->node);
>+		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>+
> 		job->sched->ops->timedout_job(job);
>
> 		/*
>@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
> 			job->sched->ops->free_job(job);
> 			sched->free_guilty = false;
> 		}
>+	} else {
>+		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> 	}
>
> 	spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>@@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
> 	kthread_park(sched->thread);
>
> 	/*
>+	 * Reinsert back the bad job here - now it's safe as
>+	 * drm_sched_get_cleanup_job cannot race against us and release the
>+	 * bad job at this point - we parked (waited for) any in progress
>+	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>+	 * now until the scheduler thread is unparked.
>+	 */
>+	if (bad && bad->sched == sched)
>+		/*
>+		 * Add at the head of the queue to reflect it was the earliest
>+		 * job extracted.
>+		 */
>+		list_add(&bad->node, &sched->ring_mirror_list);
>+
>+	/*
> 	 * Iterate the job list from later to  earlier one and either deactive
> 	 * their HW callbacks or remove them from mirror list if they already
> 	 * signaled.
>--
>2.7.4
Andrey Grodzovsky Nov. 26, 2019, 12:09 a.m. UTC | #2
Christian asked to submit it to drm-misc instead of our drm-next to avoid later conflicts with Steven's patch which he mentioned in this thread which is not in drm-next yet.
Christian, Alex, once this merged to drm-misc I guess we need to pull all latest changes from there to drm-next so the issue Emily reported can be avoided.

Andrey
Alex Deucher Nov. 26, 2019, 3:36 p.m. UTC | #3
I recently updated amd-staging-drm-next.  Apply whatever makes sense for now and it'll naturally fall out in the next rebase.

Alex
Andrey Grodzovsky Nov. 26, 2019, 3:37 p.m. UTC | #4
Ping

Andrey

On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
> Problem:
> Due to a race between drm_sched_cleanup_jobs in sched thread and
> drm_sched_job_timedout in timeout work there is a possiblity that
> bad job was already freed while still being accessed from the
> timeout thread.
>
> Fix:
> Instead of just peeking at the bad job in the mirror list
> remove it from the list under lock and then put it back later when
> we are garanteed no race with main sched thread is possible which
> is after the thread is parked.
>
> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> drm_sched_get_cleanup_job already has a lock there.
>
> v4: Fix comments to relfect latest code in drm-misc.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Tested-by: Emily Deng <Emily.Deng@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 6774955..1bf9c40 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	unsigned long flags;
>   
>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> +
> +	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
>   				       struct drm_sched_job, node);
>   
>   	if (job) {
> +		/*
> +		 * Remove the bad job so it cannot be freed by concurrent
> +		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> +		 * is parked at which point it's safe.
> +		 */
> +		list_del_init(&job->node);
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
>   		job->sched->ops->timedout_job(job);
>   
>   		/*
> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   			job->sched->ops->free_job(job);
>   			sched->free_guilty = false;
>   		}
> +	} else {
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   	}
>   
>   	spin_lock_irqsave(&sched->job_list_lock, flags);
> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   	kthread_park(sched->thread);
>   
>   	/*
> +	 * Reinsert back the bad job here - now it's safe as
> +	 * drm_sched_get_cleanup_job cannot race against us and release the
> +	 * bad job at this point - we parked (waited for) any in progress
> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> +	 * now until the scheduler thread is unparked.
> +	 */
> +	if (bad && bad->sched == sched)
> +		/*
> +		 * Add at the head of the queue to reflect it was the earliest
> +		 * job extracted.
> +		 */
> +		list_add(&bad->node, &sched->ring_mirror_list);
> +
> +	/*
>   	 * Iterate the job list from later to  earlier one and either deactive
>   	 * their HW callbacks or remove them from mirror list if they already
>   	 * signaled.
Emily Deng Nov. 27, 2019, 12:41 a.m. UTC | #5
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng <Emily.Deng@amd.com>

>-----Original Message-----
>From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>Sent: Tuesday, November 26, 2019 7:37 AM
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; Deng, Emily
><Emily.Deng@amd.com>; steven.price@arm.com
>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Ping
>
>Andrey
>
>On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>> Problem:
>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>> job was already freed while still being accessed from the timeout
>> thread.
>>
>> Fix:
>> Instead of just peeking at the bad job in the mirror list remove it
>> from the list under lock and then put it back later when we are
>> garanteed no race with main sched thread is possible which is after
>> the thread is parked.
>>
>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>
>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>> drm_sched_get_cleanup_job already has a lock there.
>>
>> v4: Fix comments to relfect latest code in drm-misc.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 27
>+++++++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 6774955..1bf9c40 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>>   	unsigned long flags;
>>
>>   	sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>> +
>> +	/* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>   				       struct drm_sched_job, node);
>>
>>   	if (job) {
>> +		/*
>> +		 * Remove the bad job so it cannot be freed by concurrent
>> +		 * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>> +		 * is parked at which point it's safe.
>> +		 */
>> +		list_del_init(&job->node);
>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>>   		job->sched->ops->timedout_job(job);
>>
>>   		/*
>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>>   			job->sched->ops->free_job(job);
>>   			sched->free_guilty = false;
>>   		}
>> +	} else {
>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>   	}
>>
>>   	spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
>>   	kthread_park(sched->thread);
>>
>>   	/*
>> +	 * Reinsert back the bad job here - now it's safe as
>> +	 * drm_sched_get_cleanup_job cannot race against us and release the
>> +	 * bad job at this point - we parked (waited for) any in progress
>> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>> +	 * now until the scheduler thread is unparked.
>> +	 */
>> +	if (bad && bad->sched == sched)
>> +		/*
>> +		 * Add at the head of the queue to reflect it was the earliest
>> +		 * job extracted.
>> +		 */
>> +		list_add(&bad->node, &sched->ring_mirror_list);
>> +
>> +	/*
>>   	 * Iterate the job list from later to  earlier one and either deactive
>>   	 * their HW callbacks or remove them from mirror list if they already
>>   	 * signaled.
Emily Deng Dec. 2, 2019, 7:24 p.m. UTC | #6
[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
    Seems this patch is still not in amd-staging-drm-next?

Best wishes
Emily Deng



>-----Original Message-----
>From: Deng, Emily
>Sent: Tuesday, November 26, 2019 4:41 PM
>To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; steven.price@arm.com
>Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Reviewed-by: Emily Deng <Emily.Deng@amd.com>
>
>>-----Original Message-----
>>From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>Sent: Tuesday, November 26, 2019 7:37 AM
>>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>Koenig, Christian <Christian.Koenig@amd.com>; Deng, Emily
>><Emily.Deng@amd.com>; steven.price@arm.com
>>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>
>>Ping
>>
>>Andrey
>>
>>On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>> Problem:
>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>>> job was already freed while still being accessed from the timeout
>>> thread.
>>>
>>> Fix:
>>> Instead of just peeking at the bad job in the mirror list remove it
>>> from the list under lock and then put it back later when we are
>>> garanteed no race with main sched thread is possible which is after
>>> the thread is parked.
>>>
>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>
>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>> drm_sched_get_cleanup_job already has a lock there.
>>>
>>> v4: Fix comments to relfect latest code in drm-misc.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 27
>>+++++++++++++++++++++++++++
>>>   1 file changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 6774955..1bf9c40 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>>work_struct *work)
>>>   	unsigned long flags;
>>>
>>>   	sched = container_of(work, struct drm_gpu_scheduler,
>>> work_tdr.work);
>>> +
>>> +	/* Protects against concurrent deletion in
>>drm_sched_get_cleanup_job */
>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>   				       struct drm_sched_job, node);
>>>
>>>   	if (job) {
>>> +		/*
>>> +		 * Remove the bad job so it cannot be freed by concurrent
>>> +		 * drm_sched_cleanup_jobs. It will be reinserted back after
>>sched->thread
>>> +		 * is parked at which point it's safe.
>>> +		 */
>>> +		list_del_init(&job->node);
>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>>   		job->sched->ops->timedout_job(job);
>>>
>>>   		/*
>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>>work_struct *work)
>>>   			job->sched->ops->free_job(job);
>>>   			sched->free_guilty = false;
>>>   		}
>>> +	} else {
>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>   	}
>>>
>>>   	spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>drm_sched_job *bad)
>>>   	kthread_park(sched->thread);
>>>
>>>   	/*
>>> +	 * Reinsert back the bad job here - now it's safe as
>>> +	 * drm_sched_get_cleanup_job cannot race against us and release the
>>> +	 * bad job at this point - we parked (waited for) any in progress
>>> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>>called
>>> +	 * now until the scheduler thread is unparked.
>>> +	 */
>>> +	if (bad && bad->sched == sched)
>>> +		/*
>>> +		 * Add at the head of the queue to reflect it was the earliest
>>> +		 * job extracted.
>>> +		 */
>>> +		list_add(&bad->node, &sched->ring_mirror_list);
>>> +
>>> +	/*
>>>   	 * Iterate the job list from later to  earlier one and either deactive
>>>   	 * their HW callbacks or remove them from mirror list if they already
>>>   	 * signaled.
Andrey Grodzovsky Dec. 3, 2019, 7:10 p.m. UTC | #7
Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian 
didn't pull to amd-staging-drm-next yet.

Andrey

On 12/2/19 2:24 PM, Deng, Emily wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Andrey,
>      Seems this patch is still not in amd-staging-drm-next?
>
> Best wishes
> Emily Deng
>
>
>
>> -----Original Message-----
>> From: Deng, Emily
>> Sent: Tuesday, November 26, 2019 4:41 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>> Christian <Christian.Koenig@amd.com>; steven.price@arm.com
>> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Reviewed-by: Emily Deng <Emily.Deng@amd.com>
>>
>>> -----Original Message-----
>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> Sent: Tuesday, November 26, 2019 7:37 AM
>>> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>> Koenig, Christian <Christian.Koenig@amd.com>; Deng, Emily
>>> <Emily.Deng@amd.com>; steven.price@arm.com
>>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>
>>> Ping
>>>
>>> Andrey
>>>
>>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>>> Problem:
>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>>>> job was already freed while still being accessed from the timeout
>>>> thread.
>>>>
>>>> Fix:
>>>> Instead of just peeking at the bad job in the mirror list remove it
>>>> from the list under lock and then put it back later when we are
>>>> garanteed no race with main sched thread is possible which is after
>>>> the thread is parked.
>>>>
>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>>
>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>
>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27
>>> +++++++++++++++++++++++++++
>>>>    1 file changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 6774955..1bf9c40 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>>> work_struct *work)
>>>>    	unsigned long flags;
>>>>
>>>>    	sched = container_of(work, struct drm_gpu_scheduler,
>>>> work_tdr.work);
>>>> +
>>>> +	/* Protects against concurrent deletion in
>>> drm_sched_get_cleanup_job */
>>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>    	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>    				       struct drm_sched_job, node);
>>>>
>>>>    	if (job) {
>>>> +		/*
>>>> +		 * Remove the bad job so it cannot be freed by concurrent
>>>> +		 * drm_sched_cleanup_jobs. It will be reinserted back after
>>> sched->thread
>>>> +		 * is parked at which point it's safe.
>>>> +		 */
>>>> +		list_del_init(&job->node);
>>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +
>>>>    		job->sched->ops->timedout_job(job);
>>>>
>>>>    		/*
>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>>> work_struct *work)
>>>>    			job->sched->ops->free_job(job);
>>>>    			sched->free_guilty = false;
>>>>    		}
>>>> +	} else {
>>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>    	}
>>>>
>>>>    	spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>>>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>> drm_sched_job *bad)
>>>>    	kthread_park(sched->thread);
>>>>
>>>>    	/*
>>>> +	 * Reinsert back the bad job here - now it's safe as
>>>> +	 * drm_sched_get_cleanup_job cannot race against us and release the
>>>> +	 * bad job at this point - we parked (waited for) any in progress
>>>> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>>> called
>>>> +	 * now until the scheduler thread is unparked.
>>>> +	 */
>>>> +	if (bad && bad->sched == sched)
>>>> +		/*
>>>> +		 * Add at the head of the queue to reflect it was the earliest
>>>> +		 * job extracted.
>>>> +		 */
>>>> +		list_add(&bad->node, &sched->ring_mirror_list);
>>>> +
>>>> +	/*
>>>>    	 * Iterate the job list from later to  earlier one and either deactive
>>>>    	 * their HW callbacks or remove them from mirror list if they already
>>>>    	 * signaled.
Alex Deucher Dec. 3, 2019, 7:44 p.m. UTC | #8
[AMD Official Use Only - Internal Distribution Only]

Please go ahead an apply whatever version is necessary for amd-staging-drm-next.

Alex
Emily Deng Dec. 3, 2019, 7:53 p.m. UTC | #9
[AMD Official Use Only - Internal Distribution Only]

Hi Alex,
    When we will cherry pick those patches to drm-next?

>-----Original Message-----
>From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>Sent: Tuesday, December 3, 2019 11:10 AM
>To: Deng, Emily <Emily.Deng@amd.com>; Deucher, Alexander
><Alexander.Deucher@amd.com>
>Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian <Christian.Koenig@amd.com>; steven.price@arm.com
>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian didn't pull
>to amd-staging-drm-next yet.
>
>Andrey
>
>On 12/2/19 2:24 PM, Deng, Emily wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Andrey,
>>      Seems this patch is still not in amd-staging-drm-next?
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -----Original Message-----
>>> From: Deng, Emily
>>> Sent: Tuesday, November 26, 2019 4:41 PM
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>> Koenig, Christian <Christian.Koenig@amd.com>; steven.price@arm.com
>>> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Reviewed-by: Emily Deng <Emily.Deng@amd.com>
>>>
>>>> -----Original Message-----
>>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>>> Sent: Tuesday, November 26, 2019 7:37 AM
>>>> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>>> Koenig, Christian <Christian.Koenig@amd.com>; Deng, Emily
>>>> <Emily.Deng@amd.com>; steven.price@arm.com
>>>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>>
>>>> Ping
>>>>
>>>> Andrey
>>>>
>>>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>>>> Problem:
>>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>>>> bad job was already freed while still being accessed from the
>>>>> timeout thread.
>>>>>
>>>>> Fix:
>>>>> Instead of just peeking at the bad job in the mirror list remove it
>>>>> from the list under lock and then put it back later when we are
>>>>> garanteed no race with main sched thread is possible which is after
>>>>> the thread is parked.
>>>>>
>>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>>>
>>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>>
>>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27
>>>> +++++++++++++++++++++++++++
>>>>>    1 file changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 6774955..1bf9c40 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>>    	unsigned long flags;
>>>>>
>>>>>    	sched = container_of(work, struct drm_gpu_scheduler,
>>>>> work_tdr.work);
>>>>> +
>>>>> +	/* Protects against concurrent deletion in
>>>> drm_sched_get_cleanup_job */
>>>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>    	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>    				       struct drm_sched_job, node);
>>>>>
>>>>>    	if (job) {
>>>>> +		/*
>>>>> +		 * Remove the bad job so it cannot be freed by concurrent
>>>>> +		 * drm_sched_cleanup_jobs. It will be reinserted back after
>>>> sched->thread
>>>>> +		 * is parked at which point it's safe.
>>>>> +		 */
>>>>> +		list_del_init(&job->node);
>>>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>>    		job->sched->ops->timedout_job(job);
>>>>>
>>>>>    		/*
>>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>>    			job->sched->ops->free_job(job);
>>>>>    			sched->free_guilty = false;
>>>>>    		}
>>>>> +	} else {
>>>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>    	}
>>>>>
>>>>>    	spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6
>>>>> +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched,
>>>>> struct
>>>> drm_sched_job *bad)
>>>>>    	kthread_park(sched->thread);
>>>>>
>>>>>    	/*
>>>>> +	 * Reinsert back the bad job here - now it's safe as
>>>>> +	 * drm_sched_get_cleanup_job cannot race against us and release the
>>>>> +	 * bad job at this point - we parked (waited for) any in progress
>>>>> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>>>> called
>>>>> +	 * now until the scheduler thread is unparked.
>>>>> +	 */
>>>>> +	if (bad && bad->sched == sched)
>>>>> +		/*
>>>>> +		 * Add at the head of the queue to reflect it was the earliest
>>>>> +		 * job extracted.
>>>>> +		 */
>>>>> +		list_add(&bad->node, &sched->ring_mirror_list);
>>>>> +
>>>>> +	/*
>>>>>    	 * Iterate the job list from later to  earlier one and either deactive
>>>>>    	 * their HW callbacks or remove them from mirror list if they already
>>>>>    	 * signaled.
Andrey Grodzovsky Dec. 3, 2019, 7:57 p.m. UTC | #10
I don't think i can apply this patch 'as is' as this has dependency on 
patch by Steven which also wasn't applied yet - 588b982 Steven 
Price        6 weeks ago    drm: Don't free jobs in 
wait_event_interruptible()


Andrey


On 12/3/19 2:44 PM, Deucher, Alexander wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Please go ahead an apply whatever version is necessary for 
> amd-staging-drm-next.
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> *Sent:* Tuesday, December 3, 2019 2:10 PM
> *To:* Deng, Emily <Emily.Deng@amd.com>; Deucher, Alexander 
> <Alexander.Deucher@amd.com>
> *Cc:* dri-devel@lists.freedesktop.org 
> <dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org 
> <amd-gfx@lists.freedesktop.org>; Koenig, Christian 
> <Christian.Koenig@amd.com>; steven.price@arm.com <steven.price@arm.com>
> *Subject:* Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
> Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian
> didn't pull to amd-staging-drm-next yet.
>
> Andrey
>
> On 12/2/19 2:24 PM, Deng, Emily wrote:
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Hi Andrey,
> >      Seems this patch is still not in amd-staging-drm-next?
> >
> > Best wishes
> > Emily Deng
> >
> >
> >
> >> -----Original Message-----
> >> From: Deng, Emily
> >> Sent: Tuesday, November 26, 2019 4:41 PM
> >> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> >> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
> Koenig,
> >> Christian <Christian.Koenig@amd.com>; steven.price@arm.com
> >> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
> >>
> >> [AMD Official Use Only - Internal Distribution Only]
> >>
> >> Reviewed-by: Emily Deng <Emily.Deng@amd.com>
> >>
> >>> -----Original Message-----
> >>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> >>> Sent: Tuesday, November 26, 2019 7:37 AM
> >>> Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
> >>> Koenig, Christian <Christian.Koenig@amd.com>; Deng, Emily
> >>> <Emily.Deng@amd.com>; steven.price@arm.com
> >>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
> >>>
> >>> Ping
> >>>
> >>> Andrey
> >>>
> >>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
> >>>> Problem:
> >>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
> >>>> drm_sched_job_timedout in timeout work there is a possiblity that bad
> >>>> job was already freed while still being accessed from the timeout
> >>>> thread.
> >>>>
> >>>> Fix:
> >>>> Instead of just peeking at the bad job in the mirror list remove it
> >>>> from the list under lock and then put it back later when we are
> >>>> garanteed no race with main sched thread is possible which is after
> >>>> the thread is parked.
> >>>>
> >>>> v2: Lock around processing ring_mirror_list in 
> drm_sched_cleanup_jobs.
> >>>>
> >>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> >>>> drm_sched_get_cleanup_job already has a lock there.
> >>>>
> >>>> v4: Fix comments to relfect latest code in drm-misc.
> >>>>
> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
> >>>> ---
> >>>> drivers/gpu/drm/scheduler/sched_main.c | 27
> >>> +++++++++++++++++++++++++++
> >>>>    1 file changed, 27 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> >>>> b/drivers/gpu/drm/scheduler/sched_main.c
> >>>> index 6774955..1bf9c40 100644
> >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
> >>> work_struct *work)
> >>>>     unsigned long flags;
> >>>>
> >>>>     sched = container_of(work, struct drm_gpu_scheduler,
> >>>> work_tdr.work);
> >>>> +
> >>>> +  /* Protects against concurrent deletion in
> >>> drm_sched_get_cleanup_job */
> >>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
> >>>>     job = list_first_entry_or_null(&sched->ring_mirror_list,
> >>>> struct drm_sched_job, node);
> >>>>
> >>>>     if (job) {
> >>>> +          /*
> >>>> +           * Remove the bad job so it cannot be freed by concurrent
> >>>> +           * drm_sched_cleanup_jobs. It will be reinserted back 
> after
> >>> sched->thread
> >>>> +           * is parked at which point it's safe.
> >>>> +           */
> >>>> + list_del_init(&job->node);
> >>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>>> +
> >>>> job->sched->ops->timedout_job(job);
> >>>>
> >>>>             /*
> >>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
> >>> work_struct *work)
> >>>> job->sched->ops->free_job(job);
> >>>> sched->free_guilty = false;
> >>>>             }
> >>>> +  } else {
> >>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>>>     }
> >>>>
> >>>> spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
> >>>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
> >>> drm_sched_job *bad)
> >>>>     kthread_park(sched->thread);
> >>>>
> >>>>     /*
> >>>> +   * Reinsert back the bad job here - now it's safe as
> >>>> +   * drm_sched_get_cleanup_job cannot race against us and 
> release the
> >>>> +   * bad job at this point - we parked (waited for) any in progress
> >>>> +   * (earlier) cleanups and drm_sched_get_cleanup_job will not be
> >>> called
> >>>> +   * now until the scheduler thread is unparked.
> >>>> +   */
> >>>> +  if (bad && bad->sched == sched)
> >>>> +          /*
> >>>> +           * Add at the head of the queue to reflect it was the 
> earliest
> >>>> +           * job extracted.
> >>>> +           */
> >>>> +          list_add(&bad->node, &sched->ring_mirror_list);
> >>>> +
> >>>> +  /*
> >>>>      * Iterate the job list from later to  earlier one and either 
> deactive
> >>>>      * their HW callbacks or remove them from mirror list if they 
> already
> >>>>      * signaled.
Alex Deucher Dec. 3, 2019, 7:59 p.m. UTC | #11
[AMD Official Use Only - Internal Distribution Only]

Cherry pick whatever dependencies you need or pick the older version of the patch.  Either way works.

Alex
Andrey Grodzovsky Dec. 3, 2019, 8:32 p.m. UTC | #12
Turns out Steven's patch was already in so i just cherry-picked the 
change from drm-next-misc


Emily - it's in.


Andrey


On 12/3/19 2:59 PM, Deucher, Alexander wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Cherry pick whatever dependencies you need or pick the older version 
> of the patch.  Either way works.
>
> Alex
> ------------------------------------------------------------------------
> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> *Sent:* Tuesday, December 3, 2019 2:57 PM
> *To:* Deucher, Alexander <Alexander.Deucher@amd.com>; Deng, Emily 
> <Emily.Deng@amd.com>
> *Cc:* dri-devel@lists.freedesktop.org 
> <dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org 
> <amd-gfx@lists.freedesktop.org>; Koenig, Christian 
> <Christian.Koenig@amd.com>; steven.price@arm.com <steven.price@arm.com>
> *Subject:* Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
> I don't think i can apply this patch 'as is' as this has dependency on 
> patch by Steven which also wasn't applied yet - 588b982 Steven 
> Price        6 weeks ago    drm: Don't free jobs in 
> wait_event_interruptible()
>
>
> Andrey
>
>
> On 12/3/19 2:44 PM, Deucher, Alexander wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> Please go ahead an apply whatever version is necessary for 
>> amd-staging-drm-next.
>>
>> Alex
>>
>> ------------------------------------------------------------------------
>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
>> <mailto:Andrey.Grodzovsky@amd.com>
>> *Sent:* Tuesday, December 3, 2019 2:10 PM
>> *To:* Deng, Emily <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; 
>> Deucher, Alexander <Alexander.Deucher@amd.com> 
>> <mailto:Alexander.Deucher@amd.com>
>> *Cc:* dri-devel@lists.freedesktop.org 
>> <mailto:dri-devel@lists.freedesktop.org> 
>> <dri-devel@lists.freedesktop.org> 
>> <mailto:dri-devel@lists.freedesktop.org>; 
>> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> 
>> <amd-gfx@lists.freedesktop.org> 
>> <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian 
>> <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; 
>> steven.price@arm.com <mailto:steven.price@arm.com> 
>> <steven.price@arm.com> <mailto:steven.price@arm.com>
>> *Subject:* Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>> Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian
>> didn't pull to amd-staging-drm-next yet.
>>
>> Andrey
>>
>> On 12/2/19 2:24 PM, Deng, Emily wrote:
>> > [AMD Official Use Only - Internal Distribution Only]
>> >
>> > Hi Andrey,
>> >      Seems this patch is still not in amd-staging-drm-next?
>> >
>> > Best wishes
>> > Emily Deng
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: Deng, Emily
>> >> Sent: Tuesday, November 26, 2019 4:41 PM
>> >> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
>> <mailto:Andrey.Grodzovsky@amd.com>
>> >> Cc: dri-devel@lists.freedesktop.org 
>> <mailto:dri-devel@lists.freedesktop.org>; 
>> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>; 
>> Koenig,
>> >> Christian <Christian.Koenig@amd.com> 
>> <mailto:Christian.Koenig@amd.com>; steven.price@arm.com 
>> <mailto:steven.price@arm.com>
>> >> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>> >>
>> >> [AMD Official Use Only - Internal Distribution Only]
>> >>
>> >> Reviewed-by: Emily Deng <Emily.Deng@amd.com> 
>> <mailto:Emily.Deng@amd.com>
>> >>
>> >>> -----Original Message-----
>> >>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
>> <mailto:Andrey.Grodzovsky@amd.com>
>> >>> Sent: Tuesday, November 26, 2019 7:37 AM
>> >>> Cc: dri-devel@lists.freedesktop.org 
>> <mailto:dri-devel@lists.freedesktop.org>; 
>> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>> >>> Koenig, Christian <Christian.Koenig@amd.com> 
>> <mailto:Christian.Koenig@amd.com>; Deng, Emily
>> >>> <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; 
>> steven.price@arm.com <mailto:steven.price@arm.com>
>> >>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>> >>>
>> >>> Ping
>> >>>
>> >>> Andrey
>> >>>
>> >>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>> >>>> Problem:
>> >>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>> >>>> drm_sched_job_timedout in timeout work there is a possiblity 
>> that bad
>> >>>> job was already freed while still being accessed from the timeout
>> >>>> thread.
>> >>>>
>> >>>> Fix:
>> >>>> Instead of just peeking at the bad job in the mirror list remove it
>> >>>> from the list under lock and then put it back later when we are
>> >>>> garanteed no race with main sched thread is possible which is after
>> >>>> the thread is parked.
>> >>>>
>> >>>> v2: Lock around processing ring_mirror_list in 
>> drm_sched_cleanup_jobs.
>> >>>>
>> >>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>> >>>> drm_sched_get_cleanup_job already has a lock there.
>> >>>>
>> >>>> v4: Fix comments to relfect latest code in drm-misc.
>> >>>>
>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> 
>> <mailto:andrey.grodzovsky@amd.com>
>> >>>> Reviewed-by: Christian König <christian.koenig@amd.com> 
>> <mailto:christian.koenig@amd.com>
>> >>>> Tested-by: Emily Deng <Emily.Deng@amd.com> 
>> <mailto:Emily.Deng@amd.com>
>> >>>> ---
>> >>>> drivers/gpu/drm/scheduler/sched_main.c | 27
>> >>> +++++++++++++++++++++++++++
>> >>>>    1 file changed, 27 insertions(+)
>> >>>>
>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> >>>> b/drivers/gpu/drm/scheduler/sched_main.c
>> >>>> index 6774955..1bf9c40 100644
>> >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> >>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>> >>> work_struct *work)
>> >>>>     unsigned long flags;
>> >>>>
>> >>>>     sched = container_of(work, struct drm_gpu_scheduler,
>> >>>> work_tdr.work);
>> >>>> +
>> >>>> +  /* Protects against concurrent deletion in
>> >>> drm_sched_get_cleanup_job */
>> >>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>> >>>>     job = list_first_entry_or_null(&sched->ring_mirror_list,
>> >>>>                                    struct drm_sched_job, node);
>> >>>>
>> >>>>     if (job) {
>> >>>> +          /*
>> >>>> +           * Remove the bad job so it cannot be freed by concurrent
>> >>>> +           * drm_sched_cleanup_jobs. It will be reinserted back 
>> after
>> >>> sched->thread
>> >>>> +           * is parked at which point it's safe.
>> >>>> +           */
>> >>>> + list_del_init(&job->node);
>> >>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> >>>> +
>> >>>> job->sched->ops->timedout_job(job);
>> >>>>
>> >>>>             /*
>> >>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>> >>> work_struct *work)
>> >>>> job->sched->ops->free_job(job);
>> >>>> sched->free_guilty = false;
>> >>>>             }
>> >>>> +  } else {
>> >>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> >>>>     }
>> >>>>
>> >>>> spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20
>> >>>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>> >>> drm_sched_job *bad)
>> >>>> kthread_park(sched->thread);
>> >>>>
>> >>>>     /*
>> >>>> +   * Reinsert back the bad job here - now it's safe as
>> >>>> +   * drm_sched_get_cleanup_job cannot race against us and 
>> release the
>> >>>> +   * bad job at this point - we parked (waited for) any in progress
>> >>>> +   * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>> >>> called
>> >>>> +   * now until the scheduler thread is unparked.
>> >>>> +   */
>> >>>> +  if (bad && bad->sched == sched)
>> >>>> +          /*
>> >>>> +           * Add at the head of the queue to reflect it was the 
>> earliest
>> >>>> +           * job extracted.
>> >>>> +           */
>> >>>> + list_add(&bad->node, &sched->ring_mirror_list);
>> >>>> +
>> >>>> +  /*
>> >>>>      * Iterate the job list from later to  earlier one and 
>> either deactive
>> >>>>      * their HW callbacks or remove them from mirror list if 
>> they already
>> >>>>      * signaled.
Emily Deng Dec. 3, 2019, 8:58 p.m. UTC | #13
[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
    Thanks very much.

Best wishes
Emily Deng
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Tuesday, December 3, 2019 12:33 PM
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Cc: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; steven.price@arm.com
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.


Turns out Steven's patch was already in so i just cherry-picked the change from drm-next-misc



Emily - it's in.



Andrey


On 12/3/19 2:59 PM, Deucher, Alexander wrote:

[AMD Official Use Only - Internal Distribution Only]

Cherry pick whatever dependencies you need or pick the older version of the patch.  Either way works.

Alex
Lucas Stach Feb. 5, 2020, 6:24 p.m. UTC | #14
Hi Andrey,

This commit breaks all drivers, which may bail out of the timeout
processing as they wish to extend the timeout (etnaviv, v3d).

Those drivers currently just return from the timeout handler before
calling drm_sched_stop(), which means with this commit applied we are
removing the first job from the ring_mirror_list, but never put it
back. This leads to jobs getting lost from the ring mirror, which then
causes quite a bit of fallout like unsignaled fences.

Not sure yet what to do about it, we can either add a function to add
the job back to the ring_mirror if the driver wants to extend the
timeout, or we could look for another way to stop
drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
processing.

Regards,
Lucas

On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> Problem:
> Due to a race between drm_sched_cleanup_jobs in sched thread and
> drm_sched_job_timedout in timeout work there is a possiblity that
> bad job was already freed while still being accessed from the
> timeout thread.
> 
> Fix:
> Instead of just peeking at the bad job in the mirror list
> remove it from the list under lock and then put it back later when
> we are garanteed no race with main sched thread is possible which
> is after the thread is parked.
> 
> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> 
> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> drm_sched_get_cleanup_job already has a lock there.
> 
> v4: Fix comments to relfect latest code in drm-misc.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Tested-by: Emily Deng <Emily.Deng@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 6774955..1bf9c40 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
>  	unsigned long flags;
>  
>  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> +
> +	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>  	job = list_first_entry_or_null(&sched->ring_mirror_list,
>  				       struct drm_sched_job, node);
>  
>  	if (job) {
> +		/*
> +		 * Remove the bad job so it cannot be freed by concurrent
> +		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> +		 * is parked at which point it's safe.
> +		 */
> +		list_del_init(&job->node);
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
>  		job->sched->ops->timedout_job(job);
>  
>  		/*
> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>  			job->sched->ops->free_job(job);
>  			sched->free_guilty = false;
>  		}
> +	} else {
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>  	}
>  
>  	spin_lock_irqsave(&sched->job_list_lock, flags);
> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>  	kthread_park(sched->thread);
>  
>  	/*
> +	 * Reinsert back the bad job here - now it's safe as
> +	 * drm_sched_get_cleanup_job cannot race against us and release the
> +	 * bad job at this point - we parked (waited for) any in progress
> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> +	 * now until the scheduler thread is unparked.
> +	 */
> +	if (bad && bad->sched == sched)
> +		/*
> +		 * Add at the head of the queue to reflect it was the earliest
> +		 * job extracted.
> +		 */
> +		list_add(&bad->node, &sched->ring_mirror_list);
> +
> +	/*
>  	 * Iterate the job list from later to  earlier one and either deactive
>  	 * their HW callbacks or remove them from mirror list if they already
>  	 * signaled.
Lucas Stach Feb. 6, 2020, 11:10 a.m. UTC | #15
Hi all,

On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> Hi Andrey,
> 
> This commit breaks all drivers, which may bail out of the timeout
> processing as they wish to extend the timeout (etnaviv, v3d).
> 
> Those drivers currently just return from the timeout handler before
> calling drm_sched_stop(), which means with this commit applied we are
> removing the first job from the ring_mirror_list, but never put it
> back. This leads to jobs getting lost from the ring mirror, which then
> causes quite a bit of fallout like unsignaled fences.
> 
> Not sure yet what to do about it, we can either add a function to add
> the job back to the ring_mirror if the driver wants to extend the
> timeout, or we could look for another way to stop
> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> processing.

So after thinking about this a bit more my opinion is that we need to
revert this change for now and go back to the drawing board for the
scheduler timeout handling.

Right now this starts to feel like a big midlayer mistake with all the
very intricate intertwining between the drivers and the scheduler. The
rules on when it's safe to manipulate the ring mirror and when
completed jobs are signaled and freed are not really well specified.
The fact that we need to mutate state in order to get rid of races
instead of having a single big "timeout processing is owner of the
scheduler state for now" is a big fat warning sign IMHO.

It took me far longer than I'd like to admit to understand the failure
mode with fences not getting signaled after a GPU hang. The back and
forth between scheduler and driver code makes things really hard to
follow.

Regards,
Lucas

> Regards,
> Lucas
> 
> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> > Problem:
> > Due to a race between drm_sched_cleanup_jobs in sched thread and
> > drm_sched_job_timedout in timeout work there is a possiblity that
> > bad job was already freed while still being accessed from the
> > timeout thread.
> > 
> > Fix:
> > Instead of just peeking at the bad job in the mirror list
> > remove it from the list under lock and then put it back later when
> > we are garanteed no race with main sched thread is possible which
> > is after the thread is parked.
> > 
> > v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> > 
> > v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> > drm_sched_get_cleanup_job already has a lock there.
> > 
> > v4: Fix comments to relfect latest code in drm-misc.
> > 
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Tested-by: Emily Deng <Emily.Deng@amd.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6774955..1bf9c40 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >  	unsigned long flags;
> >  
> >  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> > +
> > +	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> > +	spin_lock_irqsave(&sched->job_list_lock, flags);
> >  	job = list_first_entry_or_null(&sched->ring_mirror_list,
> >  				       struct drm_sched_job, node);
> >  
> >  	if (job) {
> > +		/*
> > +		 * Remove the bad job so it cannot be freed by concurrent
> > +		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> > +		 * is parked at which point it's safe.
> > +		 */
> > +		list_del_init(&job->node);
> > +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> > +
> >  		job->sched->ops->timedout_job(job);
> >  
> >  		/*
> > @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >  			job->sched->ops->free_job(job);
> >  			sched->free_guilty = false;
> >  		}
> > +	} else {
> > +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >  	}
> >  
> >  	spin_lock_irqsave(&sched->job_list_lock, flags);
> > @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >  	kthread_park(sched->thread);
> >  
> >  	/*
> > +	 * Reinsert back the bad job here - now it's safe as
> > +	 * drm_sched_get_cleanup_job cannot race against us and release the
> > +	 * bad job at this point - we parked (waited for) any in progress
> > +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> > +	 * now until the scheduler thread is unparked.
> > +	 */
> > +	if (bad && bad->sched == sched)
> > +		/*
> > +		 * Add at the head of the queue to reflect it was the earliest
> > +		 * job extracted.
> > +		 */
> > +		list_add(&bad->node, &sched->ring_mirror_list);
> > +
> > +	/*
> >  	 * Iterate the job list from later to  earlier one and either deactive
> >  	 * their HW callbacks or remove them from mirror list if they already
> >  	 * signaled.
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König Feb. 6, 2020, 11:49 a.m. UTC | #16
Am 06.02.20 um 12:10 schrieb Lucas Stach:
> Hi all,
>
> On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
>> Hi Andrey,
>>
>> This commit breaks all drivers, which may bail out of the timeout
>> processing as they wish to extend the timeout (etnaviv, v3d).
>>
>> Those drivers currently just return from the timeout handler before
>> calling drm_sched_stop(), which means with this commit applied we are
>> removing the first job from the ring_mirror_list, but never put it
>> back. This leads to jobs getting lost from the ring mirror, which then
>> causes quite a bit of fallout like unsignaled fences.
>>
>> Not sure yet what to do about it, we can either add a function to add
>> the job back to the ring_mirror if the driver wants to extend the
>> timeout, or we could look for another way to stop
>> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
>> processing.
> So after thinking about this a bit more my opinion is that we need to
> revert this change for now and go back to the drawing board for the
> scheduler timeout handling.
>
> Right now this starts to feel like a big midlayer mistake with all the
> very intricate intertwining between the drivers and the scheduler. The
> rules on when it's safe to manipulate the ring mirror and when
> completed jobs are signaled and freed are not really well specified.
> The fact that we need to mutate state in order to get rid of races
> instead of having a single big "timeout processing is owner of the
> scheduler state for now" is a big fat warning sign IMHO.

Yes, that strongly feels like a hack to me as well. But I didn't had 
time and still haven't to take a closer look and suggest something better.

Christian.

>
> It took me far longer than I'd like to admit to understand the failure
> mode with fences not getting signaled after a GPU hang. The back and
> forth between scheduler and driver code makes things really hard to
> follow.
>
> Regards,
> Lucas
>
>> Regards,
>> Lucas
>>
>> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
>>> Problem:
>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>> bad job was already freed while still being accessed from the
>>> timeout thread.
>>>
>>> Fix:
>>> Instead of just peeking at the bad job in the mirror list
>>> remove it from the list under lock and then put it back later when
>>> we are garanteed no race with main sched thread is possible which
>>> is after the thread is parked.
>>>
>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>
>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>> drm_sched_get_cleanup_job already has a lock there.
>>>
>>> v4: Fix comments to relfect latest code in drm-misc.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
>>>   1 file changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 6774955..1bf9c40 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>   	unsigned long flags;
>>>   
>>>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>>> +
>>> +	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>   				       struct drm_sched_job, node);
>>>   
>>>   	if (job) {
>>> +		/*
>>> +		 * Remove the bad job so it cannot be freed by concurrent
>>> +		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
>>> +		 * is parked at which point it's safe.
>>> +		 */
>>> +		list_del_init(&job->node);
>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>>   		job->sched->ops->timedout_job(job);
>>>   
>>>   		/*
>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>   			job->sched->ops->free_job(job);
>>>   			sched->free_guilty = false;
>>>   		}
>>> +	} else {
>>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>   	}
>>>   
>>>   	spin_lock_irqsave(&sched->job_list_lock, flags);
>>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>   	kthread_park(sched->thread);
>>>   
>>>   	/*
>>> +	 * Reinsert back the bad job here - now it's safe as
>>> +	 * drm_sched_get_cleanup_job cannot race against us and release the
>>> +	 * bad job at this point - we parked (waited for) any in progress
>>> +	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
>>> +	 * now until the scheduler thread is unparked.
>>> +	 */
>>> +	if (bad && bad->sched == sched)
>>> +		/*
>>> +		 * Add at the head of the queue to reflect it was the earliest
>>> +		 * job extracted.
>>> +		 */
>>> +		list_add(&bad->node, &sched->ring_mirror_list);
>>> +
>>> +	/*
>>>   	 * Iterate the job list from later to  earlier one and either deactive
>>>   	 * their HW callbacks or remove them from mirror list if they already
>>>   	 * signaled.
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Alex Deucher Feb. 6, 2020, 2:49 p.m. UTC | #17
On Thu, Feb 6, 2020 at 6:50 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 06.02.20 um 12:10 schrieb Lucas Stach:
> > Hi all,
> >
> > On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> >> Hi Andrey,
> >>
> >> This commit breaks all drivers, which may bail out of the timeout
> >> processing as they wish to extend the timeout (etnaviv, v3d).
> >>
> >> Those drivers currently just return from the timeout handler before
> >> calling drm_sched_stop(), which means with this commit applied we are
> >> removing the first job from the ring_mirror_list, but never put it
> >> back. This leads to jobs getting lost from the ring mirror, which then
> >> causes quite a bit of fallout like unsignaled fences.
> >>
> >> Not sure yet what to do about it, we can either add a function to add
> >> the job back to the ring_mirror if the driver wants to extend the
> >> timeout, or we could look for another way to stop
> >> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> >> processing.
> > So after thinking about this a bit more my opinion is that we need to
> > revert this change for now and go back to the drawing board for the
> > scheduler timeout handling.
> >
> > Right now this starts to feel like a big midlayer mistake with all the
> > very intricate intertwining between the drivers and the scheduler. The
> > rules on when it's safe to manipulate the ring mirror and when
> > completed jobs are signaled and freed are not really well specified.
> > The fact that we need to mutate state in order to get rid of races
> > instead of having a single big "timeout processing is owner of the
> > scheduler state for now" is a big fat warning sign IMHO.
>
> Yes, that strongly feels like a hack to me as well. But I didn't had
> time and still haven't to take a closer look and suggest something better.
>

In that case, can someone send me a revert?

Alex


> Christian.
>
> >
> > It took me far longer than I'd like to admit to understand the failure
> > mode with fences not getting signaled after a GPU hang. The back and
> > forth between scheduler and driver code makes things really hard to
> > follow.
> >
> > Regards,
> > Lucas
> >
> >> Regards,
> >> Lucas
> >>
> >> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> >>> Problem:
> >>> Due to a race between drm_sched_cleanup_jobs in sched thread and
> >>> drm_sched_job_timedout in timeout work there is a possiblity that
> >>> bad job was already freed while still being accessed from the
> >>> timeout thread.
> >>>
> >>> Fix:
> >>> Instead of just peeking at the bad job in the mirror list
> >>> remove it from the list under lock and then put it back later when
> >>> we are garanteed no race with main sched thread is possible which
> >>> is after the thread is parked.
> >>>
> >>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> >>>
> >>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> >>> drm_sched_get_cleanup_job already has a lock there.
> >>>
> >>> v4: Fix comments to relfect latest code in drm-misc.
> >>>
> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>> Tested-by: Emily Deng <Emily.Deng@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
> >>>   1 file changed, 27 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index 6774955..1bf9c40 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >>>     unsigned long flags;
> >>>
> >>>     sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> >>> +
> >>> +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> >>> +   spin_lock_irqsave(&sched->job_list_lock, flags);
> >>>     job = list_first_entry_or_null(&sched->ring_mirror_list,
> >>>                                    struct drm_sched_job, node);
> >>>
> >>>     if (job) {
> >>> +           /*
> >>> +            * Remove the bad job so it cannot be freed by concurrent
> >>> +            * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> >>> +            * is parked at which point it's safe.
> >>> +            */
> >>> +           list_del_init(&job->node);
> >>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>> +
> >>>             job->sched->ops->timedout_job(job);
> >>>
> >>>             /*
> >>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >>>                     job->sched->ops->free_job(job);
> >>>                     sched->free_guilty = false;
> >>>             }
> >>> +   } else {
> >>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>>     }
> >>>
> >>>     spin_lock_irqsave(&sched->job_list_lock, flags);
> >>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >>>     kthread_park(sched->thread);
> >>>
> >>>     /*
> >>> +    * Reinsert back the bad job here - now it's safe as
> >>> +    * drm_sched_get_cleanup_job cannot race against us and release the
> >>> +    * bad job at this point - we parked (waited for) any in progress
> >>> +    * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> >>> +    * now until the scheduler thread is unparked.
> >>> +    */
> >>> +   if (bad && bad->sched == sched)
> >>> +           /*
> >>> +            * Add at the head of the queue to reflect it was the earliest
> >>> +            * job extracted.
> >>> +            */
> >>> +           list_add(&bad->node, &sched->ring_mirror_list);
> >>> +
> >>> +   /*
> >>>      * Iterate the job list from later to  earlier one and either deactive
> >>>      * their HW callbacks or remove them from mirror list if they already
> >>>      * signaled.
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König Feb. 6, 2020, 2:51 p.m. UTC | #18
Am 06.02.20 um 15:49 schrieb Alex Deucher:
> On Thu, Feb 6, 2020 at 6:50 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 06.02.20 um 12:10 schrieb Lucas Stach:
>>> Hi all,
>>>
>>> On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
>>>> Hi Andrey,
>>>>
>>>> This commit breaks all drivers, which may bail out of the timeout
>>>> processing as they wish to extend the timeout (etnaviv, v3d).
>>>>
>>>> Those drivers currently just return from the timeout handler before
>>>> calling drm_sched_stop(), which means with this commit applied we are
>>>> removing the first job from the ring_mirror_list, but never put it
>>>> back. This leads to jobs getting lost from the ring mirror, which then
>>>> causes quite a bit of fallout like unsignaled fences.
>>>>
>>>> Not sure yet what to do about it, we can either add a function to add
>>>> the job back to the ring_mirror if the driver wants to extend the
>>>> timeout, or we could look for another way to stop
>>>> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
>>>> processing.
>>> So after thinking about this a bit more my opinion is that we need to
>>> revert this change for now and go back to the drawing board for the
>>> scheduler timeout handling.
>>>
>>> Right now this starts to feel like a big midlayer mistake with all the
>>> very intricate intertwining between the drivers and the scheduler. The
>>> rules on when it's safe to manipulate the ring mirror and when
>>> completed jobs are signaled and freed are not really well specified.
>>> The fact that we need to mutate state in order to get rid of races
>>> instead of having a single big "timeout processing is owner of the
>>> scheduler state for now" is a big fat warning sign IMHO.
>> Yes, that strongly feels like a hack to me as well. But I didn't had
>> time and still haven't to take a closer look and suggest something better.
>>
> In that case, can someone send me a revert?

Well a revert would break our driver.

The real solution is that somebody needs to sit down, gather ALL the 
requirements and then come up with a solution which is clean and works 
for everyone.

Christian.

>
> Alex
>
>
>> Christian.
>>
>>> It took me far longer than I'd like to admit to understand the failure
>>> mode with fences not getting signaled after a GPU hang. The back and
>>> forth between scheduler and driver code makes things really hard to
>>> follow.
>>>
>>> Regards,
>>> Lucas
>>>
>>>> Regards,
>>>> Lucas
>>>>
>>>> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
>>>>> Problem:
>>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>>>> bad job was already freed while still being accessed from the
>>>>> timeout thread.
>>>>>
>>>>> Fix:
>>>>> Instead of just peeking at the bad job in the mirror list
>>>>> remove it from the list under lock and then put it back later when
>>>>> we are garanteed no race with main sched thread is possible which
>>>>> is after the thread is parked.
>>>>>
>>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>>>
>>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>>
>>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
>>>>>    1 file changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 6774955..1bf9c40 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>>>      unsigned long flags;
>>>>>
>>>>>      sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>>>>> +
>>>>> +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>>>>> +   spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>      job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>                                     struct drm_sched_job, node);
>>>>>
>>>>>      if (job) {
>>>>> +           /*
>>>>> +            * Remove the bad job so it cannot be freed by concurrent
>>>>> +            * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
>>>>> +            * is parked at which point it's safe.
>>>>> +            */
>>>>> +           list_del_init(&job->node);
>>>>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>>              job->sched->ops->timedout_job(job);
>>>>>
>>>>>              /*
>>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>>>                      job->sched->ops->free_job(job);
>>>>>                      sched->free_guilty = false;
>>>>>              }
>>>>> +   } else {
>>>>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>      }
>>>>>
>>>>>      spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>>>      kthread_park(sched->thread);
>>>>>
>>>>>      /*
>>>>> +    * Reinsert back the bad job here - now it's safe as
>>>>> +    * drm_sched_get_cleanup_job cannot race against us and release the
>>>>> +    * bad job at this point - we parked (waited for) any in progress
>>>>> +    * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
>>>>> +    * now until the scheduler thread is unparked.
>>>>> +    */
>>>>> +   if (bad && bad->sched == sched)
>>>>> +           /*
>>>>> +            * Add at the head of the queue to reflect it was the earliest
>>>>> +            * job extracted.
>>>>> +            */
>>>>> +           list_add(&bad->node, &sched->ring_mirror_list);
>>>>> +
>>>>> +   /*
>>>>>       * Iterate the job list from later to  earlier one and either deactive
>>>>>       * their HW callbacks or remove them from mirror list if they already
>>>>>       * signaled.
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=94EyD8X91MT5IVE8TN9%2FRYed8aIX6tN1Pvl8LJBkCeU%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0
Andrey Grodzovsky Feb. 6, 2020, 3:49 p.m. UTC | #19
On 2/6/20 9:51 AM, Christian König wrote:
> Am 06.02.20 um 15:49 schrieb Alex Deucher:
>> On Thu, Feb 6, 2020 at 6:50 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 06.02.20 um 12:10 schrieb Lucas Stach:
>>>> Hi all,
>>>>
>>>> On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
>>>>> Hi Andrey,
>>>>>
>>>>> This commit breaks all drivers, which may bail out of the timeout
>>>>> processing as they wish to extend the timeout (etnaviv, v3d).
>>>>>
>>>>> Those drivers currently just return from the timeout handler before
>>>>> calling drm_sched_stop(), which means with this commit applied we are
>>>>> removing the first job from the ring_mirror_list, but never put it
>>>>> back. This leads to jobs getting lost from the ring mirror, which 
>>>>> then
>>>>> causes quite a bit of fallout like unsignaled fences.
>>>>>
>>>>> Not sure yet what to do about it, we can either add a function to add
>>>>> the job back to the ring_mirror if the driver wants to extend the
>>>>> timeout, or we could look for another way to stop
>>>>> drm_sched_cleanup_jobs from freeing jobs that are currently in 
>>>>> timeout
>>>>> processing.
>>>> So after thinking about this a bit more my opinion is that we need to
>>>> revert this change for now and go back to the drawing board for the
>>>> scheduler timeout handling.
>>>>
>>>> Right now this starts to feel like a big midlayer mistake with all the
>>>> very intricate intertwining between the drivers and the scheduler. The
>>>> rules on when it's safe to manipulate the ring mirror and when
>>>> completed jobs are signaled and freed are not really well specified.
>>>> The fact that we need to mutate state in order to get rid of races
>>>> instead of having a single big "timeout processing is owner of the
>>>> scheduler state for now" is a big fat warning sign IMHO.
>>> Yes, that strongly feels like a hack to me as well. But I didn't had
>>> time and still haven't to take a closer look and suggest something 
>>> better.
>>>
>> In that case, can someone send me a revert?
>
> Well a revert would break our driver.
>
> The real solution is that somebody needs to sit down, gather ALL the 
> requirements and then come up with a solution which is clean and works 
> for everyone.
>
> Christian.


I can to take on this as indeed our general design on this becomes more 
and more entangled as GPU reset scenarios grow in complexity (at least 
in AMD driver). Currently I am on a high priority internal task which 
should take me around a week or 2 to finish and after that I can get to it.

Regarding temporary solution  - I looked into v3d and etnaviv use cases 
and we in AMD actually face the same scenario where we decide to skip HW 
reset if the guilty job did finish by the time we are processing the 
timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the 
difference is we always call drm_sched_stop/start irrespectively of 
whether we are going to actually HW reset or not (same as extend 
timeout). I wonder if something like this can be done also for ve3 and 
etnaviv ?

Andrey


>
>>
>> Alex
>>
>>
>>> Christian.
>>>
>>>> It took me far longer than I'd like to admit to understand the failure
>>>> mode with fences not getting signaled after a GPU hang. The back and
>>>> forth between scheduler and driver code makes things really hard to
>>>> follow.
>>>>
>>>> Regards,
>>>> Lucas
>>>>
>>>>> Regards,
>>>>> Lucas
>>>>>
>>>>> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
>>>>>> Problem:
>>>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>>>>> bad job was already freed while still being accessed from the
>>>>>> timeout thread.
>>>>>>
>>>>>> Fix:
>>>>>> Instead of just peeking at the bad job in the mirror list
>>>>>> remove it from the list under lock and then put it back later when
>>>>>> we are garanteed no race with main sched thread is possible which
>>>>>> is after the thread is parked.
>>>>>>
>>>>>> v2: Lock around processing ring_mirror_list in 
>>>>>> drm_sched_cleanup_jobs.
>>>>>>
>>>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>>>
>>>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27 
>>>>>> +++++++++++++++++++++++++++
>>>>>>    1 file changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 6774955..1bf9c40 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct 
>>>>>> work_struct *work)
>>>>>>      unsigned long flags;
>>>>>>
>>>>>>      sched = container_of(work, struct drm_gpu_scheduler, 
>>>>>> work_tdr.work);
>>>>>> +
>>>>>> +   /* Protects against concurrent deletion in 
>>>>>> drm_sched_get_cleanup_job */
>>>>>> +   spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>      job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>                                     struct drm_sched_job, node);
>>>>>>
>>>>>>      if (job) {
>>>>>> +           /*
>>>>>> +            * Remove the bad job so it cannot be freed by 
>>>>>> concurrent
>>>>>> +            * drm_sched_cleanup_jobs. It will be reinserted back 
>>>>>> after sched->thread
>>>>>> +            * is parked at which point it's safe.
>>>>>> +            */
>>>>>> +           list_del_init(&job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> job->sched->ops->timedout_job(job);
>>>>>>
>>>>>>              /*
>>>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct 
>>>>>> work_struct *work)
>>>>>> job->sched->ops->free_job(job);
>>>>>>                      sched->free_guilty = false;
>>>>>>              }
>>>>>> +   } else {
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>      }
>>>>>>
>>>>>>      spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>      kthread_park(sched->thread);
>>>>>>
>>>>>>      /*
>>>>>> +    * Reinsert back the bad job here - now it's safe as
>>>>>> +    * drm_sched_get_cleanup_job cannot race against us and 
>>>>>> release the
>>>>>> +    * bad job at this point - we parked (waited for) any in 
>>>>>> progress
>>>>>> +    * (earlier) cleanups and drm_sched_get_cleanup_job will not 
>>>>>> be called
>>>>>> +    * now until the scheduler thread is unparked.
>>>>>> +    */
>>>>>> +   if (bad && bad->sched == sched)
>>>>>> +           /*
>>>>>> +            * Add at the head of the queue to reflect it was the 
>>>>>> earliest
>>>>>> +            * job extracted.
>>>>>> +            */
>>>>>> +           list_add(&bad->node, &sched->ring_mirror_list);
>>>>>> +
>>>>>> +   /*
>>>>>>       * Iterate the job list from later to  earlier one and 
>>>>>> either deactive
>>>>>>       * their HW callbacks or remove them from mirror list if 
>>>>>> they already
>>>>>>       * signaled.
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0 
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=94EyD8X91MT5IVE8TN9%2FRYed8aIX6tN1Pvl8LJBkCeU%3D&amp;reserved=0 
>>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0 
>>>
>
Daniel Vetter Feb. 7, 2020, 3:26 p.m. UTC | #20
On Thu, Feb 6, 2020 at 3:51 PM Christian König <christian.koenig@amd.com> wrote:
>
> Am 06.02.20 um 15:49 schrieb Alex Deucher:
> > On Thu, Feb 6, 2020 at 6:50 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> Am 06.02.20 um 12:10 schrieb Lucas Stach:
> >>> Hi all,
> >>>
> >>> On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
> >>>> Hi Andrey,
> >>>>
> >>>> This commit breaks all drivers, which may bail out of the timeout
> >>>> processing as they wish to extend the timeout (etnaviv, v3d).
> >>>>
> >>>> Those drivers currently just return from the timeout handler before
> >>>> calling drm_sched_stop(), which means with this commit applied we are
> >>>> removing the first job from the ring_mirror_list, but never put it
> >>>> back. This leads to jobs getting lost from the ring mirror, which then
> >>>> causes quite a bit of fallout like unsignaled fences.
> >>>>
> >>>> Not sure yet what to do about it, we can either add a function to add
> >>>> the job back to the ring_mirror if the driver wants to extend the
> >>>> timeout, or we could look for another way to stop
> >>>> drm_sched_cleanup_jobs from freeing jobs that are currently in timeout
> >>>> processing.
> >>> So after thinking about this a bit more my opinion is that we need to
> >>> revert this change for now and go back to the drawing board for the
> >>> scheduler timeout handling.
> >>>
> >>> Right now this starts to feel like a big midlayer mistake with all the
> >>> very intricate intertwining between the drivers and the scheduler. The
> >>> rules on when it's safe to manipulate the ring mirror and when
> >>> completed jobs are signaled and freed are not really well specified.
> >>> The fact that we need to mutate state in order to get rid of races
> >>> instead of having a single big "timeout processing is owner of the
> >>> scheduler state for now" is a big fat warning sign IMHO.
> >> Yes, that strongly feels like a hack to me as well. But I didn't had
> >> time and still haven't to take a closer look and suggest something better.
> >>
> > In that case, can someone send me a revert?
>
> Well a revert would break our driver.
>
> The real solution is that somebody needs to sit down, gather ALL the
> requirements and then come up with a solution which is clean and works
> for everyone.

Uh generally oldest regression wins. As much as it sucks, but if we
don't do that then there's just too much room for arguing and maybe it
gets fixed in the next big rework ...
-Daniel

>
> Christian.
>
> >
> > Alex
> >
> >
> >> Christian.
> >>
> >>> It took me far longer than I'd like to admit to understand the failure
> >>> mode with fences not getting signaled after a GPU hang. The back and
> >>> forth between scheduler and driver code makes things really hard to
> >>> follow.
> >>>
> >>> Regards,
> >>> Lucas
> >>>
> >>>> Regards,
> >>>> Lucas
> >>>>
> >>>> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
> >>>>> Problem:
> >>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
> >>>>> drm_sched_job_timedout in timeout work there is a possiblity that
> >>>>> bad job was already freed while still being accessed from the
> >>>>> timeout thread.
> >>>>>
> >>>>> Fix:
> >>>>> Instead of just peeking at the bad job in the mirror list
> >>>>> remove it from the list under lock and then put it back later when
> >>>>> we are garanteed no race with main sched thread is possible which
> >>>>> is after the thread is parked.
> >>>>>
> >>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> >>>>>
> >>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> >>>>> drm_sched_get_cleanup_job already has a lock there.
> >>>>>
> >>>>> v4: Fix comments to relfect latest code in drm-misc.
> >>>>>
> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>>> Tested-by: Emily Deng <Emily.Deng@amd.com>
> >>>>> ---
> >>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++++
> >>>>>    1 file changed, 27 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> index 6774955..1bf9c40 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >>>>>      unsigned long flags;
> >>>>>
> >>>>>      sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> >>>>> +
> >>>>> +   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> >>>>> +   spin_lock_irqsave(&sched->job_list_lock, flags);
> >>>>>      job = list_first_entry_or_null(&sched->ring_mirror_list,
> >>>>>                                     struct drm_sched_job, node);
> >>>>>
> >>>>>      if (job) {
> >>>>> +           /*
> >>>>> +            * Remove the bad job so it cannot be freed by concurrent
> >>>>> +            * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> >>>>> +            * is parked at which point it's safe.
> >>>>> +            */
> >>>>> +           list_del_init(&job->node);
> >>>>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>>>> +
> >>>>>              job->sched->ops->timedout_job(job);
> >>>>>
> >>>>>              /*
> >>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >>>>>                      job->sched->ops->free_job(job);
> >>>>>                      sched->free_guilty = false;
> >>>>>              }
> >>>>> +   } else {
> >>>>> +           spin_unlock_irqrestore(&sched->job_list_lock, flags);
> >>>>>      }
> >>>>>
> >>>>>      spin_lock_irqsave(&sched->job_list_lock, flags);
> >>>>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >>>>>      kthread_park(sched->thread);
> >>>>>
> >>>>>      /*
> >>>>> +    * Reinsert back the bad job here - now it's safe as
> >>>>> +    * drm_sched_get_cleanup_job cannot race against us and release the
> >>>>> +    * bad job at this point - we parked (waited for) any in progress
> >>>>> +    * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> >>>>> +    * now until the scheduler thread is unparked.
> >>>>> +    */
> >>>>> +   if (bad && bad->sched == sched)
> >>>>> +           /*
> >>>>> +            * Add at the head of the queue to reflect it was the earliest
> >>>>> +            * job extracted.
> >>>>> +            */
> >>>>> +           list_add(&bad->node, &sched->ring_mirror_list);
> >>>>> +
> >>>>> +   /*
> >>>>>       * Iterate the job list from later to  earlier one and either deactive
> >>>>>       * their HW callbacks or remove them from mirror list if they already
> >>>>>       * signaled.
> >>>> _______________________________________________
> >>>> dri-devel mailing list
> >>>> dri-devel@lists.freedesktop.org
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0
> >>> _______________________________________________
> >>> amd-gfx mailing list
> >>> amd-gfx@lists.freedesktop.org
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=94EyD8X91MT5IVE8TN9%2FRYed8aIX6tN1Pvl8LJBkCeU%3D&amp;reserved=0
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&amp;sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&amp;reserved=0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Andrey Grodzovsky Feb. 10, 2020, 4:55 p.m. UTC | #21
Lucas - Ping on my question and also I attached this temporary solution 
for etnaviv to clarify my point. If that something acceptable for now at 
least i can do the same for v3d where it requires a bit more code changes.

Andrey

On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>> Well a revert would break our driver.
>>
>> The real solution is that somebody needs to sit down, gather ALL the 
>> requirements and then come up with a solution which is clean and 
>> works for everyone.
>>
>> Christian.
>
>
> I can to take on this as indeed our general design on this becomes 
> more and more entangled as GPU reset scenarios grow in complexity (at 
> least in AMD driver). Currently I am on a high priority internal task 
> which should take me around a week or 2 to finish and after that I can 
> get to it.
>
> Regarding temporary solution  - I looked into v3d and etnaviv use 
> cases and we in AMD actually face the same scenario where we decide to 
> skip HW reset if the guilty job did finish by the time we are 
> processing the timeout  (see amdgpu_device_gpu_recover and 
> skip_hw_reset goto) - the difference is we always call 
> drm_sched_stop/start irrespectively of whether we are going to 
> actually HW reset or not (same as extend timeout). I wonder if 
> something like this can be done also for ve3 and etnaviv ?
>
> Andrey
Luben Tuikov Feb. 10, 2020, 9:50 p.m. UTC | #22
Hi Lucas,

Thank you for bringing awareness of this issue, publicly.

As soon as this patch showed up back in November of 2019,
I objected to it, privately.

I suggested to instead use a _list_ to store the "state" of
all jobs of the same state. Then, at any time, timeout interrupt
or whatever, we can atomically (irq spinlock) move the timeout/bad
job to the timedout/cleanup/bad job list, and wake someone up
to deal with that list asynchronously, and return from the interrupt/etc.
immediately.

Then in due time, if any more interrupts or whatnot take place,
the job will either be in the timeout list or not. If it it,
then the instigator backs off as someone else (the list handler) will/is
awake and handling it (obviously a state variable may be kept as well).

This draws somewhat from my days with iSCSI, SCSI and SAS, 15 years ago,
where a device can complete a job (task) at anytime regardless
of what the SCSI layer "thinks" the task's state is: timed-out, aborted,
whatever. It is a very simple and elegant solution which generalizes
well.

Regards,
Luben

On 2020-02-10 11:55 a.m., Andrey Grodzovsky wrote:
> Lucas - Ping on my question and also I attached this temporary solution for etnaviv to clarify my point. If that something acceptable for now at least i can do the same for v3d where it requires a bit more code changes.
> 
> Andrey
> 
> On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>>> Well a revert would break our driver.
>>>
>>> The real solution is that somebody needs to sit down, gather ALL the requirements and then come up with a solution which is clean and works for everyone.
>>>
>>> Christian.
>>
>>
>> I can to take on this as indeed our general design on this becomes more and more entangled as GPU reset scenarios grow in complexity (at least in AMD driver). Currently I am on a high priority internal task which should take me around a week or 2 to finish and after that I can get to it.
>>
>> Regarding temporary solution  - I looked into v3d and etnaviv use cases and we in AMD actually face the same scenario where we decide to skip HW reset if the guilty job did finish by the time we are processing the timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the difference is we always call drm_sched_stop/start irrespectively of whether we are going to actually HW reset or not (same as extend timeout). I wonder if something like this can be done also for ve3 and etnaviv ?
>>
>> Andrey 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cluben.tuikov%40amd.com%7Cce97bc29988e4068ef8108d7ae4a043d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637169505277381327&amp;sdata=FyV0q3y5uWPwBgJF5QZLWARcXau916EUcYez2VA%2FqRA%3D&amp;reserved=0
>
Andrey Grodzovsky Feb. 11, 2020, 3:55 p.m. UTC | #23
On 2/10/20 4:50 PM, Luben Tuikov wrote:
> Hi Lucas,
>
> Thank you for bringing awareness of this issue, publicly.
>
> As soon as this patch showed up back in November of 2019,
> I objected to it, privately.


I didn't find this objection in my mail actually


>
> I suggested to instead use a _list_ to store the "state" of
> all jobs of the same state. Then, at any time, timeout interrupt
> or whatever, we can atomically (irq spinlock) move the timeout/bad
> job to the timedout/cleanup/bad job list, and wake someone up
> to deal with that list asynchronously, and return from the interrupt/etc.
> immediately.


Sounds a good idea to me, i think enough for us to have 2 lists, timeout 
list for jobs scheduled to HW and not yet completed (completion fence 
signaled) and cleanup list for those that did complete. This should give 
alternative solution to the race condition this patch was addressing 
without causing the break the Lucas reported. If no one objects I think 
i can try implement it.

Andrey


>
> Then in due time, if any more interrupts or whatnot take place,
> the job will either be in the timeout list or not. If it it,
> then the instigator backs off as someone else (the list handler) will/is
> awake and handling it (obviously a state variable may be kept as well).
>
> This draws somewhat from my days with iSCSI, SCSI and SAS, 15 years ago,
> where a device can complete a job (task) at anytime regardless
> of what the SCSI layer "thinks" the task's state is: timed-out, aborted,
> whatever. It is a very simple and elegant solution which generalizes
> well.
>
> Regards,
> Luben
>
> On 2020-02-10 11:55 a.m., Andrey Grodzovsky wrote:
>> Lucas - Ping on my question and also I attached this temporary solution for etnaviv to clarify my point. If that something acceptable for now at least i can do the same for v3d where it requires a bit more code changes.
>>
>> Andrey
>>
>> On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>>>> Well a revert would break our driver.
>>>>
>>>> The real solution is that somebody needs to sit down, gather ALL the requirements and then come up with a solution which is clean and works for everyone.
>>>>
>>>> Christian.
>>>
>>> I can to take on this as indeed our general design on this becomes more and more entangled as GPU reset scenarios grow in complexity (at least in AMD driver). Currently I am on a high priority internal task which should take me around a week or 2 to finish and after that I can get to it.
>>>
>>> Regarding temporary solution  - I looked into v3d and etnaviv use cases and we in AMD actually face the same scenario where we decide to skip HW reset if the guilty job did finish by the time we are processing the timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the difference is we always call drm_sched_stop/start irrespectively of whether we are going to actually HW reset or not (same as extend timeout). I wonder if something like this can be done also for ve3 and etnaviv ?
>>>
>>> Andrey
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cluben.tuikov%40amd.com%7Cce97bc29988e4068ef8108d7ae4a043d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637169505277381327&amp;sdata=FyV0q3y5uWPwBgJF5QZLWARcXau916EUcYez2VA%2FqRA%3D&amp;reserved=0
>>
Andrey Grodzovsky Feb. 11, 2020, 9:27 p.m. UTC | #24
On 2/11/20 10:55 AM, Andrey Grodzovsky wrote:
> On 2/10/20 4:50 PM, Luben Tuikov wrote:
>> Hi Lucas,
>>
>> Thank you for bringing awareness of this issue, publicly.
>>
>> As soon as this patch showed up back in November of 2019,
>> I objected to it, privately.
>
>
> I didn't find this objection in my mail actually
>
>
>>
>> I suggested to instead use a _list_ to store the "state" of
>> all jobs of the same state. Then, at any time, timeout interrupt
>> or whatever, we can atomically (irq spinlock) move the timeout/bad
>> job to the timedout/cleanup/bad job list, and wake someone up
>> to deal with that list asynchronously, and return from the 
>> interrupt/etc.
>> immediately.
>
>
> Sounds a good idea to me, i think enough for us to have 2 lists, 
> timeout list for jobs scheduled to HW and not yet completed 
> (completion fence signaled) and cleanup list for those that did 
> complete. This should give alternative solution to the race condition 
> this patch was addressing without causing the break the Lucas 
> reported. If no one objects I think i can try implement it.
>
> Andrey


Thinking more i realize Luben is right about having also bad job list as 
this is needed for normal job competition (by fence callback from 
amdgpu_fence_process)  and you need to decide if you move it to cleanup 
list from timeout list or not. If it's already in bad job list - meaning 
that it's being processed by GPU recovery code you don't touch it, 
otherwise you move it to cleanup list where it will be freed eventually 
by invocation of drm_sched_get_cleanup_job.

Andrey


>
>
>>
>> Then in due time, if any more interrupts or whatnot take place,
>> the job will either be in the timeout list or not. If it it,
>> then the instigator backs off as someone else (the list handler) will/is
>> awake and handling it (obviously a state variable may be kept as well).
>>
>> This draws somewhat from my days with iSCSI, SCSI and SAS, 15 years ago,
>> where a device can complete a job (task) at anytime regardless
>> of what the SCSI layer "thinks" the task's state is: timed-out, aborted,
>> whatever. It is a very simple and elegant solution which generalizes
>> well.
>>
>> Regards,
>> Luben
>>
>> On 2020-02-10 11:55 a.m., Andrey Grodzovsky wrote:
>>> Lucas - Ping on my question and also I attached this temporary 
>>> solution for etnaviv to clarify my point. If that something 
>>> acceptable for now at least i can do the same for v3d where it 
>>> requires a bit more code changes.
>>>
>>> Andrey
>>>
>>> On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>>>>> Well a revert would break our driver.
>>>>>
>>>>> The real solution is that somebody needs to sit down, gather ALL 
>>>>> the requirements and then come up with a solution which is clean 
>>>>> and works for everyone.
>>>>>
>>>>> Christian.
>>>>
>>>> I can to take on this as indeed our general design on this becomes 
>>>> more and more entangled as GPU reset scenarios grow in complexity 
>>>> (at least in AMD driver). Currently I am on a high priority 
>>>> internal task which should take me around a week or 2 to finish and 
>>>> after that I can get to it.
>>>>
>>>> Regarding temporary solution  - I looked into v3d and etnaviv use 
>>>> cases and we in AMD actually face the same scenario where we decide 
>>>> to skip HW reset if the guilty job did finish by the time we are 
>>>> processing the timeout  (see amdgpu_device_gpu_recover and 
>>>> skip_hw_reset goto) - the difference is we always call 
>>>> drm_sched_stop/start irrespectively of whether we are going to 
>>>> actually HW reset or not (same as extend timeout). I wonder if 
>>>> something like this can be done also for ve3 and etnaviv ?
>>>>
>>>> Andrey
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0 
>>>
>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0 
>
Luben Tuikov Feb. 12, 2020, 12:53 a.m. UTC | #25
On 2020-02-11 4:27 p.m., Andrey Grodzovsky wrote:
> 
> On 2/11/20 10:55 AM, Andrey Grodzovsky wrote:
>> On 2/10/20 4:50 PM, Luben Tuikov wrote:
>>> Hi Lucas,
>>>
>>> Thank you for bringing awareness of this issue, publicly.
>>>
>>> As soon as this patch showed up back in November of 2019,
>>> I objected to it, privately.
>>
>>
>> I didn't find this objection in my mail actually

Yes, I didn't send it to you.

>>> I suggested to instead use a _list_ to store the "state" of
>>> all jobs of the same state. Then, at any time, timeout interrupt
>>> or whatever, we can atomically (irq spinlock) move the timeout/bad
>>> job to the timedout/cleanup/bad job list, and wake someone up
>>> to deal with that list asynchronously, and return from the 
>>> interrupt/etc.
>>> immediately.
>>
>>
>> Sounds a good idea to me, i think enough for us to have 2 lists, 
>> timeout list for jobs scheduled to HW and not yet completed 
>> (completion fence signaled) and cleanup list for those that did 
>> complete. This should give alternative solution to the race condition 
>> this patch was addressing without causing the break the Lucas 
>> reported. If no one objects I think i can try implement it.
>>
>> Andrey
> 
> 
> Thinking more i realize Luben is right about having also bad job list as 
> this is needed for normal job competition (by fence callback from 
> amdgpu_fence_process)  and you need to decide if you move it to cleanup 
> list from timeout list or not. If it's already in bad job list - meaning 
> that it's being processed by GPU recovery code you don't touch it, 
> otherwise you move it to cleanup list where it will be freed eventually 
> by invocation of drm_sched_get_cleanup_job.

Yep...

Perhaps fewer lists, than "timeout", "bad" and "cleanup" could be had.
I'd also name the "bad" list as "recovery" list, as that is what would
be done to commands on that list.

"Timeout" is a status "timed-out", so perhaps just set the timeout
flag and move it to a "done" list. (Note that the command can still
complete asynchronously while on that list and while it has status
"timed-out'.)

The idea is that,
1) it avoid contention and races when more than one context
   can update the job at the same time, and
2) easy to process all jobs of a certain state and/or
   move them around, etc.

Let's discuss it and come up with a plan. :-)

Regards,
Luben




> 
> Andrey
> 
> 
>>
>>
>>>
>>> Then in due time, if any more interrupts or whatnot take place,
>>> the job will either be in the timeout list or not. If it it,
>>> then the instigator backs off as someone else (the list handler) will/is
>>> awake and handling it (obviously a state variable may be kept as well).
>>>
>>> This draws somewhat from my days with iSCSI, SCSI and SAS, 15 years ago,
>>> where a device can complete a job (task) at anytime regardless
>>> of what the SCSI layer "thinks" the task's state is: timed-out, aborted,
>>> whatever. It is a very simple and elegant solution which generalizes
>>> well.
>>>
>>> Regards,
>>> Luben
>>>
>>> On 2020-02-10 11:55 a.m., Andrey Grodzovsky wrote:
>>>> Lucas - Ping on my question and also I attached this temporary 
>>>> solution for etnaviv to clarify my point. If that something 
>>>> acceptable for now at least i can do the same for v3d where it 
>>>> requires a bit more code changes.
>>>>
>>>> Andrey
>>>>
>>>> On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>>>>>> Well a revert would break our driver.
>>>>>>
>>>>>> The real solution is that somebody needs to sit down, gather ALL 
>>>>>> the requirements and then come up with a solution which is clean 
>>>>>> and works for everyone.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>> I can to take on this as indeed our general design on this becomes 
>>>>> more and more entangled as GPU reset scenarios grow in complexity 
>>>>> (at least in AMD driver). Currently I am on a high priority 
>>>>> internal task which should take me around a week or 2 to finish and 
>>>>> after that I can get to it.
>>>>>
>>>>> Regarding temporary solution  - I looked into v3d and etnaviv use 
>>>>> cases and we in AMD actually face the same scenario where we decide 
>>>>> to skip HW reset if the guilty job did finish by the time we are 
>>>>> processing the timeout  (see amdgpu_device_gpu_recover and 
>>>>> skip_hw_reset goto) - the difference is we always call 
>>>>> drm_sched_stop/start irrespectively of whether we are going to 
>>>>> actually HW reset or not (same as extend timeout). I wonder if 
>>>>> something like this can be done also for ve3 and etnaviv ?
>>>>>
>>>>> Andrey
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0 
>>>>
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0 
>>
Andrey Grodzovsky Feb. 12, 2020, 4:33 p.m. UTC | #26
On 2/11/20 7:53 PM, Luben Tuikov wrote:
> On 2020-02-11 4:27 p.m., Andrey Grodzovsky wrote:
>> On 2/11/20 10:55 AM, Andrey Grodzovsky wrote:
>>> On 2/10/20 4:50 PM, Luben Tuikov wrote:
>>>> Hi Lucas,
>>>>
>>>> Thank you for bringing awareness of this issue, publicly.
>>>>
>>>> As soon as this patch showed up back in November of 2019,
>>>> I objected to it, privately.
>>>
>>> I didn't find this objection in my mail actually
> Yes, I didn't send it to you.
>
>>>> I suggested to instead use a _list_ to store the "state" of
>>>> all jobs of the same state. Then, at any time, timeout interrupt
>>>> or whatever, we can atomically (irq spinlock) move the timeout/bad
>>>> job to the timedout/cleanup/bad job list, and wake someone up
>>>> to deal with that list asynchronously, and return from the
>>>> interrupt/etc.
>>>> immediately.
>>>
>>> Sounds a good idea to me, i think enough for us to have 2 lists,
>>> timeout list for jobs scheduled to HW and not yet completed
>>> (completion fence signaled) and cleanup list for those that did
>>> complete. This should give alternative solution to the race condition
>>> this patch was addressing without causing the break the Lucas
>>> reported. If no one objects I think i can try implement it.
>>>
>>> Andrey
>>
>> Thinking more i realize Luben is right about having also bad job list as
>> this is needed for normal job competition (by fence callback from
>> amdgpu_fence_process)  and you need to decide if you move it to cleanup
>> list from timeout list or not. If it's already in bad job list - meaning
>> that it's being processed by GPU recovery code you don't touch it,
>> otherwise you move it to cleanup list where it will be freed eventually
>> by invocation of drm_sched_get_cleanup_job.
> Yep...
>
> Perhaps fewer lists, than "timeout", "bad" and "cleanup" could be had.
> I'd also name the "bad" list as "recovery" list, as that is what would
> be done to commands on that list.
>
> "Timeout" is a status "timed-out", so perhaps just set the timeout
> flag and move it to a "done" list. (Note that the command can still
> complete asynchronously while on that list and while it has status
> "timed-out'.)
>
> The idea is that,
> 1) it avoid contention and races when more than one context
>     can update the job at the same time, and
> 2) easy to process all jobs of a certain state and/or
>     move them around, etc.
>
> Let's discuss it and come up with a plan. :-)
>
> Regards,
> Luben


Sure, let me maybe come up with a draft patch so we have more concrete 
stuff to discuss and review.

Andrey



>
>
>
>
>> Andrey
>>
>>
>>>
>>>> Then in due time, if any more interrupts or whatnot take place,
>>>> the job will either be in the timeout list or not. If it it,
>>>> then the instigator backs off as someone else (the list handler) will/is
>>>> awake and handling it (obviously a state variable may be kept as well).
>>>>
>>>> This draws somewhat from my days with iSCSI, SCSI and SAS, 15 years ago,
>>>> where a device can complete a job (task) at anytime regardless
>>>> of what the SCSI layer "thinks" the task's state is: timed-out, aborted,
>>>> whatever. It is a very simple and elegant solution which generalizes
>>>> well.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>> On 2020-02-10 11:55 a.m., Andrey Grodzovsky wrote:
>>>>> Lucas - Ping on my question and also I attached this temporary
>>>>> solution for etnaviv to clarify my point. If that something
>>>>> acceptable for now at least i can do the same for v3d where it
>>>>> requires a bit more code changes.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
>>>>>>> Well a revert would break our driver.
>>>>>>>
>>>>>>> The real solution is that somebody needs to sit down, gather ALL
>>>>>>> the requirements and then come up with a solution which is clean
>>>>>>> and works for everyone.
>>>>>>>
>>>>>>> Christian.
>>>>>> I can to take on this as indeed our general design on this becomes
>>>>>> more and more entangled as GPU reset scenarios grow in complexity
>>>>>> (at least in AMD driver). Currently I am on a high priority
>>>>>> internal task which should take me around a week or 2 to finish and
>>>>>> after that I can get to it.
>>>>>>
>>>>>> Regarding temporary solution  - I looked into v3d and etnaviv use
>>>>>> cases and we in AMD actually face the same scenario where we decide
>>>>>> to skip HW reset if the guilty job did finish by the time we are
>>>>>> processing the timeout  (see amdgpu_device_gpu_recover and
>>>>>> skip_hw_reset goto) - the difference is we always call
>>>>>> drm_sched_stop/start irrespectively of whether we are going to
>>>>>> actually HW reset or not (same as extend timeout). I wonder if
>>>>>> something like this can be done also for ve3 and etnaviv ?
>>>>>>
>>>>>> Andrey
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0
>>>>>
>>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cef96617d23a54fe9b6ef08d7af0ac9db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637170333200621550&amp;sdata=wa7Eh3bdi%2BthYjjZF2yeTvTjNRipGPqVA%2FGQt0QL7R8%3D&amp;reserved=0
>>>
Lucas Stach July 21, 2020, 11:03 a.m. UTC | #27
Hi Andrey,

Am Mittwoch, den 12.02.2020, 11:33 -0500 schrieb Andrey Grodzovsky:
> On 2/11/20 7:53 PM, Luben Tuikov wrote:
> > On 2020-02-11 4:27 p.m., Andrey Grodzovsky wrote:
> > > On 2/11/20 10:55 AM, Andrey Grodzovsky wrote:
> > > > On 2/10/20 4:50 PM, Luben Tuikov wrote:
> > > > > Hi Lucas,
> > > > > 
> > > > > Thank you for bringing awareness of this issue, publicly.
> > > > > 
> > > > > As soon as this patch showed up back in November of 2019,
> > > > > I objected to it, privately.
> > > > 
> > > > I didn't find this objection in my mail actually
> > Yes, I didn't send it to you.
> > 
> > > > > I suggested to instead use a _list_ to store the "state" of
> > > > > all jobs of the same state. Then, at any time, timeout interrupt
> > > > > or whatever, we can atomically (irq spinlock) move the timeout/bad
> > > > > job to the timedout/cleanup/bad job list, and wake someone up
> > > > > to deal with that list asynchronously, and return from the
> > > > > interrupt/etc.
> > > > > immediately.
> > > > 
> > > > Sounds a good idea to me, i think enough for us to have 2 lists,
> > > > timeout list for jobs scheduled to HW and not yet completed
> > > > (completion fence signaled) and cleanup list for those that did
> > > > complete. This should give alternative solution to the race condition
> > > > this patch was addressing without causing the break the Lucas
> > > > reported. If no one objects I think i can try implement it.
> > > > 
> > > > Andrey
> > > 
> > > Thinking more i realize Luben is right about having also bad job list as
> > > this is needed for normal job competition (by fence callback from
> > > amdgpu_fence_process)  and you need to decide if you move it to cleanup
> > > list from timeout list or not. If it's already in bad job list - meaning
> > > that it's being processed by GPU recovery code you don't touch it,
> > > otherwise you move it to cleanup list where it will be freed eventually
> > > by invocation of drm_sched_get_cleanup_job.
> > Yep...
> > 
> > Perhaps fewer lists, than "timeout", "bad" and "cleanup" could be had.
> > I'd also name the "bad" list as "recovery" list, as that is what would
> > be done to commands on that list.
> > 
> > "Timeout" is a status "timed-out", so perhaps just set the timeout
> > flag and move it to a "done" list. (Note that the command can still
> > complete asynchronously while on that list and while it has status
> > "timed-out'.)
> > 
> > The idea is that,
> > 1) it avoid contention and races when more than one context
> >     can update the job at the same time, and
> > 2) easy to process all jobs of a certain state and/or
> >     move them around, etc.
> > 
> > Let's discuss it and come up with a plan. :-)
> > 
> > Regards,
> > Luben
> 
> Sure, let me maybe come up with a draft patch so we have more concrete 
> stuff to discuss and review.

It seems we all dropped the ball on this one. I believe this is still
an open issue. Has there been any progress from your side on fixing
this?

Regards,
Lucas
Andrey Grodzovsky July 21, 2020, 1:36 p.m. UTC | #28
Lucas, Luben picked the work on this a few month ago as I was diverted to a 
different project.

Luben, can you update Lucas please ?

Andrey

On 7/21/20 7:03 AM, Lucas Stach wrote:
> It seems we all dropped the ball on this one. I believe this is still
> an open issue. Has there been any progress from your side on fixing
> this?
>
> Regards,
> Lucas
Christian König July 21, 2020, 1:39 p.m. UTC | #29
Luben had a good idea how to tackle the whole job handling.

Andrey/Lucas can you work with Luben to get this cleaned up because 
there are a lot of requirements on this which not only come from AMD.

Thanks,
Christian.

Am 21.07.20 um 15:36 schrieb Andrey Grodzovsky:
> Lucas, Luben picked the work on this a few month ago as I was diverted 
> to a different project.
>
> Luben, can you update Lucas please ?
>
> Andrey
>
> On 7/21/20 7:03 AM, Lucas Stach wrote:
>> It seems we all dropped the ball on this one. I believe this is still
>> an open issue. Has there been any progress from your side on fixing
>> this?
>>
>> Regards,
>> Lucas
Andrey Grodzovsky July 21, 2020, 1:42 p.m. UTC | #30
Christian, I would want this very much but unfortunately I am on a strict 
schedule for an internal project currently and hence will not be able to 
actively participate. I will do my best to answer any questions Luben might have 
about current implementation.

Andrey

On 7/21/20 9:39 AM, Christian König wrote:
> Luben had a good idea how to tackle the whole job handling.
>
> Andrey/Lucas can you work with Luben to get this cleaned up because there are 
> a lot of requirements on this which not only come from AMD.
>
> Thanks,
> Christian.
>
> Am 21.07.20 um 15:36 schrieb Andrey Grodzovsky:
>> Lucas, Luben picked the work on this a few month ago as I was diverted to a 
>> different project.
>>
>> Luben, can you update Lucas please ?
>>
>> Andrey
>>
>> On 7/21/20 7:03 AM, Lucas Stach wrote:
>>> It seems we all dropped the ball on this one. I believe this is still
>>> an open issue. Has there been any progress from your side on fixing
>>> this?
>>>
>>> Regards,
>>> Lucas
>
Luben Tuikov July 21, 2020, 6:29 p.m. UTC | #31
Hi Lucas,

Thank you for following up on this. Some things have slowed down,
given the world pandemic we've been experiencing this year.

I've had the design ready and half of it implemented and committed
into a branch. Just as per what I wrote earlier this year on this thread.

I need to finish the rest which isn't big, but does need
some unravelling of the current code. Then I need testing,
which I suppose a number of people can help, so long as
they can make a frame time out and kick in the timeout handler.

I'll have more details in a few weeks.

Regards,
Luben

On 2020-07-21 9:42 a.m., Andrey Grodzovsky wrote:
> Christian, I would want this very much but unfortunately I am on a strict 
> schedule for an internal project currently and hence will not be able to 
> actively participate. I will do my best to answer any questions Luben might have 
> about current implementation.
> 
> Andrey
> 
> On 7/21/20 9:39 AM, Christian König wrote:
>> Luben had a good idea how to tackle the whole job handling.
>>
>> Andrey/Lucas can you work with Luben to get this cleaned up because there are 
>> a lot of requirements on this which not only come from AMD.
>>
>> Thanks,
>> Christian.
>>
>> Am 21.07.20 um 15:36 schrieb Andrey Grodzovsky:
>>> Lucas, Luben picked the work on this a few month ago as I was diverted to a 
>>> different project.
>>>
>>> Luben, can you update Lucas please ?
>>>
>>> Andrey
>>>
>>> On 7/21/20 7:03 AM, Lucas Stach wrote:
>>>> It seems we all dropped the ball on this one. I believe this is still
>>>> an open issue. Has there been any progress from your side on fixing
>>>> this?
>>>>
>>>> Regards,
>>>> Lucas
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 6774955..1bf9c40 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -284,10 +284,21 @@  static void drm_sched_job_timedout(struct work_struct *work)
 	unsigned long flags;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+
+	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+	spin_lock_irqsave(&sched->job_list_lock, flags);
 	job = list_first_entry_or_null(&sched->ring_mirror_list,
 				       struct drm_sched_job, node);
 
 	if (job) {
+		/*
+		 * Remove the bad job so it cannot be freed by concurrent
+		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
+		 * is parked at which point it's safe.
+		 */
+		list_del_init(&job->node);
+		spin_unlock_irqrestore(&sched->job_list_lock, flags);
+
 		job->sched->ops->timedout_job(job);
 
 		/*
@@ -298,6 +309,8 @@  static void drm_sched_job_timedout(struct work_struct *work)
 			job->sched->ops->free_job(job);
 			sched->free_guilty = false;
 		}
+	} else {
+		spin_unlock_irqrestore(&sched->job_list_lock, flags);
 	}
 
 	spin_lock_irqsave(&sched->job_list_lock, flags);
@@ -370,6 +383,20 @@  void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 	kthread_park(sched->thread);
 
 	/*
+	 * Reinsert back the bad job here - now it's safe as
+	 * drm_sched_get_cleanup_job cannot race against us and release the
+	 * bad job at this point - we parked (waited for) any in progress
+	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
+	 * now until the scheduler thread is unparked.
+	 */
+	if (bad && bad->sched == sched)
+		/*
+		 * Add at the head of the queue to reflect it was the earliest
+		 * job extracted.
+		 */
+		list_add(&bad->node, &sched->ring_mirror_list);
+
+	/*
 	 * Iterate the job list from later to  earlier one and either deactive
 	 * their HW callbacks or remove them from mirror list if they already
 	 * signaled.