diff mbox series

[8/8] drm/panfrost: Make sure the shrinker does not reclaim referenced BOs

Message ID 20191129135908.2439529-9-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series panfrost: Fixes for 5.4 | expand

Commit Message

Boris Brezillon Nov. 29, 2019, 1:59 p.m. UTC
Userspace might tag a BO purgeable while it's still referenced by GPU
jobs. We need to make sure the shrinker does not purge such BOs until
all jobs referencing it are finished.

Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c          | 1 +
 drivers/gpu/drm/panfrost/panfrost_gem.h          | 6 ++++++
 drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 ++
 drivers/gpu/drm/panfrost/panfrost_job.c          | 7 ++++++-
 4 files changed, 15 insertions(+), 1 deletion(-)

Comments

Steven Price Nov. 29, 2019, 3:48 p.m. UTC | #1
On 29/11/2019 13:59, Boris Brezillon wrote:
> Userspace might tag a BO purgeable while it's still referenced by GPU
> jobs. We need to make sure the shrinker does not purge such BOs until
> all jobs referencing it are finished.
> 
> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

I'm happy with this, but I would also argue that it was previously user
space's job not to mark a BO purgeable while it's in use by the GPU.
This is in some ways an ABI change so we should be sure this is
something that we want to support "forever" more. But if Mesa has
'always' been assuming this behaviour we might as well match.

Reviewed-by: Steven Price <steven.price@arm.com>

Steve

> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c          | 1 +
>  drivers/gpu/drm/panfrost/panfrost_gem.h          | 6 ++++++
>  drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_job.c          | 7 ++++++-
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index b406b5243b40..297c0e7304d2 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -166,6 +166,7 @@ panfrost_lookup_bos(struct drm_device *dev,
>  			break;
>  		}
>  
> +		atomic_inc(&bo->gpu_usecount);
>  		job->mappings[i] = mapping;
>  	}
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index ca1bc9019600..b3517ff9630c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -30,6 +30,12 @@ struct panfrost_gem_object {
>  		struct mutex lock;
>  	} mappings;
>  
> +	/*
> +	 * Count the number of jobs referencing this BO so we don't let the
> +	 * shrinker reclaim this object prematurely.
> +	 */
> +	atomic_t gpu_usecount;
> +
>  	bool noexec		:1;
>  	bool is_heap		:1;
>  };
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> index b36df326c860..288e46c40673 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> @@ -41,6 +41,8 @@ static bool panfrost_gem_purge(struct drm_gem_object *obj)
>  	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
>  	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>  
> +	if (atomic_read(&bo->gpu_usecount))
> +		return false;
>  
>  	if (!mutex_trylock(&shmem->pages_lock))
>  		return false;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index c85d45be3b5e..2b12aa87ff32 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -270,8 +270,13 @@ static void panfrost_job_cleanup(struct kref *ref)
>  	dma_fence_put(job->render_done_fence);
>  
>  	if (job->mappings) {
> -		for (i = 0; i < job->bo_count; i++)
> +		for (i = 0; i < job->bo_count; i++) {
> +			if (!job->mappings[i])
> +				break;
> +
> +			atomic_dec(&job->mappings[i]->obj->gpu_usecount);
>  			panfrost_gem_mapping_put(job->mappings[i]);
> +		}
>  		kvfree(job->mappings);
>  	}
>  
>
Boris Brezillon Nov. 29, 2019, 4:07 p.m. UTC | #2
On Fri, 29 Nov 2019 15:48:15 +0000
Steven Price <steven.price@arm.com> wrote:

> On 29/11/2019 13:59, Boris Brezillon wrote:
> > Userspace might tag a BO purgeable while it's still referenced by GPU
> > jobs. We need to make sure the shrinker does not purge such BOs until
> > all jobs referencing it are finished.
> > 
> > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> I'm happy with this, but I would also argue that it was previously user
> space's job not to mark a BO purgeable while it's in use by the GPU.

I was not aware of this design choice.

> This is in some ways an ABI change so we should be sure this is
> something that we want to support "forever" more.

Well, in that case, the ABI change is bringing extra robustness, with
AFAICT, no downside for those that were taking care of that in
userspace.

> But if Mesa has
> 'always' been assuming this behaviour we might as well match.

I think so, and VC4 seems to have the same expectations.

> 
> Reviewed-by: Steven Price <steven.price@arm.com>

Thanks for the review.
Steven Price Nov. 29, 2019, 4:12 p.m. UTC | #3
On 29/11/2019 16:07, Boris Brezillon wrote:
> On Fri, 29 Nov 2019 15:48:15 +0000
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 29/11/2019 13:59, Boris Brezillon wrote:
>>> Userspace might tag a BO purgeable while it's still referenced by GPU
>>> jobs. We need to make sure the shrinker does not purge such BOs until
>>> all jobs referencing it are finished.
>>>
>>> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
>>
>> I'm happy with this, but I would also argue that it was previously user
>> space's job not to mark a BO purgeable while it's in use by the GPU.
> 
> I was not aware of this design choice.

When I was maintaining mali_kbase I would have said no to this ;) But
thankfully I've moved on! I'm not sure whether anyone actually made a
design choice for Panfrost here.

>> This is in some ways an ABI change so we should be sure this is
>> something that we want to support "forever" more.
> 
> Well, in that case, the ABI change is bringing extra robustness, with
> AFAICT, no downside for those that were taking care of that in
> userspace.

Yes, there's no downside for user space - this is just giving user space
extra freedom.

>> But if Mesa has
>> 'always' been assuming this behaviour we might as well match.
> 
> I think so, and VC4 seems to have the same expectations.

This seems like enough justification to me.

Steve

>>
>> Reviewed-by: Steven Price <steven.price@arm.com>
> 
> Thanks for the review.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Robin Murphy Dec. 2, 2019, 12:50 p.m. UTC | #4
On 29/11/2019 1:59 pm, Boris Brezillon wrote:
> Userspace might tag a BO purgeable while it's still referenced by GPU
> jobs. We need to make sure the shrinker does not purge such BOs until
> all jobs referencing it are finished.

Nit: for extra robustness, perhaps it's worth using the refcount_t API 
rather than bare atomic_t?

Robin.

> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>   drivers/gpu/drm/panfrost/panfrost_drv.c          | 1 +
>   drivers/gpu/drm/panfrost/panfrost_gem.h          | 6 ++++++
>   drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_job.c          | 7 ++++++-
>   4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index b406b5243b40..297c0e7304d2 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -166,6 +166,7 @@ panfrost_lookup_bos(struct drm_device *dev,
>   			break;
>   		}
>   
> +		atomic_inc(&bo->gpu_usecount);
>   		job->mappings[i] = mapping;
>   	}
>   
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index ca1bc9019600..b3517ff9630c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -30,6 +30,12 @@ struct panfrost_gem_object {
>   		struct mutex lock;
>   	} mappings;
>   
> +	/*
> +	 * Count the number of jobs referencing this BO so we don't let the
> +	 * shrinker reclaim this object prematurely.
> +	 */
> +	atomic_t gpu_usecount;
> +
>   	bool noexec		:1;
>   	bool is_heap		:1;
>   };
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> index b36df326c860..288e46c40673 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> @@ -41,6 +41,8 @@ static bool panfrost_gem_purge(struct drm_gem_object *obj)
>   	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
>   	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>   
> +	if (atomic_read(&bo->gpu_usecount))
> +		return false;
>   
>   	if (!mutex_trylock(&shmem->pages_lock))
>   		return false;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index c85d45be3b5e..2b12aa87ff32 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -270,8 +270,13 @@ static void panfrost_job_cleanup(struct kref *ref)
>   	dma_fence_put(job->render_done_fence);
>   
>   	if (job->mappings) {
> -		for (i = 0; i < job->bo_count; i++)
> +		for (i = 0; i < job->bo_count; i++) {
> +			if (!job->mappings[i])
> +				break;
> +
> +			atomic_dec(&job->mappings[i]->obj->gpu_usecount);
>   			panfrost_gem_mapping_put(job->mappings[i]);
> +		}
>   		kvfree(job->mappings);
>   	}
>   
>
Boris Brezillon Dec. 2, 2019, 1:32 p.m. UTC | #5
On Mon, 2 Dec 2019 12:50:20 +0000
Robin Murphy <robin.murphy@arm.com> wrote:

> On 29/11/2019 1:59 pm, Boris Brezillon wrote:
> > Userspace might tag a BO purgeable while it's still referenced by GPU
> > jobs. We need to make sure the shrinker does not purge such BOs until
> > all jobs referencing it are finished.  
> 
> Nit: for extra robustness, perhaps it's worth using the refcount_t API 
> rather than bare atomic_t?

I considered doing that. The problem is, we start counting from 0, not
1, and the refcount API assumes counters start at 0, and should never
see a 0 -> 1 transition. I guess we could do

	if (refcount_inc_not_zero()) {
		...
	} else {
		refcount_set(1);
		...
	}

so we at least get the integer overflow/underflow protection.

Anyway, I'm reworking the gem_shmem code so we can refcount the sgt
users (I actually re-use the page_use_count counter and turn into a
refcount_t so we don't need to take the lock if it's > 0). With this
change I think I won't need the gpu_usecount.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b406b5243b40..297c0e7304d2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -166,6 +166,7 @@  panfrost_lookup_bos(struct drm_device *dev,
 			break;
 		}
 
+		atomic_inc(&bo->gpu_usecount);
 		job->mappings[i] = mapping;
 	}
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ca1bc9019600..b3517ff9630c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -30,6 +30,12 @@  struct panfrost_gem_object {
 		struct mutex lock;
 	} mappings;
 
+	/*
+	 * Count the number of jobs referencing this BO so we don't let the
+	 * shrinker reclaim this object prematurely.
+	 */
+	atomic_t gpu_usecount;
+
 	bool noexec		:1;
 	bool is_heap		:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index b36df326c860..288e46c40673 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -41,6 +41,8 @@  static bool panfrost_gem_purge(struct drm_gem_object *obj)
 	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
 	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
 
+	if (atomic_read(&bo->gpu_usecount))
+		return false;
 
 	if (!mutex_trylock(&shmem->pages_lock))
 		return false;
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index c85d45be3b5e..2b12aa87ff32 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -270,8 +270,13 @@  static void panfrost_job_cleanup(struct kref *ref)
 	dma_fence_put(job->render_done_fence);
 
 	if (job->mappings) {
-		for (i = 0; i < job->bo_count; i++)
+		for (i = 0; i < job->bo_count; i++) {
+			if (!job->mappings[i])
+				break;
+
+			atomic_dec(&job->mappings[i]->obj->gpu_usecount);
 			panfrost_gem_mapping_put(job->mappings[i]);
+		}
 		kvfree(job->mappings);
 	}