diff mbox series

[2/2] drm/ttm: optimize the pool shrinker a bit v2

Message ID 20210415115624.2904-2-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series [1/2] mm/vmscan: add sync_shrinkers function | expand

Commit Message

Christian König April 15, 2021, 11:56 a.m. UTC
Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

v2: Add a comment explaining why we need sync_shrinkers().

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 44 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 22 deletions(-)

Comments

Andrew Morton April 15, 2021, 8:33 p.m. UTC | #1
On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>  			for (j = 0; j < MAX_ORDER; ++j)
>  				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>  	}
> +
> +	/* We removed the pool types from the LRU, but we need to also make sure
> +	 * that no shrinker is concurrently freeing pages from the pool.
> +	 */
> +	sync_shrinkers();

It isn't immediately clear to me how this works.  ttm_pool_fini() has
already freed all the pages hasn't it?  So why would it care if some
shrinkers are still playing with the pages?

Or is it the case that ttm_pool_fini() is assuming that there will be
some further action against these pages, which requires that shrinkers
no longer be accessing the pages and which further assumes that future
shrinker invocations will not be able to look up these pages?

IOW, a bit more explanation about the dynamics here would help!
Christian König April 16, 2021, 7:08 a.m. UTC | #2
Am 15.04.21 um 22:33 schrieb Andrew Morton:
> On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>   			for (j = 0; j < MAX_ORDER; ++j)
>>   				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>>   	}
>> +
>> +	/* We removed the pool types from the LRU, but we need to also make sure
>> +	 * that no shrinker is concurrently freeing pages from the pool.
>> +	 */
>> +	sync_shrinkers();
> It isn't immediately clear to me how this works.  ttm_pool_fini() has
> already freed all the pages hasn't it?  So why would it care if some
> shrinkers are still playing with the pages?

Yes ttm_pool_fini() has freed up all pages which had been in the pool 
when the function was called.

But the problem is it is possible that a parallel running shrinker has 
taken a page from the pool and is in the process of freeing it up.

When I return here the pool structure and especially the device 
structure are freed while the parallel running shrinker is still using them.

I could go for a design where we have one shrinker per device instead, 
but that would put a bit to much pressure on the pool in my opinion.

> Or is it the case that ttm_pool_fini() is assuming that there will be
> some further action against these pages, which requires that shrinkers
> no longer be accessing the pages and which further assumes that future
> shrinker invocations will not be able to look up these pages?
>
> IOW, a bit more explanation about the dynamics here would help!

Sorry, I'm not a native speaker of English and sometimes still have a 
hard time explaining things.

Regards,
Christian.
Christian König April 26, 2021, 11:15 a.m. UTC | #3
Just a gentle ping?

Are you ok with this explanation Andrew or should I look for a different 
approach?

Thanks,
Christian.

Am 16.04.21 um 09:08 schrieb Christian König:
> Am 15.04.21 um 22:33 schrieb Andrew Morton:
>> On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" 
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>>               for (j = 0; j < MAX_ORDER; ++j)
>>> ttm_pool_type_fini(&pool->caching[i].orders[j]);
>>>       }
>>> +
>>> +    /* We removed the pool types from the LRU, but we need to also 
>>> make sure
>>> +     * that no shrinker is concurrently freeing pages from the pool.
>>> +     */
>>> +    sync_shrinkers();
>> It isn't immediately clear to me how this works. ttm_pool_fini() has
>> already freed all the pages hasn't it?  So why would it care if some
>> shrinkers are still playing with the pages?
>
> Yes ttm_pool_fini() has freed up all pages which had been in the pool 
> when the function was called.
>
> But the problem is it is possible that a parallel running shrinker has 
> taken a page from the pool and is in the process of freeing it up.
>
> When I return here the pool structure and especially the device 
> structure are freed while the parallel running shrinker is still using 
> them.
>
> I could go for a design where we have one shrinker per device instead, 
> but that would put a bit to much pressure on the pool in my opinion.
>
>> Or is it the case that ttm_pool_fini() is assuming that there will be
>> some further action against these pages, which requires that shrinkers
>> no longer be accessing the pages and which further assumes that future
>> shrinker invocations will not be able to look up these pages?
>>
>> IOW, a bit more explanation about the dynamics here would help!
>
> Sorry, I'm not a native speaker of English and sometimes still have a 
> hard time explaining things.
>
> Regards,
> Christian.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..955836d569cc 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@  static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@  static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	spin_lock_init(&pt->lock);
 	INIT_LIST_HEAD(&pt->pages);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@  static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_del(&pt->shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	while ((p = ttm_pool_type_take(pt)))
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,19 @@  static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
 	struct ttm_pool_type *pt;
-	unsigned int num_freed;
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
+	list_move_tail(&pt->shrinker_list, &shrinker_list);
+	spin_unlock(&shrinker_lock);
 
 	p = ttm_pool_type_take(pt);
-	if (p) {
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-		num_freed = 1 << pt->order;
-	} else {
-		num_freed = 0;
-	}
-
-	list_move_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	if (!p)
+		return 0;
 
-	return num_freed;
+	ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+	return 1 << pt->order;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +525,11 @@  void ttm_pool_fini(struct ttm_pool *pool)
 			for (j = 0; j < MAX_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
+
+	/* We removed the pool types from the LRU, but we need to also make sure
+	 * that no shrinker is concurrently freeing pages from the pool.
+	 */
+	sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +604,7 @@  static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 {
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	seq_puts(m, "wc\t:");
 	ttm_pool_debugfs_orders(global_write_combined, m);
 	seq_puts(m, "uc\t:");
@@ -613,7 +613,7 @@  static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 	ttm_pool_debugfs_orders(global_dma32_write_combined, m);
 	seq_puts(m, "uc 32\t:");
 	ttm_pool_debugfs_orders(global_dma32_uncached, m);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 
@@ -640,7 +640,7 @@  int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		seq_puts(m, "DMA ");
 		switch (i) {
@@ -656,7 +656,7 @@  int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 		}
 		ttm_pool_debugfs_orders(pool->caching[i].orders, m);
 	}
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 	return 0;
@@ -693,7 +693,7 @@  int ttm_pool_mgr_init(unsigned long num_pages)
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
-	mutex_init(&shrinker_lock);
+	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
 	for (i = 0; i < MAX_ORDER; ++i) {