diff mbox series

[1/3] dmapool: improve scalability of dma_pool_alloc

Message ID 15ff502d-d840-1003-6c45-bc17f0d81262@cybernetics.com (mailing list archive)
State New, archived
Headers show
Series mpt3sas and dmapool scalability | expand

Commit Message

Tony Battersby July 26, 2018, 6:54 p.m. UTC
dma_pool_alloc() scales poorly when allocating a large number of pages
because it does a linear scan of all previously-allocated pages before
allocating a new one.  Improve its scalability by maintaining a separate
list of pages that have free blocks ready to (re)allocate.  In big O
notation, this improves the algorithm from O(n^2) to O(n).

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

Using list_del_init() in dma_pool_alloc() makes it safe to call
list_del() unconditionally when freeing the page.

In dma_pool_free(), the check for being already in avail_page_list could
be written several different ways.  The most obvious way is:

if (page->offset >= pool->allocation)
	list_add(&page->avail_page_link, &pool->avail_page_list);

Another way would be to check page->in_use.  But since it is already
using list_del_init(), checking the list pointers directly is safest to
prevent any possible list corruption in case the caller misuses the API
(e.g. double-dma_pool_free()) with DMAPOOL_DEBUG disabled.

Comments

Andy Shevchenko July 26, 2018, 7:37 p.m. UTC | #1
On Thu, Jul 26, 2018 at 9:54 PM, Tony Battersby <tonyb@cybernetics.com> wrote:
> dma_pool_alloc() scales poorly when allocating a large number of pages
> because it does a linear scan of all previously-allocated pages before
> allocating a new one.  Improve its scalability by maintaining a separate
> list of pages that have free blocks ready to (re)allocate.  In big O
> notation, this improves the algorithm from O(n^2) to O(n).



>         spin_lock_irqsave(&pool->lock, flags);
> -       list_for_each_entry(page, &pool->page_list, page_list) {
> -               if (page->offset < pool->allocation)
> -                       goto ready;

> +       if (!list_empty(&pool->avail_page_list)) {
> +               page = list_first_entry(&pool->avail_page_list,
> +                                       struct dma_page,
> +                                       avail_page_link);
> +               goto ready;
>         }

It looks like

page = list_first_entry_or_null();
if (page)
 goto ready;

Though I don't know which one produces better code in the result.

From reader prospective of view I would go with my variant.


> +       /* This test checks if the page is already in avail_page_list. */
> +       if (list_empty(&page->avail_page_link))
> +               list_add(&page->avail_page_link, &pool->avail_page_list);

How can you be sure that the page you are testing for is the first one?

It seems you are relying on the fact that in the list should be either
0 or 1 page. In that case what's the point to have a list?
Tony Battersby July 26, 2018, 7:56 p.m. UTC | #2
On 07/26/2018 03:37 PM, Andy Shevchenko wrote:
> On Thu, Jul 26, 2018 at 9:54 PM, Tony Battersby <tonyb@cybernetics.com> wrote:
>> dma_pool_alloc() scales poorly when allocating a large number of pages
>> because it does a linear scan of all previously-allocated pages before
>> allocating a new one.  Improve its scalability by maintaining a separate
>> list of pages that have free blocks ready to (re)allocate.  In big O
>> notation, this improves the algorithm from O(n^2) to O(n).
>
>
>>         spin_lock_irqsave(&pool->lock, flags);
>> -       list_for_each_entry(page, &pool->page_list, page_list) {
>> -               if (page->offset < pool->allocation)
>> -                       goto ready;
>> +       if (!list_empty(&pool->avail_page_list)) {
>> +               page = list_first_entry(&pool->avail_page_list,
>> +                                       struct dma_page,
>> +                                       avail_page_link);
>> +               goto ready;
>>         }
> It looks like
>
> page = list_first_entry_or_null();
> if (page)
>  goto ready;
>
> Though I don't know which one produces better code in the result.
>
> >From reader prospective of view I would go with my variant.

Thanks, I didn't know about list_first_entry_or_null().

>
>> +       /* This test checks if the page is already in avail_page_list. */
>> +       if (list_empty(&page->avail_page_link))
>> +               list_add(&page->avail_page_link, &pool->avail_page_list);
> How can you be sure that the page you are testing for is the first one?
>
> It seems you are relying on the fact that in the list should be either
> 0 or 1 page. In that case what's the point to have a list?
>
That would be true if the test were "if (list_empty(&pool->avail_page_list))".  But it is testing the list pointers in the item rather than the list pointers in the pool.  It may be a bit confusing if you have never seen that usage before, which is why I added a comment.  Basically, if you use list_del_init() instead of list_del(), then you can use list_empty() on the item itself to test if the item is present in a list or not.  For example, the comments in list.h warn not to use list_empty() on the entry after just list_del():

/**
 * list_del - deletes entry from list.
 * @entry: the element to delete from the list.
 * Note: list_empty() on entry does not return true after this, the entry is
 * in an undefined state.
 */
Tony Battersby July 27, 2018, 1:50 p.m. UTC | #3
> That would be true if the test were "if
> (list_empty(&pool->avail_page_list))".  But it is testing the list
> pointers in the item rather than the list pointers in the pool.  It may
> be a bit confusing if you have never seen that usage before, which is
> why I added a comment.  Basically, if you use list_del_init() instead of
> list_del(), then you can use list_empty() on the item itself to test if
> the item is present in a list or not.  For example, the comments in
> list.h warn not to use list_empty() on the entry after just list_del():
>
> /**
>  * list_del - deletes entry from list.
>  * @entry: the element to delete from the list.
>  * Note: list_empty() on entry does not return true after this, the entry is
>  * in an undefined state.
>  */

Sorry for the crappy line length (fixed above).  Should have just used
Preformat in Thunderbird like always rather than following
Documentation/process/email-clients.rst and changing mailnews.wraplength
from 72 to 0.

Anyway, I have been using list_del_init() for a long time in various
places, but now I can't find where any of its useful features are
documented.  I will have to submit a separate patch to add more
documentation for it.  I find it useful for two things.

1) If you use list_del_init(), you can delete an item from a list
multiple times without checking if it has already been deleted.  So the
following is safe:

list_add(entry, list);
list_del_init(entry);
list_del_init(entry);

That would not be safe if just using list_del().

2) If you use list_del_init(), you can test if an item is present in a
list using list_empty() on the entry.  So:

list_add(entry, list);
/* list_empty(entry) is false */
list_del_init(entry);
/* list_empty(entry) is true */

That would not work if using just list_del().

Since the list_empty() name is unintuitive for this purpose, it might be
useful to add a new macro for this use case.
diff mbox series

Patch

--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -20,6 +20,10 @@ 
  * least 'size' bytes.  Free blocks are tracked in an unsorted singly-linked
  * list of free blocks within the page.  Used blocks aren't tracked, but we
  * keep a count of how many are currently allocated from each page.
+ *
+ * The avail_page_list keeps track of pages that have one or more free blocks
+ * available to (re)allocate.  Pages are moved in and out of avail_page_list
+ * as their blocks are allocated and freed.
  */
 
 #include <linux/device.h>
@@ -44,6 +48,7 @@ 
 
 struct dma_pool {		/* the pool */
 	struct list_head page_list;
+	struct list_head avail_page_list;
 	spinlock_t lock;
 	size_t size;
 	struct device *dev;
@@ -55,6 +60,7 @@  struct dma_pool {		/* the pool */
 
 struct dma_page {		/* cacheable header for 'allocation' bytes */
 	struct list_head page_list;
+	struct list_head avail_page_link;
 	void *vaddr;
 	dma_addr_t dma;
 	unsigned int in_use;
@@ -164,6 +170,7 @@  struct dma_pool *dma_pool_create(const c
 	retval->dev = dev;
 
 	INIT_LIST_HEAD(&retval->page_list);
+	INIT_LIST_HEAD(&retval->avail_page_list);
 	spin_lock_init(&retval->lock);
 	retval->size = size;
 	retval->boundary = boundary;
@@ -256,6 +263,7 @@  static void pool_free_page(struct dma_po
 #endif
 	dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
 	list_del(&page->page_list);
+	list_del(&page->avail_page_link);
 	kfree(page);
 }
 
@@ -298,6 +306,7 @@  void dma_pool_destroy(struct dma_pool *p
 				       pool->name, page->vaddr);
 			/* leak the still-in-use consistent memory */
 			list_del(&page->page_list);
+			list_del(&page->avail_page_link);
 			kfree(page);
 		} else
 			pool_free_page(pool, page);
@@ -328,9 +337,11 @@  void *dma_pool_alloc(struct dma_pool *po
 	might_sleep_if(gfpflags_allow_blocking(mem_flags));
 
 	spin_lock_irqsave(&pool->lock, flags);
-	list_for_each_entry(page, &pool->page_list, page_list) {
-		if (page->offset < pool->allocation)
-			goto ready;
+	if (!list_empty(&pool->avail_page_list)) {
+		page = list_first_entry(&pool->avail_page_list,
+					struct dma_page,
+					avail_page_link);
+		goto ready;
 	}
 
 	/* pool_alloc_page() might sleep, so temporarily drop &pool->lock */
@@ -343,10 +354,13 @@  void *dma_pool_alloc(struct dma_pool *po
 	spin_lock_irqsave(&pool->lock, flags);
 
 	list_add(&page->page_list, &pool->page_list);
+	list_add(&page->avail_page_link, &pool->avail_page_list);
  ready:
 	page->in_use++;
 	offset = page->offset;
 	page->offset = *(int *)(page->vaddr + offset);
+	if (page->offset >= pool->allocation)
+		list_del_init(&page->avail_page_link);
 	retval = offset + page->vaddr;
 	*handle = offset + page->dma;
 #ifdef	DMAPOOL_DEBUG
@@ -461,6 +475,10 @@  void dma_pool_free(struct dma_pool *pool
 	memset(vaddr, POOL_POISON_FREED, pool->size);
 #endif
 
+	/* This test checks if the page is already in avail_page_list. */
+	if (list_empty(&page->avail_page_link))
+		list_add(&page->avail_page_link, &pool->avail_page_list);
+
 	page->in_use--;
 	*(int *)vaddr = page->offset;
 	page->offset = offset;