diff mbox series

[2/2] lru: allow large batched add large folio to lru list

Message ID 20230417075643.3287513-3-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series Reduce lock contention related with large folio | expand

Commit Message

Yin, Fengwei April 17, 2023, 7:56 a.m. UTC
Currently, large folio is not batched added to lru list. Which
cause high lru lock contention after enable large folio for
anonymous mappping.

Running page_fault1 of will-it-scale + order 2 folio with 96
processes on Ice Lake 48C/96T, the lru lock contention could
be around 65%:
-   65.38%     0.17%  page_fault1_pro  [kernel.kallsyms]           [k]
    folio_lruvec_lock_irqsave
   - 65.21% folio_lruvec_lock_irqsave

With this patch, the lru lock contention dropped to 45% with same
testing:
-   44.93%     0.17%  page_fault1_pro  [kernel.kallsyms]           [k]
    folio_lruvec_lock_irqsave
   + 44.75% folio_lruvec_lock_irqsave

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 include/linux/pagevec.h | 19 +++++++++++++++++--
 mm/swap.c               |  3 +--
 2 files changed, 18 insertions(+), 4 deletions(-)

Comments

Matthew Wilcox April 17, 2023, 12:25 p.m. UTC | #1
On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote:
> Currently, large folio is not batched added to lru list. Which
> cause high lru lock contention after enable large folio for
> anonymous mappping.

Obviously, I think we should be doing a batched add, but I don't think
this is right.

> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec)
>  static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
>  {
>  	pvec->pages[pvec->nr++] = page;
> -	return pagevec_space(pvec);
> +	pvec->pages_nr += compound_nr(page);
> +
> +	if (pvec->pages_nr > PAGEVEC_SIZE)
> +		return 0;
> +	else
> +		return pagevec_space(pvec);

I assume your thinking here is that we should limit the number of pages
in the batches, but I think we should allow the number of folios to reach
PAGEVEC_SIZE before we drain the batch onto the LRU list.  That will
reduce the contention on the LRU lock even further.
Yin, Fengwei April 18, 2023, 1:57 a.m. UTC | #2
On 4/17/23 20:25, Matthew Wilcox wrote:
> On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote:
>> Currently, large folio is not batched added to lru list. Which
>> cause high lru lock contention after enable large folio for
>> anonymous mappping.
> 
> Obviously, I think we should be doing a batched add, but I don't think
> this is right.
> 
>> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec)
>>  static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
>>  {
>>  	pvec->pages[pvec->nr++] = page;
>> -	return pagevec_space(pvec);
>> +	pvec->pages_nr += compound_nr(page);
>> +
>> +	if (pvec->pages_nr > PAGEVEC_SIZE)
>> +		return 0;
>> +	else
>> +		return pagevec_space(pvec);
> 
> I assume your thinking here is that we should limit the number of pages
> in the batches, but I think we should allow the number of folios to reach
> PAGEVEC_SIZE before we drain the batch onto the LRU list.  That will
> reduce the contention on the LRU lock even further.

Yes. The first thought in my mind was to limit the number of folios also.

But the concern is that large folio has wider range of size. In the extreme
case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15,
one batch could have 30M memory. Which could be too large for some usages.


Regards
Yin, Fengwei

>
Yin, Fengwei April 18, 2023, 2:37 a.m. UTC | #3
Add Ying who found out the large folio is not batched added to lru.

On 4/18/23 09:57, Yin Fengwei wrote:
> 
> 
> On 4/17/23 20:25, Matthew Wilcox wrote:
>> On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote:
>>> Currently, large folio is not batched added to lru list. Which
>>> cause high lru lock contention after enable large folio for
>>> anonymous mappping.
>>
>> Obviously, I think we should be doing a batched add, but I don't think
>> this is right.
>>
>>> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec)
>>>  static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
>>>  {
>>>  	pvec->pages[pvec->nr++] = page;
>>> -	return pagevec_space(pvec);
>>> +	pvec->pages_nr += compound_nr(page);
>>> +
>>> +	if (pvec->pages_nr > PAGEVEC_SIZE)
>>> +		return 0;
>>> +	else
>>> +		return pagevec_space(pvec);
>>
>> I assume your thinking here is that we should limit the number of pages
>> in the batches, but I think we should allow the number of folios to reach
>> PAGEVEC_SIZE before we drain the batch onto the LRU list.  That will
>> reduce the contention on the LRU lock even further.
> 
> Yes. The first thought in my mind was to limit the number of folios also.
> 
> But the concern is that large folio has wider range of size. In the extreme
> case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15,
> one batch could have 30M memory. Which could be too large for some usages.
> 
> 
> Regards
> Yin, Fengwei
> 
>>
Huang, Ying April 18, 2023, 6:39 a.m. UTC | #4
Yin Fengwei <fengwei.yin@intel.com> writes:

> Add Ying who found out the large folio is not batched added to lru.

Thanks!

> On 4/18/23 09:57, Yin Fengwei wrote:
>> 
>> 
>> On 4/17/23 20:25, Matthew Wilcox wrote:
>>> On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote:
>>>> Currently, large folio is not batched added to lru list. Which
>>>> cause high lru lock contention after enable large folio for
>>>> anonymous mappping.
>>>
>>> Obviously, I think we should be doing a batched add, but I don't think
>>> this is right.
>>>
>>>> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec)
>>>>  static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
>>>>  {
>>>>  	pvec->pages[pvec->nr++] = page;
>>>> -	return pagevec_space(pvec);
>>>> +	pvec->pages_nr += compound_nr(page);
>>>> +
>>>> +	if (pvec->pages_nr > PAGEVEC_SIZE)

nr_pages appears better for me.

>>>> +		return 0;
>>>> +	else
>>>> +		return pagevec_space(pvec);
>>>
>>> I assume your thinking here is that we should limit the number of pages
>>> in the batches, but I think we should allow the number of folios to reach
>>> PAGEVEC_SIZE before we drain the batch onto the LRU list.  That will
>>> reduce the contention on the LRU lock even further.
>> 
>> Yes. The first thought in my mind was to limit the number of folios also.
>> 
>> But the concern is that large folio has wider range of size. In the extreme
>> case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15,
>> one batch could have 30M memory. Which could be too large for some usages.

Yes.  I think that these are valid concerns.  One possibility to balance
between performance and lru cache size is to make nr_pages of per-CPU
lru cache < PAGEVEC_SIZE * N.  Where N can be determined according to
the intended base order of large folios.  For example, it can be 4 if we
use 2 as intended base order.

Just my 2 cents.

Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index f582f7213ea52..d719f7ad5a567 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -10,6 +10,7 @@ 
 #define _LINUX_PAGEVEC_H
 
 #include <linux/xarray.h>
+#include <linux/mm.h>
 
 /* 15 pointers + header align the pagevec structure to a power of two */
 #define PAGEVEC_SIZE	15
@@ -22,6 +23,7 @@  struct address_space;
 struct pagevec {
 	unsigned char nr;
 	bool percpu_pvec_drained;
+	unsigned short pages_nr;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -30,6 +32,7 @@  void __pagevec_release(struct pagevec *pvec);
 static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
+	pvec->pages_nr = 0;
 	pvec->percpu_pvec_drained = false;
 }
 
@@ -54,7 +57,12 @@  static inline unsigned pagevec_space(struct pagevec *pvec)
 static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
 {
 	pvec->pages[pvec->nr++] = page;
-	return pagevec_space(pvec);
+	pvec->pages_nr += compound_nr(page);
+
+	if (pvec->pages_nr > PAGEVEC_SIZE)
+		return 0;
+	else
+		return pagevec_space(pvec);
 }
 
 static inline void pagevec_release(struct pagevec *pvec)
@@ -75,6 +83,7 @@  static inline void pagevec_release(struct pagevec *pvec)
 struct folio_batch {
 	unsigned char nr;
 	bool percpu_pvec_drained;
+	unsigned short pages_nr;
 	struct folio *folios[PAGEVEC_SIZE];
 };
 
@@ -92,6 +101,7 @@  static_assert(offsetof(struct pagevec, pages) ==
 static inline void folio_batch_init(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->pages_nr = 0;
 	fbatch->percpu_pvec_drained = false;
 }
 
@@ -124,7 +134,12 @@  static inline unsigned folio_batch_add(struct folio_batch *fbatch,
 		struct folio *folio)
 {
 	fbatch->folios[fbatch->nr++] = folio;
-	return fbatch_space(fbatch);
+	fbatch->pages_nr += folio_nr_pages(folio);
+
+	if (fbatch->pages_nr > PAGEVEC_SIZE)
+		return 0;
+	else
+		return fbatch_space(fbatch);
 }
 
 static inline void folio_batch_release(struct folio_batch *fbatch)
diff --git a/mm/swap.c b/mm/swap.c
index 423199ee8478c..59e3f1e3701c3 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -228,8 +228,7 @@  static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 static void folio_batch_add_and_move(struct folio_batch *fbatch,
 		struct folio *folio, move_fn_t move_fn)
 {
-	if (folio_batch_add(fbatch, folio) && !folio_test_large(folio) &&
-	    !lru_cache_disabled())
+	if (folio_batch_add(fbatch, folio) && !lru_cache_disabled())
 		return;
 	folio_batch_move_lru(fbatch, move_fn);
 }