diff mbox series

[v3,1/2] THP: avoid lock when check whether THP is in deferred list

Message ID 20230429082759.1600796-2-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series Reduce lock contention related with large folio | expand

Commit Message

Yin, Fengwei April 29, 2023, 8:27 a.m. UTC
free_transhuge_page() acquires split queue lock then check
whether the THP was added to deferred list or not. It brings
high deferred queue lock contention.

It's safe to check whether the THP is in deferred list or not
without holding the deferred queue lock in free_transhuge_page()
because when code hit free_transhuge_page(), there is no one
tries to add the folio to _deferred_list.

Running page_fault1 of will-it-scale + order 2 folio for anonymous
mapping with 96 processes on an Ice Lake 48C/96T test box, we could
see the 61% split_queue_lock contention:
-   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
   - 63.01% free_transhuge_page
      + 62.91% _raw_spin_lock_irqsave

With this patch applied, the split_queue_lock contention is less
than 1%.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/huge_memory.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Kirill A . Shutemov May 4, 2023, 11:48 a.m. UTC | #1
On Sat, Apr 29, 2023 at 04:27:58PM +0800, Yin Fengwei wrote:
> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not. It brings
> high deferred queue lock contention.
> 
> It's safe to check whether the THP is in deferred list or not
> without holding the deferred queue lock in free_transhuge_page()
> because when code hit free_transhuge_page(), there is no one
> tries to add the folio to _deferred_list.
> 
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
>    - 63.01% free_transhuge_page
>       + 62.91% _raw_spin_lock_irqsave
> 
> With this patch applied, the split_queue_lock contention is less
> than 1%.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Huang, Ying May 5, 2023, 12:52 a.m. UTC | #2
Yin Fengwei <fengwei.yin@intel.com> writes:

> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not. It brings
> high deferred queue lock contention.
>
> It's safe to check whether the THP is in deferred list or not
> without holding the deferred queue lock in free_transhuge_page()
> because when code hit free_transhuge_page(), there is no one
> tries to add the folio to _deferred_list.
>
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
>    - 63.01% free_transhuge_page
>       + 62.91% _raw_spin_lock_irqsave
>
> With this patch applied, the split_queue_lock contention is less
> than 1%.
>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>

Thanks!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

> ---
>  mm/huge_memory.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..2a1df2c24c8e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,19 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> -		ds_queue->split_queue_len--;
> -		list_del(&folio->_deferred_list);
> +	/*
> +	 * At this point, there is no one trying to add the folio to
> +	 * deferred_list. If folio is not in deferred_list, it's safe
> +	 * to check without acquiring the split_queue_lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> +		if (!list_empty(&folio->_deferred_list)) {
> +			ds_queue->split_queue_len--;
> +			list_del(&folio->_deferred_list);
> +		}
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
Yin, Fengwei May 5, 2023, 1:09 a.m. UTC | #3
Hi Kirill,

On 5/4/2023 7:48 PM, kirill@shutemov.name wrote:
> On Sat, Apr 29, 2023 at 04:27:58PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not. It brings
>> high deferred queue lock contention.
>>
>> It's safe to check whether the THP is in deferred list or not
>> without holding the deferred queue lock in free_transhuge_page()
>> because when code hit free_transhuge_page(), there is no one
>> tries to add the folio to _deferred_list.
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
>>    - 63.01% free_transhuge_page
>>       + 62.91% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thanks a lot for the reviewing.


Regards
Yin, Fengwei

>
Yin, Fengwei May 5, 2023, 1:09 a.m. UTC | #4
Hi Ying,

On 5/5/2023 8:52 AM, Huang, Ying wrote:
> Yin Fengwei <fengwei.yin@intel.com> writes:
> 
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not. It brings
>> high deferred queue lock contention.
>>
>> It's safe to check whether the THP is in deferred list or not
>> without holding the deferred queue lock in free_transhuge_page()
>> because when code hit free_transhuge_page(), there is no one
>> tries to add the folio to _deferred_list.
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
>>    - 63.01% free_transhuge_page
>>       + 62.91% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> 
> Thanks!
> 
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Thanks a lot for reviewing.

Regards
Yin, Fengwei

> 
>> ---
>>  mm/huge_memory.c | 17 ++++++++++++-----
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..2a1df2c24c8e 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,19 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> -		ds_queue->split_queue_len--;
>> -		list_del(&folio->_deferred_list);
>> +	/*
>> +	 * At this point, there is no one trying to add the folio to
>> +	 * deferred_list. If folio is not in deferred_list, it's safe
>> +	 * to check without acquiring the split_queue_lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> +		if (!list_empty(&folio->_deferred_list)) {
>> +			ds_queue->split_queue_len--;
>> +			list_del(&folio->_deferred_list);
>> +		}
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
Yin, Fengwei May 29, 2023, 2:58 a.m. UTC | #5
Hi Andrew,

On 5/4/23 19:48, kirill@shutemov.name wrote:
> On Sat, Apr 29, 2023 at 04:27:58PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not. It brings
>> high deferred queue lock contention.
>>
>> It's safe to check whether the THP is in deferred list or not
>> without holding the deferred queue lock in free_transhuge_page()
>> because when code hit free_transhuge_page(), there is no one
>> tries to add the folio to _deferred_list.
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   63.02%     0.01%  page_fault1_pro  [kernel.kallsyms]         [k] free_transhuge_page
>>    - 63.01% free_transhuge_page
>>       + 62.91% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
I didn't get the green light for patch2 (which was trying to reduce lru lock contention)
from Matthew. It may need more time to figure out how to reduce lru lock contention.

I am wondering whether this patch1 (without patch2) can be picked as
  - It has nothing to do with patch2
  - It could reduce the deferred queue lock contention.
  - It got acked-by from Kirill.

Let me know if you want me to resend this patch1. Thanks.


Regards
Yin, Fengwei

>
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..2a1df2c24c8e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2799,12 +2799,19 @@  void free_transhuge_page(struct page *page)
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	unsigned long flags;
 
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-	if (!list_empty(&folio->_deferred_list)) {
-		ds_queue->split_queue_len--;
-		list_del(&folio->_deferred_list);
+	/*
+	 * At this point, there is no one trying to add the folio to
+	 * deferred_list. If folio is not in deferred_list, it's safe
+	 * to check without acquiring the split_queue_lock.
+	 */
+	if (data_race(!list_empty(&folio->_deferred_list))) {
+		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
+		if (!list_empty(&folio->_deferred_list)) {
+			ds_queue->split_queue_len--;
+			list_del(&folio->_deferred_list);
+		}
+		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	free_compound_page(page);
 }