diff mbox series

[v2,3/5] mm: swap_pte_batch: add an output argument to reture if all swap entries are exclusive

Message ID 20240409082631.187483-4-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series large folios swap-in: handle refault cases first | expand

Commit Message

Barry Song April 9, 2024, 8:26 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

Add a boolean argument named any_shared. If any of the swap entries are
non-exclusive, set any_shared to true. The function do_swap_page() can
then utilize this information to determine whether the entire large
folio can be reused.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/internal.h | 9 ++++++++-
 mm/madvise.c  | 2 +-
 mm/memory.c   | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

Comments

Ryan Roberts April 11, 2024, 2:54 p.m. UTC | #1
On 09/04/2024 09:26, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Add a boolean argument named any_shared. If any of the swap entries are
> non-exclusive, set any_shared to true. The function do_swap_page() can
> then utilize this information to determine whether the entire large
> folio can be reused.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/internal.h | 9 ++++++++-
>  mm/madvise.c  | 2 +-
>  mm/memory.c   | 2 +-
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 9d3250b4a08a..cae39c372bfc 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -238,7 +238,8 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
>   *
>   * Return: the number of table entries in the batch.
>   */
> -static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
> +static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte,
> +				bool *any_shared)

Please update the docs in the comment above this for the new param; follow
folio_pte_batch()'s docs as a template.

>  {
>  	pte_t expected_pte = pte_next_swp_offset(pte);
>  	const pte_t *end_ptep = start_ptep + max_nr;
> @@ -248,12 +249,18 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
>  	VM_WARN_ON(!is_swap_pte(pte));
>  	VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte)));
>  
> +	if (any_shared)
> +		*any_shared |= !pte_swp_exclusive(pte);

This is different from the approach in folio_pte_batch(). It inits *any_shared
to false and does NOT include the value of the first pte. I think that's odd,
personally and I prefer your approach. I'm not sure if there was a good reason
that David chose the other approach? Regardless, I think both functions should
follow the same pattern here.

If sticking with your approach, why is this initial flag being ORed? Surely it
should just be initialized to get rid of any previous guff?

Thanks,
Ryan


> +
>  	while (ptep < end_ptep) {
>  		pte = ptep_get(ptep);
>  
>  		if (!pte_same(pte, expected_pte))
>  			break;
>  
> +		if (any_shared)
> +			*any_shared |= !pte_swp_exclusive(pte);
> +
>  		expected_pte = pte_next_swp_offset(expected_pte);
>  		ptep++;
>  	}
> diff --git a/mm/madvise.c b/mm/madvise.c
> index f59169888b8e..d34ca6983227 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -671,7 +671,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			entry = pte_to_swp_entry(ptent);
>  			if (!non_swap_entry(entry)) {
>  				max_nr = (end - addr) / PAGE_SIZE;
> -				nr = swap_pte_batch(pte, max_nr, ptent);
> +				nr = swap_pte_batch(pte, max_nr, ptent, NULL);
>  				nr_swap -= nr;
>  				free_swap_and_cache_nr(entry, nr);
>  				clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 2702d449880e..c4a52e8d740a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1638,7 +1638,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			folio_put(folio);
>  		} else if (!non_swap_entry(entry)) {
>  			max_nr = (end - addr) / PAGE_SIZE;
> -			nr = swap_pte_batch(pte, max_nr, ptent);
> +			nr = swap_pte_batch(pte, max_nr, ptent, NULL);
>  			/* Genuine swap entries, hence a private anon pages */
>  			if (!should_zap_cows(details))
>  				continue;
David Hildenbrand April 11, 2024, 3 p.m. UTC | #2
On 11.04.24 16:54, Ryan Roberts wrote:
> On 09/04/2024 09:26, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> Add a boolean argument named any_shared. If any of the swap entries are
>> non-exclusive, set any_shared to true. The function do_swap_page() can
>> then utilize this information to determine whether the entire large
>> folio can be reused.
>>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> ---
>>   mm/internal.h | 9 ++++++++-
>>   mm/madvise.c  | 2 +-
>>   mm/memory.c   | 2 +-
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 9d3250b4a08a..cae39c372bfc 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -238,7 +238,8 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
>>    *
>>    * Return: the number of table entries in the batch.
>>    */
>> -static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
>> +static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte,
>> +				bool *any_shared)
> 
> Please update the docs in the comment above this for the new param; follow
> folio_pte_batch()'s docs as a template.
> 
>>   {
>>   	pte_t expected_pte = pte_next_swp_offset(pte);
>>   	const pte_t *end_ptep = start_ptep + max_nr;
>> @@ -248,12 +249,18 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
>>   	VM_WARN_ON(!is_swap_pte(pte));
>>   	VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte)));
>>   
>> +	if (any_shared)
>> +		*any_shared |= !pte_swp_exclusive(pte);
> 
> This is different from the approach in folio_pte_batch(). It inits *any_shared
> to false and does NOT include the value of the first pte. I think that's odd,
> personally and I prefer your approach. I'm not sure if there was a good reason
> that David chose the other approach?

Because in my case calling code does

nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr, flags,
		     &any_writable);

...

if (any_writable)
	pte = pte_mkwrite(pte, src_vma);

...

and later checks in another function pte_write().

So if the common pattern is that the original PTE will be used for 
checks, then it doesn't make sense to unnecessary checks+setting for the 
first PTE.
Ryan Roberts April 11, 2024, 3:36 p.m. UTC | #3
On 11/04/2024 16:00, David Hildenbrand wrote:
> On 11.04.24 16:54, Ryan Roberts wrote:
>> On 09/04/2024 09:26, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> Add a boolean argument named any_shared. If any of the swap entries are
>>> non-exclusive, set any_shared to true. The function do_swap_page() can
>>> then utilize this information to determine whether the entire large
>>> folio can be reused.
>>>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>   mm/internal.h | 9 ++++++++-
>>>   mm/madvise.c  | 2 +-
>>>   mm/memory.c   | 2 +-
>>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 9d3250b4a08a..cae39c372bfc 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -238,7 +238,8 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
>>>    *
>>>    * Return: the number of table entries in the batch.
>>>    */
>>> -static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
>>> +static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte,
>>> +                bool *any_shared)
>>
>> Please update the docs in the comment above this for the new param; follow
>> folio_pte_batch()'s docs as a template.
>>
>>>   {
>>>       pte_t expected_pte = pte_next_swp_offset(pte);
>>>       const pte_t *end_ptep = start_ptep + max_nr;
>>> @@ -248,12 +249,18 @@ static inline int swap_pte_batch(pte_t *start_ptep, int
>>> max_nr, pte_t pte)
>>>       VM_WARN_ON(!is_swap_pte(pte));
>>>       VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte)));
>>>   +    if (any_shared)
>>> +        *any_shared |= !pte_swp_exclusive(pte);
>>
>> This is different from the approach in folio_pte_batch(). It inits *any_shared
>> to false and does NOT include the value of the first pte. I think that's odd,
>> personally and I prefer your approach. I'm not sure if there was a good reason
>> that David chose the other approach?
> 
> Because in my case calling code does
> 
> nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr, flags,
>              &any_writable);
> 
> ...
> 
> if (any_writable)
>     pte = pte_mkwrite(pte, src_vma);
> 
> ...
> 
> and later checks in another function pte_write().
> 
> So if the common pattern is that the original PTE will be used for checks, then
> it doesn't make sense to unnecessary checks+setting for the first PTE.

Yep understood. And I think adopting your semantics for any_shared actually
simplifies the code in patch 4 too; I've just commented that.
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index 9d3250b4a08a..cae39c372bfc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -238,7 +238,8 @@  static inline pte_t pte_next_swp_offset(pte_t pte)
  *
  * Return: the number of table entries in the batch.
  */
-static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
+static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte,
+				bool *any_shared)
 {
 	pte_t expected_pte = pte_next_swp_offset(pte);
 	const pte_t *end_ptep = start_ptep + max_nr;
@@ -248,12 +249,18 @@  static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
 	VM_WARN_ON(!is_swap_pte(pte));
 	VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte)));
 
+	if (any_shared)
+		*any_shared |= !pte_swp_exclusive(pte);
+
 	while (ptep < end_ptep) {
 		pte = ptep_get(ptep);
 
 		if (!pte_same(pte, expected_pte))
 			break;
 
+		if (any_shared)
+			*any_shared |= !pte_swp_exclusive(pte);
+
 		expected_pte = pte_next_swp_offset(expected_pte);
 		ptep++;
 	}
diff --git a/mm/madvise.c b/mm/madvise.c
index f59169888b8e..d34ca6983227 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -671,7 +671,7 @@  static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			entry = pte_to_swp_entry(ptent);
 			if (!non_swap_entry(entry)) {
 				max_nr = (end - addr) / PAGE_SIZE;
-				nr = swap_pte_batch(pte, max_nr, ptent);
+				nr = swap_pte_batch(pte, max_nr, ptent, NULL);
 				nr_swap -= nr;
 				free_swap_and_cache_nr(entry, nr);
 				clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm);
diff --git a/mm/memory.c b/mm/memory.c
index 2702d449880e..c4a52e8d740a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1638,7 +1638,7 @@  static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			folio_put(folio);
 		} else if (!non_swap_entry(entry)) {
 			max_nr = (end - addr) / PAGE_SIZE;
-			nr = swap_pte_batch(pte, max_nr, ptent);
+			nr = swap_pte_batch(pte, max_nr, ptent, NULL);
 			/* Genuine swap entries, hence a private anon pages */
 			if (!should_zap_cows(details))
 				continue;