[02/10] mm/ksm: skip subpages of compound pages

Message ID	20240604042454.2012091-3-alexs@kernel.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: alexs@kernel.org To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, izik.eidus@ravellosystems.com, willy@infradead.org, aarcange@redhat.com, chrisw@sous-sol.org, hughd@google.com, david@redhat.com Cc: "Alex Shi (tencent)" <alexs@kernel.org> Subject: [PATCH 02/10] mm/ksm: skip subpages of compound pages Date: Tue, 4 Jun 2024 12:24:44 +0800 Message-ID: <20240604042454.2012091-3-alexs@kernel.org> In-Reply-To: <20240604042454.2012091-1-alexs@kernel.org> References: <20240604042454.2012091-1-alexs@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	use folio in ksm \| expand [00/10] use folio in ksm [01/10] mm/ksm: reduce the flush action for ksm merging page [02/10] mm/ksm: skip subpages of compound pages [03/10] mm/ksm: use folio in try_to_merge_one_page [04/10] mm/ksm: add identical_folio func [05/10] mm/ksm: use folio in stable_tree_search [06/10] mm/ksm: remove page_stable_node [07/10] mm/ksm: use folio in unstable_tree_search_insert [08/10] mm/ksm: use folio in try_to_merge_xx serie funcs [09/10] mm/ksm: calc_checksum for folio [10/10] m/ksm: use folio in ksm scan path

alexs@kernel.org June 4, 2024, 4:24 a.m. UTC

From: "Alex Shi (tencent)" <alexs@kernel.org>

When a folio isn't fit for KSM, the subpages are unlikely to be good,
So let's skip the rest page checking to save some actions.

Signed-off-by: Alex Shi (tencent) <alexs@kernel.org>
---
 mm/ksm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

David Hildenbrand June 4, 2024, 8:12 a.m. UTC | #1

On 04.06.24 06:24, alexs@kernel.org wrote:
> From: "Alex Shi (tencent)" <alexs@kernel.org>
> 
> When a folio isn't fit for KSM, the subpages are unlikely to be good,
> So let's skip the rest page checking to save some actions.
> 
> Signed-off-by: Alex Shi (tencent) <alexs@kernel.org>
> ---
>   mm/ksm.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 97e5b41f8c4b..e2fdb9dd98e2 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2644,6 +2644,8 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>   		goto no_vmas;
>   
>   	for_each_vma(vmi, vma) {
> +		int nr = 1;
> +
>   		if (!(vma->vm_flags & VM_MERGEABLE))
>   			continue;
>   		if (ksm_scan.address < vma->vm_start)
> @@ -2660,6 +2662,9 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>   				cond_resched();
>   				continue;
>   			}
> +
> +			VM_WARN_ON(PageTail(*page));
> +			nr = compound_nr(*page);
>   			if (is_zone_device_page(*page))
>   				goto next_page;
>   			if (PageAnon(*page)) {
> @@ -2672,7 +2677,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>   					if (should_skip_rmap_item(*page, rmap_item))
>   						goto next_page;
>   
> -					ksm_scan.address += PAGE_SIZE;
> +					ksm_scan.address += nr * PAGE_SIZE;
>   				} else
>   					put_page(*page);
>   				mmap_read_unlock(mm);
> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>   			}
>   next_page:
>   			put_page(*page);
> -			ksm_scan.address += PAGE_SIZE;
> +			ksm_scan.address += nr * PAGE_SIZE;
>   			cond_resched();
>   		}
>   	}

You might be jumping over pages that don't belong to that folio. What 
you would actually want to do is somehow use folio_pte_batch() to really 
know the PTEs point at the same folio, so you can skip them. But that's 
not that easy when using follow_page() ...

So I suggest dropping this change for now.

Alex Shi June 4, 2024, 10:31 a.m. UTC | #2

On 6/4/24 4:12 PM, David Hildenbrand wrote:
> On 04.06.24 06:24, alexs@kernel.org wrote:
>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>
>> When a folio isn't fit for KSM, the subpages are unlikely to be good,
>> So let's skip the rest page checking to save some actions.
>>
>> Signed-off-by: Alex Shi (tencent) <alexs@kernel.org>
>> ---
>>   mm/ksm.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 97e5b41f8c4b..e2fdb9dd98e2 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -2644,6 +2644,8 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>           goto no_vmas;
>>         for_each_vma(vmi, vma) {
>> +        int nr = 1;
>> +
>>           if (!(vma->vm_flags & VM_MERGEABLE))
>>               continue;
>>           if (ksm_scan.address < vma->vm_start)
>> @@ -2660,6 +2662,9 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>                   cond_resched();
>>                   continue;
>>               }
>> +
>> +            VM_WARN_ON(PageTail(*page));
>> +            nr = compound_nr(*page);
>>               if (is_zone_device_page(*page))
>>                   goto next_page;
>>               if (PageAnon(*page)) {
>> @@ -2672,7 +2677,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>                       if (should_skip_rmap_item(*page, rmap_item))
>>                           goto next_page;
>>   -                    ksm_scan.address += PAGE_SIZE;
>> +                    ksm_scan.address += nr * PAGE_SIZE;
>>                   } else
>>                       put_page(*page);
>>                   mmap_read_unlock(mm);
>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>               }
>>   next_page:
>>               put_page(*page);
>> -            ksm_scan.address += PAGE_SIZE;
>> +            ksm_scan.address += nr * PAGE_SIZE;
>>               cond_resched();
>>           }
>>       }
> 
> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
> 
> So I suggest dropping this change for now.
> 

Hi David,

Forgive my stupidity, where I jump over normal page that not to belong to the folio?

Thanks
Alex

David Hildenbrand June 4, 2024, 10:43 a.m. UTC | #3

On 04.06.24 12:31, Alex Shi wrote:
> 
> 
> On 6/4/24 4:12 PM, David Hildenbrand wrote:
>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>
>>> When a folio isn't fit for KSM, the subpages are unlikely to be good,
>>> So let's skip the rest page checking to save some actions.
>>>
>>> Signed-off-by: Alex Shi (tencent) <alexs@kernel.org>
>>> ---
>>>    mm/ksm.c | 9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 97e5b41f8c4b..e2fdb9dd98e2 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -2644,6 +2644,8 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>            goto no_vmas;
>>>          for_each_vma(vmi, vma) {
>>> +        int nr = 1;
>>> +
>>>            if (!(vma->vm_flags & VM_MERGEABLE))
>>>                continue;
>>>            if (ksm_scan.address < vma->vm_start)
>>> @@ -2660,6 +2662,9 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>                    cond_resched();
>>>                    continue;
>>>                }
>>> +
>>> +            VM_WARN_ON(PageTail(*page));
>>> +            nr = compound_nr(*page);
>>>                if (is_zone_device_page(*page))
>>>                    goto next_page;
>>>                if (PageAnon(*page)) {
>>> @@ -2672,7 +2677,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>                        if (should_skip_rmap_item(*page, rmap_item))
>>>                            goto next_page;
>>>    -                    ksm_scan.address += PAGE_SIZE;
>>> +                    ksm_scan.address += nr * PAGE_SIZE;
>>>                    } else
>>>                        put_page(*page);
>>>                    mmap_read_unlock(mm);
>>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>                }
>>>    next_page:
>>>                put_page(*page);
>>> -            ksm_scan.address += PAGE_SIZE;
>>> +            ksm_scan.address += nr * PAGE_SIZE;
>>>                cond_resched();
>>>            }
>>>        }
>>
>> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
>>
>> So I suggest dropping this change for now.
>>
> 
> Hi David,
> 
> Forgive my stupidity, where I jump over normal page that not to belong to the folio?

IIUC, you assume that the folio is fully mapped by all PTEs that could 
span it, and that follow_page() would give you the head page, correct?

As a simple example, assume only a single page of a large folio is still 
mapped, which could be any tail page. You couldn't jump over any PTEs.

Or am I missing something?

Alex Shi June 4, 2024, 1:10 p.m. UTC | #4

On 6/4/24 6:43 PM, David Hildenbrand wrote:
>>>>
>>>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>>                }
>>>>    next_page:
>>>>                put_page(*page);
>>>> -            ksm_scan.address += PAGE_SIZE;
>>>> +            ksm_scan.address += nr * PAGE_SIZE;
>>>>                cond_resched();
>>>>            }
>>>>        }
>>>
>>> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
>>>
>>> So I suggest dropping this change for now.
>>>
>>
>> Hi David,
>>
>> Forgive my stupidity, where I jump over normal page that not to belong to the folio?
> 
> IIUC, you assume that the folio is fully mapped by all PTEs that could span it, and that follow_page() would give you the head page, correct?
> 
> As a simple example, assume only a single page of a large folio is still mapped, which could be any tail page. You couldn't jump over any PTEs.
> 
> Or am I missing something?

Uh, thanks for explanations. for what's we concerned, the following code could take care of the FULL or ERR pages. And it still keep the step of single page.  
                        page = follow_page(vma, ksm_scan.address, FOLL_GET);
                        if (IS_ERR_OR_NULL(page)) { 
                                ksm_scan.address += PAGE_SIZE;
                                cond_resched();
                                continue;
                        }
And after the above code, step folio_nr_pages on address should be safe, isn't it?

Thanks a lot
Alex

David Hildenbrand June 4, 2024, 1:14 p.m. UTC | #5

On 04.06.24 15:10, Alex Shi wrote:
> 
> 
> On 6/4/24 6:43 PM, David Hildenbrand wrote:
>>>>>
>>>>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>>>                 }
>>>>>     next_page:
>>>>>                 put_page(*page);
>>>>> -            ksm_scan.address += PAGE_SIZE;
>>>>> +            ksm_scan.address += nr * PAGE_SIZE;
>>>>>                 cond_resched();
>>>>>             }
>>>>>         }
>>>>
>>>> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
>>>>
>>>> So I suggest dropping this change for now.
>>>>
>>>
>>> Hi David,
>>>
>>> Forgive my stupidity, where I jump over normal page that not to belong to the folio?
>>
>> IIUC, you assume that the folio is fully mapped by all PTEs that could span it, and that follow_page() would give you the head page, correct?
>>
>> As a simple example, assume only a single page of a large folio is still mapped, which could be any tail page. You couldn't jump over any PTEs.
>>
>> Or am I missing something?
> 
> Uh, thanks for explanations. for what's we concerned, the following code could take care of the FULL or ERR pages. And it still keep the step of single page.
>                          page = follow_page(vma, ksm_scan.address, FOLL_GET);
>                          if (IS_ERR_OR_NULL(page)) {
>                                  ksm_scan.address += PAGE_SIZE;
>                                  cond_resched();
>                                  continue;
>                          }
> And after the above code, step folio_nr_pages on address should be safe, isn't it?

Not sure if I follow. Let me try explaining once again:

Assume a PTE maps some tail page of the large anonymous folio. The other 
PTEs around it map some other anonymous folios, not pages of that large 
anonymous folio.

Without looking at the other PTEs you don't know how much you can skip.

Matthew Wilcox June 5, 2024, 3:52 a.m. UTC | #6

On Tue, Jun 04, 2024 at 12:24:44PM +0800, alexs@kernel.org wrote:
> From: "Alex Shi (tencent)" <alexs@kernel.org>
> 
> When a folio isn't fit for KSM, the subpages are unlikely to be good,
> So let's skip the rest page checking to save some actions.

Why would you say that is true?  We have plenty of evidence that
userspace allocators can allocate large folios, then use only the first
few bytes, leaving many tail pages full of zeroes.

> @@ -2660,6 +2662,9 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>  				cond_resched();
>  				continue;
>  			}
> +
> +			VM_WARN_ON(PageTail(*page));
> +			nr = compound_nr(*page);

And this is simply wrong.  *page can obviously be a tail page.

Alex Shi June 5, 2024, 3:58 a.m. UTC | #7

On 6/4/24 9:14 PM, David Hildenbrand wrote:
> On 04.06.24 15:10, Alex Shi wrote:
>>
>>
>> On 6/4/24 6:43 PM, David Hildenbrand wrote:
>>>>>>
>>>>>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>>>>                 }
>>>>>>     next_page:
>>>>>>                 put_page(*page);
>>>>>> -            ksm_scan.address += PAGE_SIZE;
>>>>>> +            ksm_scan.address += nr * PAGE_SIZE;
>>>>>>                 cond_resched();
>>>>>>             }
>>>>>>         }
>>>>>
>>>>> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
>>>>>
>>>>> So I suggest dropping this change for now.
>>>>>
>>>>
>>>> Hi David,
>>>>
>>>> Forgive my stupidity, where I jump over normal page that not to belong to the folio?
>>>
>>> IIUC, you assume that the folio is fully mapped by all PTEs that could span it, and that follow_page() would give you the head page, correct?
>>>
>>> As a simple example, assume only a single page of a large folio is still mapped, which could be any tail page. You couldn't jump over any PTEs.
>>>
>>> Or am I missing something?
>>
>> Uh, thanks for explanations. for what's we concerned, the following code could take care of the FULL or ERR pages. And it still keep the step of single page.
>>                          page = follow_page(vma, ksm_scan.address, FOLL_GET);
>>                          if (IS_ERR_OR_NULL(page)) {
>>                                  ksm_scan.address += PAGE_SIZE;
>>                                  cond_resched();
>>                                  continue;
>>                          }
>> And after the above code, step folio_nr_pages on address should be safe, isn't it?
> 
> Not sure if I follow. Let me try explaining once again:
> 
> Assume a PTE maps some tail page of the large anonymous folio. The other PTEs around it map some other anonymous folios, not pages of that large anonymous folio.


Sorry, David,

Do you meaning there are 2 folio pages, in a same vma, in their address, 'ksm_scan.address', would be overlapped in a folio size space?
If so, that do out of my expect. I do have no idea of this thing. Could you give me more hints of this problem or how things work on it in current kernel?

Thanks a lot!
Alex
 
> 
> Without looking at the other PTEs you don't know how much you can skip.
 
>

Alex Shi June 5, 2024, 6:14 a.m. UTC | #8

On 6/5/24 11:52 AM, Matthew Wilcox wrote:
> On Tue, Jun 04, 2024 at 12:24:44PM +0800, alexs@kernel.org wrote:
>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>
>> When a folio isn't fit for KSM, the subpages are unlikely to be good,
>> So let's skip the rest page checking to save some actions.
> 
> Why would you say that is true?  We have plenty of evidence that
> userspace allocators can allocate large folios, then use only the first
> few bytes, leaving many tail pages full of zeroes.

Um, that do need tail pages... 
Is there some way to use more folio in ksm?

> 
>> @@ -2660,6 +2662,9 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>  				cond_resched();
>>  				continue;
>>  			}
>> +
>> +			VM_WARN_ON(PageTail(*page));
>> +			nr = compound_nr(*page);
> 
> And this is simply wrong.  *page can obviously be a tail page.
> 

Got it. Thanks a lot!
Alex

David Hildenbrand June 5, 2024, 7:40 a.m. UTC | #9

On 05.06.24 05:58, Alex Shi wrote:
> 
> 
> On 6/4/24 9:14 PM, David Hildenbrand wrote:
>> On 04.06.24 15:10, Alex Shi wrote:
>>>
>>>
>>> On 6/4/24 6:43 PM, David Hildenbrand wrote:
>>>>>>>
>>>>>>> @@ -2680,7 +2685,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>>>>>>                  }
>>>>>>>      next_page:
>>>>>>>                  put_page(*page);
>>>>>>> -            ksm_scan.address += PAGE_SIZE;
>>>>>>> +            ksm_scan.address += nr * PAGE_SIZE;
>>>>>>>                  cond_resched();
>>>>>>>              }
>>>>>>>          }
>>>>>>
>>>>>> You might be jumping over pages that don't belong to that folio. What you would actually want to do is somehow use folio_pte_batch() to really know the PTEs point at the same folio, so you can skip them. But that's not that easy when using follow_page() ...
>>>>>>
>>>>>> So I suggest dropping this change for now.
>>>>>>
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Forgive my stupidity, where I jump over normal page that not to belong to the folio?
>>>>
>>>> IIUC, you assume that the folio is fully mapped by all PTEs that could span it, and that follow_page() would give you the head page, correct?
>>>>
>>>> As a simple example, assume only a single page of a large folio is still mapped, which could be any tail page. You couldn't jump over any PTEs.
>>>>
>>>> Or am I missing something?
>>>
>>> Uh, thanks for explanations. for what's we concerned, the following code could take care of the FULL or ERR pages. And it still keep the step of single page.
>>>                           page = follow_page(vma, ksm_scan.address, FOLL_GET);
>>>                           if (IS_ERR_OR_NULL(page)) {
>>>                                   ksm_scan.address += PAGE_SIZE;
>>>                                   cond_resched();
>>>                                   continue;
>>>                           }
>>> And after the above code, step folio_nr_pages on address should be safe, isn't it?
>>
>> Not sure if I follow. Let me try explaining once again:
>>
>> Assume a PTE maps some tail page of the large anonymous folio. The other PTEs around it map some other anonymous folios, not pages of that large anonymous folio.
> 
> 
> Sorry, David,
> 
> Do you meaning there are 2 folio pages, in a same vma, in their address, 'ksm_scan.address', would be overlapped in a folio size space?
> If so, that do out of my expect. I do have no idea of this thing. Could you give me more hints of this problem or how things work on it in current kernel?

We do fully support partially mapping of THPs/large folios. That means, 
you could map a single page of a large pagecache folio and the other 
PTEs could map anonymous folios (due to COW).

Simply because follow_page() returned a page of a large folio doesn't 
generally say that the PTEs around it map the same large folio.

David Hildenbrand June 5, 2024, 7:47 a.m. UTC | #10

On 05.06.24 08:14, Alex Shi wrote:
> 
> 
> On 6/5/24 11:52 AM, Matthew Wilcox wrote:
>> On Tue, Jun 04, 2024 at 12:24:44PM +0800, alexs@kernel.org wrote:
>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>
>>> When a folio isn't fit for KSM, the subpages are unlikely to be good,
>>> So let's skip the rest page checking to save some actions.
>>
>> Why would you say that is true?  We have plenty of evidence that
>> userspace allocators can allocate large folios, then use only the first
>> few bytes, leaving many tail pages full of zeroes.
> 
> Um, that do need tail pages...
> Is there some way to use more folio in ksm?

My take, and Willy can correct me if I am wrong:

"struct page" is not going to away any time soon, but it might shrink at 
some point.

That is, we can use the "struct page" pointer to point at a page frame, 
and use "struct folio" to lookup/manage the metadata.

That is, use "struct page" when accessing the actual memory content 
(checksum, testing for identical content), but use the folio part when 
looking up metadata (folio_test_anon() etc). In the future we might want 
to replace the "struct page" pointer by an index into the folio, but 
that doesn't have to happen right now.

For KSM, that would mean that you have a folio+page (late folio+index) 
pair when possibly dealing with large folios, but you can use a folio 
without a page when dealing with KSM folios, that are always small.

Matthew Wilcox June 5, 2024, 9:12 p.m. UTC | #11

On Wed, Jun 05, 2024 at 09:47:10AM +0200, David Hildenbrand wrote:
> On 05.06.24 08:14, Alex Shi wrote:
> > 
> > 
> > On 6/5/24 11:52 AM, Matthew Wilcox wrote:
> > > On Tue, Jun 04, 2024 at 12:24:44PM +0800, alexs@kernel.org wrote:
> > > > From: "Alex Shi (tencent)" <alexs@kernel.org>
> > > > 
> > > > When a folio isn't fit for KSM, the subpages are unlikely to be good,
> > > > So let's skip the rest page checking to save some actions.
> > > 
> > > Why would you say that is true?  We have plenty of evidence that
> > > userspace allocators can allocate large folios, then use only the first
> > > few bytes, leaving many tail pages full of zeroes.
> > 
> > Um, that do need tail pages...
> > Is there some way to use more folio in ksm?
> 
> My take, and Willy can correct me if I am wrong:
> 
> "struct page" is not going to away any time soon, but it might shrink at
> some point.
> 
> That is, we can use the "struct page" pointer to point at a page frame, and
> use "struct folio" to lookup/manage the metadata.

Right.

> That is, use "struct page" when accessing the actual memory content
> (checksum, testing for identical content), but use the folio part when
> looking up metadata (folio_test_anon() etc). In the future we might want to
> replace the "struct page" pointer by an index into the folio, but that
> doesn't have to happen right now.

My current thinking is that folio->pfn is how we know where the memory
described by the folio is.  Using an index would be memmap[folio->pfn +
index] which isn't terribly expensive, but we may as well pass around the
(folio, page) pair and save the reference to memmap.

> For KSM, that would mean that you have a folio+page (late folio+index) pair
> when possibly dealing with large folios, but you can use a folio without a
> page when dealing with KSM folios, that are always small.

Yes, agreed.

David Hildenbrand June 6, 2024, 7:11 a.m. UTC | #12

On 05.06.24 23:12, Matthew Wilcox wrote:
> On Wed, Jun 05, 2024 at 09:47:10AM +0200, David Hildenbrand wrote:
>> On 05.06.24 08:14, Alex Shi wrote:
>>>
>>>
>>> On 6/5/24 11:52 AM, Matthew Wilcox wrote:
>>>> On Tue, Jun 04, 2024 at 12:24:44PM +0800, alexs@kernel.org wrote:
>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>
>>>>> When a folio isn't fit for KSM, the subpages are unlikely to be good,
>>>>> So let's skip the rest page checking to save some actions.
>>>>
>>>> Why would you say that is true?  We have plenty of evidence that
>>>> userspace allocators can allocate large folios, then use only the first
>>>> few bytes, leaving many tail pages full of zeroes.
>>>
>>> Um, that do need tail pages...
>>> Is there some way to use more folio in ksm?
>>
>> My take, and Willy can correct me if I am wrong:
>>
>> "struct page" is not going to away any time soon, but it might shrink at
>> some point.
>>
>> That is, we can use the "struct page" pointer to point at a page frame, and
>> use "struct folio" to lookup/manage the metadata.
> 
> Right.
> 
>> That is, use "struct page" when accessing the actual memory content
>> (checksum, testing for identical content), but use the folio part when
>> looking up metadata (folio_test_anon() etc). In the future we might want to
>> replace the "struct page" pointer by an index into the folio, but that
>> doesn't have to happen right now.
> 
> My current thinking is that folio->pfn is how we know where the memory
> described by the folio is.  Using an index would be memmap[folio->pfn +
> index] which isn't terribly expensive, but we may as well pass around the
> (folio, page) pair and save the reference to memmap.

Right, as soon as the folio does not overlay the head page it's going to 
be a bit different.

A (folio,page) pair, like we use in the RMAP code, is likely the best 
option for now and gives us sufficient flexibility for the future design.

[02/10] mm/ksm: skip subpages of compound pages

Commit Message

Comments

Patch