diff mbox series

[01/10] mm/ksm: reduce the flush action for ksm merging page

Message ID 20240604042454.2012091-2-alexs@kernel.org (mailing list archive)
State New
Headers show
Series use folio in ksm | expand

Commit Message

alexs@kernel.org June 4, 2024, 4:24 a.m. UTC
From: "Alex Shi (tencent)" <alexs@kernel.org>

We can put off the flush action util a merging is realy coming. That
could reduce some unmerge page flushing.
BTW, flushing only do at arm, mips and few other archs.

Signed-off-by: Alex Shi (tencent) <alexs@kernel.org>
---
 mm/ksm.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

David Hildenbrand June 4, 2024, 8:07 a.m. UTC | #1
On 04.06.24 06:24, alexs@kernel.org wrote:
> From: "Alex Shi (tencent)" <alexs@kernel.org>
> 
> We can put off the flush action util a merging is realy coming. That
> could reduce some unmerge page flushing.
> BTW, flushing only do at arm, mips and few other archs.
> 

I'm no expert on that flushing, but I thought we would have to do the 
flushing before accessing page content -- before calculating the 
checksum etc.

Now you would only do it before the pages_identical() check, but not 
when calculating the checksum.
Alex Shi June 4, 2024, 10:26 a.m. UTC | #2
On 6/4/24 4:07 PM, David Hildenbrand wrote:
> On 04.06.24 06:24, alexs@kernel.org wrote:
>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>
>> We can put off the flush action util a merging is realy coming. That
>> could reduce some unmerge page flushing.
>> BTW, flushing only do at arm, mips and few other archs.
>>
> 
> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
> 
> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
> 

Hi David,

Thanks a lot for comments!

If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right. 

And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?

Thanks
Alex
David Hildenbrand June 4, 2024, 10:45 a.m. UTC | #3
On 04.06.24 12:26, Alex Shi wrote:
> 
> 
> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>
>>> We can put off the flush action util a merging is realy coming. That
>>> could reduce some unmerge page flushing.
>>> BTW, flushing only do at arm, mips and few other archs.
>>>
>>
>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>
>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>
> 
> Hi David,
> 
> Thanks a lot for comments!
> 
> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
> 

Yes, but you would get more wrong checksums, resulting in more 
unnecessary pages_identical() checks.

That is missing from the description, and why we want to change that 
behavior.

What's the net win?

> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?

"I didn't see the guarantee for other writer from any other place" can 
you rephrase your comment?

If you mean "the process could modify that page concurrently", then you 
are right. But that's different than "the process modified the page in 
the past and we are reading stale content because we missed a flush".
Alex Shi June 4, 2024, 1:02 p.m. UTC | #4
On 6/4/24 6:45 PM, David Hildenbrand wrote:
> On 04.06.24 12:26, Alex Shi wrote:
>>
>>
>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>
>>>> We can put off the flush action util a merging is realy coming. That
>>>> could reduce some unmerge page flushing.
>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>
>>>
>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>
>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>
>>
>> Hi David,
>>
>> Thanks a lot for comments!
>>
>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>
> 
> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
> 
> That is missing from the description, and why we want to change that behavior.
> 
> What's the net win?
> 
>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
> 
> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
> 
> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".


Maybe moving the flush before checksum could relief some worries. :) 
But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code. 

thanks
Alex
David Hildenbrand June 5, 2024, 7:26 a.m. UTC | #5
On 04.06.24 15:02, Alex Shi wrote:
> 
> 
> On 6/4/24 6:45 PM, David Hildenbrand wrote:
>> On 04.06.24 12:26, Alex Shi wrote:
>>>
>>>
>>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>
>>>>> We can put off the flush action util a merging is realy coming. That
>>>>> could reduce some unmerge page flushing.
>>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>>
>>>>
>>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>>
>>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>>
>>>
>>> Hi David,
>>>
>>> Thanks a lot for comments!
>>>
>>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>>
>>
>> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
>>
>> That is missing from the description, and why we want to change that behavior.
>>
>> What's the net win?
>>
>>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
>>
>> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
>>
>> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".
> 
> 
> Maybe moving the flush before checksum could relief some worries. :)
> But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code.

Please explain to me why we care about moving the flushs at all :)

If they are NOP on most architectures either way, why not simply leave 
them there and call it a day?
Alex Shi June 5, 2024, 9:10 a.m. UTC | #6
On 6/5/24 3:26 PM, David Hildenbrand wrote:
> On 04.06.24 15:02, Alex Shi wrote:
>>
>>
>> On 6/4/24 6:45 PM, David Hildenbrand wrote:
>>> On 04.06.24 12:26, Alex Shi wrote:
>>>>
>>>>
>>>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>>
>>>>>> We can put off the flush action util a merging is realy coming. That
>>>>>> could reduce some unmerge page flushing.
>>>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>>>
>>>>>
>>>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>>>
>>>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>>>
>>>>
>>>> Hi David,
>>>>
>>>> Thanks a lot for comments!
>>>>
>>>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>>>
>>>
>>> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
>>>
>>> That is missing from the description, and why we want to change that behavior.
>>>
>>> What's the net win?
>>>
>>>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
>>>
>>> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
>>>
>>> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".
>>
>>
>> Maybe moving the flush before checksum could relief some worries. :)
>> But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code.
> 
> Please explain to me why we care about moving the flushs at all :)
> 
> If they are NOP on most architectures either way, why not simply leave them there and call it a day?
Uh, 2 reasons:
1, it uses page and can't convert to folio now.
2, as you pointed, flush action w/o page reading seems just waste time.

Thanks
Alex
>
David Hildenbrand June 5, 2024, 9:14 a.m. UTC | #7
On 05.06.24 11:10, Alex Shi wrote:
> 
> 
> On 6/5/24 3:26 PM, David Hildenbrand wrote:
>> On 04.06.24 15:02, Alex Shi wrote:
>>>
>>>
>>> On 6/4/24 6:45 PM, David Hildenbrand wrote:
>>>> On 04.06.24 12:26, Alex Shi wrote:
>>>>>
>>>>>
>>>>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>>>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>>>
>>>>>>> We can put off the flush action util a merging is realy coming. That
>>>>>>> could reduce some unmerge page flushing.
>>>>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>>>>
>>>>>>
>>>>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>>>>
>>>>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>>>>
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Thanks a lot for comments!
>>>>>
>>>>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>>>>
>>>>
>>>> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
>>>>
>>>> That is missing from the description, and why we want to change that behavior.
>>>>
>>>> What's the net win?
>>>>
>>>>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
>>>>
>>>> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
>>>>
>>>> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".
>>>
>>>
>>> Maybe moving the flush before checksum could relief some worries. :)
>>> But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code.
>>
>> Please explain to me why we care about moving the flushs at all :)
>>
>> If they are NOP on most architectures either way, why not simply leave them there and call it a day?
> Uh, 2 reasons:
> 1, it uses page and can't convert to folio now.
> 2, as you pointed, flush action w/o page reading seems just waste time.

Alex, I don't think the approach you take for coming up with the current 
set of patches is a good idea.

Please reconsider what you can actually convert to folios and what must 
stay pages for now due to support for large folios in that code.

Then, please explain properly why changes are required and why they are 
safe.

For example, for in scan_get_next_rmap_item() we really *need* the page 
and not just the folio. So just leave the flushing there and be done 
with it.
Alex Shi June 5, 2024, 9:49 a.m. UTC | #8
On 6/5/24 5:14 PM, David Hildenbrand wrote:
> On 05.06.24 11:10, Alex Shi wrote:
>>
>>
>> On 6/5/24 3:26 PM, David Hildenbrand wrote:
>>> On 04.06.24 15:02, Alex Shi wrote:
>>>>
>>>>
>>>> On 6/4/24 6:45 PM, David Hildenbrand wrote:
>>>>> On 04.06.24 12:26, Alex Shi wrote:
>>>>>>
>>>>>>
>>>>>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>>>>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>>>>
>>>>>>>> We can put off the flush action util a merging is realy coming. That
>>>>>>>> could reduce some unmerge page flushing.
>>>>>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>>>>>
>>>>>>>
>>>>>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>>>>>
>>>>>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>>>>>
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Thanks a lot for comments!
>>>>>>
>>>>>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>>>>>
>>>>>
>>>>> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
>>>>>
>>>>> That is missing from the description, and why we want to change that behavior.
>>>>>
>>>>> What's the net win?
>>>>>
>>>>>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
>>>>>
>>>>> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
>>>>>
>>>>> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".
>>>>
>>>>
>>>> Maybe moving the flush before checksum could relief some worries. :)
>>>> But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code.
>>>
>>> Please explain to me why we care about moving the flushs at all :)
>>>
>>> If they are NOP on most architectures either way, why not simply leave them there and call it a day?
>> Uh, 2 reasons:
>> 1, it uses page and can't convert to folio now.
>> 2, as you pointed, flush action w/o page reading seems just waste time.
> 
> Alex, I don't think the approach you take for coming up with the current set of patches is a good idea.
> 
> Please reconsider what you can actually convert to folios and what must stay pages for now due to support for large folios in that code.
> 
> Then, please explain properly why changes are required and why they are safe.
> 
> For example, for in scan_get_next_rmap_item() we really *need* the page and not just the folio. So just leave the flushing there and be done with it.
> 

Hi David,

Thanks a lot for your review.
Though all patches are passed in kernel selftest, but if we do care the saving more than quick processing, the main purpose of this patchset is gone. I'll drop this series.

Thanks
Alex
David Hildenbrand June 5, 2024, 10 a.m. UTC | #9
On 05.06.24 11:49, Alex Shi wrote:
> 
> 
> On 6/5/24 5:14 PM, David Hildenbrand wrote:
>> On 05.06.24 11:10, Alex Shi wrote:
>>>
>>>
>>> On 6/5/24 3:26 PM, David Hildenbrand wrote:
>>>> On 04.06.24 15:02, Alex Shi wrote:
>>>>>
>>>>>
>>>>> On 6/4/24 6:45 PM, David Hildenbrand wrote:
>>>>>> On 04.06.24 12:26, Alex Shi wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 6/4/24 4:07 PM, David Hildenbrand wrote:
>>>>>>>> On 04.06.24 06:24, alexs@kernel.org wrote:
>>>>>>>>> From: "Alex Shi (tencent)" <alexs@kernel.org>
>>>>>>>>>
>>>>>>>>> We can put off the flush action util a merging is realy coming. That
>>>>>>>>> could reduce some unmerge page flushing.
>>>>>>>>> BTW, flushing only do at arm, mips and few other archs.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm no expert on that flushing, but I thought we would have to do the flushing before accessing page content -- before calculating the checksum etc.
>>>>>>>>
>>>>>>>> Now you would only do it before the pages_identical() check, but not when calculating the checksum.
>>>>>>>>
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Thanks a lot for comments!
>>>>>>>
>>>>>>> If calc_checksum() is wrong before pages_idential(), (that's just after page was write_protected, that's a real guarantee for page context secured) pages_identical could recheck and make thing right.
>>>>>>>
>>>>>>
>>>>>> Yes, but you would get more wrong checksums, resulting in more unnecessary pages_identical() checks.
>>>>>>
>>>>>> That is missing from the description, and why we want to change that behavior.
>>>>>>
>>>>>> What's the net win?
>>>>>>
>>>>>>> And as to 2 flush functions here, I didn't see the guarantee for other writer from any other place. So maybe we should remove these flush action?
>>>>>>
>>>>>> "I didn't see the guarantee for other writer from any other place" can you rephrase your comment?
>>>>>>
>>>>>> If you mean "the process could modify that page concurrently", then you are right. But that's different than "the process modified the page in the past and we are reading stale content because we missed a flush".
>>>>>
>>>>>
>>>>> Maybe moving the flush before checksum could relief some worries. :)
>>>>> But still no one knows what flush really help, since if page content only syncs to memory by the flush, the kernel or process can't be work with current code.
>>>>
>>>> Please explain to me why we care about moving the flushs at all :)
>>>>
>>>> If they are NOP on most architectures either way, why not simply leave them there and call it a day?
>>> Uh, 2 reasons:
>>> 1, it uses page and can't convert to folio now.
>>> 2, as you pointed, flush action w/o page reading seems just waste time.
>>
>> Alex, I don't think the approach you take for coming up with the current set of patches is a good idea.
>>
>> Please reconsider what you can actually convert to folios and what must stay pages for now due to support for large folios in that code.
>>
>> Then, please explain properly why changes are required and why they are safe.
>>
>> For example, for in scan_get_next_rmap_item() we really *need* the page and not just the folio. So just leave the flushing there and be done with it.
>>
> 
> Hi David,
> 
> Thanks a lot for your review.
> Though all patches are passed in kernel selftest, but if we do care the saving more than quick processing, the main purpose of this patchset is gone. I'll drop this series.

I think there is value in more folio conversion, but we really have to 
be careful when we still want to/have to work on pages (folio+page pair).
diff mbox series

Patch

diff --git a/mm/ksm.c b/mm/ksm.c
index f5138f43f0d2..97e5b41f8c4b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -783,10 +783,7 @@  static struct page *get_mergeable_page(struct ksm_rmap_item *rmap_item)
 		goto out;
 	if (is_zone_device_page(page))
 		goto out_putpage;
-	if (PageAnon(page)) {
-		flush_anon_page(vma, page, addr);
-		flush_dcache_page(page);
-	} else {
+	if (!PageAnon(page)) {
 out_putpage:
 		put_page(page);
 out:
@@ -1473,8 +1470,8 @@  static int replace_page(struct vm_area_struct *vma, struct page *page,
  *
  * This function returns 0 if the pages were merged, -EFAULT otherwise.
  */
-static int try_to_merge_one_page(struct vm_area_struct *vma,
-				 struct page *page, struct page *kpage)
+static int try_to_merge_one_page(struct vm_area_struct *vma, struct page *page,
+				 struct ksm_rmap_item *rmap_item, struct page *kpage)
 {
 	pte_t orig_pte = __pte(0);
 	int err = -EFAULT;
@@ -1500,6 +1497,9 @@  static int try_to_merge_one_page(struct vm_area_struct *vma,
 			goto out_unlock;
 	}
 
+	flush_anon_page(vma, page, rmap_item->address);
+	flush_dcache_page(page);
+
 	/*
 	 * If this anonymous page is mapped only here, its pte may need
 	 * to be write-protected.  If it's mapped elsewhere, all of its
@@ -1550,7 +1550,7 @@  static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
 	if (!vma)
 		goto out;
 
-	err = try_to_merge_one_page(vma, page, kpage);
+	err = try_to_merge_one_page(vma, page, rmap_item, kpage);
 	if (err)
 		goto out;
 
@@ -2385,7 +2385,7 @@  static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
 		mmap_read_lock(mm);
 		vma = find_mergeable_vma(mm, rmap_item->address);
 		if (vma) {
-			err = try_to_merge_one_page(vma, page,
+			err = try_to_merge_one_page(vma, page, rmap_item,
 					ZERO_PAGE(rmap_item->address));
 			trace_ksm_merge_one_page(
 				page_to_pfn(ZERO_PAGE(rmap_item->address)),
@@ -2663,8 +2663,6 @@  static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
 			if (is_zone_device_page(*page))
 				goto next_page;
 			if (PageAnon(*page)) {
-				flush_anon_page(vma, *page, ksm_scan.address);
-				flush_dcache_page(*page);
 				rmap_item = get_next_rmap_item(mm_slot,
 					ksm_scan.rmap_list, ksm_scan.address);
 				if (rmap_item) {