diff mbox series

[v1,2/6] fs/proc/task_mmu: don't indicate PM_MMAP_EXCLUSIVE without PM_PRESENT

Message ID 20240607122357.115423-3-david@redhat.com (mailing list archive)
State New
Headers show
Series fs/proc: move page_mapcount() to fs/proc/internal.h | expand

Commit Message

David Hildenbrand June 7, 2024, 12:23 p.m. UTC
Relying on the mapcount for non-present PTEs that reference pages
doesn't make any sense: they are not accounted in the mapcount, so
page_mapcount() == 1 won't return the result we actually want to know.

While we don't check the mapcount for migration entries already, we
could end up checking it for swap, hwpoison, device exclusive, ...
entries, which we really shouldn't.

There is one exception: device private entries, which we consider
fake-present (e.g., incremented the mapcount). But we won't care about
that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them
although they are fake-present already sounds suspiciously wrong.

Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/task_mmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Oscar Salvador June 10, 2024, 4:38 a.m. UTC | #1
On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote:
> Relying on the mapcount for non-present PTEs that reference pages
> doesn't make any sense: they are not accounted in the mapcount, so
> page_mapcount() == 1 won't return the result we actually want to know.
> 
> While we don't check the mapcount for migration entries already, we
> could end up checking it for swap, hwpoison, device exclusive, ...
> entries, which we really shouldn't.
> 
> There is one exception: device private entries, which we consider
> fake-present (e.g., incremented the mapcount). But we won't care about
> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them
> although they are fake-present already sounds suspiciously wrong.
> 
> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT.

Alternatively we could use is_pfn_swap_entry?
But the PM_PRESENT approach seems more correct.

> Signed-off-by: David Hildenbrand <david@redhat.com>

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Oscar Salvador June 10, 2024, 4:49 a.m. UTC | #2
On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote:
> Relying on the mapcount for non-present PTEs that reference pages
> doesn't make any sense: they are not accounted in the mapcount, so
> page_mapcount() == 1 won't return the result we actually want to know.
> 
> While we don't check the mapcount for migration entries already, we
> could end up checking it for swap, hwpoison, device exclusive, ...
> entries, which we really shouldn't.
> 
> There is one exception: device private entries, which we consider
> fake-present (e.g., incremented the mapcount). But we won't care about
> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them
> although they are fake-present already sounds suspiciously wrong.
> 
> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Forgot to comment on something:

> @@ -1517,14 +1514,13 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  			if (pmd_swp_uffd_wp(pmd))
>  				flags |= PM_UFFD_WP;
>  			VM_BUG_ON(!is_pmd_migration_entry(pmd));
> -			migration = is_migration_entry(entry);
>  			page = pfn_swap_entry_to_page(entry);

We do not really need to get the page anymore here as that is the non-present
part.

Then we could get away without checking the flags as only page != NULL
would mean a present pmd.

Not that we gain much as this is far from being a hot-path, but just
saying..
Oscar Salvador June 11, 2024, 7:13 a.m. UTC | #3
On Mon, Jun 10, 2024 at 06:38:33AM +0200, Oscar Salvador wrote:
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Uh, I spaced out here, sorry.

Reviewed-by: Oscar Salvador <osalvador@suse.de>
David Hildenbrand June 11, 2024, 10:45 a.m. UTC | #4
On 10.06.24 06:38, Oscar Salvador wrote:
> On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote:
>> Relying on the mapcount for non-present PTEs that reference pages
>> doesn't make any sense: they are not accounted in the mapcount, so
>> page_mapcount() == 1 won't return the result we actually want to know.
>>
>> While we don't check the mapcount for migration entries already, we
>> could end up checking it for swap, hwpoison, device exclusive, ...
>> entries, which we really shouldn't.
>>
>> There is one exception: device private entries, which we consider
>> fake-present (e.g., incremented the mapcount). But we won't care about
>> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them
>> although they are fake-present already sounds suspiciously wrong.
>>
>> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT.
> 
> Alternatively we could use is_pfn_swap_entry?

It's all weird, because only device private fake swp entries are 
fake-present. For these, we might want to use PM_PRESENT, but I don't 
care enough about device private entries to handle that here in a better 
way :)

Indicating PM_SWAP for something that is not swap (migration/poison/...) 
is also a bit weird. But likely nobody cared about that for now: it's 
either present (PM_PRESENT), something else (PM_SWAP), or nothing is 
there (no bit set).

Thanks!
David Hildenbrand June 11, 2024, 10:50 a.m. UTC | #5
On 10.06.24 06:49, Oscar Salvador wrote:
> On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote:
>> Relying on the mapcount for non-present PTEs that reference pages
>> doesn't make any sense: they are not accounted in the mapcount, so
>> page_mapcount() == 1 won't return the result we actually want to know.
>>
>> While we don't check the mapcount for migration entries already, we
>> could end up checking it for swap, hwpoison, device exclusive, ...
>> entries, which we really shouldn't.
>>
>> There is one exception: device private entries, which we consider
>> fake-present (e.g., incremented the mapcount). But we won't care about
>> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them
>> although they are fake-present already sounds suspiciously wrong.
>>
>> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Forgot to comment on something:
> 
>> @@ -1517,14 +1514,13 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>>   			if (pmd_swp_uffd_wp(pmd))
>>   				flags |= PM_UFFD_WP;
>>   			VM_BUG_ON(!is_pmd_migration_entry(pmd));
>> -			migration = is_migration_entry(entry);
>>   			page = pfn_swap_entry_to_page(entry);
> 
> We do not really need to get the page anymore here as that is the non-present
> part.
> 
> Then we could get away without checking the flags as only page != NULL
> would mean a present pmd.
> 
> Not that we gain much as this is far from being a hot-path, but just
> saying..

I *think* we still want that for indicating PM_FILE after patch #1.
David Hildenbrand June 11, 2024, 10:51 a.m. UTC | #6
On 11.06.24 09:13, Oscar Salvador wrote:
> On Mon, Jun 10, 2024 at 06:38:33AM +0200, Oscar Salvador wrote:
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> 
> Uh, I spaced out here, sorry.
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

:)

Thanks!
Oscar Salvador June 11, 2024, 11:15 a.m. UTC | #7
On Tue, Jun 11, 2024 at 12:50:46PM +0200, David Hildenbrand wrote:
> I *think* we still want that for indicating PM_FILE after patch #1.

Yes, we do, disregard that comment please.
diff mbox series

Patch

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 08465b904ced5..40e38bc33a9d2 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1415,7 +1415,6 @@  static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 {
 	u64 frame = 0, flags = 0;
 	struct page *page = NULL;
-	bool migration = false;
 
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
@@ -1447,7 +1446,6 @@  static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			    (offset << MAX_SWAPFILES_SHIFT);
 		}
 		flags |= PM_SWAP;
-		migration = is_migration_entry(entry);
 		if (is_pfn_swap_entry(entry))
 			page = pfn_swap_entry_to_page(entry);
 		if (pte_marker_entry_uffd_wp(entry))
@@ -1456,7 +1454,7 @@  static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
-	if (page && !migration && page_mapcount(page) == 1)
+	if (page && (flags & PM_PRESENT) && page_mapcount(page) == 1)
 		flags |= PM_MMAP_EXCLUSIVE;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
@@ -1473,7 +1471,6 @@  static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 	pte_t *pte, *orig_pte;
 	int err = 0;
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	bool migration = false;
 
 	ptl = pmd_trans_huge_lock(pmdp, vma);
 	if (ptl) {
@@ -1517,14 +1514,13 @@  static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			if (pmd_swp_uffd_wp(pmd))
 				flags |= PM_UFFD_WP;
 			VM_BUG_ON(!is_pmd_migration_entry(pmd));
-			migration = is_migration_entry(entry);
 			page = pfn_swap_entry_to_page(entry);
 		}
 #endif
 
 		if (page && !PageAnon(page))
 			flags |= PM_FILE;
-		if (page && !migration && page_mapcount(page) == 1)
+		if (page && (flags & PM_PRESENT) && page_mapcount(page) == 1)
 			flags |= PM_MMAP_EXCLUSIVE;
 
 		for (; addr != end; addr += PAGE_SIZE) {