Message ID | 20240607122357.115423-3-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs/proc: move page_mapcount() to fs/proc/internal.h | expand |
On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote: > Relying on the mapcount for non-present PTEs that reference pages > doesn't make any sense: they are not accounted in the mapcount, so > page_mapcount() == 1 won't return the result we actually want to know. > > While we don't check the mapcount for migration entries already, we > could end up checking it for swap, hwpoison, device exclusive, ... > entries, which we really shouldn't. > > There is one exception: device private entries, which we consider > fake-present (e.g., incremented the mapcount). But we won't care about > that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them > although they are fake-present already sounds suspiciously wrong. > > Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. Alternatively we could use is_pfn_swap_entry? But the PM_PRESENT approach seems more correct. > Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Oscar Salvador <osalvador@suse.de>
On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote: > Relying on the mapcount for non-present PTEs that reference pages > doesn't make any sense: they are not accounted in the mapcount, so > page_mapcount() == 1 won't return the result we actually want to know. > > While we don't check the mapcount for migration entries already, we > could end up checking it for swap, hwpoison, device exclusive, ... > entries, which we really shouldn't. > > There is one exception: device private entries, which we consider > fake-present (e.g., incremented the mapcount). But we won't care about > that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them > although they are fake-present already sounds suspiciously wrong. > > Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. > > Signed-off-by: David Hildenbrand <david@redhat.com> Forgot to comment on something: > @@ -1517,14 +1514,13 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, > if (pmd_swp_uffd_wp(pmd)) > flags |= PM_UFFD_WP; > VM_BUG_ON(!is_pmd_migration_entry(pmd)); > - migration = is_migration_entry(entry); > page = pfn_swap_entry_to_page(entry); We do not really need to get the page anymore here as that is the non-present part. Then we could get away without checking the flags as only page != NULL would mean a present pmd. Not that we gain much as this is far from being a hot-path, but just saying..
On Mon, Jun 10, 2024 at 06:38:33AM +0200, Oscar Salvador wrote: > Signed-off-by: Oscar Salvador <osalvador@suse.de> Uh, I spaced out here, sorry. Reviewed-by: Oscar Salvador <osalvador@suse.de>
On 10.06.24 06:38, Oscar Salvador wrote: > On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote: >> Relying on the mapcount for non-present PTEs that reference pages >> doesn't make any sense: they are not accounted in the mapcount, so >> page_mapcount() == 1 won't return the result we actually want to know. >> >> While we don't check the mapcount for migration entries already, we >> could end up checking it for swap, hwpoison, device exclusive, ... >> entries, which we really shouldn't. >> >> There is one exception: device private entries, which we consider >> fake-present (e.g., incremented the mapcount). But we won't care about >> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them >> although they are fake-present already sounds suspiciously wrong. >> >> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. > > Alternatively we could use is_pfn_swap_entry? It's all weird, because only device private fake swp entries are fake-present. For these, we might want to use PM_PRESENT, but I don't care enough about device private entries to handle that here in a better way :) Indicating PM_SWAP for something that is not swap (migration/poison/...) is also a bit weird. But likely nobody cared about that for now: it's either present (PM_PRESENT), something else (PM_SWAP), or nothing is there (no bit set). Thanks!
On 10.06.24 06:49, Oscar Salvador wrote: > On Fri, Jun 07, 2024 at 02:23:53PM +0200, David Hildenbrand wrote: >> Relying on the mapcount for non-present PTEs that reference pages >> doesn't make any sense: they are not accounted in the mapcount, so >> page_mapcount() == 1 won't return the result we actually want to know. >> >> While we don't check the mapcount for migration entries already, we >> could end up checking it for swap, hwpoison, device exclusive, ... >> entries, which we really shouldn't. >> >> There is one exception: device private entries, which we consider >> fake-present (e.g., incremented the mapcount). But we won't care about >> that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them >> although they are fake-present already sounds suspiciously wrong. >> >> Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Forgot to comment on something: > >> @@ -1517,14 +1514,13 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, >> if (pmd_swp_uffd_wp(pmd)) >> flags |= PM_UFFD_WP; >> VM_BUG_ON(!is_pmd_migration_entry(pmd)); >> - migration = is_migration_entry(entry); >> page = pfn_swap_entry_to_page(entry); > > We do not really need to get the page anymore here as that is the non-present > part. > > Then we could get away without checking the flags as only page != NULL > would mean a present pmd. > > Not that we gain much as this is far from being a hot-path, but just > saying.. I *think* we still want that for indicating PM_FILE after patch #1.
On 11.06.24 09:13, Oscar Salvador wrote: > On Mon, Jun 10, 2024 at 06:38:33AM +0200, Oscar Salvador wrote: >> Signed-off-by: Oscar Salvador <osalvador@suse.de> > > Uh, I spaced out here, sorry. > > Reviewed-by: Oscar Salvador <osalvador@suse.de> :) Thanks!
On Tue, Jun 11, 2024 at 12:50:46PM +0200, David Hildenbrand wrote:
> I *think* we still want that for indicating PM_FILE after patch #1.
Yes, we do, disregard that comment please.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 08465b904ced5..40e38bc33a9d2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1415,7 +1415,6 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, { u64 frame = 0, flags = 0; struct page *page = NULL; - bool migration = false; if (pte_present(pte)) { if (pm->show_pfn) @@ -1447,7 +1446,6 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, (offset << MAX_SWAPFILES_SHIFT); } flags |= PM_SWAP; - migration = is_migration_entry(entry); if (is_pfn_swap_entry(entry)) page = pfn_swap_entry_to_page(entry); if (pte_marker_entry_uffd_wp(entry)) @@ -1456,7 +1454,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, if (page && !PageAnon(page)) flags |= PM_FILE; - if (page && !migration && page_mapcount(page) == 1) + if (page && (flags & PM_PRESENT) && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; @@ -1473,7 +1471,6 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, pte_t *pte, *orig_pte; int err = 0; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - bool migration = false; ptl = pmd_trans_huge_lock(pmdp, vma); if (ptl) { @@ -1517,14 +1514,13 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pmd_swp_uffd_wp(pmd)) flags |= PM_UFFD_WP; VM_BUG_ON(!is_pmd_migration_entry(pmd)); - migration = is_migration_entry(entry); page = pfn_swap_entry_to_page(entry); } #endif if (page && !PageAnon(page)) flags |= PM_FILE; - if (page && !migration && page_mapcount(page) == 1) + if (page && (flags & PM_PRESENT) && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; for (; addr != end; addr += PAGE_SIZE) {
Relying on the mapcount for non-present PTEs that reference pages doesn't make any sense: they are not accounted in the mapcount, so page_mapcount() == 1 won't return the result we actually want to know. While we don't check the mapcount for migration entries already, we could end up checking it for swap, hwpoison, device exclusive, ... entries, which we really shouldn't. There is one exception: device private entries, which we consider fake-present (e.g., incremented the mapcount). But we won't care about that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them although they are fake-present already sounds suspiciously wrong. Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. Signed-off-by: David Hildenbrand <david@redhat.com> --- fs/proc/task_mmu.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)