diff mbox series

[v2] mm: Add PM_THP to /proc/pid/pagemap

Message ID 20211104214636.450782-1-almasrymina@google.com (mailing list archive)
State New
Headers show
Series [v2] mm: Add PM_THP to /proc/pid/pagemap | expand

Commit Message

Mina Almasry Nov. 4, 2021, 9:46 p.m. UTC
Add PM_THP to allow userspace to detect whether a given virt address is
currently mapped by a hugepage or not.

Example use case is a process requesting hugepages from the kernel (via
a huge tmpfs mount for example), for a performance critical region of
memory.  The userspace may want to query whether the kernel is actually
backing this memory by hugepages or not.

Tested manually by adding logging into transhuge-stress.

Signed-off-by: Mina Almasry <almasrymina@google.com>

Cc: David Rientjes rientjes@google.com
Cc: Paul E. McKenney <paulmckrcu@fb.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org

---
 fs/proc/task_mmu.c                            |  5 +++++
 tools/testing/selftests/vm/transhuge-stress.c | 21 +++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

--
2.34.0.rc0.344.g81b53c2807-goog

Comments

Matthew Wilcox (Oracle) Nov. 4, 2021, 10:05 p.m. UTC | #1
On Thu, Nov 04, 2021 at 02:46:35PM -0700, Mina Almasry wrote:
> Add PM_THP to allow userspace to detect whether a given virt address is
> currently mapped by a hugepage or not.

Well, no, that's not what that means.

> @@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  		flags |= PM_FILE;
>  	if (page && page_mapcount(page) == 1)
>  		flags |= PM_MMAP_EXCLUSIVE;
> +	if (page && PageTransCompound(page))
> +		flags |= PM_THP;

All that PageTransCompound() does is call PageCompound().  It doesn't
tell you if the underlying allocation is PMD sized, nor properly aligned.

And you didn't answer my question about whether you want information about
whether a large page is being used that's not quite as large as a PMD.
Mina Almasry Nov. 4, 2021, 10:45 p.m. UTC | #2
On Thu, Nov 4, 2021 at 3:08 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Nov 04, 2021 at 02:46:35PM -0700, Mina Almasry wrote:
> > Add PM_THP to allow userspace to detect whether a given virt address is
> > currently mapped by a hugepage or not.
>
> Well, no, that's not what that means.
>

Sorry, that was the intention, but I didn't implement the intention correctly.

> > @@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> >               flags |= PM_FILE;
> >       if (page && page_mapcount(page) == 1)
> >               flags |= PM_MMAP_EXCLUSIVE;
> > +     if (page && PageTransCompound(page))
> > +             flags |= PM_THP;
>
> All that PageTransCompound() does is call PageCompound().  It doesn't
> tell you if the underlying allocation is PMD sized, nor properly aligned.
>
> And you didn't answer my question about whether you want information about
> whether a large page is being used that's not quite as large as a PMD.
>

Sorry, I thought the implementation would make it clear but I didn't
do that correctly. Right now and for the foreseeable future what I
want to know is whether the page is mapped by a PMD. All the below
work for me:

1. Flag is set if the page is either a PMD size THP page.
2. Flag is set if the page is either a PMD size THP page or PMD size
hugetlbfs page.
3. Flag is set if the page is either a PMD size THP page or PMD size
hugetlbfs page or contig PTE size hugetlbfs page.

I prefer #2 and I think it's maybe most extensible for future use
cases that 1 flag tells whether the page is PMD hugepage and another
flag is a large cont PTE page.
Mina Almasry Nov. 7, 2021, 10:56 p.m. UTC | #3
On Thu, Nov 4, 2021 at 3:45 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Thu, Nov 4, 2021 at 3:08 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Nov 04, 2021 at 02:46:35PM -0700, Mina Almasry wrote:
> > > Add PM_THP to allow userspace to detect whether a given virt address is
> > > currently mapped by a hugepage or not.
> >
> > Well, no, that's not what that means.
> >
>
> Sorry, that was the intention, but I didn't implement the intention correctly.
>
> > > @@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> > >               flags |= PM_FILE;
> > >       if (page && page_mapcount(page) == 1)
> > >               flags |= PM_MMAP_EXCLUSIVE;
> > > +     if (page && PageTransCompound(page))
> > > +             flags |= PM_THP;
> >
> > All that PageTransCompound() does is call PageCompound().  It doesn't
> > tell you if the underlying allocation is PMD sized, nor properly aligned.
> >

Sorry Matthew again for getting this check wrong. After taking a
deeper look, you're completely correct. My check was returning true on
all compound pages without regard to whether they are actually THP, or
whether they're mapped at the PMD level.

I've renamed the flag from PM_THP to PM_HUGE_THP_MAPPING to be more
accurate, and it looks to me like the correct check is if we're in
pagemap_pmd_range() and the underlying page is_transparent_huegpage(),
then we set the flag.

I'm about to upload v3 with this new check; please take another look.
Thank you for catching this.

> > And you didn't answer my question about whether you want information about
> > whether a large page is being used that's not quite as large as a PMD.
> >
>
> Sorry, I thought the implementation would make it clear but I didn't
> do that correctly. Right now and for the foreseeable future what I
> want to know is whether the page is mapped by a PMD. All the below
> work for me:
>
> 1. Flag is set if the page is either a PMD size THP page.
> 2. Flag is set if the page is either a PMD size THP page or PMD size
> hugetlbfs page.
> 3. Flag is set if the page is either a PMD size THP page or PMD size
> hugetlbfs page or contig PTE size hugetlbfs page.
>
> I prefer #2 and I think it's maybe most extensible for future use
> cases that 1 flag tells whether the page is PMD hugepage and another
> flag is a large cont PTE page.
diff mbox series

Patch

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ad667dbc96f5c..9847514937fc7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1302,6 +1302,7 @@  struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_THP			BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1396,6 +1397,8 @@  static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
 		flags |= PM_MMAP_EXCLUSIVE;
+	if (page && PageTransCompound(page))
+		flags |= PM_THP;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;

@@ -1456,6 +1459,8 @@  static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,

 		if (page && page_mapcount(page) == 1)
 			flags |= PM_MMAP_EXCLUSIVE;
+		if (page && PageTransCompound(page))
+			flags |= PM_THP;

 		for (; addr != end; addr += PAGE_SIZE) {
 			pagemap_entry_t pme = make_pme(frame, flags);
diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c
index fd7f1b4a96f94..7dce18981fff5 100644
--- a/tools/testing/selftests/vm/transhuge-stress.c
+++ b/tools/testing/selftests/vm/transhuge-stress.c
@@ -16,6 +16,12 @@ 
 #include <string.h>
 #include <sys/mman.h>

+/*
+ * We can use /proc/pid/pagemap to detect whether the kernel was able to find
+ * hugepages or no. This can be very noisy, so is disabled by default.
+ */
+#define NO_DETECT_HUGEPAGES
+
 #define PAGE_SHIFT 12
 #define HPAGE_SHIFT 21

@@ -23,6 +29,7 @@ 
 #define HPAGE_SIZE (1 << HPAGE_SHIFT)

 #define PAGEMAP_PRESENT(ent)	(((ent) & (1ull << 63)) != 0)
+#define PAGEMAP_THP(ent)	(((ent) & (1ull << 58)) != 0)
 #define PAGEMAP_PFN(ent)	((ent) & ((1ull << 55) - 1))

 int pagemap_fd;
@@ -47,10 +54,16 @@  int64_t allocate_transhuge(void *ptr)
 			(uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent))
 		err(2, "read pagemap");

-	if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1]) &&
-	    PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) &&
-	    !(PAGEMAP_PFN(ent[0]) & ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1)))
-		return PAGEMAP_PFN(ent[0]);
+	if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1])) {
+#ifndef NO_DETECT_HUGEPAGES
+		if (!PAGEMAP_THP(ent[0]))
+			fprintf(stderr, "WARNING: detected non THP page\n");
+#endif
+		if (PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) &&
+		    !(PAGEMAP_PFN(ent[0]) &
+		      ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1)))
+			return PAGEMAP_PFN(ent[0]);
+	}

 	return -1;
 }