diff mbox series

[1/2] fs/proc/task_mmu: add guard region bit to pagemap

Message ID 521d99c08b975fb06a1e7201e971cc24d68196d1.1740139449.git.lorenzo.stoakes@oracle.com (mailing list archive)
State New
Headers show
Series fs/proc/task_mmu: add guard region bit to pagemap | expand

Commit Message

Lorenzo Stoakes Feb. 21, 2025, 12:05 p.m. UTC
Currently there is no means by which users can determine whether a given
page in memory is in fact a guard region, that is having had the
MADV_GUARD_INSTALL madvise() flag applied to it.

This is intentional, as to provide this information in VMA metadata would
contradict the intent of the feature (providing a means to change fault
behaviour at a page table level rather than a VMA level), and would require
VMA metadata operations to scan page tables, which is unacceptable.

In many cases, users have no need to reflect and determine what regions
have been designated guard regions, as it is the user who has established
them in the first place.

But in some instances, such as monitoring software, or software that relies
upon being able to ascertain the nature of mappings within a remote process
for instance, it becomes useful to be able to determine which pages have
the guard region marker applied.

This patch makes use of an unused pagemap bit (58) to provide this
information.

This patch updates the documentation at the same time as making the change
such that the implementation of the feature and the documentation of it are
tied together.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 Documentation/admin-guide/mm/pagemap.rst | 3 ++-
 fs/proc/task_mmu.c                       | 6 +++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Kalesh Singh Feb. 21, 2025, 5:10 p.m. UTC | #1
On Fri, Feb 21, 2025 at 4:05 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Currently there is no means by which users can determine whether a given
> page in memory is in fact a guard region, that is having had the
> MADV_GUARD_INSTALL madvise() flag applied to it.
>
> This is intentional, as to provide this information in VMA metadata would
> contradict the intent of the feature (providing a means to change fault
> behaviour at a page table level rather than a VMA level), and would require
> VMA metadata operations to scan page tables, which is unacceptable.
>
> In many cases, users have no need to reflect and determine what regions
> have been designated guard regions, as it is the user who has established
> them in the first place.
>
> But in some instances, such as monitoring software, or software that relies
> upon being able to ascertain the nature of mappings within a remote process
> for instance, it becomes useful to be able to determine which pages have
> the guard region marker applied.
>
> This patch makes use of an unused pagemap bit (58) to provide this
> information.
>
> This patch updates the documentation at the same time as making the change
> such that the implementation of the feature and the documentation of it are
> tied together.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
>  fs/proc/task_mmu.c                       | 6 +++++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index caba0f52dd36..a297e824f990 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -21,7 +21,8 @@ There are four components to pagemap:
>      * Bit  56    page exclusively mapped (since 4.2)
>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>        Documentation/admin-guide/mm/userfaultfd.rst)
> -    * Bits 58-60 zero
> +    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)

Should this be 6.14 ?

Other than that: Reviewed-by: Kalesh Singh <kaleshsingh@google.com>

Thanks,
Kalesh

> +    * Bits 59-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f02cd362309a..c17615e21a5d 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1632,6 +1632,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY          BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
>  #define PM_UFFD_WP             BIT_ULL(57)
> +#define PM_GUARD_REGION                BIT_ULL(58)
>  #define PM_FILE                        BIT_ULL(61)
>  #define PM_SWAP                        BIT_ULL(62)
>  #define PM_PRESENT             BIT_ULL(63)
> @@ -1732,6 +1733,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>                         page = pfn_swap_entry_to_page(entry);
>                 if (pte_marker_entry_uffd_wp(entry))
>                         flags |= PM_UFFD_WP;
> +               if (is_guard_swp_entry(entry))
> +                       flags |=  PM_GUARD_REGION;
>         }
>
>         if (page) {
> @@ -1931,7 +1934,8 @@ static const struct mm_walk_ops pagemap_ops = {
>   * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
>   * Bit  56    page exclusively mapped
>   * Bit  57    pte is uffd-wp write-protected
> - * Bits 58-60 zero
> + * Bit  58    pte is a guard region
> + * Bits 59-60 zero
>   * Bit  61    page is file-page or shared-anon
>   * Bit  62    page swapped
>   * Bit  63    page present
> --
> 2.48.1
>
Lorenzo Stoakes Feb. 21, 2025, 5:45 p.m. UTC | #2
On Fri, Feb 21, 2025 at 09:10:42AM -0800, Kalesh Singh wrote:
> On Fri, Feb 21, 2025 at 4:05 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > Currently there is no means by which users can determine whether a given
> > page in memory is in fact a guard region, that is having had the
> > MADV_GUARD_INSTALL madvise() flag applied to it.
> >
> > This is intentional, as to provide this information in VMA metadata would
> > contradict the intent of the feature (providing a means to change fault
> > behaviour at a page table level rather than a VMA level), and would require
> > VMA metadata operations to scan page tables, which is unacceptable.
> >
> > In many cases, users have no need to reflect and determine what regions
> > have been designated guard regions, as it is the user who has established
> > them in the first place.
> >
> > But in some instances, such as monitoring software, or software that relies
> > upon being able to ascertain the nature of mappings within a remote process
> > for instance, it becomes useful to be able to determine which pages have
> > the guard region marker applied.
> >
> > This patch makes use of an unused pagemap bit (58) to provide this
> > information.
> >
> > This patch updates the documentation at the same time as making the change
> > such that the implementation of the feature and the documentation of it are
> > tied together.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
> >  fs/proc/task_mmu.c                       | 6 +++++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> > index caba0f52dd36..a297e824f990 100644
> > --- a/Documentation/admin-guide/mm/pagemap.rst
> > +++ b/Documentation/admin-guide/mm/pagemap.rst
> > @@ -21,7 +21,8 @@ There are four components to pagemap:
> >      * Bit  56    page exclusively mapped (since 4.2)
> >      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
> >        Documentation/admin-guide/mm/userfaultfd.rst)
> > -    * Bits 58-60 zero
> > +    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)
>
> Should this be 6.14 ?

We're aiming for the 6.15 merge window so this is correct :>) I don't think
this could be considered a hotfix haha!

>
> Other than that: Reviewed-by: Kalesh Singh <kaleshsingh@google.com>

Thanks! And thanks for review on the other patch also! Appreciated.

>
> Thanks,
> Kalesh
>
> > +    * Bits 59-60 zero
> >      * Bit  61    page is file-page or shared-anon (since 3.5)
> >      * Bit  62    page swapped
> >      * Bit  63    page present
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index f02cd362309a..c17615e21a5d 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -1632,6 +1632,7 @@ struct pagemapread {
> >  #define PM_SOFT_DIRTY          BIT_ULL(55)
> >  #define PM_MMAP_EXCLUSIVE      BIT_ULL(56)
> >  #define PM_UFFD_WP             BIT_ULL(57)
> > +#define PM_GUARD_REGION                BIT_ULL(58)
> >  #define PM_FILE                        BIT_ULL(61)
> >  #define PM_SWAP                        BIT_ULL(62)
> >  #define PM_PRESENT             BIT_ULL(63)
> > @@ -1732,6 +1733,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> >                         page = pfn_swap_entry_to_page(entry);
> >                 if (pte_marker_entry_uffd_wp(entry))
> >                         flags |= PM_UFFD_WP;
> > +               if (is_guard_swp_entry(entry))
> > +                       flags |=  PM_GUARD_REGION;
> >         }
> >
> >         if (page) {
> > @@ -1931,7 +1934,8 @@ static const struct mm_walk_ops pagemap_ops = {
> >   * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
> >   * Bit  56    page exclusively mapped
> >   * Bit  57    pte is uffd-wp write-protected
> > - * Bits 58-60 zero
> > + * Bit  58    pte is a guard region
> > + * Bits 59-60 zero
> >   * Bit  61    page is file-page or shared-anon
> >   * Bit  62    page swapped
> >   * Bit  63    page present
> > --
> > 2.48.1
> >
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index caba0f52dd36..a297e824f990 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -21,7 +21,8 @@  There are four components to pagemap:
     * Bit  56    page exclusively mapped (since 4.2)
     * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
       Documentation/admin-guide/mm/userfaultfd.rst)
-    * Bits 58-60 zero
+    * Bit  58    pte is a guard region (since 6.15) (see madvise (2) man page)
+    * Bits 59-60 zero
     * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f02cd362309a..c17615e21a5d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1632,6 +1632,7 @@  struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_GUARD_REGION		BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1732,6 +1733,8 @@  static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			page = pfn_swap_entry_to_page(entry);
 		if (pte_marker_entry_uffd_wp(entry))
 			flags |= PM_UFFD_WP;
+		if (is_guard_swp_entry(entry))
+			flags |=  PM_GUARD_REGION;
 	}
 
 	if (page) {
@@ -1931,7 +1934,8 @@  static const struct mm_walk_ops pagemap_ops = {
  * Bit  55    pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
  * Bit  56    page exclusively mapped
  * Bit  57    pte is uffd-wp write-protected
- * Bits 58-60 zero
+ * Bit  58    pte is a guard region
+ * Bits 59-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present