diff mbox series

[man-pages,4/4] madvise.2: add documentation for MADV_COLLAPSE

Message ID 20221017175523.2048887-5-zokeefe@google.com (mailing list archive)
State New
Headers show
Series Add MADV_COLLAPSE documentation | expand

Commit Message

Zach O'Keefe Oct. 17, 2022, 5:55 p.m. UTC
From: Zach O'Keefe <zokeefe@google.com>

Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE").  Update the man-pages for madvise(2) and
process_madvise(2).

Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2         | 91 +++++++++++++++++++++++++++++++++++++++++-
 man2/process_madvise.2 | 10 +++++
 2 files changed, 99 insertions(+), 2 deletions(-)

Comments

Alejandro Colomar Oct. 18, 2022, 10:47 a.m. UTC | #1
Hi Zach,

On 10/17/22 19:55, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> process_madvise(2).
> 
> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Please see some comments below.
There are a few more cases were I'd break the lines at different points, 
but there are few, so I'll apply them with an amend.

Thanks!

Alex

> ---
>   man2/madvise.2         | 91 +++++++++++++++++++++++++++++++++++++++++-
>   man2/process_madvise.2 | 10 +++++
>   2 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index adfe24c24..7da44fac4 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -384,9 +384,10 @@ set (see
>   .BR prctl (2) ).
>   .IP
>   The
> -.B MADV_HUGEPAGE
> -and
> +.BR MADV_HUGEPAGE ,
>   .B MADV_NOHUGEPAGE

Please add a comma before 'and' (Oxford comma).

> +and
> +.B MADV_COLLAPSE
>   operations are available only if the kernel was configured with
>   .B CONFIG_TRANSPARENT_HUGEPAGE
>   and file/shmem memory is only supported if the kernel was configured with
> @@ -399,6 +400,82 @@ and
>   .I length
>   will not be backed by transparent hugepages.
>   .TP
> +.BR MADV_COLLAPSE " (since Linux 6.1)"
> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> +Perform a best-effort synchronous collapse of the native pages mapped by the
> +memory range into Transparent Huge Pages (THPs).
> +.B MADV_COLLAPSE
> +operates on the current state of memory of the calling process and makes no
> +persistent changes or guarantees on how pages will be mapped,
> +constructed,
> +or faulted in the future.
> +.IP
> +.B MADV_COLLAPSE
> +supports private anonymous pages (see
> +.BR mmap (2)),
> +shmem-backed pages
> +(including tmpfs (see
> +.BR tmpfs (5)),

s/))/)))/

probably, but maybe you want to reword using commas or em dashes. 
Please check.

> +and file-backed pages. See

No continuation after '.'.  :)

> +.B MADV_HUGEPAGE
> +for general information on memory requirements for THP.
> +If the range provided spans multiple VMAs,
> +the semantics of the collapse over each VMA is independent from the others.
> +If collapse of a given huge page-aligned/sized region fails,
> +the operation may continue to attempt collapsing the remainder of the
> +specified memory.
> +.B MADV_COLLAPSE
> +will automatically clamp the provided range to be hugepage-aligned.
> +.IP
> +All non-resident pages covered by the range will first be
> +swapped/faulted-in,
> +before being copied onto a freshly allocated hugepage.
> +If the native pages compose the same PTE-mapped hugepage,
> +and are suitably aligned,
> +allocation of a new hugepage may be elided and collapse may happen
> +in-place.
> +Unmapped pages will have their data directly initialized to 0 in the new
> +hugepage.
> +However,
> +for every eligible hugepage-aligned/sized region to be collapsed,
> +at least one page must currently be backed by physical memory.
> +.IP
> +.BR MADV_COLLAPSE
> +is independent of any sysfs
> +(see
> +.BR sysfs (5) )

s/) )/))/

> +setting under
> +.IR /sys/kernel/mm/transparent_hugepage ,
> +both in terms of determining THP eligibility,
> +and allocation semantics.
> +See Linux kernel source file
> +.I Documentation/admin\-guide/mm/transhuge.rst
> +for more information.
> +.BR MADV_COLLAPSE
> +also ignores
> +.B huge=
> +tmpfs mount when operating on tmpfs files.
> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> +regardless of VMA flags
> +(though
> +.BR VM_NOHUGEPAGE
> +is still respected).
> +.IP
> +When the system has multiple NUMA nodes,
> +the hugepage will be allocated from the node providing the most native
> +pages.
> +.IP
> +If all hugepage-sized/aligned regions covered by the provided range were
> +either successfully collapsed,
> +or were already PMD-mapped THPs,
> +this operation will be deemed successful.
> +Note that this doesn’t guarantee anything about other possible mappings of
> +the memory.
> +Also note that many failures might have occurred since the operation may
> +continue to collapse in the event collapse of a single hugepage-sized/aligned
> +region fails.
> +.TP
>   .BR MADV_DONTDUMP " (since Linux 3.4)"
>   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
>   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> @@ -618,6 +695,11 @@ A kernel resource was temporarily unavailable.
>   .B EBADF
>   The map exists, but the area maps something that isn't a file.
>   .TP
> +.B EBUSY
> +(for
> +.BR MADV_COLLAPSE )
> +Could not charge hugepage to cgroup: cgroup limit exceeded.
> +.TP
>   .B EFAULT
>   .I advice
>   is
> @@ -715,6 +797,11 @@ maximum resident set size.
>   Not enough memory: paging in failed.
>   .TP
>   .B ENOMEM
> +(for
> +.BR MADV_COLLAPSE )
> +Not enough memory: could not allocate hugepage.
> +.TP
> +.B ENOMEM
>   Addresses in the specified range are not currently
>   mapped, or are outside the address space of the process.
>   .TP
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 7bee1a098..900210106 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -73,6 +73,10 @@ argument is one of the following values:
>   See
>   .BR madvise (2).
>   .TP
> +.B MADV_COLLAPSE
> +See
> +.BR madvise (2).
> +.TP
>   .B MADV_PAGEOUT
>   See
>   .BR madvise (2).
> @@ -170,6 +174,12 @@ The caller does not have permission to access the address space of the process
>   .TP
>   .B ESRCH
>   The target process does not exist (i.e., it has terminated and been waited on).
> +.PP
> +See
> +.BR madvise (2)
> +for
> +.IR advice -specific
> +errors.
>   .SH VERSIONS
>   This system call first appeared in Linux 5.10.
>   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
Zach O'Keefe Oct. 18, 2022, 9:54 p.m. UTC | #2
Hey Alex,

On Tue, Oct 18, 2022 at 3:47 AM Alex Colomar <alx.manpages@gmail.com> wrote:
>
> Hi Zach,
>
> On 10/17/22 19:55, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> > MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> > process_madvise(2).
> >
> > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Please see some comments below.
> There are a few more cases were I'd break the lines at different points,
> but there are few, so I'll apply them with an amend.
>
> Thanks!
>
> Alex

Thank you :) Greatly appreciated. I'll take a look at the patch
post-amend to see what I could have done. All the mentioned fixes
(thanks for pointing them out) will be included in v2.

Best,
Zach
diff mbox series

Patch

diff --git a/man2/madvise.2 b/man2/madvise.2
index adfe24c24..7da44fac4 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -384,9 +384,10 @@  set (see
 .BR prctl (2) ).
 .IP
 The
-.B MADV_HUGEPAGE
-and
+.BR MADV_HUGEPAGE ,
 .B MADV_NOHUGEPAGE
+and
+.B MADV_COLLAPSE
 operations are available only if the kernel was configured with
 .B CONFIG_TRANSPARENT_HUGEPAGE
 and file/shmem memory is only supported if the kernel was configured with
@@ -399,6 +400,82 @@  and
 .I length
 will not be backed by transparent hugepages.
 .TP
+.BR MADV_COLLAPSE " (since Linux 6.1)"
+.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
+.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
+Perform a best-effort synchronous collapse of the native pages mapped by the
+memory range into Transparent Huge Pages (THPs).
+.B MADV_COLLAPSE
+operates on the current state of memory of the calling process and makes no
+persistent changes or guarantees on how pages will be mapped,
+constructed,
+or faulted in the future.
+.IP
+.B MADV_COLLAPSE
+supports private anonymous pages (see
+.BR mmap (2)),
+shmem-backed pages
+(including tmpfs (see
+.BR tmpfs (5)),
+and file-backed pages. See
+.B MADV_HUGEPAGE
+for general information on memory requirements for THP.
+If the range provided spans multiple VMAs,
+the semantics of the collapse over each VMA is independent from the others.
+If collapse of a given huge page-aligned/sized region fails,
+the operation may continue to attempt collapsing the remainder of the
+specified memory.
+.B MADV_COLLAPSE
+will automatically clamp the provided range to be hugepage-aligned.
+.IP
+All non-resident pages covered by the range will first be
+swapped/faulted-in,
+before being copied onto a freshly allocated hugepage.
+If the native pages compose the same PTE-mapped hugepage,
+and are suitably aligned,
+allocation of a new hugepage may be elided and collapse may happen
+in-place.
+Unmapped pages will have their data directly initialized to 0 in the new
+hugepage.
+However,
+for every eligible hugepage-aligned/sized region to be collapsed,
+at least one page must currently be backed by physical memory.
+.IP
+.BR MADV_COLLAPSE
+is independent of any sysfs
+(see
+.BR sysfs (5) )
+setting under
+.IR /sys/kernel/mm/transparent_hugepage ,
+both in terms of determining THP eligibility,
+and allocation semantics.
+See Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more information.
+.BR MADV_COLLAPSE
+also ignores
+.B huge=
+tmpfs mount when operating on tmpfs files.
+Allocation for the new hugepage may enter direct reclaim and/or compaction,
+regardless of VMA flags
+(though
+.BR VM_NOHUGEPAGE
+is still respected).
+.IP
+When the system has multiple NUMA nodes,
+the hugepage will be allocated from the node providing the most native
+pages.
+.IP
+If all hugepage-sized/aligned regions covered by the provided range were
+either successfully collapsed,
+or were already PMD-mapped THPs,
+this operation will be deemed successful.
+Note that this doesn’t guarantee anything about other possible mappings of
+the memory.
+Also note that many failures might have occurred since the operation may
+continue to collapse in the event collapse of a single hugepage-sized/aligned
+region fails.
+.TP
 .BR MADV_DONTDUMP " (since Linux 3.4)"
 .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
 .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
@@ -618,6 +695,11 @@  A kernel resource was temporarily unavailable.
 .B EBADF
 The map exists, but the area maps something that isn't a file.
 .TP
+.B EBUSY
+(for
+.BR MADV_COLLAPSE )
+Could not charge hugepage to cgroup: cgroup limit exceeded.
+.TP
 .B EFAULT
 .I advice
 is
@@ -715,6 +797,11 @@  maximum resident set size.
 Not enough memory: paging in failed.
 .TP
 .B ENOMEM
+(for
+.BR MADV_COLLAPSE )
+Not enough memory: could not allocate hugepage.
+.TP
+.B ENOMEM
 Addresses in the specified range are not currently
 mapped, or are outside the address space of the process.
 .TP
diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 7bee1a098..900210106 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -73,6 +73,10 @@  argument is one of the following values:
 See
 .BR madvise (2).
 .TP
+.B MADV_COLLAPSE
+See
+.BR madvise (2).
+.TP
 .B MADV_PAGEOUT
 See
 .BR madvise (2).
@@ -170,6 +174,12 @@  The caller does not have permission to access the address space of the process
 .TP
 .B ESRCH
 The target process does not exist (i.e., it has terminated and been waited on).
+.PP
+See
+.BR madvise (2)
+for
+.IR advice -specific
+errors.
 .SH VERSIONS
 This system call first appeared in Linux 5.10.
 .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc