Message ID | 20220609185201.19932-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] madvise.2: Clarify addr/length and update hugetlb support | expand |
Hi, Mike and Peter! On 6/9/22 20:52, Mike Kravetz wrote: > Clarify that madvise only works on full pages, and remove references > to 'bytes'. > > Update MADV_DONTNEED and MADV_REMOVE sections to remove notes that > HugeTLB mappings are not supported. Indicate the releases when they > were first supported as well as alignment restrictions. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > Acked-by: Peter Xu <peterx@redhat.com> Thanks for the patch. Applied. And thanks, Peter, for reviewing it! Cheers, Alex > --- > v3 -> v4 Formatting updates (Alex) > v2 -> v3 Rebased on man-pages-5.19-rc1. Minor change to wording for > sunsequent access of data after MADV_REMOVE. > v1 -> v2 Added releases when Huge TLB support was added and moved > alignment requirements to corresponding section. (Peter) > man2/madvise.2 | 33 +++++++++++++++++++++++---------- > 1 file changed, 23 insertions(+), 10 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index 2a8f1cd0a..7fc184e20 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -44,9 +44,14 @@ system call is used to give advice or directions to the kernel > about the address range beginning at address > .I addr > and with size > +.IR length . > +.BR madvise () > +only operates on whole pages, therefore > +.I addr > +must be page-aligned. > +The value of > .I length > -bytes. > -In most cases, > +is rounded up to a multiple of page size. In most cases, s/. /.\n/ But I fixed & amended. > the goal of such advice is to improve system or application performance. > .PP > Initially, the system call supported a set of "conventional" > @@ -126,7 +131,7 @@ The resident set size (RSS) of the calling process will be immediately > reduced however. > .IP > .B MADV_DONTNEED > -cannot be applied to locked pages, Huge TLB pages, or > +cannot be applied to locked pages, or > .B VM_PFNMAP > pages. > (Pages marked with the kernel-internal > @@ -136,6 +141,12 @@ flag are special memory areas that are not managed > by the virtual memory subsystem. > Such pages are typically created by device drivers that > map the pages into user space.) > +.IP > +Support for Huge TLB pages was added in Linux v5.18. > +Addresses within a mapping backed by Huge TLB pages must be aligned > +to the underlying Huge TLB page size, > +and the range length is rounded up > +to a multiple of the underlying Huge TLB page size. > .\" > .\" ====================================================================== > .\" > @@ -153,24 +164,24 @@ Note that some of these operations change the semantics of memory accesses. > .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 > Free up a given range of pages > and its associated backing store. > -This is equivalent to punching a hole in the corresponding byte > +This is equivalent to punching a hole in the corresponding > range of the backing store (see > .BR fallocate (2)). > Subsequent accesses in the specified address range will see > -bytes containing zero. > +data with a value of zero. > .\" Databases want to use this feature to drop a section of their > .\" bufferpool (shared memory segments) - without writing back to > .\" disk/swap space. This feature is also useful for supporting > .\" hot-plug memory on UML. > .IP > The specified address range must be mapped shared and writable. > -This flag cannot be applied to locked pages, Huge TLB pages, or > +This flag cannot be applied to locked pages, or > .B VM_PFNMAP > pages. > .IP > In the initial implementation, only > .BR tmpfs (5) > -was supported > +supported > .BR MADV_REMOVE ; > but since Linux 3.5, > .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > @@ -179,10 +190,12 @@ any filesystem which supports the > .B FALLOC_FL_PUNCH_HOLE > mode also supports > .BR MADV_REMOVE . > -Hugetlbfs fails with the error > -.B EINVAL > -and other filesystems fail with the error > +Filesystems which do not support > +.B MADV_REMOVE > +fail with the error > .BR EOPNOTSUPP . > +.IP > +Support for the Huge TLB filesystem was added in Linux v4.3. > .TP > .BR MADV_DONTFORK " (since Linux 2.6.16)" > .\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4
diff --git a/man2/madvise.2 b/man2/madvise.2 index 2a8f1cd0a..7fc184e20 100644 --- a/man2/madvise.2 +++ b/man2/madvise.2 @@ -44,9 +44,14 @@ system call is used to give advice or directions to the kernel about the address range beginning at address .I addr and with size +.IR length . +.BR madvise () +only operates on whole pages, therefore +.I addr +must be page-aligned. +The value of .I length -bytes. -In most cases, +is rounded up to a multiple of page size. In most cases, the goal of such advice is to improve system or application performance. .PP Initially, the system call supported a set of "conventional" @@ -126,7 +131,7 @@ The resident set size (RSS) of the calling process will be immediately reduced however. .IP .B MADV_DONTNEED -cannot be applied to locked pages, Huge TLB pages, or +cannot be applied to locked pages, or .B VM_PFNMAP pages. (Pages marked with the kernel-internal @@ -136,6 +141,12 @@ flag are special memory areas that are not managed by the virtual memory subsystem. Such pages are typically created by device drivers that map the pages into user space.) +.IP +Support for Huge TLB pages was added in Linux v5.18. +Addresses within a mapping backed by Huge TLB pages must be aligned +to the underlying Huge TLB page size, +and the range length is rounded up +to a multiple of the underlying Huge TLB page size. .\" .\" ====================================================================== .\" @@ -153,24 +164,24 @@ Note that some of these operations change the semantics of memory accesses. .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 Free up a given range of pages and its associated backing store. -This is equivalent to punching a hole in the corresponding byte +This is equivalent to punching a hole in the corresponding range of the backing store (see .BR fallocate (2)). Subsequent accesses in the specified address range will see -bytes containing zero. +data with a value of zero. .\" Databases want to use this feature to drop a section of their .\" bufferpool (shared memory segments) - without writing back to .\" disk/swap space. This feature is also useful for supporting .\" hot-plug memory on UML. .IP The specified address range must be mapped shared and writable. -This flag cannot be applied to locked pages, Huge TLB pages, or +This flag cannot be applied to locked pages, or .B VM_PFNMAP pages. .IP In the initial implementation, only .BR tmpfs (5) -was supported +supported .BR MADV_REMOVE ; but since Linux 3.5, .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 @@ -179,10 +190,12 @@ any filesystem which supports the .B FALLOC_FL_PUNCH_HOLE mode also supports .BR MADV_REMOVE . -Hugetlbfs fails with the error -.B EINVAL -and other filesystems fail with the error +Filesystems which do not support +.B MADV_REMOVE +fail with the error .BR EOPNOTSUPP . +.IP +Support for the Huge TLB filesystem was added in Linux v4.3. .TP .BR MADV_DONTFORK " (since Linux 2.6.16)" .\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4