diff mbox series

[v2,5/6] shmem: update documentation

Message ID 20230309230545.2930737-6-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series tmpfs: add the option to disable swap | expand

Commit Message

Luis Chamberlain March 9, 2023, 11:05 p.m. UTC
Update the docs to reflect a bit better why some folks prefer tmpfs
over ramfs and clarify a bit more about the difference between brd
ramdisks.

While at it, add THP docs for tmpfs, both the mount options and the
sysfs file.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 Documentation/filesystems/tmpfs.rst | 57 +++++++++++++++++++++++++----
 1 file changed, 49 insertions(+), 8 deletions(-)

Comments

Hugh Dickins April 18, 2023, 5:29 a.m. UTC | #1
On Thu, 9 Mar 2023, Luis Chamberlain wrote:

> Update the docs to reflect a bit better why some folks prefer tmpfs
> over ramfs and clarify a bit more about the difference between brd
> ramdisks.
> 
> While at it, add THP docs for tmpfs, both the mount options and the
> sysfs file.

Okay: the original canonical reference for THP options on tmpfs has
been Documentation/admin-guide/mm/transhuge.rst.  You're right that
they would be helpful here too: IIRC (but I might well be confusing
with our Google tree) we used to have them documented in both places,
but grew tired of keeping the two in synch.  You're volunteering to
do so! so please check now that they tell the same story.

But nowadays, "man 5 tmpfs" is much more important (and that might
give you a hint for what needs to be done after this series goes into
6.4-rc - and I wonder if there are tmpfs manpage updates needed from
Christian for idmapped too? or already taken care of?).

There's a little detail we do need you to remove, indicated below.

> 
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  Documentation/filesystems/tmpfs.rst | 57 +++++++++++++++++++++++++----
>  1 file changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..1ec9a9f8196b 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,14 +13,25 @@ everything stored therein is lost.
>  
>  tmpfs puts everything into the kernel internal caches and grows and
>  shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space. It has maximum size limits which can
> -be adjusted on the fly via 'mount -o remount ...'
> -
> -If you compare it to ramfs (which was the template to create tmpfs)
> -you gain swapping and limit checking. Another similar thing is the RAM
> -disk (/dev/ram*), which simulates a fixed size hard disk in physical
> -RAM, where you have to create an ordinary filesystem on top. Ramdisks
> -cannot swap and you do not have the possibility to resize them.
> +unneeded pages out to swap space, and supports THP.
> +
> +tmpfs extends ramfs with a few userspace configurable options listed and
> +explained further below, some of which can be reconfigured dynamically on the
> +fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
> +filesystem can be resized but it cannot be resized to a size below its current
> +usage. tmpfs also supports POSIX ACLs, and extended attributes for the
> +trusted.* and security.* namespaces. ramfs does not use swap and you cannot
> +modify any parameter for a ramfs filesystem. The size limit of a ramfs
> +filesystem is how much memory you have available, and so care must be taken if
> +used so to not run out of memory.
> +
> +An alternative to tmpfs and ramfs is to use brd to create RAM disks
> +(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
> +To write data you would just then need to create an regular filesystem on top
> +this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
> +configured in size at initialization and you cannot dynamically resize them.
> +Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
> +block layer at all.
>  
>  Since tmpfs lives completely in the page cache and on swap, all tmpfs
>  pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> @@ -85,6 +96,36 @@ mount with such options, since it allows any user with write access to
>  use up all the memory on the machine; but enhances the scalability of
>  that instance in a system with many CPUs making intensive use of it.
>  
> +tmpfs also supports Transparent Huge Pages which requires a kernel
> +configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
> +your system (has_transparent_hugepage(), which is architecture specific).
> +The mount options for this are:
> +
> +======  ============================================================
> +huge=0  never: disables huge pages for the mount
> +huge=1  always: enables huge pages for the mount
> +huge=2  within_size: only allocate huge pages if the page will be
> +        fully within i_size, also respect fadvise()/madvise() hints.
> +huge=3  advise: only allocate huge pages if requested with
> +        fadvise()/madvise()

You're taking the source too literally there.  Minor point is that there
is no fadvise() for this, to date anyway.  Major point is: have you tried
mounting tmpfs with huge=0 etc?  I did propose "huge=0" and "huge=1" years
ago, but those "never" went in, it's "always" been the named options.
Please remove those misleading numbers, it's "huge=never" etc.

(Old Google internal trees excepted: and trying to wean people off
"huge=1" internally makes me a bit touchy when seeing those numbers above!)

> +======  ============================================================
> +
> +There is a sysfs file which you can also use to control system wide THP
> +configuration for all tmpfs mounts, the file is:
> +
> +/sys/kernel/mm/transparent_hugepage/shmem_enabled
> +
> +This sysfs file is placed on top of THP sysfs directory and so is registered
> +by THP code. It is however only used to control all tmpfs mounts with one
> +single knob. Since it controls all tmpfs mounts it should only be used either
> +for emergency or testing purposes. The values you can set for shmem_enabled are:
> +
> +==  ============================================================
> +-1  deny: disables huge on shm_mnt and all mounts, for
> +    emergency use
> +-2  force: enables huge on shm_mnt and all mounts, w/o needing
> +    option, for testing

Likewise here, please delete the invalid "-1" and "-2" notations,
-1 and -2 are just #defines for use in the kernel source.

And the description above is not quite accurate: it is very hard to
describe shmem_enabled, partly because it combines two different things.
It's partly the "huge=" mount option for any "internal mount", those
things like SysV SHM and memfd and i915 and shared-anonymous: the shmem
which has no user-visible mount to hold the option.  But also these
"deny" and "force" overrides affecting *all* internal and visible mounts.

Hugh

> +==  ============================================================
>  
>  tmpfs has a mount option to set the NUMA memory allocation policy for
>  all files in that instance (if CONFIG_NUMA is enabled) - which can be
> -- 
> 2.39.1
Luis Chamberlain April 18, 2023, 9:20 p.m. UTC | #2
On Mon, Apr 17, 2023 at 10:29:59PM -0700, Hugh Dickins wrote:
> On Thu, 9 Mar 2023, Luis Chamberlain wrote:
> 
> > Update the docs to reflect a bit better why some folks prefer tmpfs
> > over ramfs and clarify a bit more about the difference between brd
> > ramdisks.
> > 
> > While at it, add THP docs for tmpfs, both the mount options and the
> > sysfs file.
> 
> Okay: the original canonical reference for THP options on tmpfs has
> been Documentation/admin-guide/mm/transhuge.rst.  You're right that
> they would be helpful here too: IIRC (but I might well be confusing
> with our Google tree) we used to have them documented in both places,
> but grew tired of keeping the two in synch.  You're volunteering to
> do so! so please check now that they tell the same story.

Hehe. Sure, we should just make one point to the other. Which one should
be the authoritive source?

> But nowadays, "man 5 tmpfs" is much more important (and that might
> give you a hint for what needs to be done after this series goes into
> 6.4-rc - and I wonder if there are tmpfs manpage updates needed from
> Christian for idmapped too? or already taken care of?).

Sure, what's the man page git tree to use? I can do that once these
documents are settled as well. I'll send fixes.

> There's a little detail we do need you to remove, indicated below.
> 
> > +======  ============================================================
> > +huge=0  never: disables huge pages for the mount
> > +huge=1  always: enables huge pages for the mount
> > +huge=2  within_size: only allocate huge pages if the page will be
> > +        fully within i_size, also respect fadvise()/madvise() hints.
> > +huge=3  advise: only allocate huge pages if requested with
> > +        fadvise()/madvise()
> 
> You're taking the source too literally there.  Minor point is that there
> is no fadvise() for this, to date anyway.  Major point is: have you tried
> mounting tmpfs with huge=0 etc?  I did propose "huge=0" and "huge=1" years
> ago, but those "never" went in, it's "always" been the named options.
> Please remove those misleading numbers, it's "huge=never" etc.

Will do.

> > +==  ============================================================
> > +-1  deny: disables huge on shm_mnt and all mounts, for
> > +    emergency use
> > +-2  force: enables huge on shm_mnt and all mounts, w/o needing
> > +    option, for testing
> 
> Likewise here, please delete the invalid "-1" and "-2" notations,
> -1 and -2 are just #defines for use in the kernel source.

ok!

> And the description above is not quite accurate: it is very hard to
> describe shmem_enabled, partly because it combines two different things.
> It's partly the "huge=" mount option for any "internal mount", those
> things like SysV SHM and memfd and i915 and shared-anonymous: the shmem
> which has no user-visible mount to hold the option.  But also these
> "deny" and "force" overrides affecting *all* internal and visible mounts.

I see thanks.

  Luis
Hugh Dickins April 18, 2023, 9:41 p.m. UTC | #3
On Tue, 18 Apr 2023, Luis Chamberlain wrote:
> On Mon, Apr 17, 2023 at 10:29:59PM -0700, Hugh Dickins wrote:
> > On Thu, 9 Mar 2023, Luis Chamberlain wrote:
> > 
> > > Update the docs to reflect a bit better why some folks prefer tmpfs
> > > over ramfs and clarify a bit more about the difference between brd
> > > ramdisks.
> > > 
> > > While at it, add THP docs for tmpfs, both the mount options and the
> > > sysfs file.
> > 
> > Okay: the original canonical reference for THP options on tmpfs has
> > been Documentation/admin-guide/mm/transhuge.rst.  You're right that
> > they would be helpful here too: IIRC (but I might well be confusing
> > with our Google tree) we used to have them documented in both places,
> > but grew tired of keeping the two in synch.  You're volunteering to
> > do so! so please check now that they tell the same story.
> 
> Hehe. Sure, we should just make one point to the other. Which one should
> be the authoritive source?

Documentation/admin-guide/mm/transhuge.rst has been the authoritative
source up until this patch, so I suggest it remain so; but good if you
point to it from this Doc - unless in reading it you find that actually
its account is wrong.  (Haha, it refers to fadvise too, never mind that.)

But the man page is more important than either, so it would be good to
point to that too.  Mention the "huge=" option in this document, but
point to elsewhere for the detail of its values.

> 
> > But nowadays, "man 5 tmpfs" is much more important (and that might
> > give you a hint for what needs to be done after this series goes into
> > 6.4-rc - and I wonder if there are tmpfs manpage updates needed from
> > Christian for idmapped too? or already taken care of?).
> 
> Sure, what's the man page git tree to use? I can do that once these
> documents are settled as well. I'll send fixes.

Thanks. I'll look up a mail to lkml from Alejandro and forward that
to you, it has the details.

Hugh
Luis Chamberlain April 18, 2023, 9:49 p.m. UTC | #4
On Tue, Apr 18, 2023 at 02:41:07PM -0700, Hugh Dickins wrote:
> On Tue, 18 Apr 2023, Luis Chamberlain wrote:
> > On Mon, Apr 17, 2023 at 10:29:59PM -0700, Hugh Dickins wrote:
> > > On Thu, 9 Mar 2023, Luis Chamberlain wrote:
> > > 
> > > > Update the docs to reflect a bit better why some folks prefer tmpfs
> > > > over ramfs and clarify a bit more about the difference between brd
> > > > ramdisks.
> > > > 
> > > > While at it, add THP docs for tmpfs, both the mount options and the
> > > > sysfs file.
> > > 
> > > Okay: the original canonical reference for THP options on tmpfs has
> > > been Documentation/admin-guide/mm/transhuge.rst.  You're right that
> > > they would be helpful here too: IIRC (but I might well be confusing
> > > with our Google tree) we used to have them documented in both places,
> > > but grew tired of keeping the two in synch.  You're volunteering to
> > > do so! so please check now that they tell the same story.
> > 
> > Hehe. Sure, we should just make one point to the other. Which one should
> > be the authoritive source?
> 
> Documentation/admin-guide/mm/transhuge.rst has been the authoritative
> source up until this patch, so I suggest it remain so; but good if you
> point to it from this Doc - unless in reading it you find that actually
> its account is wrong.  (Haha, it refers to fadvise too, never mind that.)

Yeah I'll make the tmpfs kdoc point to the transhuge.rst page. I think
that's possible.

> But the man page is more important than either, so it would be good to
> point to that too. 

Sure I'll have the tmpfs kdoc also point to the tmpfs man page.

> Mention the "huge=" option in this document, but
> point to elsewhere for the detail of its values.

Sounds good.

  Luis
diff mbox series

Patch

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 0408c245785e..1ec9a9f8196b 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -13,14 +13,25 @@  everything stored therein is lost.
 
 tmpfs puts everything into the kernel internal caches and grows and
 shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space. It has maximum size limits which can
-be adjusted on the fly via 'mount -o remount ...'
-
-If you compare it to ramfs (which was the template to create tmpfs)
-you gain swapping and limit checking. Another similar thing is the RAM
-disk (/dev/ram*), which simulates a fixed size hard disk in physical
-RAM, where you have to create an ordinary filesystem on top. Ramdisks
-cannot swap and you do not have the possibility to resize them.
+unneeded pages out to swap space, and supports THP.
+
+tmpfs extends ramfs with a few userspace configurable options listed and
+explained further below, some of which can be reconfigured dynamically on the
+fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
+filesystem can be resized but it cannot be resized to a size below its current
+usage. tmpfs also supports POSIX ACLs, and extended attributes for the
+trusted.* and security.* namespaces. ramfs does not use swap and you cannot
+modify any parameter for a ramfs filesystem. The size limit of a ramfs
+filesystem is how much memory you have available, and so care must be taken if
+used so to not run out of memory.
+
+An alternative to tmpfs and ramfs is to use brd to create RAM disks
+(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
+To write data you would just then need to create an regular filesystem on top
+this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
+configured in size at initialization and you cannot dynamically resize them.
+Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
+block layer at all.
 
 Since tmpfs lives completely in the page cache and on swap, all tmpfs
 pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
@@ -85,6 +96,36 @@  mount with such options, since it allows any user with write access to
 use up all the memory on the machine; but enhances the scalability of
 that instance in a system with many CPUs making intensive use of it.
 
+tmpfs also supports Transparent Huge Pages which requires a kernel
+configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
+your system (has_transparent_hugepage(), which is architecture specific).
+The mount options for this are:
+
+======  ============================================================
+huge=0  never: disables huge pages for the mount
+huge=1  always: enables huge pages for the mount
+huge=2  within_size: only allocate huge pages if the page will be
+        fully within i_size, also respect fadvise()/madvise() hints.
+huge=3  advise: only allocate huge pages if requested with
+        fadvise()/madvise()
+======  ============================================================
+
+There is a sysfs file which you can also use to control system wide THP
+configuration for all tmpfs mounts, the file is:
+
+/sys/kernel/mm/transparent_hugepage/shmem_enabled
+
+This sysfs file is placed on top of THP sysfs directory and so is registered
+by THP code. It is however only used to control all tmpfs mounts with one
+single knob. Since it controls all tmpfs mounts it should only be used either
+for emergency or testing purposes. The values you can set for shmem_enabled are:
+
+==  ============================================================
+-1  deny: disables huge on shm_mnt and all mounts, for
+    emergency use
+-2  force: enables huge on shm_mnt and all mounts, w/o needing
+    option, for testing
+==  ============================================================
 
 tmpfs has a mount option to set the NUMA memory allocation policy for
 all files in that instance (if CONFIG_NUMA is enabled) - which can be