mbox series

[v2,0/6] tmpfs: add the option to disable swap

Message ID 20230309230545.2930737-1-mcgrof@kernel.org (mailing list archive)
Headers show
Series tmpfs: add the option to disable swap | expand

Message

Luis Chamberlain March 9, 2023, 11:05 p.m. UTC
Changes on this v2 PATCH series:

  o Added all respective tags for Reviewed-by, Acked-by's
  o David Hildenbrand suggested on the update-docs patch to mention THP.
    It turns out tmpfs.rst makes absolutely no mention to THP at all
    so I added all the relevant options to the docs including the
    system wide sysfs file. All that should hopefully demistify that
    and make it clearer.
  o Yosry Ahmed spell checked my patch "shmem: add support to ignore swap"

Changes since RFCv2 to the first real PATCH series:

  o Added Christian Brauner'd Acked-by for the noswap patch (the only
    change in that patch is just the new shmem_show_options() change I
    describe below).
  o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable()
    to at ensure the folios at least appear in the unevictable LRU.
    Since that is the goal, this accomplishes what we want and the VM
    takes care of things for us. The shem writepage() still uses a stop-gap
    to ensure we don't get called for swap when its shmem uses
    mapping_set_unevictable().
  o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable()
    but upon my review this doesn't make much sense, as shmem_lock() was
    designed to make use of the RLIMIT_MEMLOCK and this was designed for
    files / IPC / unprivileged perf limits. If we were to use
    shmem_lock() we'd bump the count on each new inode. Using
    shmem_lock() would also complicate inode allocation on shmem as
    we'd to unwind on failure from the user_shm_lock(). It would also
    beg the question of when to capture a ucount for an inode, should we
    just share one for the superblock at shmem_fill_super() or do we
    really need to capture it at every single inode creation? In theory
    we could end up with different limits. The simple solution is to
    juse use mapping_set_unevictable() upon inode creation and be done
    with it, as it cannot fail.
  o Update the documentation for tmpfs before / after my patch to
    reflect use cases a bit more clearly between ramfs, tmpfs and brd
    ramdisks.
  o I updated the shmem_show_options() to also reveal the noswap option
    when its used.
  o Address checkpatch style complaint with spaces before tabs on
    shmem_fs.h.

Chances since first RFC:

  o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed
    on writepage() callback for shmem so just remove that.
  o Based on Matthew's feedback the inode is set up early as it is not
    reset in case we split the folio. So now we move all the variables
    we can set up really early.
  o shmem writepage() should only be issued on reclaim, so just move
    the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and
    expectations are easier to read. This also avoid the folio splitting
    in case of that odd case.
  o There are a few cases where the shmem writepage() could possibly
    hit, but in the total_swap_pages we just bail out. We shouldn't be
    splitting the folio then. Likewise for VM_LOCKED case. But for
    a writepage() on a VM_LOCKED case is not expected so we want to
    learn about it so add a WARN_ON_ONCE() on that condition.
  o Based on Yosry Ahmed's feedback the patch which allows tmpfs to
    disable swap now just uses mapping_set_unevictable() on inode
    creation. In that case writepage() should not be called so we
    augment the WARN_ON_ONCE() for writepage() for that case to ensure
    that never happens.

To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.

I'm doing this work as part of future experimentation with tmpfs and the
page cache, but given a common complaint found about tmpfs is the
innability to work without the page cache I figured this might be useful
to others. It turns out it is -- at least Christian Brauner indicates
systemd uses ramfs for a few use-cases because they don't want to use
swap and so having this option would let them move over to using tmpfs
for those small use cases, see systemd-creds(1).

To see if you hit swap:

mkswap /dev/nvme2n1
swapon /dev/nvme2n1
free -h

With swap - what we see today
=============================
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.2Gi       2.2Gi       1.2Gi
Swap:           99Gi       2.8Gi        97Gi


Without swap
=============

free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       387Mi       3.4Gi       2.1Mi        57Mi       3.3Gi
Swap:           99Gi          0B        99Gi
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       2.6Gi       1.2Gi       2.3Gi       2.3Gi       1.1Gi
Swap:           99Gi        21Mi        99Gi

The mix and match remount testing
=================================

# Cannot disable swap after it was first enabled:
mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot disable swap on remount

# Remount with the same noswap option is OK:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
dmesg -c

# Trying to enable swap with a remount after it first disabled:
mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
mount -t tmpfs -o remount -o size=5G           tmpfs /data-tmpfs/
mount: /data-tmpfs: mount point not mounted or bad option.
       dmesg(1) may have more information after failed mount system call.
dmesg -c
tmpfs: Cannot enable swap on remount if it was disabled on first mount

[0] https://github.com/linux-kdevops/kdevops

Luis Chamberlain (6):
  shmem: remove check for folio lock on writepage()
  shmem: set shmem_writepage() variables early
  shmem: move reclaim check early on writepages()
  shmem: skip page split if we're not reclaiming
  shmem: update documentation
  shmem: add support to ignore swap

 Documentation/filesystems/tmpfs.rst  | 66 ++++++++++++++++++++++-----
 Documentation/mm/unevictable-lru.rst |  2 +
 include/linux/shmem_fs.h             |  1 +
 mm/shmem.c                           | 68 ++++++++++++++++++----------
 4 files changed, 103 insertions(+), 34 deletions(-)

Comments

Davidlohr Bueso March 14, 2023, 1:21 a.m. UTC | #1
On Thu, 09 Mar 2023, Luis Chamberlain wrote:

>Changes on this v2 PATCH series:
>
>  o Added all respective tags for Reviewed-by, Acked-by's
>  o David Hildenbrand suggested on the update-docs patch to mention THP.
>    It turns out tmpfs.rst makes absolutely no mention to THP at all
>    so I added all the relevant options to the docs including the
>    system wide sysfs file. All that should hopefully demistify that
>    and make it clearer.
>  o Yosry Ahmed spell checked my patch "shmem: add support to ignore swap"
>
>Changes since RFCv2 to the first real PATCH series:
>
>  o Added Christian Brauner'd Acked-by for the noswap patch (the only
>    change in that patch is just the new shmem_show_options() change I
>    describe below).
>  o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable()
>    to at ensure the folios at least appear in the unevictable LRU.
>    Since that is the goal, this accomplishes what we want and the VM
>    takes care of things for us. The shem writepage() still uses a stop-gap
>    to ensure we don't get called for swap when its shmem uses
>    mapping_set_unevictable().
>  o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable()
>    but upon my review this doesn't make much sense, as shmem_lock() was
>    designed to make use of the RLIMIT_MEMLOCK and this was designed for
>    files / IPC / unprivileged perf limits. If we were to use
>    shmem_lock() we'd bump the count on each new inode. Using
>    shmem_lock() would also complicate inode allocation on shmem as
>    we'd to unwind on failure from the user_shm_lock(). It would also
>    beg the question of when to capture a ucount for an inode, should we
>    just share one for the superblock at shmem_fill_super() or do we
>    really need to capture it at every single inode creation? In theory
>    we could end up with different limits. The simple solution is to
>    juse use mapping_set_unevictable() upon inode creation and be done
>    with it, as it cannot fail.
>  o Update the documentation for tmpfs before / after my patch to
>    reflect use cases a bit more clearly between ramfs, tmpfs and brd
>    ramdisks.
>  o I updated the shmem_show_options() to also reveal the noswap option
>    when its used.
>  o Address checkpatch style complaint with spaces before tabs on
>    shmem_fs.h.
>
>Chances since first RFC:
>
>  o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed
>    on writepage() callback for shmem so just remove that.
>  o Based on Matthew's feedback the inode is set up early as it is not
>    reset in case we split the folio. So now we move all the variables
>    we can set up really early.
>  o shmem writepage() should only be issued on reclaim, so just move
>    the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and
>    expectations are easier to read. This also avoid the folio splitting
>    in case of that odd case.
>  o There are a few cases where the shmem writepage() could possibly
>    hit, but in the total_swap_pages we just bail out. We shouldn't be
>    splitting the folio then. Likewise for VM_LOCKED case. But for
>    a writepage() on a VM_LOCKED case is not expected so we want to
>    learn about it so add a WARN_ON_ONCE() on that condition.
>  o Based on Yosry Ahmed's feedback the patch which allows tmpfs to
>    disable swap now just uses mapping_set_unevictable() on inode
>    creation. In that case writepage() should not be called so we
>    augment the WARN_ON_ONCE() for writepage() for that case to ensure
>    that never happens.
>
>To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.
>
>I'm doing this work as part of future experimentation with tmpfs and the
>page cache, but given a common complaint found about tmpfs is the
>innability to work without the page cache I figured this might be useful
>to others. It turns out it is -- at least Christian Brauner indicates
>systemd uses ramfs for a few use-cases because they don't want to use
>swap and so having this option would let them move over to using tmpfs
>for those small use cases, see systemd-creds(1).
>
>To see if you hit swap:
>
>mkswap /dev/nvme2n1
>swapon /dev/nvme2n1
>free -h
>
>With swap - what we see today
>=============================
>mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
>dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
>free -h
>               total        used        free      shared  buff/cache   available
>Mem:           3.7Gi       2.6Gi       1.2Gi       2.2Gi       2.2Gi       1.2Gi
>Swap:           99Gi       2.8Gi        97Gi
>
>
>Without swap
>=============
>
>free -h
>               total        used        free      shared  buff/cache   available
>Mem:           3.7Gi       387Mi       3.4Gi       2.1Mi        57Mi       3.3Gi
>Swap:           99Gi          0B        99Gi
>mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
>dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
>free -h
>               total        used        free      shared  buff/cache   available
>Mem:           3.7Gi       2.6Gi       1.2Gi       2.3Gi       2.3Gi       1.1Gi
>Swap:           99Gi        21Mi        99Gi
>
>The mix and match remount testing
>=================================
>
># Cannot disable swap after it was first enabled:
>mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
>mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
>mount: /data-tmpfs: mount point not mounted or bad option.
>       dmesg(1) may have more information after failed mount system call.
>dmesg -c
>tmpfs: Cannot disable swap on remount
>
># Remount with the same noswap option is OK:
>mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
>mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
>dmesg -c
>
># Trying to enable swap with a remount after it first disabled:
>mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
>mount -t tmpfs -o remount -o size=5G           tmpfs /data-tmpfs/
>mount: /data-tmpfs: mount point not mounted or bad option.
>       dmesg(1) may have more information after failed mount system call.
>dmesg -c
>tmpfs: Cannot enable swap on remount if it was disabled on first mount

Nice! For the whole series:

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
haoxin March 14, 2023, 2:46 a.m. UTC | #2
All these series looks good to me and i do some test on my virtual 
machine it works well.

so please add Tested-by: Xin Hao <xhao@linux.alibaba.com> .

just one question, if tmpfs pagecache occupies a large amount of memory, 
how can we ensure successful memory reclamation in case of memory shortage?

在 2023/3/10 上午7:05, Luis Chamberlain 写道:
> Changes on this v2 PATCH series:
>
>    o Added all respective tags for Reviewed-by, Acked-by's
>    o David Hildenbrand suggested on the update-docs patch to mention THP.
>      It turns out tmpfs.rst makes absolutely no mention to THP at all
>      so I added all the relevant options to the docs including the
>      system wide sysfs file. All that should hopefully demistify that
>      and make it clearer.
>    o Yosry Ahmed spell checked my patch "shmem: add support to ignore swap"
>
> Changes since RFCv2 to the first real PATCH series:
>
>    o Added Christian Brauner'd Acked-by for the noswap patch (the only
>      change in that patch is just the new shmem_show_options() change I
>      describe below).
>    o Embraced Yosry Ahmed's recommendation to use mapping_set_unevictable()
>      to at ensure the folios at least appear in the unevictable LRU.
>      Since that is the goal, this accomplishes what we want and the VM
>      takes care of things for us. The shem writepage() still uses a stop-gap
>      to ensure we don't get called for swap when its shmem uses
>      mapping_set_unevictable().
>    o I had evaluated using shmem_lock() instead of calling mapping_set_unevictable()
>      but upon my review this doesn't make much sense, as shmem_lock() was
>      designed to make use of the RLIMIT_MEMLOCK and this was designed for
>      files / IPC / unprivileged perf limits. If we were to use
>      shmem_lock() we'd bump the count on each new inode. Using
>      shmem_lock() would also complicate inode allocation on shmem as
>      we'd to unwind on failure from the user_shm_lock(). It would also
>      beg the question of when to capture a ucount for an inode, should we
>      just share one for the superblock at shmem_fill_super() or do we
>      really need to capture it at every single inode creation? In theory
>      we could end up with different limits. The simple solution is to
>      juse use mapping_set_unevictable() upon inode creation and be done
>      with it, as it cannot fail.
>    o Update the documentation for tmpfs before / after my patch to
>      reflect use cases a bit more clearly between ramfs, tmpfs and brd
>      ramdisks.
>    o I updated the shmem_show_options() to also reveal the noswap option
>      when its used.
>    o Address checkpatch style complaint with spaces before tabs on
>      shmem_fs.h.
>
> Chances since first RFC:
>
>    o Matthew suggested BUG_ON(!folio_test_locked(folio)) is not needed
>      on writepage() callback for shmem so just remove that.
>    o Based on Matthew's feedback the inode is set up early as it is not
>      reset in case we split the folio. So now we move all the variables
>      we can set up really early.
>    o shmem writepage() should only be issued on reclaim, so just move
>      the WARN_ON_ONCE(!wbc->for_reclaim) early so that the code and
>      expectations are easier to read. This also avoid the folio splitting
>      in case of that odd case.
>    o There are a few cases where the shmem writepage() could possibly
>      hit, but in the total_swap_pages we just bail out. We shouldn't be
>      splitting the folio then. Likewise for VM_LOCKED case. But for
>      a writepage() on a VM_LOCKED case is not expected so we want to
>      learn about it so add a WARN_ON_ONCE() on that condition.
>    o Based on Yosry Ahmed's feedback the patch which allows tmpfs to
>      disable swap now just uses mapping_set_unevictable() on inode
>      creation. In that case writepage() should not be called so we
>      augment the WARN_ON_ONCE() for writepage() for that case to ensure
>      that never happens.
>
> To test I've used kdevops [0] 8 vpcu 4 GiB libvirt guest on linux-next.
>
> I'm doing this work as part of future experimentation with tmpfs and the
> page cache, but given a common complaint found about tmpfs is the
> innability to work without the page cache I figured this might be useful
> to others. It turns out it is -- at least Christian Brauner indicates
> systemd uses ramfs for a few use-cases because they don't want to use
> swap and so having this option would let them move over to using tmpfs
> for those small use cases, see systemd-creds(1).
>
> To see if you hit swap:
>
> mkswap /dev/nvme2n1
> swapon /dev/nvme2n1
> free -h
>
> With swap - what we see today
> =============================
> mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
> dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
> free -h
>                 total        used        free      shared  buff/cache   available
> Mem:           3.7Gi       2.6Gi       1.2Gi       2.2Gi       2.2Gi       1.2Gi
> Swap:           99Gi       2.8Gi        97Gi
>
>
> Without swap
> =============
>
> free -h
>                 total        used        free      shared  buff/cache   available
> Mem:           3.7Gi       387Mi       3.4Gi       2.1Mi        57Mi       3.3Gi
> Swap:           99Gi          0B        99Gi
> mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
> dd if=/dev/urandom of=/data-tmpfs/5g-rand2 bs=1G count=5
> free -h
>                 total        used        free      shared  buff/cache   available
> Mem:           3.7Gi       2.6Gi       1.2Gi       2.3Gi       2.3Gi       1.1Gi
> Swap:           99Gi        21Mi        99Gi
>
> The mix and match remount testing
> =================================
>
> # Cannot disable swap after it was first enabled:
> mount -t tmpfs            -o size=5G           tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
> mount: /data-tmpfs: mount point not mounted or bad option.
>         dmesg(1) may have more information after failed mount system call.
> dmesg -c
> tmpfs: Cannot disable swap on remount
>
> # Remount with the same noswap option is OK:
> mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G -o noswap tmpfs /data-tmpfs/
> dmesg -c
>
> # Trying to enable swap with a remount after it first disabled:
> mount -t tmpfs            -o size=5G -o noswap tmpfs /data-tmpfs/
> mount -t tmpfs -o remount -o size=5G           tmpfs /data-tmpfs/
> mount: /data-tmpfs: mount point not mounted or bad option.
>         dmesg(1) may have more information after failed mount system call.
> dmesg -c
> tmpfs: Cannot enable swap on remount if it was disabled on first mount
>
> [0] https://github.com/linux-kdevops/kdevops
>
> Luis Chamberlain (6):
>    shmem: remove check for folio lock on writepage()
>    shmem: set shmem_writepage() variables early
>    shmem: move reclaim check early on writepages()
>    shmem: skip page split if we're not reclaiming
>    shmem: update documentation
>    shmem: add support to ignore swap
>
>   Documentation/filesystems/tmpfs.rst  | 66 ++++++++++++++++++++++-----
>   Documentation/mm/unevictable-lru.rst |  2 +
>   include/linux/shmem_fs.h             |  1 +
>   mm/shmem.c                           | 68 ++++++++++++++++++----------
>   4 files changed, 103 insertions(+), 34 deletions(-)
>
Luis Chamberlain March 19, 2023, 8:32 p.m. UTC | #3
On Tue, Mar 14, 2023 at 10:46:28AM +0800, haoxin wrote:
> All these series looks good to me and i do some test on my virtual machine
> it works well.
> 
> so please add Tested-by: Xin Hao <xhao@linux.alibaba.com> .
> 
> just one question, if tmpfs pagecache occupies a large amount of memory, how
> can we ensure successful memory reclamation in case of memory shortage?

If you're disabling swap then you know the only thing you can do is
unmount if you want to help the VM, otherwise the pressure is just
greater for the VM.

  Luis
haoxin March 20, 2023, 11:14 a.m. UTC | #4
在 2023/3/20 上午4:32, Luis Chamberlain 写道:
> On Tue, Mar 14, 2023 at 10:46:28AM +0800, haoxin wrote:
>> All these series looks good to me and i do some test on my virtual machine
>> it works well.
>>
>> so please add Tested-by: Xin Hao<xhao@linux.alibaba.com>  .
>>
>> just one question, if tmpfs pagecache occupies a large amount of memory, how
>> can we ensure successful memory reclamation in case of memory shortage?
> If you're disabling swap then you know the only thing you can do is
> unmount if you want to help the VM, otherwise the pressure is just
> greater for the VM.

Un, what i mean is can we add a priority so that this type of pagecache 
is reclaimed last ?

Instead of just setting the parameter noswap to make it unreclaimed, 
because if such pagecache which

occupy big part of memory which can not be reclaimed, it will cause OOM.


>
>    Luis
Luis Chamberlain March 20, 2023, 9:36 p.m. UTC | #5
On Mon, Mar 20, 2023 at 07:14:22PM +0800, haoxin wrote:
> 
> 在 2023/3/20 上午4:32, Luis Chamberlain 写道:
> > On Tue, Mar 14, 2023 at 10:46:28AM +0800, haoxin wrote:
> > > All these series looks good to me and i do some test on my virtual machine
> > > it works well.
> > > 
> > > so please add Tested-by: Xin Hao<xhao@linux.alibaba.com>  .
> > > 
> > > just one question, if tmpfs pagecache occupies a large amount of memory, how
> > > can we ensure successful memory reclamation in case of memory shortage?
> > If you're disabling swap then you know the only thing you can do is
> > unmount if you want to help the VM, otherwise the pressure is just
> > greater for the VM.
> 
> Un, what i mean is can we add a priority so that this type of pagecache is
> reclaimed last ?

That seems to be a classifier request for something much less aggressive
than mapping_set_unevictable(). My patches *prior* to using mapping_set_unevictable()
are I think closer to what it seems you want, but as noted before by
folks, that also puts unecessary stress on the VM because just fail
reclaim on our writepage().

> Instead of just setting the parameter noswap to make it unreclaimed, because
> if such pagecache which occupy big part of memory which can not be
> reclaimed, it will cause OOM.

You can't simultaneously retain possession of a cake and eat it, too,
once you eat it, its gone and noswap eats the cake because of the
suggestion / decision to follow through with mapping_set_unevictable().

It sounds like you want to make mapping_set_unevictable() optional and
deal with the possible stress incurred writepage() failing? Not quite
sure what else to recommend here.

  Luis
haoxin March 21, 2023, 11:37 a.m. UTC | #6
在 2023/3/21 上午5:36, Luis Chamberlain 写道:
> On Mon, Mar 20, 2023 at 07:14:22PM +0800, haoxin wrote:
>> 在 2023/3/20 上午4:32, Luis Chamberlain 写道:
>>> On Tue, Mar 14, 2023 at 10:46:28AM +0800, haoxin wrote:
>>>> All these series looks good to me and i do some test on my virtual machine
>>>> it works well.
>>>>
>>>> so please add Tested-by: Xin Hao<xhao@linux.alibaba.com>  .
>>>>
>>>> just one question, if tmpfs pagecache occupies a large amount of memory, how
>>>> can we ensure successful memory reclamation in case of memory shortage?
>>> If you're disabling swap then you know the only thing you can do is
>>> unmount if you want to help the VM, otherwise the pressure is just
>>> greater for the VM.
>> Un, what i mean is can we add a priority so that this type of pagecache is
>> reclaimed last ?
> That seems to be a classifier request for something much less aggressive
> than mapping_set_unevictable(). My patches *prior* to using mapping_set_unevictable()
> are I think closer to what it seems you want, but as noted before by
> folks, that also puts unecessary stress on the VM because just fail
> reclaim on our writepage().
>
>> Instead of just setting the parameter noswap to make it unreclaimed, because
>> if such pagecache which occupy big part of memory which can not be
>> reclaimed, it will cause OOM.
> You can't simultaneously retain possession of a cake and eat it, too,
> once you eat it, its gone and noswap eats the cake because of the
> suggestion / decision to follow through with mapping_set_unevictable().
>
> It sounds like you want to make mapping_set_unevictable() optional and
> deal with the possible stress incurred writepage() failing?
Yes, Just a personal idea, in any way, the current patch is an excellent 
implementation,  thank you very much.
>   Not quite
> sure what else to recommend here.
>
>    Luis
Hugh Dickins April 18, 2023, 4:31 a.m. UTC | #7
On Thu, 9 Mar 2023, Luis Chamberlain wrote:

> I'm doing this work as part of future experimentation with tmpfs and the
> page cache, but given a common complaint found about tmpfs is the
> innability to work without the page cache I figured this might be useful
> to others. It turns out it is -- at least Christian Brauner indicates
> systemd uses ramfs for a few use-cases because they don't want to use
> swap and so having this option would let them move over to using tmpfs
> for those small use cases, see systemd-creds(1).

Thanks for your thorough work on tmpfs "noswap": seems well-received
by quite a few others, that's good.

I've just a few comments on later patches (I don't understand why you
went into those little rearrangements at the start of shmem_writepage(),
but they seem harmless so I don't object), but wanted to ask here:

You say "a common complaint about tmpfs is the inability to work without
the page cache".  Ehh?  I don't understand that at all, and have never
heard such a complaint.  It doesn't affect the series itself (oh, Andrew
has copied that text into the first patch), but please illuminate!

Thanks,
Hugh
Luis Chamberlain April 18, 2023, 8:55 p.m. UTC | #8
On Mon, Apr 17, 2023 at 09:31:20PM -0700, Hugh Dickins wrote:
> On Thu, 9 Mar 2023, Luis Chamberlain wrote:
> 
> > I'm doing this work as part of future experimentation with tmpfs and the
> > page cache, but given a common complaint found about tmpfs is the
> > innability to work without the page cache I figured this might be useful
> > to others. It turns out it is -- at least Christian Brauner indicates
> > systemd uses ramfs for a few use-cases because they don't want to use
> > swap and so having this option would let them move over to using tmpfs
> > for those small use cases, see systemd-creds(1).
> 
> Thanks for your thorough work on tmpfs "noswap": seems well-received
> by quite a few others, that's good.
> 
> I've just a few comments on later patches (I don't understand why you
> went into those little rearrangements at the start of shmem_writepage(),
> but they seem harmless so I don't object),

Because the devil is in the details as you noted too!

> but wanted to ask here:
> 
> You say "a common complaint about tmpfs is the inability to work without
> the page cache".  Ehh?

That was a mistake! s/page cache/swap.

  Luis