[RFC,2/2] shmem: add support to ignore swap

Message ID	20230207025259.2522793-3-mcgrof@kernel.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Luis Chamberlain <mcgrof@kernel.org> To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, dave@stgolabs.net, a.manzanares@samsung.com, linux-kernel@vger.kernel.org, Luis Chamberlain <mcgrof@kernel.org> Subject: [RFC 2/2] shmem: add support to ignore swap Date: Mon, 6 Feb 2023 18:52:59 -0800 Message-Id: <20230207025259.2522793-3-mcgrof@kernel.org> In-Reply-To: <20230207025259.2522793-1-mcgrof@kernel.org> References: <20230207025259.2522793-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	tmpfs: add the option to disable swap \| expand [RFC,0/2] tmpfs: add the option to disable swap [RFC,1/2] shmem: set shmem_writepage() variables early [RFC,2/2] shmem: add support to ignore swap

Luis Chamberlain Feb. 7, 2023, 2:52 a.m. UTC

In doing experimentations with shmem having the option to avoid becomes
a useful mechanism. One of the *raves* about brd over shmem is you can
avoid swap, but that's not really a good reason to use brd if we can
instead use shmem. Using brd has its own good reasons to exist, but
just because "tmpfs" doesn't let you do that is not a great reason
to avoid it if we can easily add support for it.

I don't add support for reconfiguring incompatible options, but if
we really wanted to we can add support for that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/shmem_fs.h |  1 +
 mm/shmem.c               | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Matthew Wilcox Feb. 7, 2023, 4:01 a.m. UTC | #1

On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	struct shmem_inode_info *info;
>  	struct address_space *mapping = folio->mapping;
>  	struct inode *inode = mapping->host;
> +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  	swp_entry_t swap;
>  	pgoff_t index;
>  
>  	BUG_ON(!folio_test_locked(folio));
>  
> +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> +		return AOP_WRITEPAGE_ACTIVATE;

Not sure this is the best way to handle this.  We'll still incur the
oevrhead of tracking shmem pages on the LRU, only to fail to write them
out when the VM thinks we should get rid of them.  We'd be better off
not putting them on the LRU in the first place.

Luis Chamberlain Feb. 8, 2023, 4:01 p.m. UTC | #2

On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >  	struct shmem_inode_info *info;
> >  	struct address_space *mapping = folio->mapping;
> >  	struct inode *inode = mapping->host;
> > +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> >  	swp_entry_t swap;
> >  	pgoff_t index;
> >  
> >  	BUG_ON(!folio_test_locked(folio));
> >  
> > +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > +		return AOP_WRITEPAGE_ACTIVATE;
> 
> Not sure this is the best way to handle this.  We'll still incur the
> oevrhead of tracking shmem pages on the LRU, only to fail to write them
> out when the VM thinks we should get rid of them.  We'd be better off
> not putting them on the LRU in the first place.

Ah, makes sense, so in effect then if we do that then on reclaim
we should be able to even WARN_ON(sbinfo->noswap) assuming we did
everthing right.

Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
too late how about d_mark_dontcache() on shmem_get_inode() instead?

  Luis

Matthew Wilcox Feb. 8, 2023, 5:45 p.m. UTC | #3

On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > >  	struct shmem_inode_info *info;
> > >  	struct address_space *mapping = folio->mapping;
> > >  	struct inode *inode = mapping->host;
> > > +	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > >  	swp_entry_t swap;
> > >  	pgoff_t index;
> > >  
> > >  	BUG_ON(!folio_test_locked(folio));
> > >  
> > > +	if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > +		return AOP_WRITEPAGE_ACTIVATE;
> > 
> > Not sure this is the best way to handle this.  We'll still incur the
> > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > out when the VM thinks we should get rid of them.  We'd be better off
> > not putting them on the LRU in the first place.
> 
> Ah, makes sense, so in effect then if we do that then on reclaim
> we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> everthing right.
> 
> Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> too late how about d_mark_dontcache() on shmem_get_inode() instead?

I was thinking that the two calls to folio_add_lru() in mm/shmem.c
should be conditional on sbinfo->noswap.

Yosry Ahmed Feb. 8, 2023, 8:33 p.m. UTC | #4

On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > >   struct shmem_inode_info *info;
> > > >   struct address_space *mapping = folio->mapping;
> > > >   struct inode *inode = mapping->host;
> > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > >   swp_entry_t swap;
> > > >   pgoff_t index;
> > > >
> > > >   BUG_ON(!folio_test_locked(folio));
> > > >
> > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > >
> > > Not sure this is the best way to handle this.  We'll still incur the
> > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > out when the VM thinks we should get rid of them.  We'd be better off
> > > not putting them on the LRU in the first place.
> >
> > Ah, makes sense, so in effect then if we do that then on reclaim
> > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > everthing right.
> >
> > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > too late how about d_mark_dontcache() on shmem_get_inode() instead?
>
> I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> should be conditional on sbinfo->noswap.
>

Wouldn't this cause the folio to not show up in any lru lists, even
the unevictable one, which may be a strange discrepancy?

Perhaps we can do something like shmem_lock(), which calls
mapping_set_unevictable(), which will make folio_evictable() return
true and the LRUs code will take care of the rest?

Luis Chamberlain Feb. 23, 2023, 12:53 a.m. UTC | #5

On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > >   struct shmem_inode_info *info;
> > > > >   struct address_space *mapping = folio->mapping;
> > > > >   struct inode *inode = mapping->host;
> > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > >   swp_entry_t swap;
> > > > >   pgoff_t index;
> > > > >
> > > > >   BUG_ON(!folio_test_locked(folio));
> > > > >
> > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > >
> > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > not putting them on the LRU in the first place.
> > >
> > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > everthing right.
> > >
> > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> >
> > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > should be conditional on sbinfo->noswap.
> >
> 
> Wouldn't this cause the folio to not show up in any lru lists, even
> the unevictable one, which may be a strange discrepancy?
> 
> Perhaps we can do something like shmem_lock(), which calls
> mapping_set_unevictable(), which will make folio_evictable() return
> true and the LRUs code will take care of the rest?

If shmem_lock() should take care of that is that because writepages()
should not happen or because we have that info->flags & VM_LOCKED stop
gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
if writepages() is called on info->flags & VM_LOCKED?

While I see the value in mapping_set_unevictable() I am not sure I see
the point in using shmem_lock(). I don't see why we should constrain
noswap tmpfs option to RLIMIT_MEMLOCK

Please correct me if I'm wrong but the limit seem to be designed for
files / IPC / unprivileged perf limits. On the contrary, we'd bump the
count for each new inode. Using shmem_lock() would  also complicate the
inode allocation on shmem as we'd have to unwind on failure from the
user_shm_lock(). It would also beg the question of when to capture a
ucount for an inode, should we just share one for the superblock at
shmem_fill_super() or do we really need to capture it at every single
inode creation? In theory we could end up with different limits.    

So why not just use mapping_set_unevictable() alone for this use case?

  Luis

Yosry Ahmed Feb. 23, 2023, 1:04 a.m. UTC | #6

On Wed, Feb 22, 2023 at 4:53 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> > On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > > >   struct shmem_inode_info *info;
> > > > > >   struct address_space *mapping = folio->mapping;
> > > > > >   struct inode *inode = mapping->host;
> > > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > > >   swp_entry_t swap;
> > > > > >   pgoff_t index;
> > > > > >
> > > > > >   BUG_ON(!folio_test_locked(folio));
> > > > > >
> > > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > > >
> > > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > > not putting them on the LRU in the first place.
> > > >
> > > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > > everthing right.
> > > >
> > > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> > >
> > > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > > should be conditional on sbinfo->noswap.
> > >
> >
> > Wouldn't this cause the folio to not show up in any lru lists, even
> > the unevictable one, which may be a strange discrepancy?
> >
> > Perhaps we can do something like shmem_lock(), which calls
> > mapping_set_unevictable(), which will make folio_evictable() return
> > true and the LRUs code will take care of the rest?
>
> If shmem_lock() should take care of that is that because writepages()
> should not happen or because we have that info->flags & VM_LOCKED stop
> gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
> if writepages() is called on info->flags & VM_LOCKED?
>
> While I see the value in mapping_set_unevictable() I am not sure I see
> the point in using shmem_lock(). I don't see why we should constrain
> noswap tmpfs option to RLIMIT_MEMLOCK
>
> Please correct me if I'm wrong but the limit seem to be designed for
> files / IPC / unprivileged perf limits. On the contrary, we'd bump the
> count for each new inode. Using shmem_lock() would  also complicate the
> inode allocation on shmem as we'd have to unwind on failure from the
> user_shm_lock(). It would also beg the question of when to capture a
> ucount for an inode, should we just share one for the superblock at
> shmem_fill_super() or do we really need to capture it at every single
> inode creation? In theory we could end up with different limits.
>
> So why not just use mapping_set_unevictable() alone for this use case?

Sorry if I wasn't clear, I did NOT mean that we should use
shmem_lock(), I meant that we do something similar to what
shmem_lock() does and use mapping_set_unevictable() or similar.

I think we just need to make sure that if we use
mapping_set_unevictable() does not imply that shmem_lock() was used
(i.e no code assumes that if the shmem mapping is unevictable then
shmem_lock() was used).

Anyway, I am not very knowledgeable here so take anything I say with a
grain of salt.
Thanks.

>
>   Luis

Luis Chamberlain Feb. 23, 2023, 1:35 a.m. UTC | #7

On Wed, Feb 22, 2023 at 05:04:32PM -0800, Yosry Ahmed wrote:
> On Wed, Feb 22, 2023 at 4:53 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote:
> > > On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote:
> > > > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote:
> > > > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote:
> > > > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > > > > > >   struct shmem_inode_info *info;
> > > > > > >   struct address_space *mapping = folio->mapping;
> > > > > > >   struct inode *inode = mapping->host;
> > > > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > > > > > >   swp_entry_t swap;
> > > > > > >   pgoff_t index;
> > > > > > >
> > > > > > >   BUG_ON(!folio_test_locked(folio));
> > > > > > >
> > > > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap))
> > > > > > > +         return AOP_WRITEPAGE_ACTIVATE;
> > > > > >
> > > > > > Not sure this is the best way to handle this.  We'll still incur the
> > > > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them
> > > > > > out when the VM thinks we should get rid of them.  We'd be better off
> > > > > > not putting them on the LRU in the first place.
> > > > >
> > > > > Ah, makes sense, so in effect then if we do that then on reclaim
> > > > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did
> > > > > everthing right.
> > > > >
> > > > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit
> > > > > too late how about d_mark_dontcache() on shmem_get_inode() instead?
> > > >
> > > > I was thinking that the two calls to folio_add_lru() in mm/shmem.c
> > > > should be conditional on sbinfo->noswap.
> > > >
> > >
> > > Wouldn't this cause the folio to not show up in any lru lists, even
> > > the unevictable one, which may be a strange discrepancy?
> > >
> > > Perhaps we can do something like shmem_lock(), which calls
> > > mapping_set_unevictable(), which will make folio_evictable() return
> > > true and the LRUs code will take care of the rest?
> >
> > If shmem_lock() should take care of that is that because writepages()
> > should not happen or because we have that info->flags & VM_LOCKED stop
> > gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE()
> > if writepages() is called on info->flags & VM_LOCKED?
> >
> > While I see the value in mapping_set_unevictable() I am not sure I see
> > the point in using shmem_lock(). I don't see why we should constrain
> > noswap tmpfs option to RLIMIT_MEMLOCK
> >
> > Please correct me if I'm wrong but the limit seem to be designed for
> > files / IPC / unprivileged perf limits. On the contrary, we'd bump the
> > count for each new inode. Using shmem_lock() would  also complicate the
> > inode allocation on shmem as we'd have to unwind on failure from the
> > user_shm_lock(). It would also beg the question of when to capture a
> > ucount for an inode, should we just share one for the superblock at
> > shmem_fill_super() or do we really need to capture it at every single
> > inode creation? In theory we could end up with different limits.
> >
> > So why not just use mapping_set_unevictable() alone for this use case?
> 
> Sorry if I wasn't clear, I did NOT mean that we should use
> shmem_lock(), I meant that we do something similar to what
> shmem_lock() does and use mapping_set_unevictable() or similar.

Ah OK! Sure yeah I reviewed shmem_lock() usage and I don't think it
and its rtlimit baggage makes sense here so the only thing to do is
just mapping_set_unevictable().

> I think we just need to make sure that if we use
> mapping_set_unevictable() does not imply that shmem_lock() was used
> (i.e no code assumes that if the shmem mapping is unevictable then
> shmem_lock() was used).

The *other* stuff that shmem_lock() does is rlimit rlimit related
to RLIMIT_MEMLOCK, I can't think off hand why we'd confuse the two
use cases at the moment, but I'll give it another good luck with this
in mind.

I'll test what I have and post a v2 with the feedback received.

Thanks,

  Luis

[RFC,2/2] shmem: add support to ignore swap

Commit Message

Comments

Patch