[09/28] mm: directed shrinker work deferral

Message ID	20191031234618.15403-10-david@fromorbit.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3TFX=YY=vger.kernel.org=linux-fsdevel-owner@kernel.org> From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/28] mm: directed shrinker work deferral Date: Fri, 1 Nov 2019 10:45:59 +1100 Message-Id: <20191031234618.15403-10-david@fromorbit.com> In-Reply-To: <20191031234618.15403-1-david@fromorbit.com> References: <20191031234618.15403-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	mm, xfs: non-blocking inode reclaim \| expand [00/28] mm, xfs: non-blocking inode reclaim [01/28] xfs: Lower CIL flush limit for large logs [02/28] xfs: Throttle commits on delayed background CIL push [03/28] xfs: don't allow log IO to be throttled [04/28] xfs: Improve metadata buffer reclaim accountability [05/28] xfs: correctly acount for reclaimable slabs [06/28] xfs: factor common AIL item deletion code [07/28] xfs: tail updates only need to occur when LSN changes [08/28] xfs: factor inode lookup from xfs_ifree_cluster [09/28] mm: directed shrinker work deferral [10/28] shrinkers: use defer_work for GFP_NOFS sensitive shrinkers [11/28] mm: factor shrinker work calculations [12/28] shrinker: defer work only to kswapd [13/28] shrinker: clean up variable types and tracepoints [14/28] mm: reclaim_state records pages reclaimed, not slabs [15/28] mm: back off direct reclaim on excessive shrinker deferral [16/28] mm: kswapd backoff for shrinkers [17/28] xfs: synchronous AIL pushing [18/28] xfs: don't block kswapd in inode reclaim [19/28] xfs: reduce kswapd blocking on inode locking. [20/28] xfs: kill background reclaim work [21/28] xfs: use AIL pushing for inode reclaim IO [22/28] xfs: remove mode from xfs_reclaim_inodes() [23/28] xfs: track reclaimable inodes using a LRU list [24/28] xfs: reclaim inodes from the LRU [25/28] xfs: remove unusued old inode reclaim code [26/28] xfs: use xfs_ail_push_all in xfs_reclaim_inodes [27/28] rwsem: introduce down/up_write_non_owner [28/28] xfs: rework unreferenced inode lookups

Dave Chinner Oct. 31, 2019, 11:45 p.m. UTC

From: Dave Chinner <dchinner@redhat.com>

Introduce a mechanism for ->count_objects() to indicate to the
shrinker infrastructure that the reclaim context will not allow
scanning work to be done and so the work it decides is necessary
needs to be deferred.

This simplifies the code by separating out the accounting of
deferred work from the actual doing of the work, and allows better
decisions to be made by the shrinekr control logic on what action it
can take.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/shrinker.h | 7 +++++++
 mm/vmscan.c              | 8 ++++++++
 2 files changed, 15 insertions(+)

Brian Foster Nov. 4, 2019, 3:25 p.m. UTC | #1

On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Introduce a mechanism for ->count_objects() to indicate to the
> shrinker infrastructure that the reclaim context will not allow
> scanning work to be done and so the work it decides is necessary
> needs to be deferred.
> 
> This simplifies the code by separating out the accounting of
> deferred work from the actual doing of the work, and allows better
> decisions to be made by the shrinekr control logic on what action it
> can take.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

My understanding from the previous discussion(s) is that this is not
tied directly to the gfp mask because that is not the only intended use.
While it is currently a boolean tied to the the entire shrinker call,
the longer term objective is per-object granularity.

I find the argument reasonable enough, but if the above is true, why do
we move these checks from ->scan_objects() to ->count_objects() (in the
next patch) when per-object decisions will ultimately need to be made by
the former? That seems like unnecessary churn and inconsistent with the
argument against just temporarily doing something like what Christoph
suggested in the previous version, particularly since IIRC the only use
in this series was for gfp mask purposes.

>  include/linux/shrinker.h | 7 +++++++
>  mm/vmscan.c              | 8 ++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 0f80123650e2..3405c39ab92c 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -31,6 +31,13 @@ struct shrink_control {
>  
>  	/* current memcg being shrunk (for memcg aware shrinkers) */
>  	struct mem_cgroup *memcg;
> +
> +	/*
> +	 * set by ->count_objects if reclaim context prevents reclaim from
> +	 * occurring. This allows the shrinker to immediately defer all the
> +	 * work and not even attempt to scan the cache.
> +	 */
> +	bool defer_work;
>  };
>  
>  #define SHRINK_STOP (~0UL)
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ee4eecc7e1c2..a215d71d9d4b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -536,6 +536,13 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  	trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
>  				   freeable, delta, total_scan, priority);
>  
> +	/*
> +	 * If the shrinker can't run (e.g. due to gfp_mask constraints), then
> +	 * defer the work to a context that can scan the cache.
> +	 */
> +	if (shrinkctl->defer_work)
> +		goto done;
> +

I still find the fact that this per-shrinker invocation field is never
reset unnecessarily fragile, and I don't see any good reason not to
reset it prior to the shrinker callback that potentially sets it.

Brian

>  	/*
>  	 * Normally, we should not scan less than batch_size objects in one
>  	 * pass to avoid too frequent shrinker calls, but if the slab has less
> @@ -570,6 +577,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		cond_resched();
>  	}
>  
> +done:
>  	if (next_deferred >= scanned)
>  		next_deferred -= scanned;
>  	else
> -- 
> 2.24.0.rc0
>

Dave Chinner Nov. 14, 2019, 8:49 p.m. UTC | #2

On Mon, Nov 04, 2019 at 10:25:25AM -0500, Brian Foster wrote:
> On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Introduce a mechanism for ->count_objects() to indicate to the
> > shrinker infrastructure that the reclaim context will not allow
> > scanning work to be done and so the work it decides is necessary
> > needs to be deferred.
> > 
> > This simplifies the code by separating out the accounting of
> > deferred work from the actual doing of the work, and allows better
> > decisions to be made by the shrinekr control logic on what action it
> > can take.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> My understanding from the previous discussion(s) is that this is not
> tied directly to the gfp mask because that is not the only intended use.
> While it is currently a boolean tied to the the entire shrinker call,
> the longer term objective is per-object granularity.

Longer term, yes, but right now such things are not possible as the
shrinker needs more context to be able to make sane per-object
decisions. shrinker policy decisions that affect the entire run
scope should be handled by the ->count operation - it's the one that
says whether the scan loop should run or not, and right now GFP_NOFS
for all filesystem shrinkers is a pure boolean policy
implementation.

The next future step is to provide a superblock context with
GFP_NOFS to indicate which filesystem we cannot recurse into. That
is also a shrinker instance wide check, so again it's something that
->count should be deciding.

i.e. ->count determines what is to be done, ->scan iterates the work
that has to be done until we are done.

> I find the argument reasonable enough, but if the above is true, why do
> we move these checks from ->scan_objects() to ->count_objects() (in the
> next patch) when per-object decisions will ultimately need to be made by
> the former?

Because run/no-run policy belongs in one place, and things like
GFP_NOFS do no change across calls to the ->scan loop. i.e. after
the first ->scan call in a loop that calls it hundreds to thousands
of times, the GFP_NOFS run/no-run check is completely redundant.

Once we introduce a new policy that allows the fs shrinker to do
careful reclaim in GFP_NOFS conditions, we need to do substantial
rework the shrinker scan loop and how it accounts the work that is
done - we now have at least 3 or 4 different return counters
(skipped because locked, skipped because referenced,
reclaimed, deferred reclaim because couldn't lock/recursion) and
the accounting and decisions to be made are a lot more complex.

In that case, the ->count function will drop the GFP_NOFS check, but
still do all the other things is needs to do. The GFP_NOFS check
will go deep in the guts of the shrinker scan implementation where
the per-object recursion problem exists. But for most shrinkers,
it's still going to be a global boolean check...

> That seems like unnecessary churn and inconsistent with the
> argument against just temporarily doing something like what Christoph
> suggested in the previous version, particularly since IIRC the only use
> in this series was for gfp mask purposes.

If people want to call avoiding repeated, unnecessary evaluation of
the same condition hundreds of times instead of once "unnecessary
churn", then I'll drop it.

> >  include/linux/shrinker.h | 7 +++++++
> >  mm/vmscan.c              | 8 ++++++++
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> > index 0f80123650e2..3405c39ab92c 100644
> > --- a/include/linux/shrinker.h
> > +++ b/include/linux/shrinker.h
> > @@ -31,6 +31,13 @@ struct shrink_control {
> >  
> >  	/* current memcg being shrunk (for memcg aware shrinkers) */
> >  	struct mem_cgroup *memcg;
> > +
> > +	/*
> > +	 * set by ->count_objects if reclaim context prevents reclaim from
> > +	 * occurring. This allows the shrinker to immediately defer all the
> > +	 * work and not even attempt to scan the cache.
> > +	 */
> > +	bool defer_work;
> >  };
> >  
> >  #define SHRINK_STOP (~0UL)
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ee4eecc7e1c2..a215d71d9d4b 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -536,6 +536,13 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >  	trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
> >  				   freeable, delta, total_scan, priority);
> >  
> > +	/*
> > +	 * If the shrinker can't run (e.g. due to gfp_mask constraints), then
> > +	 * defer the work to a context that can scan the cache.
> > +	 */
> > +	if (shrinkctl->defer_work)
> > +		goto done;
> > +
> 
> I still find the fact that this per-shrinker invocation field is never
> reset unnecessarily fragile, and I don't see any good reason not to
> reset it prior to the shrinker callback that potentially sets it.

I missed that when updating. I'll reset it in the next version.

-Dave.

Brian Foster Nov. 15, 2019, 5:21 p.m. UTC | #3

On Fri, Nov 15, 2019 at 07:49:26AM +1100, Dave Chinner wrote:
> On Mon, Nov 04, 2019 at 10:25:25AM -0500, Brian Foster wrote:
> > On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Introduce a mechanism for ->count_objects() to indicate to the
> > > shrinker infrastructure that the reclaim context will not allow
> > > scanning work to be done and so the work it decides is necessary
> > > needs to be deferred.
> > > 
> > > This simplifies the code by separating out the accounting of
> > > deferred work from the actual doing of the work, and allows better
> > > decisions to be made by the shrinekr control logic on what action it
> > > can take.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > My understanding from the previous discussion(s) is that this is not
> > tied directly to the gfp mask because that is not the only intended use.
> > While it is currently a boolean tied to the the entire shrinker call,
> > the longer term objective is per-object granularity.
> 
> Longer term, yes, but right now such things are not possible as the
> shrinker needs more context to be able to make sane per-object
> decisions. shrinker policy decisions that affect the entire run
> scope should be handled by the ->count operation - it's the one that
> says whether the scan loop should run or not, and right now GFP_NOFS
> for all filesystem shrinkers is a pure boolean policy
> implementation.
> 
> The next future step is to provide a superblock context with
> GFP_NOFS to indicate which filesystem we cannot recurse into. That
> is also a shrinker instance wide check, so again it's something that
> ->count should be deciding.
> 
> i.e. ->count determines what is to be done, ->scan iterates the work
> that has to be done until we are done.
> 

Sure, makes sense in general.

> > I find the argument reasonable enough, but if the above is true, why do
> > we move these checks from ->scan_objects() to ->count_objects() (in the
> > next patch) when per-object decisions will ultimately need to be made by
> > the former?
> 
> Because run/no-run policy belongs in one place, and things like
> GFP_NOFS do no change across calls to the ->scan loop. i.e. after
> the first ->scan call in a loop that calls it hundreds to thousands
> of times, the GFP_NOFS run/no-run check is completely redundant.
> 

What loop is currently called hundreds to thousands of times that this
change prevents? AFAICT the current nofs checks in the ->scan calls
explicitly terminate the scan loop. So we're effectively saving a
function call by doing this earlier in the count ->call. (Nothing wrong
with that, I'm just not following the numbers used in this reasoning..).

> Once we introduce a new policy that allows the fs shrinker to do
> careful reclaim in GFP_NOFS conditions, we need to do substantial
> rework the shrinker scan loop and how it accounts the work that is
> done - we now have at least 3 or 4 different return counters
> (skipped because locked, skipped because referenced,
> reclaimed, deferred reclaim because couldn't lock/recursion) and
> the accounting and decisions to be made are a lot more complex.
> 

Yeah, that's generally what I expected from your previous description.

> In that case, the ->count function will drop the GFP_NOFS check, but
> still do all the other things is needs to do. The GFP_NOFS check
> will go deep in the guts of the shrinker scan implementation where
> the per-object recursion problem exists. But for most shrinkers,
> it's still going to be a global boolean check...
> 

So once the nofs checks are lifted out of the ->count callback and into
the core shrinker, is there still a use case to defer an entire ->count
instance from the callback?

> > That seems like unnecessary churn and inconsistent with the
> > argument against just temporarily doing something like what Christoph
> > suggested in the previous version, particularly since IIRC the only use
> > in this series was for gfp mask purposes.
> 
> If people want to call avoiding repeated, unnecessary evaluation of
> the same condition hundreds of times instead of once "unnecessary
> churn", then I'll drop it.
> 

I'm not referring to the functional change as churn. What I was
referring to is that we're shuffling around the boilerplate gfp checking
code between the different shrinker callbacks, knowing that it's
eventually going to be lifted out, when we could potentially just lift
that code up a level now.

Brian

> > >  include/linux/shrinker.h | 7 +++++++
> > >  mm/vmscan.c              | 8 ++++++++
> > >  2 files changed, 15 insertions(+)
> > > 
> > > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> > > index 0f80123650e2..3405c39ab92c 100644
> > > --- a/include/linux/shrinker.h
> > > +++ b/include/linux/shrinker.h
> > > @@ -31,6 +31,13 @@ struct shrink_control {
> > >  
> > >  	/* current memcg being shrunk (for memcg aware shrinkers) */
> > >  	struct mem_cgroup *memcg;
> > > +
> > > +	/*
> > > +	 * set by ->count_objects if reclaim context prevents reclaim from
> > > +	 * occurring. This allows the shrinker to immediately defer all the
> > > +	 * work and not even attempt to scan the cache.
> > > +	 */
> > > +	bool defer_work;
> > >  };
> > >  
> > >  #define SHRINK_STOP (~0UL)
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index ee4eecc7e1c2..a215d71d9d4b 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -536,6 +536,13 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> > >  	trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
> > >  				   freeable, delta, total_scan, priority);
> > >  
> > > +	/*
> > > +	 * If the shrinker can't run (e.g. due to gfp_mask constraints), then
> > > +	 * defer the work to a context that can scan the cache.
> > > +	 */
> > > +	if (shrinkctl->defer_work)
> > > +		goto done;
> > > +
> > 
> > I still find the fact that this per-shrinker invocation field is never
> > reset unnecessarily fragile, and I don't see any good reason not to
> > reset it prior to the shrinker callback that potentially sets it.
> 
> I missed that when updating. I'll reset it in the next version.
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Dave Chinner Nov. 18, 2019, 12:49 a.m. UTC | #4

On Fri, Nov 15, 2019 at 12:21:40PM -0500, Brian Foster wrote:
> On Fri, Nov 15, 2019 at 07:49:26AM +1100, Dave Chinner wrote:
> > On Mon, Nov 04, 2019 at 10:25:25AM -0500, Brian Foster wrote:
> > > On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Introduce a mechanism for ->count_objects() to indicate to the
> > > > shrinker infrastructure that the reclaim context will not allow
> > > > scanning work to be done and so the work it decides is necessary
> > > > needs to be deferred.
> > > > 
> > > > This simplifies the code by separating out the accounting of
> > > > deferred work from the actual doing of the work, and allows better
> > > > decisions to be made by the shrinekr control logic on what action it
> > > > can take.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > 
> > > My understanding from the previous discussion(s) is that this is not
> > > tied directly to the gfp mask because that is not the only intended use.
> > > While it is currently a boolean tied to the the entire shrinker call,
> > > the longer term objective is per-object granularity.
> > 
> > Longer term, yes, but right now such things are not possible as the
> > shrinker needs more context to be able to make sane per-object
> > decisions. shrinker policy decisions that affect the entire run
> > scope should be handled by the ->count operation - it's the one that
> > says whether the scan loop should run or not, and right now GFP_NOFS
> > for all filesystem shrinkers is a pure boolean policy
> > implementation.
> > 
> > The next future step is to provide a superblock context with
> > GFP_NOFS to indicate which filesystem we cannot recurse into. That
> > is also a shrinker instance wide check, so again it's something that
> > ->count should be deciding.
> > 
> > i.e. ->count determines what is to be done, ->scan iterates the work
> > that has to be done until we are done.
> > 
> 
> Sure, makes sense in general.
> 
> > > I find the argument reasonable enough, but if the above is true, why do
> > > we move these checks from ->scan_objects() to ->count_objects() (in the
> > > next patch) when per-object decisions will ultimately need to be made by
> > > the former?
> > 
> > Because run/no-run policy belongs in one place, and things like
> > GFP_NOFS do no change across calls to the ->scan loop. i.e. after
> > the first ->scan call in a loop that calls it hundreds to thousands
> > of times, the GFP_NOFS run/no-run check is completely redundant.
> > 
> 
> What loop is currently called hundreds to thousands of times that this
> change prevents? AFAICT the current nofs checks in the ->scan calls
> explicitly terminate the scan loop.

Right, but when we are in GFP_KERNEL context, every call to ->scan()
checks it and says "ok". If we are scanning tens of thousands of
objects in a scan, and we are using a befault batch size of 128
objects per scan, then we have hundreds of calls in a single scan
loop that check the GFP context and say "ok"....

> So we're effectively saving a
> function call by doing this earlier in the count ->call. (Nothing wrong
> with that, I'm just not following the numbers used in this reasoning..).

It's the don't terminate case. :)

> > Once we introduce a new policy that allows the fs shrinker to do
> > careful reclaim in GFP_NOFS conditions, we need to do substantial
> > rework the shrinker scan loop and how it accounts the work that is
> > done - we now have at least 3 or 4 different return counters
> > (skipped because locked, skipped because referenced,
> > reclaimed, deferred reclaim because couldn't lock/recursion) and
> > the accounting and decisions to be made are a lot more complex.
> > 
> 
> Yeah, that's generally what I expected from your previous description.
> 
> > In that case, the ->count function will drop the GFP_NOFS check, but
> > still do all the other things is needs to do. The GFP_NOFS check
> > will go deep in the guts of the shrinker scan implementation where
> > the per-object recursion problem exists. But for most shrinkers,
> > it's still going to be a global boolean check...
> > 
> 
> So once the nofs checks are lifted out of the ->count callback and into
> the core shrinker, is there still a use case to defer an entire ->count
> instance from the callback?

Not right now. There may be in future, but I don't want to make
things more complex than they need to be by trying to support
functionality that isn't used.

> > If people want to call avoiding repeated, unnecessary evaluation of
> > the same condition hundreds of times instead of once "unnecessary
> > churn", then I'll drop it.
> > 
> 
> I'm not referring to the functional change as churn. What I was
> referring to is that we're shuffling around the boilerplate gfp checking
> code between the different shrinker callbacks, knowing that it's
> eventually going to be lifted out, when we could potentially just lift
> that code up a level now.

I don't think that lifting it up will save much code at all, once we
add all the gfp mask intialisation to all the shrinkers, etc. It's
just means we can't look at the shrinker implementation and know
that it can't run in GFP_NOFS context - we have to go look up
where it is instantiated instead to see if there are gfp context
constraints.

I think it's better where it is, documenting the constraints the
shrinker implementation runs under in the implementation itself...

Cheers,

Dave.

Brian Foster Nov. 19, 2019, 3:12 p.m. UTC | #5

On Mon, Nov 18, 2019 at 11:49:56AM +1100, Dave Chinner wrote:
> On Fri, Nov 15, 2019 at 12:21:40PM -0500, Brian Foster wrote:
> > On Fri, Nov 15, 2019 at 07:49:26AM +1100, Dave Chinner wrote:
> > > On Mon, Nov 04, 2019 at 10:25:25AM -0500, Brian Foster wrote:
> > > > On Fri, Nov 01, 2019 at 10:45:59AM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Introduce a mechanism for ->count_objects() to indicate to the
> > > > > shrinker infrastructure that the reclaim context will not allow
> > > > > scanning work to be done and so the work it decides is necessary
> > > > > needs to be deferred.
> > > > > 
> > > > > This simplifies the code by separating out the accounting of
> > > > > deferred work from the actual doing of the work, and allows better
> > > > > decisions to be made by the shrinekr control logic on what action it
> > > > > can take.
> > > > > 
> > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > > ---
> > > > 
> > > > My understanding from the previous discussion(s) is that this is not
> > > > tied directly to the gfp mask because that is not the only intended use.
> > > > While it is currently a boolean tied to the the entire shrinker call,
> > > > the longer term objective is per-object granularity.
> > > 
> > > Longer term, yes, but right now such things are not possible as the
> > > shrinker needs more context to be able to make sane per-object
> > > decisions. shrinker policy decisions that affect the entire run
> > > scope should be handled by the ->count operation - it's the one that
> > > says whether the scan loop should run or not, and right now GFP_NOFS
> > > for all filesystem shrinkers is a pure boolean policy
> > > implementation.
> > > 
> > > The next future step is to provide a superblock context with
> > > GFP_NOFS to indicate which filesystem we cannot recurse into. That
> > > is also a shrinker instance wide check, so again it's something that
> > > ->count should be deciding.
> > > 
> > > i.e. ->count determines what is to be done, ->scan iterates the work
> > > that has to be done until we are done.
> > > 
> > 
> > Sure, makes sense in general.
> > 
> > > > I find the argument reasonable enough, but if the above is true, why do
> > > > we move these checks from ->scan_objects() to ->count_objects() (in the
> > > > next patch) when per-object decisions will ultimately need to be made by
> > > > the former?
> > > 
> > > Because run/no-run policy belongs in one place, and things like
> > > GFP_NOFS do no change across calls to the ->scan loop. i.e. after
> > > the first ->scan call in a loop that calls it hundreds to thousands
> > > of times, the GFP_NOFS run/no-run check is completely redundant.
> > > 
> > 
> > What loop is currently called hundreds to thousands of times that this
> > change prevents? AFAICT the current nofs checks in the ->scan calls
> > explicitly terminate the scan loop.
> 
> Right, but when we are in GFP_KERNEL context, every call to ->scan()
> checks it and says "ok". If we are scanning tens of thousands of
> objects in a scan, and we are using a befault batch size of 128
> objects per scan, then we have hundreds of calls in a single scan
> loop that check the GFP context and say "ok"....
> 
> > So we're effectively saving a
> > function call by doing this earlier in the count ->call. (Nothing wrong
> > with that, I'm just not following the numbers used in this reasoning..).
> 
> It's the don't terminate case. :)
> 

Oh, I see. You're talking about the number of executions of the gfp
check itself. That makes sense, though my understanding is that we'll
ultimately have a similar check anyways if we want per-object
granularity based on the allocation constraints of the current context.
OTOH, the check would still occur only once with an alloc flags field in
the shrinker structure too, FWIW.

> > > Once we introduce a new policy that allows the fs shrinker to do
> > > careful reclaim in GFP_NOFS conditions, we need to do substantial
> > > rework the shrinker scan loop and how it accounts the work that is
> > > done - we now have at least 3 or 4 different return counters
> > > (skipped because locked, skipped because referenced,
> > > reclaimed, deferred reclaim because couldn't lock/recursion) and
> > > the accounting and decisions to be made are a lot more complex.
> > > 
> > 
> > Yeah, that's generally what I expected from your previous description.
> > 
> > > In that case, the ->count function will drop the GFP_NOFS check, but
> > > still do all the other things is needs to do. The GFP_NOFS check
> > > will go deep in the guts of the shrinker scan implementation where
> > > the per-object recursion problem exists. But for most shrinkers,
> > > it's still going to be a global boolean check...
> > > 
> > 
> > So once the nofs checks are lifted out of the ->count callback and into
> > the core shrinker, is there still a use case to defer an entire ->count
> > instance from the callback?
> 
> Not right now. There may be in future, but I don't want to make
> things more complex than they need to be by trying to support
> functionality that isn't used.
> 

Ok, but do note that the reason I ask is to touch on simply whether it's
worth putting this in the ->scan callback at all. It's not like _not_
doing that is some big complexity adjustment. ;)

> > > If people want to call avoiding repeated, unnecessary evaluation of
> > > the same condition hundreds of times instead of once "unnecessary
> > > churn", then I'll drop it.
> > > 
> > 
> > I'm not referring to the functional change as churn. What I was
> > referring to is that we're shuffling around the boilerplate gfp checking
> > code between the different shrinker callbacks, knowing that it's
> > eventually going to be lifted out, when we could potentially just lift
> > that code up a level now.
> 
> I don't think that lifting it up will save much code at all, once we
> add all the gfp mask intialisation to all the shrinkers, etc. It's
> just means we can't look at the shrinker implementation and know
> that it can't run in GFP_NOFS context - we have to go look up
> where it is instantiated instead to see if there are gfp context
> constraints.
> 
> I think it's better where it is, documenting the constraints the
> shrinker implementation runs under in the implementation itself...
> 

Fair enough.. I don't necessarily agree that this is the best approach,
but the implementation is reasonable enough that I certainly don't
object to it (provided the fragility nits are addressed) and I don't
feel particularly tied to the suggested alternative. At the end of the
day this isn't a lot of code and it's not difficult to change (which it
probably will). I just wanted to make sure the alternative was fairly
considered and to test the reasoning for the approach a bit. I'll
move along from this topic on review of the next version...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

[09/28] mm: directed shrinker work deferral

Commit Message

Comments

Patch