mm/zswap: Improve with alloc_workqueue() call

Message ID	20231211052850.3513230-1-debug.penguin32@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Ronald Monthero <debug.penguin32@gmail.com> To: Cc: sjenning@redhat.com, akpm@linux-foundation.org, Ronald Monthero <debug.penguin32@gmail.com>, Dan Streetman <ddstreet@ieee.org>, Vitaly Wool <vitaly.wool@konsulko.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm/zswap: Improve with alloc_workqueue() call Date: Mon, 11 Dec 2023 15:28:49 +1000 Message-Id: <20231211052850.3513230-1-debug.penguin32@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/zswap: Improve with alloc_workqueue() call \| expand mm/zswap: Improve with alloc_workqueue() call

Ronald Monthero Dec. 11, 2023, 5:28 a.m. UTC

Use alloc_workqueue() to create and set finer
work item attributes instead of create_workqueue()
which is to be deprecated.

Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
---
 mm/zswap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Nhat Pham Dec. 11, 2023, 2:15 p.m. UTC | #1

On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
>
> Use alloc_workqueue() to create and set finer
> work item attributes instead of create_workqueue()
> which is to be deprecated.
>
> Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> ---
>  mm/zswap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 74411dfdad92..64dbe3e944a2 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
>                 zswap_enabled = false;
>         }
>
> -       shrink_wq = create_workqueue("zswap-shrink");
> +       shrink_wq = alloc_workqueue("zswap-shrink",
> +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);

Hmmm this changes the current behavior a bit right? create_workqueue()
is currently defined as:

alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))

I think this should be noted in the changelog, at the very least, even
if it is fine. We should be as explicit as possible about behavior
changes.



>         if (!shrink_wq)
>                 goto fallback_fail;
>
> --
> 2.34.1
>
>

Ronald Monthero Dec. 13, 2023, 1:20 p.m. UTC | #2

Hi Nhat,
Thanks for checking.
On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
> <debug.penguin32@gmail.com> wrote:
> >
> > Use alloc_workqueue() to create and set finer
> > work item attributes instead of create_workqueue()
> > which is to be deprecated.
> >
> > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > ---
> >  mm/zswap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 74411dfdad92..64dbe3e944a2 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> >                 zswap_enabled = false;
> >         }
> >
> > -       shrink_wq = create_workqueue("zswap-shrink");
> > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
>
> Hmmm this changes the current behavior a bit right? create_workqueue()
> is currently defined as:
>
> alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))

create_workqueue is deprecated and it's observed that most of the
subsystems have changed to using alloc_workqueue. So it's a small
minority of few remnant instances in the kernel and some drivers still
using create_workqueue. With the create_workqueue defined as is , it
hardcodes the workqueue  flags to be per cpu and unbound in nature and
not giving the flexibility of using any other wait queue
flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER,
WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers
use the alloc_workqueue( ) api.
#define create_workqueue(name) \
alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))

> I think this should be noted in the changelog, at the very least, even
> if it is fine. We should be as explicit as possible about behavior
> changes.
>
imo, it's sort of known and consistently changed for quite some time already.
https://lists.openwall.net/linux-kernel/2016/06/07/1086
https://lists.openwall.net/linux-kernel/2011/01/03/124
https://lwn.net/Articles/403891/   => quoted: "The long-term plan, it
seems, is to convert all create_workqueue() users over to an
appropriate alloc_workqueue() call; eventually create_workqueue() will
be removed"

glad to take some suggestions , thoughts ?

BR,
ronald

Nhat Pham Dec. 14, 2023, 12:28 a.m. UTC | #3

On Wed, Dec 13, 2023 at 5:20 AM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
>
> Hi Nhat,
> Thanks for checking.
> On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero
> > <debug.penguin32@gmail.com> wrote:
> > >
> > > Use alloc_workqueue() to create and set finer
> > > work item attributes instead of create_workqueue()
> > > which is to be deprecated.
> > >
> > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com>
> > > ---
> > >  mm/zswap.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index 74411dfdad92..64dbe3e944a2 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void)
> > >                 zswap_enabled = false;
> > >         }
> > >
> > > -       shrink_wq = create_workqueue("zswap-shrink");
> > > +       shrink_wq = alloc_workqueue("zswap-shrink",
> > > +                       WQ_UNBOUND|WQ_MEM_RECLAIM, 0);
> >
> > Hmmm this changes the current behavior a bit right? create_workqueue()
> > is currently defined as:
> >
> > alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
>
> create_workqueue is deprecated and it's observed that most of the
> subsystems have changed to using alloc_workqueue. So it's a small
> minority of few remnant instances in the kernel and some drivers still
> using create_workqueue. With the create_workqueue defined as is , it
> hardcodes the workqueue  flags to be per cpu and unbound in nature and
> not giving the flexibility of using any other wait queue
> flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER,
> WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers
> use the alloc_workqueue( ) api.
> #define create_workqueue(name) \
> alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
>
> > I think this should be noted in the changelog, at the very least, even
> > if it is fine. We should be as explicit as possible about behavior
> > changes.
> >
> imo, it's sort of known and consistently changed for quite some time already.
> https://lists.openwall.net/linux-kernel/2016/06/07/1086
> https://lists.openwall.net/linux-kernel/2011/01/03/124
> https://lwn.net/Articles/403891/   => quoted: "The long-term plan, it
> seems, is to convert all create_workqueue() users over to an
> appropriate alloc_workqueue() call; eventually create_workqueue() will
> be removed"
>
> glad to take some suggestions , thoughts ?
>
> BR,
> ronald

I should have been clearer. I'm not against the change per-se - I
agree that we should replace create_workqueue() with
alloc_workqueue(). What I meant was, IIUC, there are two behavioral
changes with this new workqueue creation:

a) We're replacing a bounded workqueue (which as you noted, is fixed
by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems
fine to me - I doubt locality buys us much here.

b) create_workqueue() limits the number of concurrent per-cpu
execution contexts at 1 (i.e only one single global reclaimer),
whereas after this patch this is set to the default value. This seems
fine to me too - I don't remember us taking advantage of the previous
concurrency limitation. Also, in practice, the task_struct is
one-to-one with the zswap_pool's anyway, and most of the time, there
is just a single pool being used. (But it begs the question - what's
the point of using 0 instead of 1 here?)

Both seem fine (to me anyway - other reviewers feel free to take a
look and fact-check everything). I just feel like this should be
explicitly noted in the changelog, IMHO, in case we are mistaken and
need to revisit this :) Either way, not a NACK from me.

Nhat Pham Dec. 14, 2023, 1:02 a.m. UTC | #4

On Wed, Dec 13, 2023 at 4:28 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> concurrency limitation. Also, in practice, the task_struct is
/s/task/work.
I was looking at some articles about tasks recently, so my brain just
auto-completed. I was referring to pool->shrink_work here.

Ronald Monthero Dec. 15, 2023, 9:03 a.m. UTC | #5

On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
..
< snipped >

> I should have been clearer. I'm not against the change per-se - I
> agree that we should replace create_workqueue() with
> alloc_workqueue(). What I meant was, IIUC, there are two behavioral
> changes with this new workqueue creation:
>
> a) We're replacing a bounded workqueue (which as you noted, is fixed
> by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems
> fine to me - I doubt locality buys us much here.

Yes the workqueue attribute change per se but the existing
functionality remains seamless and the change is more a forward
looking change. imo under a memory pressure scenario an unbound
workqueue might workaround the scenario better as the number of
backing pools is dynamic. And with the WQ_UNBOUND attribute the
scheduler is more open to exercise some improvisations in any
demanding scenarios for offloading cpu time slicing for workers, ie if
any other worker of the same primary cpu had to be served due to
workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and
highpri worker-pools don't interact with each other, the target cpu
atleast need not be the same if our worker for zswap is WQ_UNBOUND.

Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM
attribute, so is there a rescue worker for zswap during a memory
pressure scenario ?
Quoting: "All work items which might be used on code paths that handle
memory reclaim are required to be queued on wq's that have a
rescue-worker reserved for execution under memory pressure. Else it is
possible that the worker-pool deadlocks waiting for execution contexts
to free up"

Also additional thought if adding WQ_FREEZABLE attribute while
creating the zswap worker make sense in scenarios to handle freeze and
unfreeze of specific cgroups or file system wide freeze and unfreeze
scenarios ? Does zswap worker participate in freeze/unfreeze code path
scenarios ?

> b) create_workqueue() limits the number of concurrent per-cpu
> execution contexts at 1 (i.e only one single global reclaimer),
> whereas after this patch this is set to the default value. This seems
> fine to me too - I don't remember us taking advantage of the previous
> concurrency limitation. Also, in practice, the task_struct is
> one-to-one with the zswap_pool's anyway, and most of the time, there
> is just a single pool being used. (But it begs the question - what's
> the point of using 0 instead of 1 here?)

Nothing in particular but I left it at default 0, which can go upto
256 ( @maxactive per cpu).
But if zswap worker is always intended to only have 1 active worker per cpu,
then that's fine with 1, otherwise a default setting might be flexible
for scaling.
just a thought, does having a higher value help for larger memory systems  ?

> Both seem fine (to me anyway - other reviewers feel free to take a
> look and fact-check everything). I just feel like this should be
> explicitly noted in the changelog, IMHO, in case we are mistaken and
> need to revisit this :) Either way, not a NACK from me.

Thanks Nhat, for checking. Above are my thoughts, I could be missing
some info or incorrect
on certain fronts so I would seek clarifications.
Also thanks in advance to other experts/maintainers, please share
feedback and suggestions.

BR,
ronald

Nhat Pham Dec. 20, 2023, 12:21 a.m. UTC | #6

On Fri, Dec 15, 2023 at 1:04 AM Ronald Monthero
<debug.penguin32@gmail.com> wrote:
>
> On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> ..
> < snipped >
>
> > I should have been clearer. I'm not against the change per-se - I
> > agree that we should replace create_workqueue() with
> > alloc_workqueue(). What I meant was, IIUC, there are two behavioral
> > changes with this new workqueue creation:
> >
> > a) We're replacing a bounded workqueue (which as you noted, is fixed
> > by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems
> > fine to me - I doubt locality buys us much here.
>
> Yes the workqueue attribute change per se but the existing
> functionality remains seamless and the change is more a forward
> looking change. imo under a memory pressure scenario an unbound
> workqueue might workaround the scenario better as the number of
> backing pools is dynamic. And with the WQ_UNBOUND attribute the
> scheduler is more open to exercise some improvisations in any
> demanding scenarios for offloading cpu time slicing for workers, ie if
> any other worker of the same primary cpu had to be served due to
> workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and
> highpri worker-pools don't interact with each other, the target cpu
> atleast need not be the same if our worker for zswap is WQ_UNBOUND.

I don't object to the change per-se. I just meant that these
changes/discussion should be spelled out in the patch's changelog :)
IMHO, we should document behavior changes if there are any. For
instance, when we switched to kmap_local_page() for zswap, there is a
discussion in the changelog regarding how it differs from the old (and
deprecated) kmap_atomic():

https://lore.kernel.org/linux-mm/20231127160058.586446-1-fabio.maria.de.francesco@linux.intel.com/

and how that difference doesn't matter in the case of zswap.

>
> Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM
> attribute, so is there a rescue worker for zswap during a memory
> pressure scenario ?
> Quoting: "All work items which might be used on code paths that handle
> memory reclaim are required to be queued on wq's that have a
> rescue-worker reserved for execution under memory pressure. Else it is
> possible that the worker-pool deadlocks waiting for execution contexts
> to free up"

Seems like it, but this behavior is not changed by your patch :) So
i'm not too concerned by it one way or another.

>
> Also additional thought if adding WQ_FREEZABLE attribute while
> creating the zswap worker make sense in scenarios to handle freeze and
> unfreeze of specific cgroups or file system wide freeze and unfreeze
> scenarios ? Does zswap worker participate in freeze/unfreeze code path
> scenarios ?

I don't think so, no? This zswap worker is scheduled upon the zswap
pool limit hit (which happens on the zswap store/swapping/memory
reclaim) path.

>
> > b) create_workqueue() limits the number of concurrent per-cpu
> > execution contexts at 1 (i.e only one single global reclaimer),
> > whereas after this patch this is set to the default value. This seems
> > fine to me too - I don't remember us taking advantage of the previous
> > concurrency limitation. Also, in practice, the task_struct is
> > one-to-one with the zswap_pool's anyway, and most of the time, there
> > is just a single pool being used. (But it begs the question - what's
> > the point of using 0 instead of 1 here?)
>
> Nothing in particular but I left it at default 0, which can go upto
> 256 ( @maxactive per cpu).
> But if zswap worker is always intended to only have 1 active worker per cpu,
> then that's fine with 1, otherwise a default setting might be flexible
> for scaling.
> just a thought, does having a higher value help for larger memory systems  ?

I don't think having higher value helps here tbh. We only have one
work_struct per pool, so it shouldn't make a difference either way :)

>
> > Both seem fine (to me anyway - other reviewers feel free to take a
> > look and fact-check everything). I just feel like this should be
> > explicitly noted in the changelog, IMHO, in case we are mistaken and
> > need to revisit this :) Either way, not a NACK from me.
>
> Thanks Nhat, for checking. Above are my thoughts, I could be missing
> some info or incorrect
> on certain fronts so I would seek clarifications.
> Also thanks in advance to other experts/maintainers, please share
> feedback and suggestions.
>
> BR,
> ronald

Also +Chris Li, who is also working on improving zswap :)

mm/zswap: Improve with alloc_workqueue() call

Commit Message

Comments

Patch