Message ID | 20231211052850.3513230-1-debug.penguin32@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/zswap: Improve with alloc_workqueue() call | expand |
On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero <debug.penguin32@gmail.com> wrote: > > Use alloc_workqueue() to create and set finer > work item attributes instead of create_workqueue() > which is to be deprecated. > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com> > --- > mm/zswap.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 74411dfdad92..64dbe3e944a2 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1620,7 +1620,8 @@ static int zswap_setup(void) > zswap_enabled = false; > } > > - shrink_wq = create_workqueue("zswap-shrink"); > + shrink_wq = alloc_workqueue("zswap-shrink", > + WQ_UNBOUND|WQ_MEM_RECLAIM, 0); Hmmm this changes the current behavior a bit right? create_workqueue() is currently defined as: alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) I think this should be noted in the changelog, at the very least, even if it is fine. We should be as explicit as possible about behavior changes. > if (!shrink_wq) > goto fallback_fail; > > -- > 2.34.1 > >
Hi Nhat, Thanks for checking. On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote: > > On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero > <debug.penguin32@gmail.com> wrote: > > > > Use alloc_workqueue() to create and set finer > > work item attributes instead of create_workqueue() > > which is to be deprecated. > > > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com> > > --- > > mm/zswap.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 74411dfdad92..64dbe3e944a2 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void) > > zswap_enabled = false; > > } > > > > - shrink_wq = create_workqueue("zswap-shrink"); > > + shrink_wq = alloc_workqueue("zswap-shrink", > > + WQ_UNBOUND|WQ_MEM_RECLAIM, 0); > > Hmmm this changes the current behavior a bit right? create_workqueue() > is currently defined as: > > alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) create_workqueue is deprecated and it's observed that most of the subsystems have changed to using alloc_workqueue. So it's a small minority of few remnant instances in the kernel and some drivers still using create_workqueue. With the create_workqueue defined as is , it hardcodes the workqueue flags to be per cpu and unbound in nature and not giving the flexibility of using any other wait queue flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER, WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers use the alloc_workqueue( ) api. #define create_workqueue(name) \ alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) > I think this should be noted in the changelog, at the very least, even > if it is fine. We should be as explicit as possible about behavior > changes. > imo, it's sort of known and consistently changed for quite some time already. https://lists.openwall.net/linux-kernel/2016/06/07/1086 https://lists.openwall.net/linux-kernel/2011/01/03/124 https://lwn.net/Articles/403891/ => quoted: "The long-term plan, it seems, is to convert all create_workqueue() users over to an appropriate alloc_workqueue() call; eventually create_workqueue() will be removed" glad to take some suggestions , thoughts ? BR, ronald
On Wed, Dec 13, 2023 at 5:20 AM Ronald Monthero <debug.penguin32@gmail.com> wrote: > > Hi Nhat, > Thanks for checking. > On Tue, Dec 12, 2023 at 12:16 AM Nhat Pham <nphamcs@gmail.com> wrote: > > > > On Sun, Dec 10, 2023 at 9:31 PM Ronald Monthero > > <debug.penguin32@gmail.com> wrote: > > > > > > Use alloc_workqueue() to create and set finer > > > work item attributes instead of create_workqueue() > > > which is to be deprecated. > > > > > > Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com> > > > --- > > > mm/zswap.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 74411dfdad92..64dbe3e944a2 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1620,7 +1620,8 @@ static int zswap_setup(void) > > > zswap_enabled = false; > > > } > > > > > > - shrink_wq = create_workqueue("zswap-shrink"); > > > + shrink_wq = alloc_workqueue("zswap-shrink", > > > + WQ_UNBOUND|WQ_MEM_RECLAIM, 0); > > > > Hmmm this changes the current behavior a bit right? create_workqueue() > > is currently defined as: > > > > alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) > > create_workqueue is deprecated and it's observed that most of the > subsystems have changed to using alloc_workqueue. So it's a small > minority of few remnant instances in the kernel and some drivers still > using create_workqueue. With the create_workqueue defined as is , it > hardcodes the workqueue flags to be per cpu and unbound in nature and > not giving the flexibility of using any other wait queue > flags/attributes. ( WQ_CPU_INTENSIVE, WQ_HIGHPRI, WQ_RESCUER, > WQ_FREEZEABLE, WQ_UNBOUND) . Hence most of the subsystems and drivers > use the alloc_workqueue( ) api. > #define create_workqueue(name) \ > alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) > > > I think this should be noted in the changelog, at the very least, even > > if it is fine. We should be as explicit as possible about behavior > > changes. > > > imo, it's sort of known and consistently changed for quite some time already. > https://lists.openwall.net/linux-kernel/2016/06/07/1086 > https://lists.openwall.net/linux-kernel/2011/01/03/124 > https://lwn.net/Articles/403891/ => quoted: "The long-term plan, it > seems, is to convert all create_workqueue() users over to an > appropriate alloc_workqueue() call; eventually create_workqueue() will > be removed" > > glad to take some suggestions , thoughts ? > > BR, > ronald I should have been clearer. I'm not against the change per-se - I agree that we should replace create_workqueue() with alloc_workqueue(). What I meant was, IIUC, there are two behavioral changes with this new workqueue creation: a) We're replacing a bounded workqueue (which as you noted, is fixed by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems fine to me - I doubt locality buys us much here. b) create_workqueue() limits the number of concurrent per-cpu execution contexts at 1 (i.e only one single global reclaimer), whereas after this patch this is set to the default value. This seems fine to me too - I don't remember us taking advantage of the previous concurrency limitation. Also, in practice, the task_struct is one-to-one with the zswap_pool's anyway, and most of the time, there is just a single pool being used. (But it begs the question - what's the point of using 0 instead of 1 here?) Both seem fine (to me anyway - other reviewers feel free to take a look and fact-check everything). I just feel like this should be explicitly noted in the changelog, IMHO, in case we are mistaken and need to revisit this :) Either way, not a NACK from me.
On Wed, Dec 13, 2023 at 4:28 PM Nhat Pham <nphamcs@gmail.com> wrote: > > concurrency limitation. Also, in practice, the task_struct is /s/task/work. I was looking at some articles about tasks recently, so my brain just auto-completed. I was referring to pool->shrink_work here.
On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote: > .. < snipped > > I should have been clearer. I'm not against the change per-se - I > agree that we should replace create_workqueue() with > alloc_workqueue(). What I meant was, IIUC, there are two behavioral > changes with this new workqueue creation: > > a) We're replacing a bounded workqueue (which as you noted, is fixed > by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems > fine to me - I doubt locality buys us much here. Yes the workqueue attribute change per se but the existing functionality remains seamless and the change is more a forward looking change. imo under a memory pressure scenario an unbound workqueue might workaround the scenario better as the number of backing pools is dynamic. And with the WQ_UNBOUND attribute the scheduler is more open to exercise some improvisations in any demanding scenarios for offloading cpu time slicing for workers, ie if any other worker of the same primary cpu had to be served due to workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and highpri worker-pools don't interact with each other, the target cpu atleast need not be the same if our worker for zswap is WQ_UNBOUND. Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM attribute, so is there a rescue worker for zswap during a memory pressure scenario ? Quoting: "All work items which might be used on code paths that handle memory reclaim are required to be queued on wq's that have a rescue-worker reserved for execution under memory pressure. Else it is possible that the worker-pool deadlocks waiting for execution contexts to free up" Also additional thought if adding WQ_FREEZABLE attribute while creating the zswap worker make sense in scenarios to handle freeze and unfreeze of specific cgroups or file system wide freeze and unfreeze scenarios ? Does zswap worker participate in freeze/unfreeze code path scenarios ? > b) create_workqueue() limits the number of concurrent per-cpu > execution contexts at 1 (i.e only one single global reclaimer), > whereas after this patch this is set to the default value. This seems > fine to me too - I don't remember us taking advantage of the previous > concurrency limitation. Also, in practice, the task_struct is > one-to-one with the zswap_pool's anyway, and most of the time, there > is just a single pool being used. (But it begs the question - what's > the point of using 0 instead of 1 here?) Nothing in particular but I left it at default 0, which can go upto 256 ( @maxactive per cpu). But if zswap worker is always intended to only have 1 active worker per cpu, then that's fine with 1, otherwise a default setting might be flexible for scaling. just a thought, does having a higher value help for larger memory systems ? > Both seem fine (to me anyway - other reviewers feel free to take a > look and fact-check everything). I just feel like this should be > explicitly noted in the changelog, IMHO, in case we are mistaken and > need to revisit this :) Either way, not a NACK from me. Thanks Nhat, for checking. Above are my thoughts, I could be missing some info or incorrect on certain fronts so I would seek clarifications. Also thanks in advance to other experts/maintainers, please share feedback and suggestions. BR, ronald
On Fri, Dec 15, 2023 at 1:04 AM Ronald Monthero <debug.penguin32@gmail.com> wrote: > > On Thu, Dec 14, 2023 at 10:28 AM Nhat Pham <nphamcs@gmail.com> wrote: > > > .. > < snipped > > > > I should have been clearer. I'm not against the change per-se - I > > agree that we should replace create_workqueue() with > > alloc_workqueue(). What I meant was, IIUC, there are two behavioral > > changes with this new workqueue creation: > > > > a) We're replacing a bounded workqueue (which as you noted, is fixed > > by create_workqueue()) with an unbounded one (WQ_UNBOUND). This seems > > fine to me - I doubt locality buys us much here. > > Yes the workqueue attribute change per se but the existing > functionality remains seamless and the change is more a forward > looking change. imo under a memory pressure scenario an unbound > workqueue might workaround the scenario better as the number of > backing pools is dynamic. And with the WQ_UNBOUND attribute the > scheduler is more open to exercise some improvisations in any > demanding scenarios for offloading cpu time slicing for workers, ie if > any other worker of the same primary cpu had to be served due to > workers with WQ_HIGHPRI and WQ_CPU_INTENSIVE.Although normal and > highpri worker-pools don't interact with each other, the target cpu > atleast need not be the same if our worker for zswap is WQ_UNBOUND. I don't object to the change per-se. I just meant that these changes/discussion should be spelled out in the patch's changelog :) IMHO, we should document behavior changes if there are any. For instance, when we switched to kmap_local_page() for zswap, there is a discussion in the changelog regarding how it differs from the old (and deprecated) kmap_atomic(): https://lore.kernel.org/linux-mm/20231127160058.586446-1-fabio.maria.de.francesco@linux.intel.com/ and how that difference doesn't matter in the case of zswap. > > Also noting that the existing wq of zwap worker has the WQ_MEM_RECLAIM > attribute, so is there a rescue worker for zswap during a memory > pressure scenario ? > Quoting: "All work items which might be used on code paths that handle > memory reclaim are required to be queued on wq's that have a > rescue-worker reserved for execution under memory pressure. Else it is > possible that the worker-pool deadlocks waiting for execution contexts > to free up" Seems like it, but this behavior is not changed by your patch :) So i'm not too concerned by it one way or another. > > Also additional thought if adding WQ_FREEZABLE attribute while > creating the zswap worker make sense in scenarios to handle freeze and > unfreeze of specific cgroups or file system wide freeze and unfreeze > scenarios ? Does zswap worker participate in freeze/unfreeze code path > scenarios ? I don't think so, no? This zswap worker is scheduled upon the zswap pool limit hit (which happens on the zswap store/swapping/memory reclaim) path. > > > b) create_workqueue() limits the number of concurrent per-cpu > > execution contexts at 1 (i.e only one single global reclaimer), > > whereas after this patch this is set to the default value. This seems > > fine to me too - I don't remember us taking advantage of the previous > > concurrency limitation. Also, in practice, the task_struct is > > one-to-one with the zswap_pool's anyway, and most of the time, there > > is just a single pool being used. (But it begs the question - what's > > the point of using 0 instead of 1 here?) > > Nothing in particular but I left it at default 0, which can go upto > 256 ( @maxactive per cpu). > But if zswap worker is always intended to only have 1 active worker per cpu, > then that's fine with 1, otherwise a default setting might be flexible > for scaling. > just a thought, does having a higher value help for larger memory systems ? I don't think having higher value helps here tbh. We only have one work_struct per pool, so it shouldn't make a difference either way :) > > > Both seem fine (to me anyway - other reviewers feel free to take a > > look and fact-check everything). I just feel like this should be > > explicitly noted in the changelog, IMHO, in case we are mistaken and > > need to revisit this :) Either way, not a NACK from me. > > Thanks Nhat, for checking. Above are my thoughts, I could be missing > some info or incorrect > on certain fronts so I would seek clarifications. > Also thanks in advance to other experts/maintainers, please share > feedback and suggestions. > > BR, > ronald Also +Chris Li, who is also working on improving zswap :)
diff --git a/mm/zswap.c b/mm/zswap.c index 74411dfdad92..64dbe3e944a2 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1620,7 +1620,8 @@ static int zswap_setup(void) zswap_enabled = false; } - shrink_wq = create_workqueue("zswap-shrink"); + shrink_wq = alloc_workqueue("zswap-shrink", + WQ_UNBOUND|WQ_MEM_RECLAIM, 0); if (!shrink_wq) goto fallback_fail;
Use alloc_workqueue() to create and set finer work item attributes instead of create_workqueue() which is to be deprecated. Signed-off-by: Ronald Monthero <debug.penguin32@gmail.com> --- mm/zswap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)