Message ID | 20250206122633.167896-1-mhocko@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm, percpu: do not consider sleepable allocations atomic | expand |
On 2/6/25 13:26, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > 28307d938fb2 ("percpu: make pcpu_alloc() aware of current gfp context") > has fixed a reclaim recursion for scoped GFP_NOFS context. It has done > that by avoiding taking pcpu_alloc_mutex. This is a correct solution as > the worker context with full GFP_KERNEL allocation/reclaim power and which > is using the same lock cannot block the NOFS pcpu_alloc caller. > > On the other hand this is a very conservative approach that could lead > to failures because pcpu_alloc lockless implementation is quite limited. > > We have a bug report about premature failures when scsi array of 193 > devices is scanned. Sometimes (not consistently) the scanning aborts > because the iscsid daemon fails to create the queue for a random scsi > device during the scan. iscsid itslef is running with PR_SET_IO_FLUSHER > set so all allocations from this process context are GFP_NOIO. This in > turn makes any pcpu_alloc lockless (without pcpu_alloc_mutex) which > leads to pre-mature failures. > > It has turned out that iscsid has worked around this by dropping > PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382) > when scanning host. But we can do better in this case on the kernel side > and use pcpu_alloc_mutex for NOIO resp. NOFS constrained allocation > scopes too. We just need the WQ worker to never trigger IO/FS reclaim. > Achieve that by enforcing scoped GFP_NOIO for the whole execution of > pcpu_balance_workfn (this will imply NOFS constrain as well). This will > remove the dependency chain and preserve the full allocation power of > the pcpu_alloc call. > > While at it make is_atomic really test for blockable allocations. > > Fixes: 28307d938fb2 ("percpu: make pcpu_alloc() aware of current gfp context > Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/percpu.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/percpu.c b/mm/percpu.c > index d8dd31a2e407..192c2a8e901d 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -1758,7 +1758,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved, > gfp = current_gfp_context(gfp); > /* whitelisted flags that can be passed to the backing allocators */ > pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); > - is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL; > + is_atomic = !gfpflags_allow_blocking(gfp); > do_warn = !(gfp & __GFP_NOWARN); > > /* > @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work) > * to grow other chunks. This then gives pcpu_reclaim_populated() time > * to move fully free chunks to the active list to be freed if > * appropriate. > + * > + * Enforce GFP_NOIO allocations because we have pcpu_alloc users > + * constrained to GFP_NOIO/NOFS contexts and they could form lock > + * dependency through pcpu_alloc_mutex > */ > + unsigned int flags = memalloc_noio_save(); > mutex_lock(&pcpu_alloc_mutex); > spin_lock_irq(&pcpu_lock); > > @@ -2215,6 +2220,7 @@ static void pcpu_balance_workfn(struct work_struct *work) > > spin_unlock_irq(&pcpu_lock); > mutex_unlock(&pcpu_alloc_mutex); > + memalloc_noio_restore(flags); > } > > /**
Hello, Michal. On Thu, Feb 06, 2025 at 01:26:33PM +0100, Michal Hocko wrote: ... > It has turned out that iscsid has worked around this by dropping > PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382) > when scanning host. But we can do better in this case on the kernel side FWIW, requiring GFP_KERNEL context for probing doesn't sound too crazy to me. > @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work) > * to grow other chunks. This then gives pcpu_reclaim_populated() time > * to move fully free chunks to the active list to be freed if > * appropriate. > + * > + * Enforce GFP_NOIO allocations because we have pcpu_alloc users > + * constrained to GFP_NOIO/NOFS contexts and they could form lock > + * dependency through pcpu_alloc_mutex > */ > + unsigned int flags = memalloc_noio_save(); Just for context, the reason why the allocation mask support was limited to GFP_KERNEL or not rather than supporting full range of GFP flags is because percpu memory area expansion can involve page table allocations in the vmalloc area which always uses GFP_KERNEL. memalloc_noio_save() masks IO part out of that, right? It might be worthwhile to explain why we aren't passing down GPF flags throughout and instead depending on masking. Also, doesn't the above always prevent percpu allocations from doing fs/io reclaims? ie. Shouldn't the masking only be used if the passed in gfp doesn't allow fs/io? Thanks.
On Tue 11-02-25 10:55:20, Tejun Heo wrote: > Hello, Michal. > > On Thu, Feb 06, 2025 at 01:26:33PM +0100, Michal Hocko wrote: > ... > > It has turned out that iscsid has worked around this by dropping > > PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382) > > when scanning host. But we can do better in this case on the kernel side > > FWIW, requiring GFP_KERNEL context for probing doesn't sound too crazy to > me. > > > @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work) > > * to grow other chunks. This then gives pcpu_reclaim_populated() time > > * to move fully free chunks to the active list to be freed if > > * appropriate. > > + * > > + * Enforce GFP_NOIO allocations because we have pcpu_alloc users > > + * constrained to GFP_NOIO/NOFS contexts and they could form lock > > + * dependency through pcpu_alloc_mutex > > */ > > + unsigned int flags = memalloc_noio_save(); > > Just for context, the reason why the allocation mask support was limited to > GFP_KERNEL or not rather than supporting full range of GFP flags is because > percpu memory area expansion can involve page table allocations in the > vmalloc area which always uses GFP_KERNEL. memalloc_noio_save() masks IO > part out of that, right? It might be worthwhile to explain why we aren't > passing down GPF flags throughout and instead depending on masking. I have gone with masking because that seemed easier to review and more robust solution. vmalloc does support NOFS/NOIO contexts these days (it will just uses scoped masking in those cases). Propagating the gfp throughout the worker code path is likely possible, but I haven't really explored that in detail to be sure. Would that be preferable even if the fix would be more involved? > Also, doesn't the above always prevent percpu allocations from doing fs/io > reclaims? Yes it does. Probably worth mentioning in the changelog. These allocations should be rare so having a constrained reclaim didn't really seem problematic to me. There should be kswapd running in the background with the full reclaim power. > ie. Shouldn't the masking only be used if the passed in gfp > doesn't allow fs/io? This is a good question. I have to admit that my understanding might be incorrect but wouldn't it be possible that we could get the lock dependency chain if GFP_KERNEL and scoped NOFS alloc_pcp calls are competing? fs/io lock pcpu_alloc_noprof(NOFS/NOIO) pcpu_alloc_noprof(GFP_KERNEL) pcpu_schedule_balance_work pcpu_alloc_mutex pcpu_alloc_mutex allocation_deadlock throgh fs/io lock This is currently not possible because constrained allocations only do trylock. Makes sense?
Hello, On Wed, Feb 12, 2025 at 05:57:04PM +0100, Michal Hocko wrote: ... > I have gone with masking because that seemed easier to review and more > robust solution. vmalloc does support NOFS/NOIO contexts these days (it > will just uses scoped masking in those cases). Propagating the gfp I see. Nice. > throughout the worker code path is likely possible, but I haven't really > explored that in detail to be sure. Would that be preferable even if the > fix would be more involved? Longer term, yeah, I think so. > > Also, doesn't the above always prevent percpu allocations from doing fs/io > > reclaims? > > Yes it does. Probably worth mentioning in the changelog. These > allocations should be rare so having a constrained reclaim didn't really > seem problematic to me. There should be kswapd running in the background > with the full reclaim power. Hmm... you'd a better judge on whether that'd be okay or not but it does bother me that we might be increasing the chance of allocation failures for GFP_KERNEL users at least under memory pressure. > > ie. Shouldn't the masking only be used if the passed in gfp > > doesn't allow fs/io? > > This is a good question. I have to admit that my understanding might be > incorrect but wouldn't it be possible that we could get the lock > dependency chain if GFP_KERNEL and scoped NOFS alloc_pcp calls are > competing? > > fs/io lock > pcpu_alloc_noprof(NOFS/NOIO) > pcpu_alloc_noprof(GFP_KERNEL) > pcpu_schedule_balance_work > pcpu_alloc_mutex > pcpu_alloc_mutex > allocation_deadlock throgh fs/io lock > > This is currently not possible because constrained allocations only do > trylock. Right, the current locking in expansion path is really simple because it was assuming everyone would be doing GFP_KERNEL allocation. We'd have to break up the locking so that allocations are done outside locking, which hopefully shouldn't be too complicated. Thanks.
On Wed 12-02-25 08:14:35, Tejun Heo wrote: > Hello, > > On Wed, Feb 12, 2025 at 05:57:04PM +0100, Michal Hocko wrote: > ... > > I have gone with masking because that seemed easier to review and more > > robust solution. vmalloc does support NOFS/NOIO contexts these days (it > > will just uses scoped masking in those cases). Propagating the gfp > > I see. Nice. > > > throughout the worker code path is likely possible, but I haven't really > > explored that in detail to be sure. Would that be preferable even if the > > fix would be more involved? > > Longer term, yeah, I think so. I can invest more time into that direction if this is really preferred way. Not my call but I would argue that the scope interface is actually a good fit in the current implementation because it clearly defines the scope of all allocation context at a single place. Ideally with a good explanation on why is that (I guess I owe in that regards). > > > Also, doesn't the above always prevent percpu allocations from doing fs/io > > > reclaims? > > > > Yes it does. Probably worth mentioning in the changelog. These > > allocations should be rare so having a constrained reclaim didn't really > > seem problematic to me. There should be kswapd running in the background > > with the full reclaim power. > > Hmm... you'd a better judge on whether that'd be okay or not but it does > bother me that we might be increasing the chance of allocation failures for > GFP_KERNEL users at least under memory pressure. Nope, this will not change the allocation failure mode. Reclaim constrains do not change the failure mode they just change how much the allocation might struggle to reclaim to succeed. My undocumented assumption (another dept on my end) is that pcp allocations are no hot paths. So the worst case is that GFP_KERNEL pcp_allocation could have been satisfied _easier_ (i.e. faster) because it could have reclaimed fs/io caches and now it needs to rely on kswapd to do that on memory tight situations. On the other hand we have a situation when NOIO/FS allocations fail prematurely so there is certainly some pros and cons. As I've said I am no pcp allocator expert so I cannot really make proper judgment calls. I can improve the changelog or move from scope to specific gfp flags but I do not feel like I am positioned to make deeper changes to the subsystem.
Hello, On Wed, Feb 12, 2025 at 09:53:20PM +0100, Michal Hocko wrote: ... > > Hmm... you'd a better judge on whether that'd be okay or not but it does > > bother me that we might be increasing the chance of allocation failures for > > GFP_KERNEL users at least under memory pressure. > > Nope, this will not change the allocation failure mode. Reclaim > constrains do not change the failure mode they just change how much the > allocation might struggle to reclaim to succeed. > > My undocumented assumption (another dept on my end) is that pcp > allocations are no hot paths. So the worst case is that GFP_KERNEL > pcp_allocation could have been satisfied _easier_ (i.e. faster) because > it could have reclaimed fs/io caches and now it needs to rely on kswapd > to do that on memory tight situations. On the other hand we have a > situation when NOIO/FS allocations fail prematurely so there is > certainly some pros and cons. I'm having a hard time following. Are you saying that it won't increase the likelihood of allocation failures even under memory pressure but that it might just make allocations take longer to succeed? NOFS/IO prevents allocation attempt from entering fs/io reclaim paths, right? It would still trigger kswapd for reclaim but can the allocation attempt wait for that to finish? If so, wouldn't that constitute a dependency cycle all the same? All in all, percpu allocations taking longer under memory pressure is fine. Becoming more prone to allocation failures, especially for GFP_KERNEL callers, probably isn't great. > As I've said I am no pcp allocator expert so I cannot really make proper > judgment calls. I can improve the changelog or move from scope to > specific gfp flags but I do not feel like I am positioned to make deeper > changes to the subsystem. I don't think deciding whether always using NOIO/FS is a good idea requires knowing the percpu allocator that well. It's just depending on the underlying page allocator for that part. Thanks.
Hello, On Wed, Feb 12, 2025 at 11:30:08AM -1000, Tejun Heo wrote: > Hello, > > On Wed, Feb 12, 2025 at 09:53:20PM +0100, Michal Hocko wrote: > ... > > > Hmm... you'd a better judge on whether that'd be okay or not but it does > > > bother me that we might be increasing the chance of allocation failures for > > > GFP_KERNEL users at least under memory pressure. > > > > Nope, this will not change the allocation failure mode. Reclaim > > constrains do not change the failure mode they just change how much the > > allocation might struggle to reclaim to succeed. > > > > My undocumented assumption (another dept on my end) is that pcp > > allocations are no hot paths. So the worst case is that GFP_KERNEL > > pcp_allocation could have been satisfied _easier_ (i.e. faster) because > > it could have reclaimed fs/io caches and now it needs to rely on kswapd > > to do that on memory tight situations. On the other hand we have a > > situation when NOIO/FS allocations fail prematurely so there is > > certainly some pros and cons. > > I'm having a hard time following. Are you saying that it won't increase the > likelihood of allocation failures even under memory pressure but that it > might just make allocations take longer to succeed? > > NOFS/IO prevents allocation attempt from entering fs/io reclaim paths, > right? It would still trigger kswapd for reclaim but can the allocation > attempt wait for that to finish? If so, wouldn't that constitute a > dependency cycle all the same? > > All in all, percpu allocations taking longer under memory pressure is fine. > Becoming more prone to allocation failures, especially for GFP_KERNEL > callers, probably isn't great. > Wait, I think I'm interpreting this change differently. This is preventing the worker from allocating backing pages via GFP_KERNEL. It isn't preventing an allocation via alloc_percpu() from being GFP_KERNEL and providing those flags down to the backing page code. alloc_percpu() for GFP_KERNEL allocations will populate the pages before returning. I'm reading this as potentially making atomic percpu allocations fail as we might be low on backing pages. This change makes the worker now need to wait for kswapd to give it pages. Consequently, if there are a lot of allocations coming in when it's low, we might burn a bit of cpu from the worker now. We could take the time to split out pcpu_alloc_mutex and pcpu_lock more to provide finer grain / concurrrent allocations. But I don't currently have a justification for it. > > As I've said I am no pcp allocator expert so I cannot really make proper > > judgment calls. I can improve the changelog or move from scope to > > specific gfp flags but I do not feel like I am positioned to make deeper > > changes to the subsystem. > > I don't think deciding whether always using NOIO/FS is a good idea requires > knowing the percpu allocator that well. It's just depending on the > underlying page allocator for that part. > > Thanks. > > -- > tejun Thanks, Dennis
On Wed 12-02-25 11:30:08, Tejun Heo wrote: > Hello, > > On Wed, Feb 12, 2025 at 09:53:20PM +0100, Michal Hocko wrote: > ... > > > Hmm... you'd a better judge on whether that'd be okay or not but it does > > > bother me that we might be increasing the chance of allocation failures for > > > GFP_KERNEL users at least under memory pressure. > > > > Nope, this will not change the allocation failure mode. Reclaim > > constrains do not change the failure mode they just change how much the > > allocation might struggle to reclaim to succeed. > > > > My undocumented assumption (another dept on my end) is that pcp > > allocations are no hot paths. So the worst case is that GFP_KERNEL > > pcp_allocation could have been satisfied _easier_ (i.e. faster) because > > it could have reclaimed fs/io caches and now it needs to rely on kswapd > > to do that on memory tight situations. On the other hand we have a > > situation when NOIO/FS allocations fail prematurely so there is > > certainly some pros and cons. > > I'm having a hard time following. Are you saying that it won't increase the > likelihood of allocation failures even under memory pressure but that it > might just make allocations take longer to succeed? yes, this is like any other NOFS/NOIO allocation non-costly (<=PAGE_ALLOC_COSTLY_ORDER) which effectively never fail.
On Wed 12-02-25 13:39:31, Dennis Zhou wrote: > Hello, > > On Wed, Feb 12, 2025 at 11:30:08AM -1000, Tejun Heo wrote: > > Hello, > > > > On Wed, Feb 12, 2025 at 09:53:20PM +0100, Michal Hocko wrote: > > ... > > > > Hmm... you'd a better judge on whether that'd be okay or not but it does > > > > bother me that we might be increasing the chance of allocation failures for > > > > GFP_KERNEL users at least under memory pressure. > > > > > > Nope, this will not change the allocation failure mode. Reclaim > > > constrains do not change the failure mode they just change how much the > > > allocation might struggle to reclaim to succeed. > > > > > > My undocumented assumption (another dept on my end) is that pcp > > > allocations are no hot paths. So the worst case is that GFP_KERNEL > > > pcp_allocation could have been satisfied _easier_ (i.e. faster) because > > > it could have reclaimed fs/io caches and now it needs to rely on kswapd > > > to do that on memory tight situations. On the other hand we have a > > > situation when NOIO/FS allocations fail prematurely so there is > > > certainly some pros and cons. > > > > I'm having a hard time following. Are you saying that it won't increase the > > likelihood of allocation failures even under memory pressure but that it > > might just make allocations take longer to succeed? > > > > NOFS/IO prevents allocation attempt from entering fs/io reclaim paths, > > right? It would still trigger kswapd for reclaim but can the allocation > > attempt wait for that to finish? If so, wouldn't that constitute a > > dependency cycle all the same? > > > > All in all, percpu allocations taking longer under memory pressure is fine. > > Becoming more prone to allocation failures, especially for GFP_KERNEL > > callers, probably isn't great. > > > > Wait, I think I'm interpreting this change differently. This is > preventing the worker from allocating backing pages via GFP_KERNEL. It > isn't preventing an allocation via alloc_percpu() from being GFP_KERNEL > and providing those flags down to the backing page code. alloc_percpu() > for GFP_KERNEL allocations will populate the pages before returning. Correct. > I'm reading this as potentially making atomic percpu allocations fail as > we might be low on backing pages. This change makes the worker now need > to wait for kswapd to give it pages. Consequently, if there are a lot of > allocations coming in when it's low, we might burn a bit of cpu from the > worker now. Yes, this is potential side effect. On the other hand NOFS/NOIO requests wouldn't be considered atomic anymore and they wouldn't fail that easily. Maybe that is an odd case not worth the additional worker overhead. As I've said I am not familiar with the pcp internals to know how often the worker is really required
On Fri, Feb 14, 2025 at 04:52:42PM +0100, Michal Hocko wrote: > On Wed 12-02-25 13:39:31, Dennis Zhou wrote: > > Hello, > > > > On Wed, Feb 12, 2025 at 11:30:08AM -1000, Tejun Heo wrote: > > > Hello, > > > > > > On Wed, Feb 12, 2025 at 09:53:20PM +0100, Michal Hocko wrote: > > > ... > > > > > Hmm... you'd a better judge on whether that'd be okay or not but it does > > > > > bother me that we might be increasing the chance of allocation failures for > > > > > GFP_KERNEL users at least under memory pressure. > > > > > > > > Nope, this will not change the allocation failure mode. Reclaim > > > > constrains do not change the failure mode they just change how much the > > > > allocation might struggle to reclaim to succeed. > > > > > > > > My undocumented assumption (another dept on my end) is that pcp > > > > allocations are no hot paths. So the worst case is that GFP_KERNEL > > > > pcp_allocation could have been satisfied _easier_ (i.e. faster) because > > > > it could have reclaimed fs/io caches and now it needs to rely on kswapd > > > > to do that on memory tight situations. On the other hand we have a > > > > situation when NOIO/FS allocations fail prematurely so there is > > > > certainly some pros and cons. > > > > > > I'm having a hard time following. Are you saying that it won't increase the > > > likelihood of allocation failures even under memory pressure but that it > > > might just make allocations take longer to succeed? > > > > > > NOFS/IO prevents allocation attempt from entering fs/io reclaim paths, > > > right? It would still trigger kswapd for reclaim but can the allocation > > > attempt wait for that to finish? If so, wouldn't that constitute a > > > dependency cycle all the same? > > > > > > All in all, percpu allocations taking longer under memory pressure is fine. > > > Becoming more prone to allocation failures, especially for GFP_KERNEL > > > callers, probably isn't great. > > > > > > > Wait, I think I'm interpreting this change differently. This is > > preventing the worker from allocating backing pages via GFP_KERNEL. It > > isn't preventing an allocation via alloc_percpu() from being GFP_KERNEL > > and providing those flags down to the backing page code. alloc_percpu() > > for GFP_KERNEL allocations will populate the pages before returning. > > Correct. > > > I'm reading this as potentially making atomic percpu allocations fail as > > we might be low on backing pages. This change makes the worker now need > > to wait for kswapd to give it pages. Consequently, if there are a lot of > > allocations coming in when it's low, we might burn a bit of cpu from the > > worker now. > > Yes, this is potential side effect. On the other hand NOFS/NOIO requests > wouldn't be considered atomic anymore and they wouldn't fail that > easily. Maybe that is an odd case not worth the additional worker > overhead. As I've said I am not familiar with the pcp internals to know > how often the worker is really required > I've thought about this in the back of my head for the past few weeks. I think I have 2 questions about this change. 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL allocations should be okay because that more or less is control plane time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a work around? 2. This change breaks the feedback loop as we discussed above. Historically we've targeted 2-4 free pages worth of percpu memory. This is done by kicking the percpu work off. That does GFP_KERNEL allocations and if that requires reclaim then it goes and does it. However, now we're saying kswapd is going to work in parallel while we try to get pages in the worker thread. Given you're more versed in the reclaim side. I presume it must be pretty bad if we're failing to get order-0 pages even if we have NOFS/NOIO set? My feeling is that we should add back some knowledge of the dependency so if the worker fails to get pages, it doesn't reschedule immediately. Maybe it's as simple as adding a sleep in the worker or playing with delayed work... Thanks, Dennis
On 2/21/25 03:36, Dennis Zhou wrote: > I've thought about this in the back of my head for the past few weeks. I > think I have 2 questions about this change. > > 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL > allocations should be okay because that more or less is control plane > time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a > work around? This solves the iscsid case but not other cases, where GFP_KERNEL allocations are fundamentally impossible. > 2. This change breaks the feedback loop as we discussed above. > Historically we've targeted 2-4 free pages worth of percpu memory. > This is done by kicking the percpu work off. That does GFP_KERNEL > allocations and if that requires reclaim then it goes and does it. > However, now we're saying kswapd is going to work in parallel while > we try to get pages in the worker thread. > > Given you're more versed in the reclaim side. I presume it must be > pretty bad if we're failing to get order-0 pages even if we have > NOFS/NOIO set? IMHO yes, so I don't think we need to pre-emptively fear that situation that much. OTOH in the current state, depleting pcpu's atomic reserves and failing pcpu_alloc due to not being allowed to take the mutex can happen easily and even if there's plenty of free memory. > My feeling is that we should add back some knowledge of the > dependency so if the worker fails to get pages, it doesn't reschedule > immediately. Maybe it's as simple as adding a sleep in the worker or > playing with delayed work... I think if we wanted things to be more robust (and perhaps there's no need to, see above), the best way would be to make the worker preallocate with GFP_KERNEL outside of pcpu_alloc_mutex. I assume it's probably not easy to implement as page table allocations are involved in the process and we don't have a way to supply preallocated memory for those. > Thanks, > Dennis
diff --git a/mm/percpu.c b/mm/percpu.c index d8dd31a2e407..192c2a8e901d 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1758,7 +1758,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved, gfp = current_gfp_context(gfp); /* whitelisted flags that can be passed to the backing allocators */ pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); - is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL; + is_atomic = !gfpflags_allow_blocking(gfp); do_warn = !(gfp & __GFP_NOWARN); /* @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work) * to grow other chunks. This then gives pcpu_reclaim_populated() time * to move fully free chunks to the active list to be freed if * appropriate. + * + * Enforce GFP_NOIO allocations because we have pcpu_alloc users + * constrained to GFP_NOIO/NOFS contexts and they could form lock + * dependency through pcpu_alloc_mutex */ + unsigned int flags = memalloc_noio_save(); mutex_lock(&pcpu_alloc_mutex); spin_lock_irq(&pcpu_lock); @@ -2215,6 +2220,7 @@ static void pcpu_balance_workfn(struct work_struct *work) spin_unlock_irq(&pcpu_lock); mutex_unlock(&pcpu_alloc_mutex); + memalloc_noio_restore(flags); } /**