Message ID | 20220613125622.18628-8-mgorman@techsingularity.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Drain remote per-cpu directly | expand |
On Mon, Jun 13, 2022 at 8:54 AM Mel Gorman <mgorman@techsingularity.net> wrote: ... > +#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \ > +({ \ > + type *_ret; \ > + pcpu_task_pin(); \ > + _ret = this_cpu_ptr(ptr); \ > + if (!spin_trylock_irqsave(&_ret->member, flags)) \ > + _ret = NULL; \ I'm getting "BUG: sleeping function called from invalid context" with mm-everything-2022-06-14-19-05. Perhaps missing a pcpu_task_unpin() here? > + _ret; \ > +})
Hi Mel, On 13.06.2022 14:56, Mel Gorman wrote: > struct per_cpu_pages is no longer strictly local as PCP lists can be > drained remotely using a lock for protection. While the use of local_lock > works, it goes against the intent of local_lock which is for "pure > CPU local concurrency control mechanisms and not suited for inter-CPU > concurrency control" (Documentation/locking/locktypes.rst) > > local_lock protects against migration between when the percpu pointer is > accessed and the pcp->lock acquired. The lock acquisition is a preemption > point so in the worst case, a task could migrate to another NUMA node > and accidentally allocate remote memory. The main requirement is to pin > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. > > Replace local_lock with helpers that pin a task to a CPU, lookup the > per-cpu structure and acquire the embedded lock. It's similar to local_lock > without breaking the intent behind the API. It is not a complete API > as only the parts needed for PCP-alloc are implemented but in theory, > the generic helpers could be promoted to a general API if there was > demand for an embedded lock within a per-cpu struct with a guarantee > that the per-cpu structure locked matches the running CPU and cannot use > get_cpu_var due to RT concerns. PCP requires these semantics to avoid > accidentally allocating remote memory. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> This patch landed in linux next-20220614 as commit 54bcdc6744e3 ("mm/page_alloc: replace local_lock with normal spinlock"). Unfortunately it causes some serious issues when some kernel debugging options (CONFIG_PROVE_LOCKING and CONFIG_DEBUG_ATOMIC_SLEEP) are enabled. I've observed this on various ARM 64bit and 32bit boards. In the logs I see lots of errors like: BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:274 BUG: scheduling while atomic: systemd-udevd/288/0x00000002 BUG: sleeping function called from invalid context at mm/filemap.c:2647 however there are also a fatal ones like: Unable to handle kernel paging request at virtual address 00000000017a87b4 The issues seems to be a bit random. Looks like memory trashing. Reverting $subject on top of current linux-next fixes all those issues. Let me know if how I can help debugging this. Best regards
On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > In the logs I see lots of errors like: > > BUG: sleeping function called from invalid context at > ./include/linux/sched/mm.h:274 > > BUG: scheduling while atomic: systemd-udevd/288/0x00000002 > > BUG: sleeping function called from invalid context at mm/filemap.c:2647 > > however there are also a fatal ones like: > > Unable to handle kernel paging request at virtual address 00000000017a87b4 > > > The issues seems to be a bit random. Looks like memory trashing. > Reverting $subject on top of current linux-next fixes all those issues. > > This? --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix +++ a/mm/page_alloc.c @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock) type *_ret; \ pcpu_task_pin(); \ _ret = this_cpu_ptr(ptr); \ - if (!spin_trylock_irqsave(&_ret->member, flags)) \ + if (!spin_trylock_irqsave(&_ret->member, flags)) { \ + pcpu_task_unpin(); \ _ret = NULL; \ + } \ _ret; \ }) I'll drop Mel's patch for next -next.
On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote: > On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > > In the logs I see lots of errors like: > > > > BUG: sleeping function called from invalid context at > > ./include/linux/sched/mm.h:274 > > > > BUG: scheduling while atomic: systemd-udevd/288/0x00000002 > > > > BUG: sleeping function called from invalid context at mm/filemap.c:2647 > > > > however there are also a fatal ones like: > > > > Unable to handle kernel paging request at virtual address 00000000017a87b4 > > > > > > The issues seems to be a bit random. Looks like memory trashing. > > Reverting $subject on top of current linux-next fixes all those issues. > > > > > > This? > > --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix > +++ a/mm/page_alloc.c > @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock) > type *_ret; \ > pcpu_task_pin(); \ > _ret = this_cpu_ptr(ptr); \ > - if (!spin_trylock_irqsave(&_ret->member, flags)) \ > + if (!spin_trylock_irqsave(&_ret->member, flags)) { \ > + pcpu_task_unpin(); \ > _ret = NULL; \ > + } \ > _ret; \ > }) > > > I'll drop Mel's patch for next -next. While we are at it, please consider this cleanup: mm/page_alloc.c | 48 +++++++++--------------------------------------- 1 file changed, 9 insertions(+), 39 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e538dde2c1c0..a1b76d5fdf75 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -160,61 +160,31 @@ static DEFINE_MUTEX(pcp_batch_high_lock); * Generic helper to lookup and a per-cpu variable with an embedded spinlock. * Return value should be used with equivalent unlock helper. */ -#define pcpu_spin_lock(type, member, ptr) \ -({ \ - type *_ret; \ - pcpu_task_pin(); \ - _ret = this_cpu_ptr(ptr); \ - spin_lock(&_ret->member); \ - _ret; \ -}) - -#define pcpu_spin_lock_irqsave(type, member, ptr, flags) \ -({ \ - type *_ret; \ - pcpu_task_pin(); \ - _ret = this_cpu_ptr(ptr); \ - spin_lock_irqsave(&_ret->member, flags); \ - _ret; \ -}) - -#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \ -({ \ - type *_ret; \ - pcpu_task_pin(); \ - _ret = this_cpu_ptr(ptr); \ - if (!spin_trylock_irqsave(&_ret->member, flags)) \ - _ret = NULL; \ - _ret; \ -}) - -#define pcpu_spin_unlock(member, ptr) \ -({ \ - spin_unlock(&ptr->member); \ - pcpu_task_unpin(); \ -}) - -#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \ -({ \ - spin_unlock_irqrestore(&ptr->member, flags); \ - pcpu_task_unpin(); \ -}) - -/* struct per_cpu_pages specific helpers. */ -#define pcp_spin_lock(ptr) \ - pcpu_spin_lock(struct per_cpu_pages, lock, ptr) - #define pcp_spin_lock_irqsave(ptr, flags) \ - pcpu_spin_lock_irqsave(struct per_cpu_pages, lock, ptr, flags) +({ \ + struct per_cpu_pages *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + spin_lock_irqsave(&_ret->lock, flags); \ + _ret; \ +}) #define pcp_spin_trylock_irqsave(ptr, flags) \ - pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags) - -#define pcp_spin_unlock(ptr) \ - pcpu_spin_unlock(lock, ptr) +({ \ + struct per_cpu_pages *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + if (!spin_trylock_irqsave(&_ret->lock, flags)) \ + _ret = NULL; \ + _ret; \ +}) #define pcp_spin_unlock_irqrestore(ptr, flags) \ - pcpu_spin_unlock_irqrestore(lock, ptr, flags) +({ \ + spin_unlock_irqrestore(&ptr->lock, flags); \ + pcpu_task_unpin(); \ +}) + #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -3488,7 +3458,7 @@ void free_unref_page(struct page *page, unsigned int order) zone = page_zone(page); pcp_trylock_prepare(UP_flags); - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); if (pcp) { free_unref_page_commit(pcp, zone, page, migratetype, order); pcp_spin_unlock_irqrestore(pcp, flags);
On 6/13/22 14:56, Mel Gorman wrote: > struct per_cpu_pages is no longer strictly local as PCP lists can be > drained remotely using a lock for protection. While the use of local_lock > works, it goes against the intent of local_lock which is for "pure > CPU local concurrency control mechanisms and not suited for inter-CPU > concurrency control" (Documentation/locking/locktypes.rst) > > local_lock protects against migration between when the percpu pointer is > accessed and the pcp->lock acquired. The lock acquisition is a preemption > point so in the worst case, a task could migrate to another NUMA node > and accidentally allocate remote memory. The main requirement is to pin > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. > > Replace local_lock with helpers that pin a task to a CPU, lookup the > per-cpu structure and acquire the embedded lock. It's similar to local_lock > without breaking the intent behind the API. It is not a complete API > as only the parts needed for PCP-alloc are implemented but in theory, > the generic helpers could be promoted to a general API if there was > demand for an embedded lock within a per-cpu struct with a guarantee > that the per-cpu structure locked matches the running CPU and cannot use > get_cpu_var due to RT concerns. PCP requires these semantics to avoid > accidentally allocating remote memory. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> ... > @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > return min(READ_ONCE(pcp->batch) << 2, high); > } > > -/* Returns true if the page was committed to the per-cpu list. */ > -static bool free_unref_page_commit(struct page *page, int migratetype, > - unsigned int order, bool locked) > +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone, > + struct page *page, int migratetype, > + unsigned int order) Hmm given this drops the "bool locked" and bool return value again, my suggestion for patch 5/7 would result in less churn as those woudn't need to be introduced? ... > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, > struct list_head *list; > struct page *page; > unsigned long flags; > + unsigned long __maybe_unused UP_flags; > > - local_lock_irqsave(&pagesets.lock, flags); > + /* > + * spin_trylock_irqsave is not necessary right now as it'll only be > + * true when contending with a remote drain. It's in place as a > + * preparation step before converting pcp locking to spin_trylock > + * to protect against IRQ reentry. > + */ > + pcp_trylock_prepare(UP_flags); > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > + if (!pcp) Besides the missing unpin Andrew fixed, I think also this is missing pcp_trylock_finish(UP_flags); ? > + return NULL; > > /* > * On allocation, reduce the number of pages that are batch freed. > * See nr_pcp_free() where free_factor is increased for subsequent > * frees. > */ > - pcp = this_cpu_ptr(zone->per_cpu_pageset); > pcp->free_factor >>= 1; > list = &pcp->lists[order_to_pindex(migratetype, order)]; > - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false); > - local_unlock_irqrestore(&pagesets.lock, flags); > + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); > + pcp_spin_unlock_irqrestore(pcp, flags); > + pcp_trylock_finish(UP_flags); > if (page) { > __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); > zone_statistics(preferred_zone, zone, 1); > @@ -5410,10 +5431,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > goto failed; > > /* Attempt the batch allocation */ > - local_lock_irqsave(&pagesets.lock, flags); > - pcp = this_cpu_ptr(zone->per_cpu_pageset); > + pcp = pcp_spin_lock_irqsave(zone->per_cpu_pageset, flags); > pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; > - spin_lock(&pcp->lock); > > while (nr_populated < nr_pages) { > > @@ -5424,13 +5443,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > } > > page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, > - pcp, pcp_list, true); > + pcp, pcp_list); > if (unlikely(!page)) { > /* Try and allocate at least one page */ > - if (!nr_account) { > - spin_unlock(&pcp->lock); > + if (!nr_account) > goto failed_irq; > - } > break; > } > nr_account++; > @@ -5443,8 +5460,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > nr_populated++; > } > > - spin_unlock(&pcp->lock); > - local_unlock_irqrestore(&pagesets.lock, flags); > + pcp_spin_unlock_irqrestore(pcp, flags); > > __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); > zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); > @@ -5453,7 +5469,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > return nr_populated; > > failed_irq: > - local_unlock_irqrestore(&pagesets.lock, flags); > + pcp_spin_unlock_irqrestore(pcp, flags); > > failed: > page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 6/13/22 14:56, Mel Gorman wrote: > > struct per_cpu_pages is no longer strictly local as PCP lists can be > > drained remotely using a lock for protection. While the use of local_lock > > works, it goes against the intent of local_lock which is for "pure > > CPU local concurrency control mechanisms and not suited for inter-CPU > > concurrency control" (Documentation/locking/locktypes.rst) > > > > local_lock protects against migration between when the percpu pointer is > > accessed and the pcp->lock acquired. The lock acquisition is a preemption > > point so in the worst case, a task could migrate to another NUMA node > > and accidentally allocate remote memory. The main requirement is to pin > > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. > > > > Replace local_lock with helpers that pin a task to a CPU, lookup the > > per-cpu structure and acquire the embedded lock. It's similar to local_lock > > without breaking the intent behind the API. It is not a complete API > > as only the parts needed for PCP-alloc are implemented but in theory, > > the generic helpers could be promoted to a general API if there was > > demand for an embedded lock within a per-cpu struct with a guarantee > > that the per-cpu structure locked matches the running CPU and cannot use > > get_cpu_var due to RT concerns. PCP requires these semantics to avoid > > accidentally allocating remote memory. > > > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > ... > > > @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > > return min(READ_ONCE(pcp->batch) << 2, high); > > } > > > > -/* Returns true if the page was committed to the per-cpu list. */ > > -static bool free_unref_page_commit(struct page *page, int migratetype, > > - unsigned int order, bool locked) > > +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone, > > + struct page *page, int migratetype, > > + unsigned int order) > > Hmm given this drops the "bool locked" and bool return value again, my > suggestion for patch 5/7 would result in less churn as those woudn't need to > be introduced? > > ... > > > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, > > struct list_head *list; > > struct page *page; > > unsigned long flags; > > + unsigned long __maybe_unused UP_flags; > > > > - local_lock_irqsave(&pagesets.lock, flags); > > + /* > > + * spin_trylock_irqsave is not necessary right now as it'll only be > > + * true when contending with a remote drain. It's in place as a > > + * preparation step before converting pcp locking to spin_trylock > > + * to protect against IRQ reentry. > > + */ > > + pcp_trylock_prepare(UP_flags); > > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > > + if (!pcp) > > Besides the missing unpin Andrew fixed, I think also this is missing > pcp_trylock_finish(UP_flags); ? spin_trylock only fails when trylock_finish is a NOP.
Hi Andrew, On 16.06.2022 01:04, Andrew Morton wrote: > On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > >> In the logs I see lots of errors like: >> >> BUG: sleeping function called from invalid context at >> ./include/linux/sched/mm.h:274 >> >> BUG: scheduling while atomic: systemd-udevd/288/0x00000002 >> >> BUG: sleeping function called from invalid context at mm/filemap.c:2647 >> >> however there are also a fatal ones like: >> >> Unable to handle kernel paging request at virtual address 00000000017a87b4 >> >> >> The issues seems to be a bit random. Looks like memory trashing. >> Reverting $subject on top of current linux-next fixes all those issues. >> >> > This? > > --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix > +++ a/mm/page_alloc.c > @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock) > type *_ret; \ > pcpu_task_pin(); \ > _ret = this_cpu_ptr(ptr); \ > - if (!spin_trylock_irqsave(&_ret->member, flags)) \ > + if (!spin_trylock_irqsave(&_ret->member, flags)) { \ > + pcpu_task_unpin(); \ > _ret = NULL; \ > + } \ > _ret; \ > }) > > > I'll drop Mel's patch for next -next. Yes, this fixes the issues I've observed. Feel free to add: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards
On 6/16/22 05:05, Yu Zhao wrote: > On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote: > > While we are at it, please consider this cleanup: I suspect Mel had further plans for the API beynd this series. ... > #define pcp_spin_trylock_irqsave(ptr, flags) \ > - pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags) > - > -#define pcp_spin_unlock(ptr) \ > - pcpu_spin_unlock(lock, ptr) > +({ \ > + struct per_cpu_pages *_ret; \ > + pcpu_task_pin(); \ > + _ret = this_cpu_ptr(ptr); \ > + if (!spin_trylock_irqsave(&_ret->lock, flags)) \ Also missing the unpin? > + _ret = NULL; \ > + _ret; \ > +}) > > #define pcp_spin_unlock_irqrestore(ptr, flags) \ > - pcpu_spin_unlock_irqrestore(lock, ptr, flags) > +({ \ > + spin_unlock_irqrestore(&ptr->lock, flags); \ > + pcpu_task_unpin(); \ > +}) > + > #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID > DEFINE_PER_CPU(int, numa_node); > EXPORT_PER_CPU_SYMBOL(numa_node); > @@ -3488,7 +3458,7 @@ void free_unref_page(struct page *page, unsigned int order) > > zone = page_zone(page); > pcp_trylock_prepare(UP_flags); > - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > if (pcp) { > free_unref_page_commit(pcp, zone, page, migratetype, order); > pcp_spin_unlock_irqrestore(pcp, flags);
On 6/16/22 23:07, Yu Zhao wrote: > On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> >> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, >> > struct list_head *list; >> > struct page *page; >> > unsigned long flags; >> > + unsigned long __maybe_unused UP_flags; >> > >> > - local_lock_irqsave(&pagesets.lock, flags); >> > + /* >> > + * spin_trylock_irqsave is not necessary right now as it'll only be >> > + * true when contending with a remote drain. It's in place as a >> > + * preparation step before converting pcp locking to spin_trylock >> > + * to protect against IRQ reentry. >> > + */ >> > + pcp_trylock_prepare(UP_flags); >> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); >> > + if (!pcp) >> >> Besides the missing unpin Andrew fixed, I think also this is missing >> pcp_trylock_finish(UP_flags); ? > > spin_trylock only fails when trylock_finish is a NOP. True, so it's not an active bug, but I would still add it, so it's not confusing and depending on non-obvious details that might later change and break the code.
Hi Mel, On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote: > @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order) > migratetype = MIGRATE_MOVABLE; > } > > - local_lock_irqsave(&pagesets.lock, flags); > - freed_pcp = free_unref_page_commit(page, migratetype, order, false); > - local_unlock_irqrestore(&pagesets.lock, flags); > - > - if (unlikely(!freed_pcp)) > + zone = page_zone(page); > + pcp_trylock_prepare(UP_flags); Now that you're calling the *_irqsave() family of function you can drop pcp_trylock_prepare/finish() For the record in UP: #define spin_trylock_irqsave(lock, flags) \ ({ \ local_irq_save(flags); \ 1; }) > + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); > + if (pcp) { > + free_unref_page_commit(pcp, zone, page, migratetype, order); > + pcp_spin_unlock_irqrestore(pcp, flags); > + } else { > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); > + } > + pcp_trylock_finish(UP_flags); > } > > /* As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just bypass patch #5. Especially as its intent isn't true anymore: "As preparation for dealing with both of those problems, protect the lists with a spinlock. The IRQ-unsafe version of the lock is used because IRQs are already disabled by local_lock_irqsave. spin_trylock is used in preparation for a time when local_lock could be used instead of lock_lock_irqsave."
On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote: > On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > > In the logs I see lots of errors like: > > > > BUG: sleeping function called from invalid context at > > ./include/linux/sched/mm.h:274 > > > > BUG: scheduling while atomic: systemd-udevd/288/0x00000002 > > > > BUG: sleeping function called from invalid context at mm/filemap.c:2647 > > > > however there are also a fatal ones like: > > > > Unable to handle kernel paging request at virtual address 00000000017a87b4 > > > > > > The issues seems to be a bit random. Looks like memory trashing. > > Reverting $subject on top of current linux-next fixes all those issues. > > > > > > This? > > --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix > +++ a/mm/page_alloc.c > @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock) > type *_ret; \ > pcpu_task_pin(); \ > _ret = this_cpu_ptr(ptr); \ > - if (!spin_trylock_irqsave(&_ret->member, flags)) \ > + if (!spin_trylock_irqsave(&_ret->member, flags)) { \ > + pcpu_task_unpin(); \ > _ret = NULL; \ > + } \ > _ret; \ > }) > This is the correct fix. I *had* a fix for this but in a patch that was not posted that drops irqsave :(
On Thu, Jun 16, 2022 at 07:01:53PM +0200, Vlastimil Babka wrote: > On 6/13/22 14:56, Mel Gorman wrote: > > struct per_cpu_pages is no longer strictly local as PCP lists can be > > drained remotely using a lock for protection. While the use of local_lock > > works, it goes against the intent of local_lock which is for "pure > > CPU local concurrency control mechanisms and not suited for inter-CPU > > concurrency control" (Documentation/locking/locktypes.rst) > > > > local_lock protects against migration between when the percpu pointer is > > accessed and the pcp->lock acquired. The lock acquisition is a preemption > > point so in the worst case, a task could migrate to another NUMA node > > and accidentally allocate remote memory. The main requirement is to pin > > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. > > > > Replace local_lock with helpers that pin a task to a CPU, lookup the > > per-cpu structure and acquire the embedded lock. It's similar to local_lock > > without breaking the intent behind the API. It is not a complete API > > as only the parts needed for PCP-alloc are implemented but in theory, > > the generic helpers could be promoted to a general API if there was > > demand for an embedded lock within a per-cpu struct with a guarantee > > that the per-cpu structure locked matches the running CPU and cannot use > > get_cpu_var due to RT concerns. PCP requires these semantics to avoid > > accidentally allocating remote memory. > > > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > ... > > > @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > > return min(READ_ONCE(pcp->batch) << 2, high); > > } > > > > -/* Returns true if the page was committed to the per-cpu list. */ > > -static bool free_unref_page_commit(struct page *page, int migratetype, > > - unsigned int order, bool locked) > > +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone, > > + struct page *page, int migratetype, > > + unsigned int order) > > Hmm given this drops the "bool locked" and bool return value again, my > suggestion for patch 5/7 would result in less churn as those woudn't need to > be introduced? > It would. I considered doing exactly that but I didn't want to drop the reviewed-bys and tested-bys and the change was significant enough to do that. As multiple fixes are needed, I'll do that. > ... > > > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, > > struct list_head *list; > > struct page *page; > > unsigned long flags; > > + unsigned long __maybe_unused UP_flags; > > > > - local_lock_irqsave(&pagesets.lock, flags); > > + /* > > + * spin_trylock_irqsave is not necessary right now as it'll only be > > + * true when contending with a remote drain. It's in place as a > > + * preparation step before converting pcp locking to spin_trylock > > + * to protect against IRQ reentry. > > + */ > > + pcp_trylock_prepare(UP_flags); > > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > > + if (!pcp) > > Besides the missing unpin Andrew fixed, I think also this is missing > pcp_trylock_finish(UP_flags); ? > Yes.
On Fri, Jun 17, 2022 at 09:57:06AM +0200, Vlastimil Babka wrote: > On 6/16/22 23:07, Yu Zhao wrote: > > On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> > >> > >> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, > >> > struct list_head *list; > >> > struct page *page; > >> > unsigned long flags; > >> > + unsigned long __maybe_unused UP_flags; > >> > > >> > - local_lock_irqsave(&pagesets.lock, flags); > >> > + /* > >> > + * spin_trylock_irqsave is not necessary right now as it'll only be > >> > + * true when contending with a remote drain. It's in place as a > >> > + * preparation step before converting pcp locking to spin_trylock > >> > + * to protect against IRQ reentry. > >> > + */ > >> > + pcp_trylock_prepare(UP_flags); > >> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > >> > + if (!pcp) > >> > >> Besides the missing unpin Andrew fixed, I think also this is missing > >> pcp_trylock_finish(UP_flags); ? > > > > spin_trylock only fails when trylock_finish is a NOP. > > True, so it's not an active bug, but I would still add it, so it's not > confusing and depending on non-obvious details that might later change and > break the code. Yes. Even though it may work, it's still wrong.
On Fri, Jun 17, 2022 at 11:39:03AM +0200, Nicolas Saenz Julienne wrote: > Hi Mel, > > On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote: > > @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order) > > migratetype = MIGRATE_MOVABLE; > > } > > > > - local_lock_irqsave(&pagesets.lock, flags); > > - freed_pcp = free_unref_page_commit(page, migratetype, order, false); > > - local_unlock_irqrestore(&pagesets.lock, flags); > > - > > - if (unlikely(!freed_pcp)) > > + zone = page_zone(page); > > + pcp_trylock_prepare(UP_flags); > > Now that you're calling the *_irqsave() family of function you can drop > pcp_trylock_prepare/finish() > > For the record in UP: > > #define spin_trylock_irqsave(lock, flags) \ > ({ \ > local_irq_save(flags); \ > 1; > }) > The missing patch that is deferred for a later release uses spin_trylock so unless that is never merged because there is an unfixable flaw in it, I'd prefer to leave the preparation in place. > > + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); > > + if (pcp) { > > + free_unref_page_commit(pcp, zone, page, migratetype, order); > > + pcp_spin_unlock_irqrestore(pcp, flags); > > + } else { > > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); > > + } > > + pcp_trylock_finish(UP_flags); > > } > > > > /* > > As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just > bypass patch #5. Especially as its intent isn't true anymore: > > "As preparation for dealing with both of those problems, protect the lists > with a spinlock. The IRQ-unsafe version of the lock is used because IRQs > are already disabled by local_lock_irqsave. spin_trylock is used in > preparation for a time when local_lock could be used instead of > lock_lock_irqsave." > It's still true, the patch just isn't included as I wanted them to be separated by time so a bisection that points to it is "obvious" instead of pointing at the whole series as being a potential problem.
On Tue, 2022-06-21 at 10:29 +0100, Mel Gorman wrote: > On Fri, Jun 17, 2022 at 11:39:03AM +0200, Nicolas Saenz Julienne wrote: > > Hi Mel, > > > > On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote: > > > @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order) > > > migratetype = MIGRATE_MOVABLE; > > > } > > > > > > - local_lock_irqsave(&pagesets.lock, flags); > > > - freed_pcp = free_unref_page_commit(page, migratetype, order, false); > > > - local_unlock_irqrestore(&pagesets.lock, flags); > > > - > > > - if (unlikely(!freed_pcp)) > > > + zone = page_zone(page); > > > + pcp_trylock_prepare(UP_flags); > > > > Now that you're calling the *_irqsave() family of function you can drop > > pcp_trylock_prepare/finish() > > > > For the record in UP: > > > > #define spin_trylock_irqsave(lock, flags) \ > > ({ \ > > local_irq_save(flags); \ > > 1; > > }) > > > > The missing patch that is deferred for a later release uses spin_trylock > so unless that is never merged because there is an unfixable flaw in it, > I'd prefer to leave the preparation in place. > > > > + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); > > > + if (pcp) { > > > + free_unref_page_commit(pcp, zone, page, migratetype, order); > > > + pcp_spin_unlock_irqrestore(pcp, flags); > > > + } else { > > > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); > > > + } > > > + pcp_trylock_finish(UP_flags); > > > } > > > > > > /* > > > > As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just > > bypass patch #5. Especially as its intent isn't true anymore: > > > > "As preparation for dealing with both of those problems, protect the lists > > with a spinlock. The IRQ-unsafe version of the lock is used because IRQs > > are already disabled by local_lock_irqsave. spin_trylock is used in > > preparation for a time when local_lock could be used instead of > > lock_lock_irqsave." > > > > It's still true, the patch just isn't included as I wanted them to be > separated by time so a bisection that points to it is "obvious" instead > of pointing at the whole series as being a potential problem. Understood, I jumped straight into the code and missed your comment in the cover letter. Thanks!
Greeting, FYI, we noticed the following commit (built with gcc-11): commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net in testcase: kernel-selftests version: kernel-selftests-x86_64-a10a197d-1_20220626 with following parameters: sc_nr_hugepages: 2 group: vm ucode: 0x500320a test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel. test-url: https://www.kernel.org/doc/Documentation/kselftest.txt on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag Reported-by: kernel test robot <oliver.sang@intel.com> [ 202.339609][T27281] BUG: sleeping function called from invalid context at mm/gup.c:1170 [ 202.339615][T27281] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27281, name: compaction_test [ 202.339617][T27281] preempt_count: 1, expected: 0 [ 202.339619][T27281] 1 lock held by compaction_test/27281: [202.339622][T27281] #0: ffff88911e087828 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:118 mm/gup.c:1611) [ 202.339637][T27281] CPU: 78 PID: 27281 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 202.339641][T27281] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 202.339643][T27281] Call Trace: [ 202.339645][T27281] <TASK> [202.339650][T27281] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [202.339657][T27281] __might_resched.cold (kernel/sched/core.c:9792) [202.339668][T27281] __get_user_pages (include/linux/sched.h:2059 mm/gup.c:1170) [202.339682][T27281] ? get_gate_page (mm/gup.c:1099) [202.339697][T27281] ? rwsem_down_read_slowpath (kernel/locking/rwsem.c:1487) [202.339709][T27281] populate_vma_page_range (mm/gup.c:1518) [202.339715][T27281] __mm_populate (mm/gup.c:1639) [202.339720][T27281] ? faultin_vma_page_range (mm/gup.c:1595) [202.339726][T27281] ? __up_write (arch/x86/include/asm/atomic64_64.h:172 (discriminator 23) include/linux/atomic/atomic-long.h:95 (discriminator 23) include/linux/atomic/atomic-instrumented.h:1348 (discriminator 23) kernel/locking/rwsem.c:1346 (discriminator 23)) [202.339736][T27281] vm_mmap_pgoff (include/linux/mm.h:2706 mm/util.c:557) [202.339745][T27281] ? randomize_page (mm/util.c:542) [202.339753][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [202.339757][T27281] ? syscall_enter_from_user_mode (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 kernel/entry/common.c:109) [202.339768][T27281] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [202.339779][T27281] ? __local_bh_enable (kernel/softirq.c:357) [202.339785][T27281] ? __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600) [202.339795][T27281] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309) [202.339802][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [202.339806][T27281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) [ 202.339810][T27281] RIP: 0033:0x7fdb25ea1b62 [ 202.339814][T27281] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 All code ======== 0: e4 e8 in $0xe8,%al 2: b2 4b mov $0x4b,%dl 4: 01 00 add %eax,(%rax) 6: 66 90 xchg %ax,%ax 8: 41 f7 c1 ff 0f 00 00 test $0xfff,%r9d f: 75 27 jne 0x38 11: 55 push %rbp 12: 48 89 fd mov %rdi,%rbp 15: 53 push %rbx 16: 89 cb mov %ecx,%ebx 18: 48 85 ff test %rdi,%rdi 1b: 74 3b je 0x58 1d: 41 89 da mov %ebx,%r10d 20: 48 89 ef mov %rbp,%rdi 23: b8 09 00 00 00 mov $0x9,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 66 ja 0x98 32: 5b pop %rbx 33: 5d pop %rbp 34: c3 retq 35: 0f 1f 00 nopl (%rax) 38: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc5338 3f: 64 fs Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 66 ja 0x6e 8: 5b pop %rbx 9: 5d pop %rbp a: c3 retq b: 0f 1f 00 nopl (%rax) e: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc530e 15: 64 fs [ 202.339817][T27281] RSP: 002b:00007ffc53280778 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [ 202.339820][T27281] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007fdb25ea1b62 [ 202.339822][T27281] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 [ 202.339823][T27281] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 [ 202.339825][T27281] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 [ 202.339826][T27281] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 202.339842][T27281] </TASK> [ 202.571229][T27281] BUG: scheduling while atomic: compaction_test/27281/0x00000003 [ 202.571235][T27281] no locks held by compaction_test/27281. [ 202.571236][T27281] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 202.571302][T27281] CPU: 78 PID: 27281 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 202.571305][T27281] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 202.571307][T27281] Call Trace: [ 202.571309][T27281] <TASK> [202.571313][T27281] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [202.571321][T27281] __schedule_bug.cold (kernel/sched/core.c:5661) [202.571328][T27281] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688) [202.571338][T27281] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324) [202.571348][T27281] ? io_schedule_timeout (kernel/sched/core.c:6310) [202.571352][T27281] ? vm_mmap_pgoff (include/linux/mm.h:2706 mm/util.c:557) [202.571363][T27281] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [202.571368][T27281] exit_to_user_mode_loop (kernel/entry/common.c:159) [202.571374][T27281] exit_to_user_mode_prepare (kernel/entry/common.c:201) [202.571377][T27281] syscall_exit_to_user_mode (kernel/entry/common.c:128 kernel/entry/common.c:296) [202.571383][T27281] do_syscall_64 (arch/x86/entry/common.c:87) [202.571387][T27281] ? __local_bh_enable (kernel/softirq.c:357) [202.571392][T27281] ? __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600) [202.571400][T27281] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309) [202.571407][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [202.571412][T27281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) [ 202.571416][T27281] RIP: 0033:0x7fdb25ea1b62 [ 202.571421][T27281] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 All code ======== 0: e4 e8 in $0xe8,%al 2: b2 4b mov $0x4b,%dl 4: 01 00 add %eax,(%rax) 6: 66 90 xchg %ax,%ax 8: 41 f7 c1 ff 0f 00 00 test $0xfff,%r9d f: 75 27 jne 0x38 11: 55 push %rbp 12: 48 89 fd mov %rdi,%rbp 15: 53 push %rbx 16: 89 cb mov %ecx,%ebx 18: 48 85 ff test %rdi,%rdi 1b: 74 3b je 0x58 1d: 41 89 da mov %ebx,%r10d 20: 48 89 ef mov %rbp,%rdi 23: b8 09 00 00 00 mov $0x9,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 66 ja 0x98 32: 5b pop %rbx 33: 5d pop %rbp 34: c3 retq 35: 0f 1f 00 nopl (%rax) 38: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc5338 3f: 64 fs Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 66 ja 0x6e 8: 5b pop %rbx 9: 5d pop %rbp a: c3 retq b: 0f 1f 00 nopl (%rax) e: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc530e 15: 64 fs [ 202.571423][T27281] RSP: 002b:00007ffc53280778 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [ 202.571426][T27281] RAX: 00007fcc735a6000 RBX: 0000000000002022 RCX: 00007fdb25ea1b62 [ 202.571428][T27281] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 [ 202.571429][T27281] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 [ 202.571431][T27281] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 [ 202.571432][T27281] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 202.571446][T27281] </TASK> [ 215.004337][ T1122] [ 228.735493][ T1122] [ 242.528575][ T1122] [ 256.379123][ T1122] [ 269.551898][ T569] BUG: sleeping function called from invalid context at mm/migrate.c:1380 [ 269.551906][ T569] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 569, name: kcompactd1 [ 269.551909][ T569] preempt_count: 1, expected: 0 [ 269.551912][ T569] no locks held by kcompactd1/569. [ 269.551916][ T569] CPU: 72 PID: 569 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 269.551921][ T569] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 269.551924][ T569] Call Trace: [ 269.551926][ T569] <TASK> [ 269.551934][ T569] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 269.551945][ T569] __might_resched.cold (kernel/sched/core.c:9792) [ 269.551958][ T569] migrate_pages (include/linux/sched.h:2059 mm/migrate.c:1380) [ 269.551971][ T569] ? isolate_freepages (mm/compaction.c:1687) [ 269.551978][ T569] ? split_map_pages (mm/compaction.c:1711) [ 269.551994][ T569] ? buffer_migrate_page_norefs (mm/migrate.c:1345) [ 269.552002][ T569] ? isolate_migratepages (mm/compaction.c:1959) [ 269.552023][ T569] compact_zone (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/compaction.h:68 mm/compaction.c:2419) [ 269.552054][ T569] ? compaction_suitable (mm/compaction.c:2292) [ 269.552063][ T569] ? lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5667 kernel/locking/lockdep.c:5630) [ 269.552069][ T569] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407) [ 269.552082][ T569] proactive_compact_node (mm/compaction.c:2660 (discriminator 2)) [ 269.552089][ T569] ? compact_store (mm/compaction.c:2648) [ 269.552115][ T569] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [ 269.552121][ T569] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [ 269.552134][ T569] kcompactd (mm/compaction.c:2011 mm/compaction.c:2031 mm/compaction.c:2978) [ 269.552152][ T569] ? kcompactd_do_work (mm/compaction.c:2924) [ 269.552161][ T569] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) [ 269.552174][ T569] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) [ 269.552178][ T569] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [ 269.552183][ T569] ? kcompactd_do_work (mm/compaction.c:2924) [ 269.552193][ T569] kthread (kernel/kthread.c:376) [ 269.552196][ T569] ? kthread_complete_and_exit (kernel/kthread.c:331) [ 269.552206][ T569] ret_from_fork (arch/x86/entry/entry_64.S:302) [ 269.552235][ T569] </TASK> [ 269.961505][ T568] BUG: scheduling while atomic: kcompactd0/568/0x00000028 [ 269.961512][ T568] no locks held by kcompactd0/568. [ 269.961514][ T568] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 269.961581][ T568] CPU: 13 PID: 568 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 269.961585][ T568] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 269.961587][ T568] Call Trace: [ 269.961589][ T568] <TASK> [ 269.961596][ T568] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 269.961606][ T568] __schedule_bug.cold (kernel/sched/core.c:5661) [ 269.961615][ T568] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688) [ 269.961625][ T568] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324) [ 269.961637][ T568] ? io_schedule_timeout (kernel/sched/core.c:6310) [ 269.961641][ T568] ? find_held_lock (kernel/locking/lockdep.c:5156) [ 269.961647][ T568] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15)) [ 269.961657][ T568] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [ 269.961662][ T568] schedule_timeout (kernel/time/timer.c:1936) [ 269.961668][ T568] ? usleep_range_state (kernel/time/timer.c:1897) [ 269.961673][ T568] ? timer_migration_handler (kernel/time/timer.c:1859) [ 269.961682][ T568] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [ 269.961687][ T568] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15)) [ 269.961695][ T568] kcompactd (include/linux/freezer.h:121 include/linux/freezer.h:193 mm/compaction.c:2950) [ 269.961707][ T568] ? kcompactd_do_work (mm/compaction.c:2924) [ 269.961713][ T568] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) [ 269.961720][ T568] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) [ 269.961724][ T568] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [ 269.961727][ T568] ? kcompactd_do_work (mm/compaction.c:2924) [ 269.961732][ T568] kthread (kernel/kthread.c:376) [ 269.961735][ T568] ? kthread_complete_and_exit (kernel/kthread.c:331) [ 269.961741][ T568] ret_from_fork (arch/x86/entry/entry_64.S:302) [ 269.961758][ T568] </TASK> [ 270.347843][ T569] BUG: scheduling while atomic: kcompactd1/569/0x00000017 [ 270.347849][ T569] no locks held by kcompactd1/569. [ 270.347851][ T569] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 270.347911][ T569] CPU: 72 PID: 569 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 270.347915][ T569] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 270.347917][ T569] Call Trace: [ 270.347920][ T569] <TASK> [ 270.347926][ T569] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 270.347935][ T569] __schedule_bug.cold (kernel/sched/core.c:5661) [ 270.347944][ T569] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688) [ 270.347955][ T569] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324) [ 270.347967][ T569] ? io_schedule_timeout (kernel/sched/core.c:6310) [ 270.347970][ T569] ? find_held_lock (kernel/locking/lockdep.c:5156) [ 270.347977][ T569] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15)) [ 270.347987][ T569] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [ 270.347993][ T569] schedule_timeout (kernel/time/timer.c:1936) [ 270.347999][ T569] ? usleep_range_state (kernel/time/timer.c:1897) [ 270.348004][ T569] ? timer_migration_handler (kernel/time/timer.c:1859) [ 270.348013][ T569] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [ 270.348018][ T569] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15)) [ 270.348025][ T569] kcompactd (include/linux/freezer.h:121 include/linux/freezer.h:193 mm/compaction.c:2950) [ 270.348040][ T569] ? kcompactd_do_work (mm/compaction.c:2924) [ 270.348045][ T569] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) [ 270.348053][ T569] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) [ 270.348057][ T569] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1)) [ 270.348059][ T569] ? kcompactd_do_work (mm/compaction.c:2924) [ 270.348065][ T569] kthread (kernel/kthread.c:376) [ 270.348068][ T569] ? kthread_complete_and_exit (kernel/kthread.c:331) [ 270.348073][ T569] ret_from_fork (arch/x86/entry/entry_64.S:302) [ 270.348092][ T569] </TASK> [ 270.616627][ T1122] [ 270.768074][T27574] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 [ 270.768078][T27574] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27574, name: date [ 270.768080][T27574] preempt_count: 1, expected: 0 [ 270.768082][T27574] 1 lock held by date/27574: [270.768084][T27574] #0: ffff88820bd53228 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:137 arch/x86/mm/fault.c:1338) [ 270.768098][T27574] CPU: 4 PID: 27574 Comm: date Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 270.768101][T27574] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 270.768103][T27574] Call Trace: [ 270.768104][T27574] <TASK> [270.768108][T27574] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [270.768113][T27574] __might_resched.cold (kernel/sched/core.c:9792) [270.768120][T27574] ? __pmd_alloc (mm/memory.c:5763 include/linux/mm.h:2304 include/linux/mm.h:2390 include/linux/mm.h:2426 include/asm-generic/pgalloc.h:129 mm/memory.c:5214) [270.768125][T27574] kmem_cache_alloc (include/linux/sched/mm.h:274 mm/slab.h:723 mm/slub.c:3128 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239) [270.768137][T27574] __pmd_alloc (mm/memory.c:5763 include/linux/mm.h:2304 include/linux/mm.h:2390 include/linux/mm.h:2426 include/asm-generic/pgalloc.h:129 mm/memory.c:5214) [270.768144][T27574] __handle_mm_fault (include/linux/mm.h:2254 mm/memory.c:5003) [270.768155][T27574] ? copy_page_range (mm/memory.c:4955) [270.768159][T27574] ? __lock_release (kernel/locking/lockdep.c:5341) [270.768172][T27574] ? lock_is_held_type (kernel/locking/lockdep.c:5406 kernel/locking/lockdep.c:5708) [270.768181][T27574] ? handle_mm_fault (include/linux/rcupdate.h:274 include/linux/rcupdate.h:728 include/linux/memcontrol.h:1087 include/linux/memcontrol.h:1075 mm/memory.c:5120) [270.768188][T27574] handle_mm_fault (mm/memory.c:5140) [270.768195][T27574] do_user_addr_fault (arch/x86/mm/fault.c:1397) [270.768206][T27574] exc_page_fault (arch/x86/include/asm/irqflags.h:29 arch/x86/include/asm/irqflags.h:70 arch/x86/include/asm/irqflags.h:130 arch/x86/mm/fault.c:1492 arch/x86/mm/fault.c:1540) [270.768211][T27574] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570) [270.768215][T27574] RIP: 0010:__clear_user (arch/x86/lib/usercopy_64.c:24) [ 270.768220][T27574] Code: 00 00 00 e8 a2 28 56 ff 0f 01 cb 48 89 d8 48 c1 eb 03 48 89 ef 83 e0 07 48 89 d9 48 85 c9 74 19 66 2e 0f 1f 84 00 00 00 00 00 <48> c7 07 00 00 00 00 48 83 c7 08 ff c9 75 f1 48 89 c1 85 c9 74 0a All code ======== 0: 00 00 add %al,(%rax) 2: 00 e8 add %ch,%al 4: a2 28 56 ff 0f 01 cb movabs %al,0x8948cb010fff5628 b: 48 89 d: d8 48 c1 fmuls -0x3f(%rax) 10: eb 03 jmp 0x15 12: 48 89 ef mov %rbp,%rdi 15: 83 e0 07 and $0x7,%eax 18: 48 89 d9 mov %rbx,%rcx 1b: 48 85 c9 test %rcx,%rcx 1e: 74 19 je 0x39 20: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 27: 00 00 00 2a:* 48 c7 07 00 00 00 00 movq $0x0,(%rdi) <-- trapping instruction 31: 48 83 c7 08 add $0x8,%rdi 35: ff c9 dec %ecx 37: 75 f1 jne 0x2a 39: 48 89 c1 mov %rax,%rcx 3c: 85 c9 test %ecx,%ecx 3e: 74 0a je 0x4a Code starting with the faulting instruction =========================================== 0: 48 c7 07 00 00 00 00 movq $0x0,(%rdi) 7: 48 83 c7 08 add $0x8,%rdi b: ff c9 dec %ecx d: 75 f1 jne 0x0 f: 48 89 c1 mov %rax,%rcx 12: 85 c9 test %ecx,%ecx 14: 74 0a je 0x20 [ 270.768223][T27574] RSP: 0018:ffffc900350dfb28 EFLAGS: 00050202 [ 270.768226][T27574] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4 [ 270.768227][T27574] RDX: 0000000000000000 RSI: ffff88820bd53228 RDI: 00005649441d92e0 [ 270.768229][T27574] RBP: 00005649441d92e0 R08: ffff88a0589ec810 R09: ffffffff85f06fa7 [ 270.768231][T27574] R10: fffffbfff0be0df4 R11: 0000000000000001 R12: 0000000000000000 [ 270.768232][T27574] R13: 000000000001c498 R14: 00005649441d92e0 R15: 000000000001c2e0 [270.768249][T27574] ? __clear_user (arch/x86/include/asm/smap.h:39 arch/x86/lib/usercopy_64.c:23) [270.768252][T27574] load_elf_binary (fs/binfmt_elf.c:143 fs/binfmt_elf.c:1244) [270.768279][T27574] ? load_elf_interp+0xa80/0xa80 [270.768285][T27574] ? search_binary_handler (fs/exec.c:1728) [270.768297][T27574] search_binary_handler (fs/exec.c:1728) [270.768302][T27574] ? bprm_change_interp (fs/exec.c:1707) [270.768310][T27574] ? exec_binprm (include/linux/rcupdate.h:274 include/linux/rcupdate.h:728 fs/exec.c:1761) [270.768317][T27574] exec_binprm (fs/exec.c:1770) [270.768325][T27574] bprm_execve (fs/exec.c:1920) [270.768330][T27574] ? bprm_execve (fs/exec.c:1474 fs/exec.c:1806) [270.768336][T27574] do_execveat_common+0x4c7/0x680 [270.768344][T27574] ? getname_flags (fs/namei.c:205) [270.768350][T27574] __x64_sys_execve (fs/exec.c:2088) [270.768356][T27574] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [270.768361][T27574] ? do_user_addr_fault (arch/x86/mm/fault.c:1422) [270.768367][T27574] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309) [270.768374][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [270.768379][T27574] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) [ 270.768384][T27574] RIP: 0033:0x7f1a7a7936c7 [ 270.768390][T27574] Code: Unable to access opcode bytes at RIP 0x7f1a7a79369d. Code starting with the faulting instruction =========================================== [ 270.768392][T27574] RSP: 002b:00007ffe741919f8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b [ 270.768394][T27574] RAX: ffffffffffffffda RBX: 00005643b084c428 RCX: 00007f1a7a7936c7 [ 270.768396][T27574] RDX: 00005643b084ff48 RSI: 00005643b084c428 RDI: 00005643b0850208 [ 270.768397][T27574] RBP: 00005643b079246e R08: 00005643b0792470 R09: 00005643b079247b [ 270.768398][T27574] R10: 000000000000006e R11: 0000000000000246 R12: 00005643b084ff48 [ 270.768400][T27574] R13: 0000000000000002 R14: 00005643b084ff48 R15: 00005643b0850208 [ 270.768415][T27574] </TASK> [ 270.768815][T27574] BUG: scheduling while atomic: date/27574/0x00000002 [ 270.768818][T27574] no locks held by date/27574. [ 270.768819][T27574] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 270.768871][T27574] CPU: 4 PID: 27574 Comm: date Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1 [ 270.768874][T27574] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 270.768876][T27574] Call Trace: [ 270.768878][T27574] <TASK> [270.768881][T27574] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [270.768886][T27574] __schedule_bug.cold (kernel/sched/core.c:5661) [270.768892][T27574] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688) [270.768900][T27574] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324) [270.768907][T27574] ? rwlock_bug+0xc0/0xc0 [270.768913][T27574] ? io_schedule_timeout (kernel/sched/core.c:6310) [270.768919][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [270.768923][T27574] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [270.768931][T27574] do_task_dead (kernel/sched/core.c:6447 (discriminator 4)) [270.768938][T27574] do_exit (include/trace/events/sched.h:333 kernel/exit.c:786) [270.768948][T27574] do_group_exit (kernel/exit.c:906) [270.768955][T27574] get_signal (kernel/signal.c:2857) [270.768965][T27574] ? search_binary_handler (fs/exec.c:1707) [270.768976][T27574] ? ptrace_signal (kernel/signal.c:2627) [270.768980][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [270.768984][T27574] ? kasan_quarantine_put (arch/x86/include/asm/irqflags.h:45 (discriminator 1) arch/x86/include/asm/irqflags.h:80 (discriminator 1) arch/x86/include/asm/irqflags.h:138 (discriminator 1) mm/kasan/quarantine.c:242 (discriminator 1)) [270.768988][T27574] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22)) [270.768998][T27574] arch_do_signal_or_restart (arch/x86/kernel/signal.c:869) [270.769004][T27574] ? get_sigframe_size (arch/x86/kernel/signal.c:866) [270.769009][T27574] ? do_execveat_common+0x1c0/0x680 [270.769022][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [270.769029][T27574] exit_to_user_mode_loop (kernel/entry/common.c:168) [270.769035][T27574] exit_to_user_mode_prepare (kernel/entry/common.c:201) [270.769039][T27574] syscall_exit_to_user_mode (kernel/entry/common.c:128 kernel/entry/common.c:296) [270.769044][T27574] do_syscall_64 (arch/x86/entry/common.c:87) [270.769050][T27574] ? do_user_addr_fault (arch/x86/mm/fault.c:1422) [270.769057][T27574] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309) [270.769064][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) [270.769069][T27574] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) [ 270.769073][T27574] RIP: 0033:0x7f1a7a7936c7 [ 270.769076][T27574] Code: Unable to access opcode bytes at RIP 0x7f1a7a79369d. Code starting with the faulting instruction =========================================== [ 270.769077][T27574] RSP: 002b:00007ffe741919f8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b [ 270.769080][T27574] RAX: fffffffffffffff2 RBX: 00005643b084c428 RCX: 00007f1a7a7936c7 [ 270.769082][T27574] RDX: 00005643b084ff48 RSI: 00005643b084c428 RDI: 00005643b0850208 [ 270.769083][T27574] RBP: 00005643b079246e R08: 00005643b0792470 R09: 00005643b079247b [ 270.769084][T27574] R10: 000000000000006e R11: 0000000000000246 R12: 00005643b084ff48 [ 270.769086][T27574] R13: 0000000000000002 R14: 00005643b084ff48 R15: 00005643b0850208 [ 270.769100][T27574] </TASK> [ 271.701080][ T1124] Segmentation fault [ 271.701094][ T1124] [ 284.402869][ T1122] To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > FYI, we noticed the following commit (built with gcc-11): > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > Did this test include the followup patch mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? From: Mel Gorman <mgorman@techsingularity.net> Subject: mm/page_alloc: replace local_lock with normal spinlock -fix Date: Mon, 27 Jun 2022 09:46:45 +0100 As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch mm-page_alloc-replace-local_lock-with-normal-spinlock.patch Link: https://lkml.kernel.org/r/20220627084645.GA27531@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix +++ a/mm/page_alloc.c @@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page, zone = page_zone(page); pcp_trylock_prepare(UP_flags); - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); if (pcp) { free_unref_page_commit(zone, pcp, page, migratetype, order); pcp_spin_unlock_irqrestore(pcp, flags);
Hi Andrew Morton, On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > FYI, we noticed the following commit (built with gcc-11): > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > Did this test include the followup patch > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? no, we just fetched original patch set and test upon it. now we applied the patch you pointed to us upon 2bd8eec68f and found the issue still exist. (attached dmesg FYI) [ 204.416449][T27283] BUG: sleeping function called from invalid context at mm/gup.c:1170 [ 204.416455][T27283] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27283, name: compaction_test [ 204.416457][T27283] preempt_count: 1, expected: 0 [ 204.416460][T27283] 1 lock held by compaction_test/27283: [ 204.416462][T27283] #0: ffff88918df83928 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate+0x1d0/0x300 [ 204.416477][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 204.416481][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 204.416483][T27283] Call Trace: [ 204.416485][T27283] <TASK> [ 204.416489][T27283] dump_stack_lvl+0x45/0x59 [ 204.416497][T27283] __might_resched.cold+0x15e/0x190 [ 204.416508][T27283] __get_user_pages+0x274/0x6c0 [ 204.416522][T27283] ? get_gate_page+0x640/0x640 [ 204.416538][T27283] ? rwsem_down_read_slowpath+0xb80/0xb80 [ 204.416548][T27283] populate_vma_page_range+0xd7/0x140 [ 204.416554][T27283] __mm_populate+0x178/0x300 [ 204.416560][T27283] ? faultin_vma_page_range+0x100/0x100 [ 204.416566][T27283] ? __up_write+0x13a/0x480 [ 204.416575][T27283] vm_mmap_pgoff+0x1a7/0x240 [ 204.416584][T27283] ? randomize_page+0x80/0x80 [ 204.416586][T27283] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 204.416595][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 204.416600][T27283] ? syscall_enter_from_user_mode+0x21/0x80 [ 204.416609][T27283] do_syscall_64+0x59/0x80 [ 204.416617][T27283] ? irqentry_exit_to_user_mode+0xa/0x40 [ 204.416624][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 204.416629][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 204.416633][T27283] RIP: 0033:0x7f10e01e2b62 [ 204.416637][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 [ 204.416639][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [ 204.416642][T27283] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007f10e01e2b62 [ 204.416645][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 [ 204.416646][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 [ 204.416648][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 [ 204.416649][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 204.416666][T27283] </TASK> [ 204.690617][T27283] BUG: scheduling while atomic: compaction_test/27283/0x00000004 [ 204.690624][T27283] no locks held by compaction_test/27283. [ 204.690625][T27283] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 204.690688][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 204.690691][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 204.690694][T27283] Call Trace: [ 204.690695][T27283] <TASK> [ 204.690700][T27283] dump_stack_lvl+0x45/0x59 [ 204.690707][T27283] __schedule_bug.cold+0xcf/0xe0 [ 204.690714][T27283] schedule_debug+0x274/0x300 [ 204.690724][T27283] __schedule+0xf5/0x1740 [ 204.690733][T27283] ? io_schedule_timeout+0x180/0x180 [ 204.690737][T27283] ? vm_mmap_pgoff+0x1a7/0x240 [ 204.690748][T27283] schedule+0xea/0x240 [ 204.690753][T27283] exit_to_user_mode_loop+0x79/0x140 [ 204.690759][T27283] exit_to_user_mode_prepare+0xfc/0x180 [ 204.690762][T27283] syscall_exit_to_user_mode+0x19/0x80 [ 204.690768][T27283] do_syscall_64+0x69/0x80 [ 204.690773][T27283] ? __local_bh_enable+0x7a/0xc0 [ 204.690777][T27283] ? __do_softirq+0x52c/0x865 [ 204.690786][T27283] ? irqentry_exit_to_user_mode+0xa/0x40 [ 204.690792][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 204.690798][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 204.690802][T27283] RIP: 0033:0x7f10e01e2b62 [ 204.690806][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 [ 204.690808][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [ 204.690811][T27283] RAX: 00007f022d8e7000 RBX: 0000000000002022 RCX: 00007f10e01e2b62 [ 204.690813][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 [ 204.690814][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 [ 204.690815][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 [ 204.690817][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 204.690830][T27283] </TASK> [ 216.734914][ T1147] [ 230.207563][ T1147] [ 244.124530][ T1147] [ 257.808775][ T1147] [ 271.803313][ T1147] [ 272.181098][ T563] BUG: sleeping function called from invalid context at mm/migrate.c:1380 [ 272.181104][ T563] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 563, name: kcompactd0 [ 272.181107][ T563] preempt_count: 1, expected: 0 [ 272.181109][ T563] no locks held by kcompactd0/563. [ 272.181112][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 272.181115][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 272.181117][ T563] Call Trace: [ 272.181119][ T563] <TASK> [ 272.181124][ T563] dump_stack_lvl+0x45/0x59 [ 272.181133][ T563] __might_resched.cold+0x15e/0x190 [ 272.181143][ T563] migrate_pages+0x2b1/0x1200 [ 272.181152][ T563] ? isolate_freepages+0x880/0x880 [ 272.181158][ T563] ? split_map_pages+0x4c0/0x4c0 [ 272.181167][ T563] ? buffer_migrate_page_norefs+0x40/0x40 [ 272.181172][ T563] ? isolate_migratepages+0x300/0x6c0 [ 272.181183][ T563] compact_zone+0xa3f/0x1640 [ 272.181200][ T563] ? compaction_suitable+0x200/0x200 [ 272.181205][ T563] ? lock_acquire+0x194/0x500 [ 272.181211][ T563] ? finish_wait+0xc5/0x280 [ 272.181220][ T563] proactive_compact_node+0xeb/0x180 [ 272.181224][ T563] ? compact_store+0xc0/0xc0 [ 272.181239][ T563] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 272.181242][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.181252][ T563] kcompactd+0x500/0xc80 [ 272.181262][ T563] ? kcompactd_do_work+0x540/0x540 [ 272.181268][ T563] ? prepare_to_swait_exclusive+0x240/0x240 [ 272.181275][ T563] ? __kthread_parkme+0xd9/0x200 [ 272.181278][ T563] ? schedule+0xfe/0x240 [ 272.181282][ T563] ? kcompactd_do_work+0x540/0x540 [ 272.181288][ T563] kthread+0x28f/0x340 [ 272.181290][ T563] ? kthread_complete_and_exit+0x40/0x40 [ 272.181295][ T563] ret_from_fork+0x1f/0x30 [ 272.181313][ T563] </TASK> [ 272.295259][ T2111] meminfo[2111]: segfault at 7ffc6e0e55e8 ip 00007fbdf6db8580 sp 00007ffc6e0e55f0 error 7 in libc-2.31.so[7fbdf6d12000+14b000] [ 272.295314][ T2111] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00 00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8 [ 272.296053][ T2111] BUG: scheduling while atomic: meminfo/2111/0x00000002 [ 272.296056][ T2111] no locks held by meminfo/2111. [ 272.296058][ T2111] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 272.296121][ T2111] CPU: 20 PID: 2111 Comm: meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 272.296125][ T2111] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 272.296127][ T2111] Call Trace: [ 272.296128][ T2111] <TASK> [ 272.296132][ T2111] dump_stack_lvl+0x45/0x59 [ 272.296141][ T2111] __schedule_bug.cold+0xcf/0xe0 [ 272.296150][ T2111] schedule_debug+0x274/0x300 [ 272.296160][ T2111] __schedule+0xf5/0x1740 [ 272.296169][ T2111] ? rwlock_bug+0xc0/0xc0 [ 272.296176][ T2111] ? io_schedule_timeout+0x180/0x180 [ 272.296181][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 272.296185][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.296194][ T2111] do_task_dead+0xda/0x140 [ 272.296200][ T2111] do_exit+0x6a7/0xac0 [ 272.296210][ T2111] do_group_exit+0xb7/0x2c0 [ 272.296216][ T2111] get_signal+0x1b13/0x1cc0 [ 272.296226][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.296230][ T2111] ? force_sig_info_to_task+0x30d/0x500 [ 272.296234][ T2111] ? ptrace_signal+0x700/0x700 [ 272.296245][ T2111] arch_do_signal_or_restart+0x77/0x300 [ 272.296252][ T2111] ? get_sigframe_size+0x40/0x40 [ 272.296257][ T2111] ? show_opcodes.cold+0x1c/0x21 [ 272.296270][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 272.296277][ T2111] exit_to_user_mode_loop+0xac/0x140 [ 272.296282][ T2111] exit_to_user_mode_prepare+0xfc/0x180 [ 272.296286][ T2111] irqentry_exit_to_user_mode+0x5/0x40 [ 272.296291][ T2111] asm_exc_page_fault+0x27/0x30 [ 272.296293][ T2111] RIP: 0033:0x7fbdf6db8580 [ 272.296297][ T2111] Code: Unable to access opcode bytes at RIP 0x7fbdf6db8556. [ 272.296299][ T2111] RSP: 002b:00007ffc6e0e55f0 EFLAGS: 00010246 [ 272.296301][ T2111] RAX: 0000000000006bb3 RBX: 00007ffc6e0e56d0 RCX: 00007fbdf6db84bb [ 272.296303][ T2111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [ 272.296305][ T2111] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fbdf6cea740 [ 272.296306][ T2111] R10: 00007fbdf6ceaa10 R11: 0000000000000246 R12: 0000000000000000 [ 272.296308][ T2111] R13: 0000000000006bb3 R14: 00005563332b3908 R15: 00007ffc6e0e56b0 [ 272.296323][ T2111] </TASK> [ 272.296514][ T2150] gzip-meminfo[2150]: segfault at 7fd637199670 ip 00007fd637199670 sp 00007fffd9088698 error 14 in libc-2.31.so[7fd6370f3000+14b000 ] [ 272.296560][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646. [ 272.297682][ T2150] BUG: scheduling while atomic: gzip-meminfo/2150/0x00000002 [ 272.297686][ T2150] no locks held by gzip-meminfo/2150. [ 272.297687][ T2150] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 272.297746][ T2150] CPU: 45 PID: 2150 Comm: gzip-meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 272.297749][ T2150] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 272.297751][ T2150] Call Trace: [ 272.297752][ T2150] <TASK> [ 272.297756][ T2150] dump_stack_lvl+0x45/0x59 [ 272.297762][ T2150] __schedule_bug.cold+0xcf/0xe0 [ 272.297768][ T2150] schedule_debug+0x274/0x300 [ 272.297775][ T2150] __schedule+0xf5/0x1740 [ 272.297783][ T2150] ? rwlock_bug+0xc0/0xc0 [ 272.297788][ T2150] ? io_schedule_timeout+0x180/0x180 [ 272.297794][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 272.297797][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.297806][ T2150] do_task_dead+0xda/0x140 [ 272.297811][ T2150] do_exit+0x6a7/0xac0 [ 272.297819][ T2150] do_group_exit+0xb7/0x2c0 [ 272.297825][ T2150] get_signal+0x1b13/0x1cc0 [ 272.297833][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.297838][ T2150] ? force_sig_info_to_task+0x30d/0x500 [ 272.297842][ T2150] ? ptrace_signal+0x700/0x700 [ 272.297854][ T2150] arch_do_signal_or_restart+0x77/0x300 [ 272.297859][ T2150] ? get_sigframe_size+0x40/0x40 [ 272.297864][ T2150] ? show_opcodes+0x97/0xc0 [ 272.297876][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 272.297883][ T2150] exit_to_user_mode_loop+0xac/0x140 [ 272.297887][ T2150] exit_to_user_mode_prepare+0xfc/0x180 [ 272.297890][ T2150] irqentry_exit_to_user_mode+0x5/0x40 [ 272.297894][ T2150] asm_exc_page_fault+0x27/0x30 [ 272.297897][ T2150] RIP: 0033:0x7fd637199670 [ 272.297900][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646. [ 272.297901][ T2150] RSP: 002b:00007fffd9088698 EFLAGS: 00010246 [ 272.297904][ T2150] RAX: 0000000000000000 RBX: 00007fd63728e610 RCX: 0000000000000000 [ 272.297905][ T2150] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 272.297906][ T2150] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000001 [ 272.297908][ T2150] R10: fffffffffffff287 R11: 00007fd63710c660 R12: 00007fd63728e610 [ 272.297909][ T2150] R13: 0000000000000001 R14: 00007fd63728eae8 R15: 0000000000000000 [ 272.297923][ T2150] </TASK> [ 272.340352][ T563] BUG: scheduling while atomic: kcompactd0/563/0x0000004d [ 272.340356][ T563] no locks held by kcompactd0/563. [ 272.340357][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 272.340433][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 272.340437][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 272.340438][ T563] Call Trace: [ 272.340440][ T563] <TASK> [ 272.340444][ T563] dump_stack_lvl+0x45/0x59 [ 272.340451][ T563] __schedule_bug.cold+0xcf/0xe0 [ 272.340459][ T563] schedule_debug+0x274/0x300 [ 272.340467][ T563] __schedule+0xf5/0x1740 [ 272.340477][ T563] ? io_schedule_timeout+0x180/0x180 [ 272.340481][ T563] ? find_held_lock+0x2c/0x140 [ 272.340486][ T563] ? prepare_to_wait_event+0xcd/0x6c0 [ 272.340496][ T563] schedule+0xea/0x240 [ 272.340501][ T563] schedule_timeout+0x11b/0x240 [ 272.340507][ T563] ? usleep_range_state+0x180/0x180 [ 272.340512][ T563] ? timer_migration_handler+0xc0/0xc0 [ 272.340520][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.340525][ T563] ? prepare_to_wait_event+0xcd/0x6c0 [ 272.340540][ T563] kcompactd+0x870/0xc80 [ 272.340554][ T563] ? kcompactd_do_work+0x540/0x540 [ 272.340560][ T563] ? prepare_to_swait_exclusive+0x240/0x240 [ 272.340567][ T563] ? __kthread_parkme+0xd9/0x200 [ 272.340571][ T563] ? schedule+0xfe/0x240 [ 272.340574][ T563] ? kcompactd_do_work+0x540/0x540 [ 272.340579][ T563] kthread+0x28f/0x340 [ 272.340582][ T563] ? kthread_complete_and_exit+0x40/0x40 [ 272.340588][ T563] ret_from_fork+0x1f/0x30 [ 272.340605][ T563] </TASK> [ 272.799216][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000027 [ 272.799222][ T564] no locks held by kcompactd1/564. [ 272.799224][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 272.799283][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 272.799287][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 272.799289][ T564] Call Trace: [ 272.799292][ T564] <TASK> [ 272.799299][ T564] dump_stack_lvl+0x45/0x59 [ 272.799309][ T564] __schedule_bug.cold+0xcf/0xe0 [ 272.799318][ T564] schedule_debug+0x274/0x300 [ 272.799329][ T564] __schedule+0xf5/0x1740 [ 272.799341][ T564] ? io_schedule_timeout+0x180/0x180 [ 272.799345][ T564] ? find_held_lock+0x2c/0x140 [ 272.799352][ T564] ? prepare_to_wait_event+0xcd/0x6c0 [ 272.799362][ T564] schedule+0xea/0x240 [ 272.799368][ T564] schedule_timeout+0x11b/0x240 [ 272.799374][ T564] ? usleep_range_state+0x180/0x180 [ 272.799379][ T564] ? timer_migration_handler+0xc0/0xc0 [ 272.799389][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 272.799394][ T564] ? prepare_to_wait_event+0xcd/0x6c0 [ 272.799402][ T564] kcompactd+0x870/0xc80 [ 272.799416][ T564] ? kcompactd_do_work+0x540/0x540 [ 272.799422][ T564] ? prepare_to_swait_exclusive+0x240/0x240 [ 272.799429][ T564] ? __kthread_parkme+0xd9/0x200 [ 272.799433][ T564] ? schedule+0xfe/0x240 [ 272.799436][ T564] ? kcompactd_do_work+0x540/0x540 [ 272.799442][ T564] kthread+0x28f/0x340 [ 272.799445][ T564] ? kthread_complete_and_exit+0x40/0x40 [ 272.799451][ T564] ret_from_fork+0x1f/0x30 [ 272.799469][ T564] </TASK> [ 273.033327][ T563] BUG: scheduling while atomic: kcompactd0/563/0x00000003 [ 273.033331][ T563] no locks held by kcompactd0/563. [ 273.033333][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 273.033428][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 273.033432][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 273.033434][ T563] Call Trace: [ 273.033436][ T563] <TASK> [ 273.033440][ T563] dump_stack_lvl+0x45/0x59 [ 273.033449][ T563] __schedule_bug.cold+0xcf/0xe0 [ 273.033457][ T563] schedule_debug+0x274/0x300 [ 273.033467][ T563] __schedule+0xf5/0x1740 [ 273.033477][ T563] ? io_schedule_timeout+0x180/0x180 [ 273.033481][ T563] ? find_held_lock+0x2c/0x140 [ 273.033487][ T563] ? prepare_to_wait_event+0xcd/0x6c0 [ 273.033498][ T563] schedule+0xea/0x240 [ 273.033503][ T563] schedule_timeout+0x11b/0x240 [ 273.033509][ T563] ? usleep_range_state+0x180/0x180 [ 273.033521][ T563] ? timer_migration_handler+0xc0/0xc0 [ 273.033530][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 273.033535][ T563] ? prepare_to_wait_event+0xcd/0x6c0 [ 273.033543][ T563] kcompactd+0x870/0xc80 [ 273.033557][ T563] ? kcompactd_do_work+0x540/0x540 [ 273.033563][ T563] ? prepare_to_swait_exclusive+0x240/0x240 [ 273.033570][ T563] ? __kthread_parkme+0xd9/0x200 [ 273.033574][ T563] ? schedule+0xfe/0x240 [ 273.033577][ T563] ? kcompactd_do_work+0x540/0x540 [ 273.033582][ T563] kthread+0x28f/0x340 [ 273.033585][ T563] ? kthread_complete_and_exit+0x40/0x40 [ 273.033590][ T563] ret_from_fork+0x1f/0x30 [ 273.033608][ T563] </TASK> [ 273.319687][ T564] BUG: sleeping function called from invalid context at mm/migrate.c:1380 [ 273.319692][ T564] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 564, name: kcompactd1 [ 273.319694][ T564] preempt_count: 1, expected: 0 [ 273.319696][ T564] no locks held by kcompactd1/564. [ 273.319699][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 273.319702][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 273.319704][ T564] Call Trace: [ 273.319707][ T564] <TASK> [ 273.319713][ T564] dump_stack_lvl+0x45/0x59 [ 273.319723][ T564] __might_resched.cold+0x15e/0x190 [ 273.319734][ T564] migrate_pages+0x2b1/0x1200 [ 273.319744][ T564] ? isolate_freepages+0x880/0x880 [ 273.319752][ T564] ? split_map_pages+0x4c0/0x4c0 [ 273.319762][ T564] ? buffer_migrate_page_norefs+0x40/0x40 [ 273.319767][ T564] ? isolate_migratepages+0x300/0x6c0 [ 273.319778][ T564] compact_zone+0xa3f/0x1640 [ 273.319795][ T564] ? compaction_suitable+0x200/0x200 [ 273.319800][ T564] ? lock_acquire+0x194/0x500 [ 273.319807][ T564] ? finish_wait+0xc5/0x280 [ 273.319816][ T564] proactive_compact_node+0xeb/0x180 [ 273.319820][ T564] ? compact_store+0xc0/0xc0 [ 273.319835][ T564] ? lockdep_hardirqs_on_prepare+0x19a/0x380 [ 273.319839][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 273.319850][ T564] kcompactd+0x500/0xc80 [ 273.319860][ T564] ? kcompactd_do_work+0x540/0x540 [ 273.319866][ T564] ? prepare_to_swait_exclusive+0x240/0x240 [ 273.319873][ T564] ? __kthread_parkme+0xd9/0x200 [ 273.319877][ T564] ? schedule+0xfe/0x240 [ 273.319882][ T564] ? kcompactd_do_work+0x540/0x540 [ 273.319888][ T564] kthread+0x28f/0x340 [ 273.319891][ T564] ? kthread_complete_and_exit+0x40/0x40 [ 273.319896][ T564] ret_from_fork+0x1f/0x30 [ 273.319914][ T564] </TASK> [ 273.637490][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000041 [ 273.637496][ T564] no locks held by kcompactd1/564. [ 273.637498][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables [ 273.637556][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 [ 273.637560][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 [ 273.637562][ T564] Call Trace: [ 273.637565][ T564] <TASK> [ 273.637571][ T564] dump_stack_lvl+0x45/0x59 [ 273.637580][ T564] __schedule_bug.cold+0xcf/0xe0 [ 273.637589][ T564] schedule_debug+0x274/0x300 [ 273.637600][ T564] __schedule+0xf5/0x1740 [ 273.637612][ T564] ? io_schedule_timeout+0x180/0x180 [ 273.637616][ T564] ? find_held_lock+0x2c/0x140 [ 273.637622][ T564] ? prepare_to_wait_event+0xcd/0x6c0 [ 273.637633][ T564] schedule+0xea/0x240 [ 273.637638][ T564] schedule_timeout+0x11b/0x240 [ 273.637645][ T564] ? usleep_range_state+0x180/0x180 [ 273.637650][ T564] ? timer_migration_handler+0xc0/0xc0 [ 273.637659][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 273.637664][ T564] ? prepare_to_wait_event+0xcd/0x6c0 [ 273.637671][ T564] kcompactd+0x870/0xc80 [ 273.637687][ T564] ? kcompactd_do_work+0x540/0x540 [ 273.637692][ T564] ? prepare_to_swait_exclusive+0x240/0x240 [ 273.637700][ T564] ? __kthread_parkme+0xd9/0x200 [ 273.637704][ T564] ? schedule+0xfe/0x240 [ 273.637707][ T564] ? kcompactd_do_work+0x540/0x540 [ 273.637713][ T564] kthread+0x28f/0x340 [ 273.637716][ T564] ? kthread_complete_and_exit+0x40/0x40 [ 273.637722][ T564] ret_from_fork+0x1f/0x30 [ 273.637740][ T564] </TASK> [ 285.377624][ T1147] > > > From: Mel Gorman <mgorman@techsingularity.net> > Subject: mm/page_alloc: replace local_lock with normal spinlock -fix > Date: Mon, 27 Jun 2022 09:46:45 +0100 > > As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of > pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch > mm-page_alloc-replace-local_lock-with-normal-spinlock.patch > > Link: https://lkml.kernel.org/r/20220627084645.GA27531@techsingularity.net > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > Reported-by: Yu Zhao <yuzhao@google.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix > +++ a/mm/page_alloc.c > @@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page, > > zone = page_zone(page); > pcp_trylock_prepare(UP_flags); > - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); > if (pcp) { > free_unref_page_commit(zone, pcp, page, migratetype, order); > pcp_spin_unlock_irqrestore(pcp, flags); > _ >
On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > Hi Andrew Morton, > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > Did this test include the followup patch > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > no, we just fetched original patch set and test upon it. > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > still exist. > (attached dmesg FYI) > Thanks Oliver. The trace is odd in that it hits in GUP when the page allocator is no longer active and the context is a syscall. First, is this definitely the first patch the problem occurs? Second, it's possible for IRQs to be enabled and an IRQ delivered before preemption is enabled. It's not clear why that would be a problem other than lacking symmetry or how it could result in the reported BUG but might as well rule it out. This is build tested only diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 934d1b5a5449..d0141e51e613 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -192,14 +192,14 @@ static DEFINE_MUTEX(pcp_batch_high_lock); #define pcpu_spin_unlock(member, ptr) \ ({ \ - spin_unlock(&ptr->member); \ pcpu_task_unpin(); \ + spin_unlock(&ptr->member); \ }) #define pcpu_spin_unlock_irqrestore(member, ptr, flags) \ ({ \ - spin_unlock_irqrestore(&ptr->member, flags); \ pcpu_task_unpin(); \ + spin_unlock_irqrestore(&ptr->member, flags); \ }) /* struct per_cpu_pages specific helpers. */
On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > Hi Andrew Morton, > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > Did this test include the followup patch > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > no, we just fetched original patch set and test upon it. > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > still exist. > > (attached dmesg FYI) > > > > Thanks Oliver. > > The trace is odd in that it hits in GUP when the page allocator is no > longer active and the context is a syscall. First, is this definitely > the first patch the problem occurs? > I tried reproducing this on a 2-socket machine with Xeon Gold Gold 5218R CPUs. It was necessary to set timeouts in both vm/settings and kselftest/runner.sh to avoid timeouts. Testing with a standard config on my original 5.19-rc3 baseline and the baseline b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel config with i915 disabled (would not build) and necessary storage drivers and network drivers enabled (for boot and access). The kernel log shows a bunch of warnings related to USBAN during boot and during some of the tests but otherwise compaction_test completed successfully as well as the other VM tests. Is this always reproducible?
hi, Mel Gorman, On Wed, Jul 06, 2022 at 12:53:29PM +0100, Mel Gorman wrote: > On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > > Hi Andrew Morton, > > > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > > > > Did this test include the followup patch > > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > > > no, we just fetched original patch set and test upon it. > > > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > > still exist. > > > (attached dmesg FYI) > > > > > > > Thanks Oliver. > > > > The trace is odd in that it hits in GUP when the page allocator is no > > longer active and the context is a syscall. First, is this definitely > > the first patch the problem occurs? > > > > I tried reproducing this on a 2-socket machine with Xeon > Gold Gold 5218R CPUs. It was necessary to set timeouts in both > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with > a standard config on my original 5.19-rc3 baseline and the baseline > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel > config with i915 disabled (would not build) and necessary storage drivers > and network drivers enabled (for boot and access). The kernel log shows > a bunch of warnings related to USBAN during boot and during some of the > tests but otherwise compaction_test completed successfully as well as > the other VM tests. > > Is this always reproducible? not always but high rate. we actually also observed other dmesgs stats for both 2bd8eec68f74 and its parent, but those dmesg.BUG:sleeping_function_called_from_invalid_context_at* seem only happen on 2bd8eec68f74 as well as the '-fix' commit. ========================================================================================= compiler/group/kconfig/rootfs/sc_nr_hugepages/tbox_group/testcase/ucode: gcc-11/vm/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/2/lkp-csl-2sp9/kernel-selftests/0x500320a commit: eec0ff5df294 ("mm/page_alloc: Remotely drain per-cpu lists") 2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock") 292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix") eec0ff5df2945d19 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | :20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c :20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h :20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h :20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h :20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c :20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c :20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c :20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c :20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c :20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c :20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic :20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= :20 5% 1:20 0% :21 dmesg.RIP:__clear_user 20:20 0% 20:20 5% 21:21 dmesg.RIP:rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.RIP:sched_clock_tick :20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick :20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:suspicious_RCU_usage 20:20 0% 20:20 5% 21:21 dmesg.boot_failures 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage > > -- > Mel Gorman > SUSE Labs
hi, Mel Gorman, On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > Hi Andrew Morton, > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > Did this test include the followup patch > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > no, we just fetched original patch set and test upon it. > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > still exist. > > (attached dmesg FYI) > > > > Thanks Oliver. > > The trace is odd in that it hits in GUP when the page allocator is no > longer active and the context is a syscall. First, is this definitely > the first patch the problem occurs? > > Second, it's possible for IRQs to be enabled and an IRQ delivered before > preemption is enabled. It's not clear why that would be a problem other > than lacking symmetry or how it could result in the reported BUG but > might as well rule it out. This is build tested only do you want us test below patch? if so, should we apply it upon the patch "mm/page_alloc: Replace local_lock with normal spinlock" or "mm/page_alloc: replace local_lock with normal spinlock -fix"? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 934d1b5a5449..d0141e51e613 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -192,14 +192,14 @@ static DEFINE_MUTEX(pcp_batch_high_lock); > > #define pcpu_spin_unlock(member, ptr) \ > ({ \ > - spin_unlock(&ptr->member); \ > pcpu_task_unpin(); \ > + spin_unlock(&ptr->member); \ > }) > > #define pcpu_spin_unlock_irqrestore(member, ptr, flags) \ > ({ \ > - spin_unlock_irqrestore(&ptr->member, flags); \ > pcpu_task_unpin(); \ > + spin_unlock_irqrestore(&ptr->member, flags); \ > }) > > /* struct per_cpu_pages specific helpers. */ > > >
On Wed, Jul 06, 2022 at 10:21:36PM +0800, Oliver Sang wrote: > > I tried reproducing this on a 2-socket machine with Xeon > > Gold Gold 5218R CPUs. It was necessary to set timeouts in both > > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with > > a standard config on my original 5.19-rc3 baseline and the baseline > > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel > > config with i915 disabled (would not build) and necessary storage drivers > > and network drivers enabled (for boot and access). The kernel log shows > > a bunch of warnings related to USBAN during boot and during some of the > > tests but otherwise compaction_test completed successfully as well as > > the other VM tests. > > > > Is this always reproducible? > > not always but high rate. > we actually also observed other dmesgs stats for both 2bd8eec68f74 and its > parent Ok, it's unclear what the "other dmesg stats" are but given that it happens for the parent. Does 5.19-rc2 (your baseline) have the same messages as 2bd8eec68f74^? Does the kselftests vm suite always pass but sometimes fails with 2bd8eec68f74? > but those dmesg.BUG:sleeping_function_called_from_invalid_context_at* > seem only happen on 2bd8eec68f74 as well as the '-fix' commit. > And roughly how often does it happen? I'm running it in a loop now to see if I can trigger it locally.
On Wed, Jul 06, 2022 at 10:25:30PM +0800, Oliver Sang wrote: > hi, Mel Gorman, > > On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > > Hi Andrew Morton, > > > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > > > > Did this test include the followup patch > > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > > > no, we just fetched original patch set and test upon it. > > > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > > still exist. > > > (attached dmesg FYI) > > > > > > > Thanks Oliver. > > > > The trace is odd in that it hits in GUP when the page allocator is no > > longer active and the context is a syscall. First, is this definitely > > the first patch the problem occurs? > > > > Second, it's possible for IRQs to be enabled and an IRQ delivered before > > preemption is enabled. It's not clear why that would be a problem other > > than lacking symmetry or how it could result in the reported BUG but > > might as well rule it out. This is build tested only > > do you want us test below patch? > if so, should we apply it upon the patch > "mm/page_alloc: Replace local_lock with normal spinlock" > or > "mm/page_alloc: replace local_lock with normal spinlock -fix"? > On top of "mm/page_alloc: replace local_lock with normal spinlock -fix" please. The -fix patch is cosmetic but it'd still be better to test on top. Thanks!
Hi Mel Gorman, On Wed, Jul 06, 2022 at 03:52:41PM +0100, Mel Gorman wrote: > On Wed, Jul 06, 2022 at 10:21:36PM +0800, Oliver Sang wrote: > > > I tried reproducing this on a 2-socket machine with Xeon > > > Gold Gold 5218R CPUs. It was necessary to set timeouts in both > > > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with > > > a standard config on my original 5.19-rc3 baseline and the baseline > > > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel > > > config with i915 disabled (would not build) and necessary storage drivers > > > and network drivers enabled (for boot and access). The kernel log shows > > > a bunch of warnings related to USBAN during boot and during some of the > > > tests but otherwise compaction_test completed successfully as well as > > > the other VM tests. > > > > > > Is this always reproducible? > > > > not always but high rate. > > we actually also observed other dmesgs stats for both 2bd8eec68f74 and its > > parent > > Ok, it's unclear what the "other dmesg stats" are but given that it happens > for the parent. Does 5.19-rc2 (your baseline) have the same messages as > 2bd8eec68f74^? yeah, 5.19-rc2 has similar results as 2bd8eec68f74^ by multi-runs, while 2bd8eec68f74 looks quite similar to '-fix' commit which we applied it as 292baeb4c714. take the 'BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c' we reported as example: v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | | | :31 0% :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c the 'fail:runs' means we observed the issue 'fail' times while running 'runs' times. for v5.19-rc2, " :31", so we run the same jobs upon v5.19-rc2 31 times, but never see this "dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c" for eec0ff5df2 (2bd8eec68f74^), also clean on 20 runs. but for both 2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock") 292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix") it almost can reproduce in half of runs (11 out of 20 runs, 13 out of 31 runs respectively) the full comparison of these 4 commits are as [1] generally, for those dmesg.BUG:sleeping_function_called_from_invalid***, quite clean on v5.19-rc2 2bd8eec68f74^, but have similar rate on 2bd8eec68f74 & 292baeb4c714 but we also obversed other issues, such like "dmesg.RIP:rcu_eqs_exit", almost always happen on all 4 commits (this is what I said 'other dmesg stats', sorry for confusion, and I will avoid using this kind of 'internal' words in the future) > Does the kselftests vm suite always pass but sometimes > fails with 2bd8eec68f74? below is results of kselftests vm suite, so it's really like what you said, sometimes fails with 2bd8eec68f74 (also 292baeb4c714) one example is kernel-selftests.vm.run_vmtests.sh../userfaultfd_anon_20_16 if always pass on v5.19-rc2 and 2bd8eec68f74^ but fail 6 times out of 20 runs on 2bd8eec68f74, and fail 5 times out of 21 runs on 292baeb4c714 but since this rate seems not match with above issues, so not sure if they are related? v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | | | 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.madv_populate.fail 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_a.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_ct_F_0x1_0_19_0x1000.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_u.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_mmap.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_mremap_./huge/huge_mremap.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_shm.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_vmemmap.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugetlb_madvise_./huge/madvise_test.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../map_fixed_noreplace.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../map_hugetlb.pass 31:31 -35% 20:20 -30% 14:20 -20% 16:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_anon_20_16.pass 31:31 -35% 20:20 -30% 14:20 -20% 16:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_hugetlb_256_32.pass 29:31 -32% 19:20 -29% 13:20 -34% 12:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_shmem_20_16.pass 31:31 -35% 20:20 -35% 13:20 -25% 15:21 kernel-selftests.vm.run_vmtests.sh.fail 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.soft-dirty.pass 31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.split_huge_page_test.pass > > > but those dmesg.BUG:sleeping_function_called_from_invalid_context_at* > > seem only happen on 2bd8eec68f74 as well as the '-fix' commit. > > > > And roughly how often does it happen? I'm running it in a loop now to > see if I can trigger it locally. just as above, the 'BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c' has around 50% rate to happen upon 2bd8eec68f74 & 292baeb4c714 BTW, will test that patch you mentioned in another mail later and update you with the results. > > -- > Mel Gorman > SUSE Labs [1] v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | | | :31 0% :20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic :31 0% :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c :31 0% :20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c :31 0% :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h :31 0% :20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h :31 0% :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h :31 0% :20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h :31 0% :20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c :31 0% :20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c :31 0% :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c :31 0% :20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c :31 0% :20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c :31 0% :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c :31 0% :20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c :31 0% :20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c :31 0% :20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic :31 0% :20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= :31 0% :20 5% 1:20 0% :21 dmesg.RIP:__clear_user 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.RIP:rcu_eqs_exit 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.RIP:sched_clock_tick :31 0% :20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick :31 0% :20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:suspicious_RCU_usage 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.boot_failures 11:31 -6% 9:20 -5% 6:20 5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle 11:31 -6% 9:20 -5% 6:20 5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage 29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage
On 7/5/22 15:51, Oliver Sang wrote: > Hi Andrew Morton, > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: >> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: >> >> > FYI, we noticed the following commit (built with gcc-11): >> > >> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") >> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 >> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 >> > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net >> > >> >> Did this test include the followup patch >> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > no, we just fetched original patch set and test upon it. It appears you fetched v4, not v5. I noticed it from the threading of your report that was threaded in the v4 thread, and also the github url: above. In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's missing in the github branch you were testing: https://github.com/intel-lab-lkp/linux/commit/2bd8eec68f740608db5ea58ecff06965228764cb#diff-cef95765dfd76e5f9c9f0faebfa683edf904d0c3de71547ae8c3ea14418c1e38R187 v5 should be fine: https://lore.kernel.org/lkml/20220624125423.6126-1-mgorman@techsingularity.net/ > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > still exist. > (attached dmesg FYI) > > [ 204.416449][T27283] BUG: sleeping function called from invalid context at mm/gup.c:1170 > [ 204.416455][T27283] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27283, name: compaction_test > [ 204.416457][T27283] preempt_count: 1, expected: 0 > [ 204.416460][T27283] 1 lock held by compaction_test/27283: > [ 204.416462][T27283] #0: ffff88918df83928 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate+0x1d0/0x300 > [ 204.416477][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 204.416481][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 204.416483][T27283] Call Trace: > [ 204.416485][T27283] <TASK> > [ 204.416489][T27283] dump_stack_lvl+0x45/0x59 > [ 204.416497][T27283] __might_resched.cold+0x15e/0x190 > [ 204.416508][T27283] __get_user_pages+0x274/0x6c0 > [ 204.416522][T27283] ? get_gate_page+0x640/0x640 > [ 204.416538][T27283] ? rwsem_down_read_slowpath+0xb80/0xb80 > [ 204.416548][T27283] populate_vma_page_range+0xd7/0x140 > [ 204.416554][T27283] __mm_populate+0x178/0x300 > [ 204.416560][T27283] ? faultin_vma_page_range+0x100/0x100 > [ 204.416566][T27283] ? __up_write+0x13a/0x480 > [ 204.416575][T27283] vm_mmap_pgoff+0x1a7/0x240 > [ 204.416584][T27283] ? randomize_page+0x80/0x80 > [ 204.416586][T27283] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 204.416595][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 204.416600][T27283] ? syscall_enter_from_user_mode+0x21/0x80 > [ 204.416609][T27283] do_syscall_64+0x59/0x80 > [ 204.416617][T27283] ? irqentry_exit_to_user_mode+0xa/0x40 > [ 204.416624][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 204.416629][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 204.416633][T27283] RIP: 0033:0x7f10e01e2b62 > [ 204.416637][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f > 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 > [ 204.416639][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 > [ 204.416642][T27283] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007f10e01e2b62 > [ 204.416645][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 > [ 204.416646][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 > [ 204.416648][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 > [ 204.416649][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 204.416666][T27283] </TASK> > [ 204.690617][T27283] BUG: scheduling while atomic: compaction_test/27283/0x00000004 > [ 204.690624][T27283] no locks held by compaction_test/27283. > [ 204.690625][T27283] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 204.690688][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 204.690691][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 204.690694][T27283] Call Trace: > [ 204.690695][T27283] <TASK> > [ 204.690700][T27283] dump_stack_lvl+0x45/0x59 > [ 204.690707][T27283] __schedule_bug.cold+0xcf/0xe0 > [ 204.690714][T27283] schedule_debug+0x274/0x300 > [ 204.690724][T27283] __schedule+0xf5/0x1740 > [ 204.690733][T27283] ? io_schedule_timeout+0x180/0x180 > [ 204.690737][T27283] ? vm_mmap_pgoff+0x1a7/0x240 > [ 204.690748][T27283] schedule+0xea/0x240 > [ 204.690753][T27283] exit_to_user_mode_loop+0x79/0x140 > [ 204.690759][T27283] exit_to_user_mode_prepare+0xfc/0x180 > [ 204.690762][T27283] syscall_exit_to_user_mode+0x19/0x80 > [ 204.690768][T27283] do_syscall_64+0x69/0x80 > [ 204.690773][T27283] ? __local_bh_enable+0x7a/0xc0 > [ 204.690777][T27283] ? __do_softirq+0x52c/0x865 > [ 204.690786][T27283] ? irqentry_exit_to_user_mode+0xa/0x40 > [ 204.690792][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 204.690798][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 204.690802][T27283] RIP: 0033:0x7f10e01e2b62 > [ 204.690806][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f > 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64 > [ 204.690808][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 > [ 204.690811][T27283] RAX: 00007f022d8e7000 RBX: 0000000000002022 RCX: 00007f10e01e2b62 > [ 204.690813][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000 > [ 204.690814][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000 > [ 204.690815][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170 > [ 204.690817][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 204.690830][T27283] </TASK> > [ 216.734914][ T1147] > [ 230.207563][ T1147] > [ 244.124530][ T1147] > [ 257.808775][ T1147] > [ 271.803313][ T1147] > [ 272.181098][ T563] BUG: sleeping function called from invalid context at mm/migrate.c:1380 > [ 272.181104][ T563] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 563, name: kcompactd0 > [ 272.181107][ T563] preempt_count: 1, expected: 0 > [ 272.181109][ T563] no locks held by kcompactd0/563. > [ 272.181112][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 272.181115][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 272.181117][ T563] Call Trace: > [ 272.181119][ T563] <TASK> > [ 272.181124][ T563] dump_stack_lvl+0x45/0x59 > [ 272.181133][ T563] __might_resched.cold+0x15e/0x190 > [ 272.181143][ T563] migrate_pages+0x2b1/0x1200 > [ 272.181152][ T563] ? isolate_freepages+0x880/0x880 > [ 272.181158][ T563] ? split_map_pages+0x4c0/0x4c0 > [ 272.181167][ T563] ? buffer_migrate_page_norefs+0x40/0x40 > [ 272.181172][ T563] ? isolate_migratepages+0x300/0x6c0 > [ 272.181183][ T563] compact_zone+0xa3f/0x1640 > [ 272.181200][ T563] ? compaction_suitable+0x200/0x200 > [ 272.181205][ T563] ? lock_acquire+0x194/0x500 > [ 272.181211][ T563] ? finish_wait+0xc5/0x280 > [ 272.181220][ T563] proactive_compact_node+0xeb/0x180 > [ 272.181224][ T563] ? compact_store+0xc0/0xc0 > [ 272.181239][ T563] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 272.181242][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.181252][ T563] kcompactd+0x500/0xc80 > [ 272.181262][ T563] ? kcompactd_do_work+0x540/0x540 > [ 272.181268][ T563] ? prepare_to_swait_exclusive+0x240/0x240 > [ 272.181275][ T563] ? __kthread_parkme+0xd9/0x200 > [ 272.181278][ T563] ? schedule+0xfe/0x240 > [ 272.181282][ T563] ? kcompactd_do_work+0x540/0x540 > [ 272.181288][ T563] kthread+0x28f/0x340 > [ 272.181290][ T563] ? kthread_complete_and_exit+0x40/0x40 > [ 272.181295][ T563] ret_from_fork+0x1f/0x30 > [ 272.181313][ T563] </TASK> > [ 272.295259][ T2111] meminfo[2111]: segfault at 7ffc6e0e55e8 ip 00007fbdf6db8580 sp 00007ffc6e0e55f0 error 7 in libc-2.31.so[7fbdf6d12000+14b000] > [ 272.295314][ T2111] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00 > 00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8 > [ 272.296053][ T2111] BUG: scheduling while atomic: meminfo/2111/0x00000002 > [ 272.296056][ T2111] no locks held by meminfo/2111. > [ 272.296058][ T2111] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 272.296121][ T2111] CPU: 20 PID: 2111 Comm: meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 272.296125][ T2111] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 272.296127][ T2111] Call Trace: > [ 272.296128][ T2111] <TASK> > [ 272.296132][ T2111] dump_stack_lvl+0x45/0x59 > [ 272.296141][ T2111] __schedule_bug.cold+0xcf/0xe0 > [ 272.296150][ T2111] schedule_debug+0x274/0x300 > [ 272.296160][ T2111] __schedule+0xf5/0x1740 > [ 272.296169][ T2111] ? rwlock_bug+0xc0/0xc0 > [ 272.296176][ T2111] ? io_schedule_timeout+0x180/0x180 > [ 272.296181][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 272.296185][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.296194][ T2111] do_task_dead+0xda/0x140 > [ 272.296200][ T2111] do_exit+0x6a7/0xac0 > [ 272.296210][ T2111] do_group_exit+0xb7/0x2c0 > [ 272.296216][ T2111] get_signal+0x1b13/0x1cc0 > [ 272.296226][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.296230][ T2111] ? force_sig_info_to_task+0x30d/0x500 > [ 272.296234][ T2111] ? ptrace_signal+0x700/0x700 > [ 272.296245][ T2111] arch_do_signal_or_restart+0x77/0x300 > [ 272.296252][ T2111] ? get_sigframe_size+0x40/0x40 > [ 272.296257][ T2111] ? show_opcodes.cold+0x1c/0x21 > [ 272.296270][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 272.296277][ T2111] exit_to_user_mode_loop+0xac/0x140 > [ 272.296282][ T2111] exit_to_user_mode_prepare+0xfc/0x180 > [ 272.296286][ T2111] irqentry_exit_to_user_mode+0x5/0x40 > [ 272.296291][ T2111] asm_exc_page_fault+0x27/0x30 > [ 272.296293][ T2111] RIP: 0033:0x7fbdf6db8580 > [ 272.296297][ T2111] Code: Unable to access opcode bytes at RIP 0x7fbdf6db8556. > [ 272.296299][ T2111] RSP: 002b:00007ffc6e0e55f0 EFLAGS: 00010246 > [ 272.296301][ T2111] RAX: 0000000000006bb3 RBX: 00007ffc6e0e56d0 RCX: 00007fbdf6db84bb > [ 272.296303][ T2111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 > [ 272.296305][ T2111] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fbdf6cea740 > [ 272.296306][ T2111] R10: 00007fbdf6ceaa10 R11: 0000000000000246 R12: 0000000000000000 > [ 272.296308][ T2111] R13: 0000000000006bb3 R14: 00005563332b3908 R15: 00007ffc6e0e56b0 > [ 272.296323][ T2111] </TASK> > [ 272.296514][ T2150] gzip-meminfo[2150]: segfault at 7fd637199670 ip 00007fd637199670 sp 00007fffd9088698 error 14 in libc-2.31.so[7fd6370f3000+14b000 > ] > [ 272.296560][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646. > [ 272.297682][ T2150] BUG: scheduling while atomic: gzip-meminfo/2150/0x00000002 > [ 272.297686][ T2150] no locks held by gzip-meminfo/2150. > [ 272.297687][ T2150] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 272.297746][ T2150] CPU: 45 PID: 2150 Comm: gzip-meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 272.297749][ T2150] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 272.297751][ T2150] Call Trace: > [ 272.297752][ T2150] <TASK> > [ 272.297756][ T2150] dump_stack_lvl+0x45/0x59 > [ 272.297762][ T2150] __schedule_bug.cold+0xcf/0xe0 > [ 272.297768][ T2150] schedule_debug+0x274/0x300 > [ 272.297775][ T2150] __schedule+0xf5/0x1740 > [ 272.297783][ T2150] ? rwlock_bug+0xc0/0xc0 > [ 272.297788][ T2150] ? io_schedule_timeout+0x180/0x180 > [ 272.297794][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 272.297797][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.297806][ T2150] do_task_dead+0xda/0x140 > [ 272.297811][ T2150] do_exit+0x6a7/0xac0 > [ 272.297819][ T2150] do_group_exit+0xb7/0x2c0 > [ 272.297825][ T2150] get_signal+0x1b13/0x1cc0 > [ 272.297833][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.297838][ T2150] ? force_sig_info_to_task+0x30d/0x500 > [ 272.297842][ T2150] ? ptrace_signal+0x700/0x700 > [ 272.297854][ T2150] arch_do_signal_or_restart+0x77/0x300 > [ 272.297859][ T2150] ? get_sigframe_size+0x40/0x40 > [ 272.297864][ T2150] ? show_opcodes+0x97/0xc0 > [ 272.297876][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 272.297883][ T2150] exit_to_user_mode_loop+0xac/0x140 > [ 272.297887][ T2150] exit_to_user_mode_prepare+0xfc/0x180 > [ 272.297890][ T2150] irqentry_exit_to_user_mode+0x5/0x40 > [ 272.297894][ T2150] asm_exc_page_fault+0x27/0x30 > [ 272.297897][ T2150] RIP: 0033:0x7fd637199670 > [ 272.297900][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646. > [ 272.297901][ T2150] RSP: 002b:00007fffd9088698 EFLAGS: 00010246 > [ 272.297904][ T2150] RAX: 0000000000000000 RBX: 00007fd63728e610 RCX: 0000000000000000 > [ 272.297905][ T2150] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 272.297906][ T2150] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000001 > [ 272.297908][ T2150] R10: fffffffffffff287 R11: 00007fd63710c660 R12: 00007fd63728e610 > [ 272.297909][ T2150] R13: 0000000000000001 R14: 00007fd63728eae8 R15: 0000000000000000 > [ 272.297923][ T2150] </TASK> > [ 272.340352][ T563] BUG: scheduling while atomic: kcompactd0/563/0x0000004d > [ 272.340356][ T563] no locks held by kcompactd0/563. > [ 272.340357][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 272.340433][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 272.340437][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 272.340438][ T563] Call Trace: > [ 272.340440][ T563] <TASK> > [ 272.340444][ T563] dump_stack_lvl+0x45/0x59 > [ 272.340451][ T563] __schedule_bug.cold+0xcf/0xe0 > [ 272.340459][ T563] schedule_debug+0x274/0x300 > [ 272.340467][ T563] __schedule+0xf5/0x1740 > [ 272.340477][ T563] ? io_schedule_timeout+0x180/0x180 > [ 272.340481][ T563] ? find_held_lock+0x2c/0x140 > [ 272.340486][ T563] ? prepare_to_wait_event+0xcd/0x6c0 > [ 272.340496][ T563] schedule+0xea/0x240 > [ 272.340501][ T563] schedule_timeout+0x11b/0x240 > [ 272.340507][ T563] ? usleep_range_state+0x180/0x180 > [ 272.340512][ T563] ? timer_migration_handler+0xc0/0xc0 > [ 272.340520][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.340525][ T563] ? prepare_to_wait_event+0xcd/0x6c0 > [ 272.340540][ T563] kcompactd+0x870/0xc80 > [ 272.340554][ T563] ? kcompactd_do_work+0x540/0x540 > [ 272.340560][ T563] ? prepare_to_swait_exclusive+0x240/0x240 > [ 272.340567][ T563] ? __kthread_parkme+0xd9/0x200 > [ 272.340571][ T563] ? schedule+0xfe/0x240 > [ 272.340574][ T563] ? kcompactd_do_work+0x540/0x540 > [ 272.340579][ T563] kthread+0x28f/0x340 > [ 272.340582][ T563] ? kthread_complete_and_exit+0x40/0x40 > [ 272.340588][ T563] ret_from_fork+0x1f/0x30 > [ 272.340605][ T563] </TASK> > [ 272.799216][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000027 > [ 272.799222][ T564] no locks held by kcompactd1/564. > [ 272.799224][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 272.799283][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 272.799287][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 272.799289][ T564] Call Trace: > [ 272.799292][ T564] <TASK> > [ 272.799299][ T564] dump_stack_lvl+0x45/0x59 > [ 272.799309][ T564] __schedule_bug.cold+0xcf/0xe0 > [ 272.799318][ T564] schedule_debug+0x274/0x300 > [ 272.799329][ T564] __schedule+0xf5/0x1740 > [ 272.799341][ T564] ? io_schedule_timeout+0x180/0x180 > [ 272.799345][ T564] ? find_held_lock+0x2c/0x140 > [ 272.799352][ T564] ? prepare_to_wait_event+0xcd/0x6c0 > [ 272.799362][ T564] schedule+0xea/0x240 > [ 272.799368][ T564] schedule_timeout+0x11b/0x240 > [ 272.799374][ T564] ? usleep_range_state+0x180/0x180 > [ 272.799379][ T564] ? timer_migration_handler+0xc0/0xc0 > [ 272.799389][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 272.799394][ T564] ? prepare_to_wait_event+0xcd/0x6c0 > [ 272.799402][ T564] kcompactd+0x870/0xc80 > [ 272.799416][ T564] ? kcompactd_do_work+0x540/0x540 > [ 272.799422][ T564] ? prepare_to_swait_exclusive+0x240/0x240 > [ 272.799429][ T564] ? __kthread_parkme+0xd9/0x200 > [ 272.799433][ T564] ? schedule+0xfe/0x240 > [ 272.799436][ T564] ? kcompactd_do_work+0x540/0x540 > [ 272.799442][ T564] kthread+0x28f/0x340 > [ 272.799445][ T564] ? kthread_complete_and_exit+0x40/0x40 > [ 272.799451][ T564] ret_from_fork+0x1f/0x30 > [ 272.799469][ T564] </TASK> > [ 273.033327][ T563] BUG: scheduling while atomic: kcompactd0/563/0x00000003 > [ 273.033331][ T563] no locks held by kcompactd0/563. > [ 273.033333][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 273.033428][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 273.033432][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 273.033434][ T563] Call Trace: > [ 273.033436][ T563] <TASK> > [ 273.033440][ T563] dump_stack_lvl+0x45/0x59 > [ 273.033449][ T563] __schedule_bug.cold+0xcf/0xe0 > [ 273.033457][ T563] schedule_debug+0x274/0x300 > [ 273.033467][ T563] __schedule+0xf5/0x1740 > [ 273.033477][ T563] ? io_schedule_timeout+0x180/0x180 > [ 273.033481][ T563] ? find_held_lock+0x2c/0x140 > [ 273.033487][ T563] ? prepare_to_wait_event+0xcd/0x6c0 > [ 273.033498][ T563] schedule+0xea/0x240 > [ 273.033503][ T563] schedule_timeout+0x11b/0x240 > [ 273.033509][ T563] ? usleep_range_state+0x180/0x180 > [ 273.033521][ T563] ? timer_migration_handler+0xc0/0xc0 > [ 273.033530][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 273.033535][ T563] ? prepare_to_wait_event+0xcd/0x6c0 > [ 273.033543][ T563] kcompactd+0x870/0xc80 > [ 273.033557][ T563] ? kcompactd_do_work+0x540/0x540 > [ 273.033563][ T563] ? prepare_to_swait_exclusive+0x240/0x240 > [ 273.033570][ T563] ? __kthread_parkme+0xd9/0x200 > [ 273.033574][ T563] ? schedule+0xfe/0x240 > [ 273.033577][ T563] ? kcompactd_do_work+0x540/0x540 > [ 273.033582][ T563] kthread+0x28f/0x340 > [ 273.033585][ T563] ? kthread_complete_and_exit+0x40/0x40 > [ 273.033590][ T563] ret_from_fork+0x1f/0x30 > [ 273.033608][ T563] </TASK> > [ 273.319687][ T564] BUG: sleeping function called from invalid context at mm/migrate.c:1380 > [ 273.319692][ T564] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 564, name: kcompactd1 > [ 273.319694][ T564] preempt_count: 1, expected: 0 > [ 273.319696][ T564] no locks held by kcompactd1/564. > [ 273.319699][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 273.319702][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 273.319704][ T564] Call Trace: > [ 273.319707][ T564] <TASK> > [ 273.319713][ T564] dump_stack_lvl+0x45/0x59 > [ 273.319723][ T564] __might_resched.cold+0x15e/0x190 > [ 273.319734][ T564] migrate_pages+0x2b1/0x1200 > [ 273.319744][ T564] ? isolate_freepages+0x880/0x880 > [ 273.319752][ T564] ? split_map_pages+0x4c0/0x4c0 > [ 273.319762][ T564] ? buffer_migrate_page_norefs+0x40/0x40 > [ 273.319767][ T564] ? isolate_migratepages+0x300/0x6c0 > [ 273.319778][ T564] compact_zone+0xa3f/0x1640 > [ 273.319795][ T564] ? compaction_suitable+0x200/0x200 > [ 273.319800][ T564] ? lock_acquire+0x194/0x500 > [ 273.319807][ T564] ? finish_wait+0xc5/0x280 > [ 273.319816][ T564] proactive_compact_node+0xeb/0x180 > [ 273.319820][ T564] ? compact_store+0xc0/0xc0 > [ 273.319835][ T564] ? lockdep_hardirqs_on_prepare+0x19a/0x380 > [ 273.319839][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 273.319850][ T564] kcompactd+0x500/0xc80 > [ 273.319860][ T564] ? kcompactd_do_work+0x540/0x540 > [ 273.319866][ T564] ? prepare_to_swait_exclusive+0x240/0x240 > [ 273.319873][ T564] ? __kthread_parkme+0xd9/0x200 > [ 273.319877][ T564] ? schedule+0xfe/0x240 > [ 273.319882][ T564] ? kcompactd_do_work+0x540/0x540 > [ 273.319888][ T564] kthread+0x28f/0x340 > [ 273.319891][ T564] ? kthread_complete_and_exit+0x40/0x40 > [ 273.319896][ T564] ret_from_fork+0x1f/0x30 > [ 273.319914][ T564] </TASK> > [ 273.637490][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000041 > [ 273.637496][ T564] no locks held by kcompactd1/564. > [ 273.637498][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk > x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g > eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper > ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb > _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables > [ 273.637556][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1 > [ 273.637560][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020 > [ 273.637562][ T564] Call Trace: > [ 273.637565][ T564] <TASK> > [ 273.637571][ T564] dump_stack_lvl+0x45/0x59 > [ 273.637580][ T564] __schedule_bug.cold+0xcf/0xe0 > [ 273.637589][ T564] schedule_debug+0x274/0x300 > [ 273.637600][ T564] __schedule+0xf5/0x1740 > [ 273.637612][ T564] ? io_schedule_timeout+0x180/0x180 > [ 273.637616][ T564] ? find_held_lock+0x2c/0x140 > [ 273.637622][ T564] ? prepare_to_wait_event+0xcd/0x6c0 > [ 273.637633][ T564] schedule+0xea/0x240 > [ 273.637638][ T564] schedule_timeout+0x11b/0x240 > [ 273.637645][ T564] ? usleep_range_state+0x180/0x180 > [ 273.637650][ T564] ? timer_migration_handler+0xc0/0xc0 > [ 273.637659][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40 > [ 273.637664][ T564] ? prepare_to_wait_event+0xcd/0x6c0 > [ 273.637671][ T564] kcompactd+0x870/0xc80 > [ 273.637687][ T564] ? kcompactd_do_work+0x540/0x540 > [ 273.637692][ T564] ? prepare_to_swait_exclusive+0x240/0x240 > [ 273.637700][ T564] ? __kthread_parkme+0xd9/0x200 > [ 273.637704][ T564] ? schedule+0xfe/0x240 > [ 273.637707][ T564] ? kcompactd_do_work+0x540/0x540 > [ 273.637713][ T564] kthread+0x28f/0x340 > [ 273.637716][ T564] ? kthread_complete_and_exit+0x40/0x40 > [ 273.637722][ T564] ret_from_fork+0x1f/0x30 > [ 273.637740][ T564] </TASK> > [ 285.377624][ T1147] > > > >> >> >> From: Mel Gorman <mgorman@techsingularity.net> >> Subject: mm/page_alloc: replace local_lock with normal spinlock -fix >> Date: Mon, 27 Jun 2022 09:46:45 +0100 >> >> As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of >> pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch >> mm-page_alloc-replace-local_lock-with-normal-spinlock.patch >> >> Link: https://lkml.kernel.org/r/20220627084645.GA27531@techsingularity.net >> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >> Reported-by: Yu Zhao <yuzhao@google.com> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >> --- >> >> mm/page_alloc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix >> +++ a/mm/page_alloc.c >> @@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page, >> >> zone = page_zone(page); >> pcp_trylock_prepare(UP_flags); >> - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); >> + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); >> if (pcp) { >> free_unref_page_commit(zone, pcp, page, migratetype, order); >> pcp_spin_unlock_irqrestore(pcp, flags); >> _ >>
On Thu, Jul 07, 2022 at 11:55:35PM +0200, Vlastimil Babka wrote: > On 7/5/22 15:51, Oliver Sang wrote: > > Hi Andrew Morton, > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > >> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > >> > >> > FYI, we noticed the following commit (built with gcc-11): > >> > > >> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > >> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > >> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > >> > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > >> > > >> > >> Did this test include the followup patch > >> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > no, we just fetched original patch set and test upon it. > > It appears you fetched v4, not v5. I noticed it from the threading of your > report that was threaded in the v4 thread, and also the github url: above. > In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's > missing in the github branch you were testing: > Thanks Vlastimil! This is my fault, I failed to verify that the code in my tree, Andrew's tree and what Oliver tested were the same so no wonder I could not find where the missing unpin was. I've gone through mm-unstable commits be42c869b8e..4143c9b5266 and can confirm that they are now identical to my own tree which includes Andrew's fix for the smatch warning that Dan reported. # git diff HEAD^..mm-pcpspinnoirq-v6r1-mmunstable | wc -l 0 The only difference between my tree and Andrew's is that there is a head commit for "mm/page_alloc: Do not disable IRQs for per-cpu allocations" which has been put on hold for now.
Hi Mel Gorman, Hi Vlastimil Babka, thanks a lot for information! normally we would discard a report if we found there is a new version for a patch set. however, for this one, we failed to fetch v5 from mailing list. sorry if any inconvenience. On Fri, Jul 08, 2022 at 11:56:03AM +0100, Mel Gorman wrote: > On Thu, Jul 07, 2022 at 11:55:35PM +0200, Vlastimil Babka wrote: > > On 7/5/22 15:51, Oliver Sang wrote: > > > Hi Andrew Morton, > > > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > >> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <oliver.sang@intel.com> wrote: > > >> > > >> > FYI, we noticed the following commit (built with gcc-11): > > >> > > > >> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > >> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > >> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > >> > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > >> > > > >> > > >> Did this test include the followup patch > > >> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > > > no, we just fetched original patch set and test upon it. > > > > It appears you fetched v4, not v5. I noticed it from the threading of your > > report that was threaded in the v4 thread, and also the github url: above. > > In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's > > missing in the github branch you were testing: > > > > Thanks Vlastimil! This is my fault, I failed to verify that the code in > my tree, Andrew's tree and what Oliver tested were the same so no wonder I > could not find where the missing unpin was. I've gone through mm-unstable > commits be42c869b8e..4143c9b5266 and can confirm that they are now identical > to my own tree which includes Andrew's fix for the smatch warning that > Dan reported. > > # git diff HEAD^..mm-pcpspinnoirq-v6r1-mmunstable | wc -l > 0 > > The only difference between my tree and Andrew's is that there is a head > commit for "mm/page_alloc: Do not disable IRQs for per-cpu allocations" > which has been put on hold for now. > > -- > Mel Gorman > SUSE Labs
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03882ce7765f..f10782ab7cc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -126,13 +126,6 @@ typedef int __bitwise fpi_t; static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) -struct pagesets { - local_lock_t lock; -}; -static DEFINE_PER_CPU(struct pagesets, pagesets) = { - .lock = INIT_LOCAL_LOCK(lock), -}; - #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) /* * On SMP, spin_trylock is sufficient protection. @@ -147,6 +140,81 @@ static DEFINE_PER_CPU(struct pagesets, pagesets) = { #define pcp_trylock_finish(flags) local_irq_restore(flags) #endif +/* + * Locking a pcp requires a PCP lookup followed by a spinlock. To avoid + * a migration causing the wrong PCP to be locked and remote memory being + * potentially allocated, pin the task to the CPU for the lookup+lock. + * preempt_disable is used on !RT because it is faster than migrate_disable. + * migrate_disable is used on RT because otherwise RT spinlock usage is + * interfered with and a high priority task cannot preempt the allocator. + */ +#ifndef CONFIG_PREEMPT_RT +#define pcpu_task_pin() preempt_disable() +#define pcpu_task_unpin() preempt_enable() +#else +#define pcpu_task_pin() migrate_disable() +#define pcpu_task_unpin() migrate_enable() +#endif + +/* + * Generic helper to lookup and a per-cpu variable with an embedded spinlock. + * Return value should be used with equivalent unlock helper. + */ +#define pcpu_spin_lock(type, member, ptr) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + spin_lock(&_ret->member); \ + _ret; \ +}) + +#define pcpu_spin_lock_irqsave(type, member, ptr, flags) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + spin_lock_irqsave(&_ret->member, flags); \ + _ret; \ +}) + +#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + if (!spin_trylock_irqsave(&_ret->member, flags)) \ + _ret = NULL; \ + _ret; \ +}) + +#define pcpu_spin_unlock(member, ptr) \ +({ \ + spin_unlock(&ptr->member); \ + pcpu_task_unpin(); \ +}) + +#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \ +({ \ + spin_unlock_irqrestore(&ptr->member, flags); \ + pcpu_task_unpin(); \ +}) + +/* struct per_cpu_pages specific helpers. */ +#define pcp_spin_lock(ptr) \ + pcpu_spin_lock(struct per_cpu_pages, lock, ptr) + +#define pcp_spin_lock_irqsave(ptr, flags) \ + pcpu_spin_lock_irqsave(struct per_cpu_pages, lock, ptr, flags) + +#define pcp_spin_trylock_irqsave(ptr, flags) \ + pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags) + +#define pcp_spin_unlock(ptr) \ + pcpu_spin_unlock(lock, ptr) + +#define pcp_spin_unlock_irqrestore(ptr, flags) \ + pcpu_spin_unlock_irqrestore(lock, ptr, flags) #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -1481,10 +1549,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* Ensure requested pindex is drained first. */ pindex = pindex - 1; - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ + /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */ spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -3052,10 +3117,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, { int i, allocated = 0; - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ + /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */ spin_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return min(READ_ONCE(pcp->batch) << 2, high); } -/* Returns true if the page was committed to the per-cpu list. */ -static bool free_unref_page_commit(struct page *page, int migratetype, - unsigned int order, bool locked) +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone, + struct page *page, int migratetype, + unsigned int order) { - struct zone *zone = page_zone(page); - struct per_cpu_pages *pcp; int high; int pindex; bool free_high; - unsigned long __maybe_unused UP_flags; __count_vm_event(PGFREE); - pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); - if (!locked) { - /* Protect against a parallel drain. */ - pcp_trylock_prepare(UP_flags); - if (!spin_trylock(&pcp->lock)) { - pcp_trylock_finish(UP_flags); - return false; - } - } - list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count += 1 << order; @@ -3408,13 +3457,6 @@ static bool free_unref_page_commit(struct page *page, int migratetype, free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); } - - if (!locked) { - spin_unlock(&pcp->lock); - pcp_trylock_finish(UP_flags); - } - - return true; } /* @@ -3422,10 +3464,12 @@ static bool free_unref_page_commit(struct page *page, int migratetype, */ void free_unref_page(struct page *page, unsigned int order) { - unsigned long flags; + struct per_cpu_pages *pcp; + struct zone *zone; unsigned long pfn = page_to_pfn(page); int migratetype; - bool freed_pcp = false; + unsigned long flags; + unsigned long __maybe_unused UP_flags; if (!free_unref_page_prepare(page, pfn, order)) return; @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = MIGRATE_MOVABLE; } - local_lock_irqsave(&pagesets.lock, flags); - freed_pcp = free_unref_page_commit(page, migratetype, order, false); - local_unlock_irqrestore(&pagesets.lock, flags); - - if (unlikely(!freed_pcp)) + zone = page_zone(page); + pcp_trylock_prepare(UP_flags); + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); + if (pcp) { + free_unref_page_commit(pcp, zone, page, migratetype, order); + pcp_spin_unlock_irqrestore(pcp, flags); + } else { free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); + } + pcp_trylock_finish(UP_flags); } /* @@ -3500,20 +3548,20 @@ void free_unref_page_list(struct list_head *list) if (list_empty(list)) return; - local_lock_irqsave(&pagesets.lock, flags); - page = lru_to_page(list); locked_zone = page_zone(page); - pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); - spin_lock(&pcp->lock); + pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); list_for_each_entry_safe(page, next, list, lru) { struct zone *zone = page_zone(page); /* Different zone, different pcp lock. */ if (zone != locked_zone) { + /* Leave IRQs enabled as a new lock is acquired. */ spin_unlock(&pcp->lock); locked_zone = zone; + + /* Preemption disabled by pcp_spin_lock_irqsave. */ pcp = this_cpu_ptr(zone->per_cpu_pageset); spin_lock(&pcp->lock); } @@ -3528,33 +3576,19 @@ void free_unref_page_list(struct list_head *list) trace_mm_page_free_batched(page); - /* - * If there is a parallel drain in progress, free to the buddy - * allocator directly. This is expensive as the zone lock will - * be acquired multiple times but if a drain is in progress - * then an expensive operation is already taking place. - * - * TODO: Always false at the moment due to local_lock_irqsave - * and is preparation for converting to local_lock. - */ - if (unlikely(!free_unref_page_commit(page, migratetype, 0, true))) - free_one_page(page_zone(page), page, page_to_pfn(page), 0, migratetype, FPI_NONE); + free_unref_page_commit(pcp, zone, page, migratetype, 0); /* * Guard against excessive IRQ disabled times when we get * a large list of pages to free. */ if (++batch_count == SWAP_CLUSTER_MAX) { - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); batch_count = 0; - local_lock_irqsave(&pagesets.lock, flags); - pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); - spin_lock(&pcp->lock); + pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); } } - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); } /* @@ -3722,28 +3756,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, int migratetype, unsigned int alloc_flags, struct per_cpu_pages *pcp, - struct list_head *list, - bool locked) + struct list_head *list) { struct page *page; - unsigned long __maybe_unused UP_flags; - - /* - * spin_trylock is not necessary right now due to due to - * local_lock_irqsave and is a preparation step for - * a conversion to local_lock using the trylock to prevent - * IRQ re-entrancy. If pcp->lock cannot be acquired, the caller - * uses rmqueue_buddy. - * - * TODO: Convert local_lock_irqsave to local_lock. - */ - if (unlikely(!locked)) { - pcp_trylock_prepare(UP_flags); - if (!spin_trylock(&pcp->lock)) { - pcp_trylock_finish(UP_flags); - return NULL; - } - } do { if (list_empty(list)) { @@ -3776,10 +3791,6 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } while (check_new_pcp(page, order)); out: - if (!locked) { - spin_unlock(&pcp->lock); - pcp_trylock_finish(UP_flags); - } return page; } @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct list_head *list; struct page *page; unsigned long flags; + unsigned long __maybe_unused UP_flags; - local_lock_irqsave(&pagesets.lock, flags); + /* + * spin_trylock_irqsave is not necessary right now as it'll only be + * true when contending with a remote drain. It's in place as a + * preparation step before converting pcp locking to spin_trylock + * to protect against IRQ reentry. + */ + pcp_trylock_prepare(UP_flags); + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); + if (!pcp) + return NULL; /* * On allocation, reduce the number of pages that are batch freed. * See nr_pcp_free() where free_factor is increased for subsequent * frees. */ - pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp->free_factor >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false); - local_unlock_irqrestore(&pagesets.lock, flags); + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); + pcp_spin_unlock_irqrestore(pcp, flags); + pcp_trylock_finish(UP_flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone, 1); @@ -5410,10 +5431,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed; /* Attempt the batch allocation */ - local_lock_irqsave(&pagesets.lock, flags); - pcp = this_cpu_ptr(zone->per_cpu_pageset); + pcp = pcp_spin_lock_irqsave(zone->per_cpu_pageset, flags); pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; - spin_lock(&pcp->lock); while (nr_populated < nr_pages) { @@ -5424,13 +5443,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, } page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, - pcp, pcp_list, true); + pcp, pcp_list); if (unlikely(!page)) { /* Try and allocate at least one page */ - if (!nr_account) { - spin_unlock(&pcp->lock); + if (!nr_account) goto failed_irq; - } break; } nr_account++; @@ -5443,8 +5460,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); @@ -5453,7 +5469,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, return nr_populated; failed_irq: - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
struct per_cpu_pages is no longer strictly local as PCP lists can be drained remotely using a lock for protection. While the use of local_lock works, it goes against the intent of local_lock which is for "pure CPU local concurrency control mechanisms and not suited for inter-CPU concurrency control" (Documentation/locking/locktypes.rst) local_lock protects against migration between when the percpu pointer is accessed and the pcp->lock acquired. The lock acquisition is a preemption point so in the worst case, a task could migrate to another NUMA node and accidentally allocate remote memory. The main requirement is to pin the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. Replace local_lock with helpers that pin a task to a CPU, lookup the per-cpu structure and acquire the embedded lock. It's similar to local_lock without breaking the intent behind the API. It is not a complete API as only the parts needed for PCP-alloc are implemented but in theory, the generic helpers could be promoted to a general API if there was demand for an embedded lock within a per-cpu struct with a guarantee that the per-cpu structure locked matches the running CPU and cannot use get_cpu_var due to RT concerns. PCP requires these semantics to avoid accidentally allocating remote memory. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- mm/page_alloc.c | 226 ++++++++++++++++++++++++++---------------------- 1 file changed, 121 insertions(+), 105 deletions(-)