diff mbox series

[for,6.1,regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock"

Message ID 20230113173345.9692-1-vbabka@suse.cz (mailing list archive)
State New
Headers show
Series [for,6.1,regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock" | expand

Commit Message

Vlastimil Babka Jan. 13, 2023, 5:33 p.m. UTC
This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.

We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
stalling CPU for long periods of time. Investigation of tracepoint data
shows that compaction is stuck in repeating fast_find_migrateblock()
based migrate page isolation, and then fails to migrate all isolated
pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
fast_find_migrateblock") was suspected as it was merged in 6.1 and in
theory can indeed remove a termination condition for
fast_find_migrateblock() under certain conditions, as it removes a place
that always marks a scanned pageblock from being re-scanned. There are
other such places, but those can be skipped under certain conditions,
which seems to match the tracepoint data.

Testing of revert also appears to have resolved the issue, thus revert
the commit until a more robust solution for the original problem is
developed.

It's also likely this will fix qemu stalls with 6.1 kernel reported in
Link 2, but that is not yet confirmed.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
Cc: <stable@vger.kernel.org>
---
 mm/compaction.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Vlastimil Babka Jan. 14, 2023, 6:49 a.m. UTC | #1
On 1/13/23 18:33, Vlastimil Babka wrote:
> This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.
> 
> We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
> stalling CPU for long periods of time. Investigation of tracepoint data
> shows that compaction is stuck in repeating fast_find_migrateblock()
> based migrate page isolation, and then fails to migrate all isolated
> pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
> fast_find_migrateblock") was suspected as it was merged in 6.1 and in
> theory can indeed remove a termination condition for
> fast_find_migrateblock() under certain conditions, as it removes a place
> that always marks a scanned pageblock from being re-scanned. There are
> other such places, but those can be skipped under certain conditions,
> which seems to match the tracepoint data.
> 
> Testing of revert also appears to have resolved the issue, thus revert
> the commit until a more robust solution for the original problem is
> developed.
> 
> It's also likely this will fix qemu stalls with 6.1 kernel reported in
> Link 2, but that is not yet confirmed.
> 
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
> Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> Cc: <stable@vger.kernel.org>

Oops, forgot:

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca1603524bbe..8238e83385a7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1839,6 +1839,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
>  					pfn = cc->zone->zone_start_pfn;
>  				cc->fast_search_fail = 0;
>  				found_block = true;
> +				set_pageblock_skip(freepage);
>  				break;
>  			}
>  		}
Pedro Falcato Jan. 14, 2023, 8:08 a.m. UTC | #2
On Sat, Jan 14, 2023 at 6:51 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/13/23 18:33, Vlastimil Babka wrote:
> > This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.
> >
> > We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
> > stalling CPU for long periods of time. Investigation of tracepoint data
> > shows that compaction is stuck in repeating fast_find_migrateblock()
> > based migrate page isolation, and then fails to migrate all isolated
> > pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
> > fast_find_migrateblock") was suspected as it was merged in 6.1 and in
> > theory can indeed remove a termination condition for
> > fast_find_migrateblock() under certain conditions, as it removes a place
> > that always marks a scanned pageblock from being re-scanned. There are
> > other such places, but those can be skipped under certain conditions,
> > which seems to match the tracepoint data.
> >
> > Testing of revert also appears to have resolved the issue, thus revert
> > the commit until a more robust solution for the original problem is
> > developed.
> >
> > It's also likely this will fix qemu stalls with 6.1 kernel reported in
> > Link 2, but that is not yet confirmed.
> >
> > Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
> > Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
> > Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> > Cc: <stable@vger.kernel.org>
>
> Oops, forgot:
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> > ---
> >  mm/compaction.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ca1603524bbe..8238e83385a7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1839,6 +1839,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
> >                                       pfn = cc->zone->zone_start_pfn;
> >                               cc->fast_search_fail = 0;
> >                               found_block = true;
> > +                             set_pageblock_skip(freepage);
> >                               break;
> >                       }
> >               }
>

Vlastimil,

Thank you so much for looking into this. I've been daily driving it
for the past half day and it seems to have fixed my QEMU issues.
Of course, I don't have exactly a test suite for this but I've tried
everything and I can't get any of the original problems to show up.

That being said,
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>

I'll report back if QEMU freezes the system again.
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index ca1603524bbe..8238e83385a7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1839,6 +1839,7 @@  static unsigned long fast_find_migrateblock(struct compact_control *cc)
 					pfn = cc->zone->zone_start_pfn;
 				cc->fast_search_fail = 0;
 				found_block = true;
+				set_pageblock_skip(freepage);
 				break;
 			}
 		}