diff mbox series

[v1] mm: migrate: don't rely on PageMovable() of newpage after unlocking it

Message ID 20190128160403.16657-1-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v1] mm: migrate: don't rely on PageMovable() of newpage after unlocking it | expand

Commit Message

David Hildenbrand Jan. 28, 2019, 4:04 p.m. UTC
While debugging some crashes related to virtio-balloon deflation that
happened under the old balloon migration code, I stumbled over a race
that still exists today.

What we experienced:

drivers/virtio/virtio_balloon.c:release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Turns out after having added the page to a local list when dequeuing,
the page would suddenly be moved to an LRU list before we would free it
via the local list, corrupting both lists. So a page we own and that is
!LRU was moved to an LRU list.

In __unmap_and_move(), we lock the old and newpage and perform the
migration. In case of vitio-balloon, the new page will become
movable, the old page will no longer be movable.

However, after unlocking newpage, there is nothing stopping the newpage
from getting dequeued and freed by virtio-balloon. This
will result in the newpage
1. No longer having PageMovable()
2. Getting moved to the local list before finally freeing it (using
   page->lru)

Back in the migration thread in __unmap_and_move(), we would after
unlocking the newpage suddenly no longer have PageMovable(newpage) and
will therefore call putback_lru_page(newpage), modifying page->lru
although that list is still in use by virtio-balloon.

To summarize, we have a race between migrating the newpage and checking
for PageMovable(newpage). Instead of checking PageMovable(newpage), we
can simply rely on is_lru of the original page.

Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management"), which was backported up to 3.12.
Old compaction code used PageBalloon() via -_is_movable_balloon_page()
instead of PageMovable(), however with the same semantics.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: stable@vger.kernel.org # 3.12+
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/migrate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Jan. 28, 2019, 8:02 p.m. UTC | #1
On 28.01.19 17:04, David Hildenbrand wrote:
> While debugging some crashes related to virtio-balloon deflation that
> happened under the old balloon migration code, I stumbled over a race
> that still exists today.
> 
> What we experienced:
> 
> drivers/virtio/virtio_balloon.c:release_pages_balloon():
> - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
> - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
> 
> Turns out after having added the page to a local list when dequeuing,
> the page would suddenly be moved to an LRU list before we would free it
> via the local list, corrupting both lists. So a page we own and that is
> !LRU was moved to an LRU list.
> 
> In __unmap_and_move(), we lock the old and newpage and perform the
> migration. In case of vitio-balloon, the new page will become
> movable, the old page will no longer be movable.
> 
> However, after unlocking newpage, there is nothing stopping the newpage
> from getting dequeued and freed by virtio-balloon. This
> will result in the newpage
> 1. No longer having PageMovable()
> 2. Getting moved to the local list before finally freeing it (using
>    page->lru)
> 
> Back in the migration thread in __unmap_and_move(), we would after
> unlocking the newpage suddenly no longer have PageMovable(newpage) and
> will therefore call putback_lru_page(newpage), modifying page->lru
> although that list is still in use by virtio-balloon.
> 
> To summarize, we have a race between migrating the newpage and checking
> for PageMovable(newpage). Instead of checking PageMovable(newpage), we
> can simply rely on is_lru of the original page.
> 
> Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
> redesign ballooned pages management"), which was backported up to 3.12.
> Old compaction code used PageBalloon() via -_is_movable_balloon_page()
> instead of PageMovable(), however with the same semantics.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Vratislav Bendel <vbendel@redhat.com>
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: stable@vger.kernel.org # 3.12+
> Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Rafael Aquini <aquini@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/migrate.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4512afab46ac..31e002270b05 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1135,10 +1135,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  	 * If migration is successful, decrease refcount of the newpage
>  	 * which will not free the page because new page owner increased
>  	 * refcounter. As well, if it is LRU page, add the page to LRU
> -	 * list in here.
> +	 * list in here. Don't rely on PageMovable(newpage), as that could
> +	 * already have changed after unlocking newpage (e.g.
> +	 * virtio-balloon deflation).
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
> -		if (unlikely(__PageMovable(newpage)))
> +		if (unlikely(!is_lru))
>  			put_page(newpage);
>  		else
>  			putback_lru_page(newpage);
> 

Vratislav just pointed out that this issue should not happen on upstream
as __PageMovable(newpage) will still return true even after
__ClearPageMovable(newpage). Only PageMovable(newpage) would actually
return false.

(not sure if I am happy about this, this is horribly confusing and
complicated)

I am not 100% sure yet, but I guess Vratislav is right. So it was
effectively fixed by

b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"),
which checks for __PageMovable(newpage) instead of
__is_movable_balloon_page(newpage).

Anybody wanting to fix stable kernels either has to backport something
proposed in this patch or b1123ea6d3b3.
Michal Hocko Jan. 28, 2019, 8:19 p.m. UTC | #2
On Mon 28-01-19 21:02:52, David Hildenbrand wrote:
> On 28.01.19 17:04, David Hildenbrand wrote:
> > While debugging some crashes related to virtio-balloon deflation that
> > happened under the old balloon migration code, I stumbled over a race
> > that still exists today.
> > 
> > What we experienced:
> > 
> > drivers/virtio/virtio_balloon.c:release_pages_balloon():
> > - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
> > - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
> > 
> > Turns out after having added the page to a local list when dequeuing,
> > the page would suddenly be moved to an LRU list before we would free it
> > via the local list, corrupting both lists. So a page we own and that is
> > !LRU was moved to an LRU list.
> > 
> > In __unmap_and_move(), we lock the old and newpage and perform the
> > migration. In case of vitio-balloon, the new page will become
> > movable, the old page will no longer be movable.
> > 
> > However, after unlocking newpage, there is nothing stopping the newpage
> > from getting dequeued and freed by virtio-balloon. This
> > will result in the newpage
> > 1. No longer having PageMovable()
> > 2. Getting moved to the local list before finally freeing it (using
> >    page->lru)
> > 
> > Back in the migration thread in __unmap_and_move(), we would after
> > unlocking the newpage suddenly no longer have PageMovable(newpage) and
> > will therefore call putback_lru_page(newpage), modifying page->lru
> > although that list is still in use by virtio-balloon.
> > 
> > To summarize, we have a race between migrating the newpage and checking
> > for PageMovable(newpage). Instead of checking PageMovable(newpage), we
> > can simply rely on is_lru of the original page.
> > 
> > Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
> > redesign ballooned pages management"), which was backported up to 3.12.
> > Old compaction code used PageBalloon() via -_is_movable_balloon_page()
> > instead of PageMovable(), however with the same semantics.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Vratislav Bendel <vbendel@redhat.com>
> > Cc: Rafael Aquini <aquini@redhat.com>
> > Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: stable@vger.kernel.org # 3.12+
> > Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
> > Reported-by: Vratislav Bendel <vbendel@redhat.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Acked-by: Rafael Aquini <aquini@redhat.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  mm/migrate.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 4512afab46ac..31e002270b05 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1135,10 +1135,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >  	 * If migration is successful, decrease refcount of the newpage
> >  	 * which will not free the page because new page owner increased
> >  	 * refcounter. As well, if it is LRU page, add the page to LRU
> > -	 * list in here.
> > +	 * list in here. Don't rely on PageMovable(newpage), as that could
> > +	 * already have changed after unlocking newpage (e.g.
> > +	 * virtio-balloon deflation).
> >  	 */
> >  	if (rc == MIGRATEPAGE_SUCCESS) {
> > -		if (unlikely(__PageMovable(newpage)))
> > +		if (unlikely(!is_lru))
> >  			put_page(newpage);
> >  		else
> >  			putback_lru_page(newpage);
> > 
> 
> Vratislav just pointed out that this issue should not happen on upstream
> as __PageMovable(newpage) will still return true even after
> __ClearPageMovable(newpage). Only PageMovable(newpage) would actually
> return false.
> 
> (not sure if I am happy about this, this is horribly confusing and
> complicated)

It is confusing as hell! __ClearPageMovable is a misnomer and I have to
admit I have misread the implementation to actually ~PAGE_MAPPING_MOVABLE.

> I am not 100% sure yet, but I guess Vratislav is right. So it was
> effectively fixed by
> 
> b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"),
> which checks for __PageMovable(newpage) instead of
> __is_movable_balloon_page(newpage).

So this is not just a clean up. Sigh!

> Anybody wanting to fix stable kernels either has to backport something
> proposed in this patch or b1123ea6d3b3.

I think we should go with a simple patch for stable so this patch sounds
like a good thing. *PageMovable thingy needs a much better documentation
and ideally a cleaner implementation as well. The current state is just
incomprehensible. 

David, could you reformulate the changelog accordingly please? My ack
still holds.
David Hildenbrand Jan. 28, 2019, 9:09 p.m. UTC | #3
On 28.01.19 21:19, Michal Hocko wrote:
> On Mon 28-01-19 21:02:52, David Hildenbrand wrote:
>> On 28.01.19 17:04, David Hildenbrand wrote:
>>> While debugging some crashes related to virtio-balloon deflation that
>>> happened under the old balloon migration code, I stumbled over a race
>>> that still exists today.
>>>
>>> What we experienced:
>>>
>>> drivers/virtio/virtio_balloon.c:release_pages_balloon():
>>> - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
>>> - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
>>>
>>> Turns out after having added the page to a local list when dequeuing,
>>> the page would suddenly be moved to an LRU list before we would free it
>>> via the local list, corrupting both lists. So a page we own and that is
>>> !LRU was moved to an LRU list.
>>>
>>> In __unmap_and_move(), we lock the old and newpage and perform the
>>> migration. In case of vitio-balloon, the new page will become
>>> movable, the old page will no longer be movable.
>>>
>>> However, after unlocking newpage, there is nothing stopping the newpage
>>> from getting dequeued and freed by virtio-balloon. This
>>> will result in the newpage
>>> 1. No longer having PageMovable()
>>> 2. Getting moved to the local list before finally freeing it (using
>>>    page->lru)
>>>
>>> Back in the migration thread in __unmap_and_move(), we would after
>>> unlocking the newpage suddenly no longer have PageMovable(newpage) and
>>> will therefore call putback_lru_page(newpage), modifying page->lru
>>> although that list is still in use by virtio-balloon.
>>>
>>> To summarize, we have a race between migrating the newpage and checking
>>> for PageMovable(newpage). Instead of checking PageMovable(newpage), we
>>> can simply rely on is_lru of the original page.
>>>
>>> Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
>>> redesign ballooned pages management"), which was backported up to 3.12.
>>> Old compaction code used PageBalloon() via -_is_movable_balloon_page()
>>> instead of PageMovable(), however with the same semantics.
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> Cc: Jan Kara <jack@suse.cz>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Vratislav Bendel <vbendel@redhat.com>
>>> Cc: Rafael Aquini <aquini@redhat.com>
>>> Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: stable@vger.kernel.org # 3.12+
>>> Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
>>> Reported-by: Vratislav Bendel <vbendel@redhat.com>
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>> Acked-by: Rafael Aquini <aquini@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  mm/migrate.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 4512afab46ac..31e002270b05 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1135,10 +1135,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>>  	 * If migration is successful, decrease refcount of the newpage
>>>  	 * which will not free the page because new page owner increased
>>>  	 * refcounter. As well, if it is LRU page, add the page to LRU
>>> -	 * list in here.
>>> +	 * list in here. Don't rely on PageMovable(newpage), as that could
>>> +	 * already have changed after unlocking newpage (e.g.
>>> +	 * virtio-balloon deflation).
>>>  	 */
>>>  	if (rc == MIGRATEPAGE_SUCCESS) {
>>> -		if (unlikely(__PageMovable(newpage)))
>>> +		if (unlikely(!is_lru))
>>>  			put_page(newpage);
>>>  		else
>>>  			putback_lru_page(newpage);
>>>
>>
>> Vratislav just pointed out that this issue should not happen on upstream
>> as __PageMovable(newpage) will still return true even after
>> __ClearPageMovable(newpage). Only PageMovable(newpage) would actually
>> return false.
>>
>> (not sure if I am happy about this, this is horribly confusing and
>> complicated)
> 
> It is confusing as hell! __ClearPageMovable is a misnomer and I have to
> admit I have misread the implementation to actually ~PAGE_MAPPING_MOVABLE.
> 
>> I am not 100% sure yet, but I guess Vratislav is right. So it was
>> effectively fixed by
>>
>> b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"),
>> which checks for __PageMovable(newpage) instead of
>> __is_movable_balloon_page(newpage).
> 
> So this is not just a clean up. Sigh!
> 
>> Anybody wanting to fix stable kernels either has to backport something
>> proposed in this patch or b1123ea6d3b3.
> 
> I think we should go with a simple patch for stable so this patch sounds
> like a good thing. *PageMovable thingy needs a much better documentation
> and ideally a cleaner implementation as well. The current state is just
> incomprehensible. 

Especially as __ClearPageMovable will not make __PageMovable fail, fun
with code :)

> 
> David, could you reformulate the changelog accordingly please? My ack
> still holds.

You mean reformulating + resending for stable kernels only?

b1123ea6d3b3 was merged with v4.8.
d6d86c0a7f8d was backported to v3.12+.

So v3.12 - v4.7 are affected and will free pages to the LRU list.

Without 195a8c43e93d ("virtio-balloon: deflate via a page list") -
merged with v4.13 - this BUG is not immediately visible I guess. Pages
are still added to the wrong list (LRU although they shouldn't) but the
virtio-balloon local list does not exist/corrupt. I assume adding these
pages to the LRU list is bad itself, right?

If a distro backported 195a8c43e93d without b1123ea6d3b3, the BUG
becomes directly visible (this is how we hit it).

Thanks!
Michal Hocko Jan. 29, 2019, 7:18 a.m. UTC | #4
On Mon 28-01-19 22:09:14, David Hildenbrand wrote:
> On 28.01.19 21:19, Michal Hocko wrote:
[...]
> > David, could you reformulate the changelog accordingly please? My ack
> > still holds.
> 
> You mean reformulating + resending for stable kernels only?

I would merge your patch even if it doesn't fix any real problem _now_.
If for not other reasons it makes the code less subtle because we no
longer depend on this crazy __PageMovable is special. If the movable
flag is supposed to be synchronized with the page lock then do not do
tricks and make code more robust because the next time somebody would
like to fix up the current semantic he might reintroduce the bug easily.
David Hildenbrand Jan. 29, 2019, 10:01 a.m. UTC | #5
On 29.01.19 08:18, Michal Hocko wrote:
> On Mon 28-01-19 22:09:14, David Hildenbrand wrote:
>> On 28.01.19 21:19, Michal Hocko wrote:
> [...]
>>> David, could you reformulate the changelog accordingly please? My ack
>>> still holds.
>>
>> You mean reformulating + resending for stable kernels only?
> 
> I would merge your patch even if it doesn't fix any real problem _now_.
> If for not other reasons it makes the code less subtle because we no
> longer depend on this crazy __PageMovable is special. If the movable
> flag is supposed to be synchronized with the page lock then do not do
> tricks and make code more robust because the next time somebody would
> like to fix up the current semantic he might reintroduce the bug easily.
> 

Sounds good to me, I'll rephrase and resend. I'll also have a look if I
can easily refactor the whole PageMovable logic or if this is more involved.

Thanks!
Sasha Levin Jan. 29, 2019, 11:16 p.m. UTC | #6
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: d6d86c0a7f8d mm/balloon_compaction: redesign ballooned pages management.

The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133.

v4.20.5: Build OK!
v4.19.18: Build OK!
v4.14.96: Build OK!
v4.9.153: Build OK!
v4.4.172: Failed to apply! Possible dependencies:
    1031bc589228 ("lib/vsprintf: add %*pg format specifier")
    14e0a214d62d ("tools, perf: make gfp_compact_table up to date")
    1f7866b4aebd ("mm, tracing: make show_gfp_flags() up to date")
    420adbe9fc1a ("mm, tracing: unify mm flags handling in tracepoints and printk")
    53f9263baba6 ("mm: rework mapcount accounting to enable 4k mapping of THPs")
    7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
    7d2eba0557c1 ("mm: add tracepoint for scanning pages")
    c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
    d435edca9288 ("mm, page_owner: copy page owner info during migration")
    d8c1bdeb5d6b ("page-flags: trivial cleanup for PageTrans* helpers")
    eca56ff906bd ("mm, shmem: add internal shmem resident memory accounting")
    edf14cdbf9a0 ("mm, printk: introduce new format string for flags")

v3.18.133: Failed to apply! Possible dependencies:
    2847cf95c68f ("mm/debug-pagealloc: cleanup page guard code")
    48c96a368579 ("mm/page_owner: keep track of page owners")
    7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
    94f759d62b2c ("mm/page_owner.c: remove unnecessary stack_trace field")
    c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
    d435edca9288 ("mm, page_owner: copy page owner info during migration")
    e2cfc91120fa ("mm/page_owner: set correct gfp_mask on page_owner")
    e30825f1869a ("mm/debug-pagealloc: prepare boottime configurable on/off")
    eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")


How should we proceed with this patch?

--
Thanks,
Sasha
David Hildenbrand Jan. 29, 2019, 11:33 p.m. UTC | #7
On 30.01.19 00:16, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: d6d86c0a7f8d mm/balloon_compaction: redesign ballooned pages management.
> 
> The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133.
> 
> v4.20.5: Build OK!
> v4.19.18: Build OK!
> v4.14.96: Build OK!
> v4.9.153: Build OK!
> v4.4.172: Failed to apply! Possible dependencies:
>     1031bc589228 ("lib/vsprintf: add %*pg format specifier")
>     14e0a214d62d ("tools, perf: make gfp_compact_table up to date")
>     1f7866b4aebd ("mm, tracing: make show_gfp_flags() up to date")
>     420adbe9fc1a ("mm, tracing: unify mm flags handling in tracepoints and printk")
>     53f9263baba6 ("mm: rework mapcount accounting to enable 4k mapping of THPs")
>     7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
>     7d2eba0557c1 ("mm: add tracepoint for scanning pages")
>     c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
>     d435edca9288 ("mm, page_owner: copy page owner info during migration")
>     d8c1bdeb5d6b ("page-flags: trivial cleanup for PageTrans* helpers")
>     eca56ff906bd ("mm, shmem: add internal shmem resident memory accounting")
>     edf14cdbf9a0 ("mm, printk: introduce new format string for flags")
> 
> v3.18.133: Failed to apply! Possible dependencies:
>     2847cf95c68f ("mm/debug-pagealloc: cleanup page guard code")
>     48c96a368579 ("mm/page_owner: keep track of page owners")
>     7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
>     94f759d62b2c ("mm/page_owner.c: remove unnecessary stack_trace field")
>     c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
>     d435edca9288 ("mm, page_owner: copy page owner info during migration")
>     e2cfc91120fa ("mm/page_owner: set correct gfp_mask on page_owner")
>     e30825f1869a ("mm/debug-pagealloc: prepare boottime configurable on/off")
>     eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")
> 
> 
> How should we proceed with this patch?

I just sent a v2 and will send separate backports for 4.4 and 3.18 once
we agreed on the new patch. Thanks!

> 
> --
> Thanks,
> Sasha
>
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 4512afab46ac..31e002270b05 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1135,10 +1135,12 @@  static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * If migration is successful, decrease refcount of the newpage
 	 * which will not free the page because new page owner increased
 	 * refcounter. As well, if it is LRU page, add the page to LRU
-	 * list in here.
+	 * list in here. Don't rely on PageMovable(newpage), as that could
+	 * already have changed after unlocking newpage (e.g.
+	 * virtio-balloon deflation).
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__PageMovable(newpage)))
+		if (unlikely(!is_lru))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);