diff mbox series

[RFC] mm: migrate: don't rely on PageMovable() of newpage after unlocking it

Message ID 20190128121609.9528-1-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: migrate: don't rely on PageMovable() of newpage after unlocking it | expand

Commit Message

David Hildenbrand Jan. 28, 2019, 12:16 p.m. UTC
While debugging some crashes related to virtio-balloon deflation that
happened under the old balloon migration code, I stumbled over a possible
race I think.

What we experienced:

drivers/virtio/virtio_balloon.c:release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Turns out after having added the page to a local list when dequeuing,
the page would suddenly be moved to an LRU list before we would free it
via the local list, corrupting both lists. So a page we own and that is
!LRU was moved to an LRU list.

For us, this was triggered by backporting 195a8c43e93d8 ("virtio-balloon:
deflate via a page list") onto old balloon compaction code, but I think
this only made the BUG become visible and the race still exists in
new !LRU migraton code.

My theory:

In __unmap_and_move(), we lock the old and newpage and perform the
migration. In case of vitio-balloon, the new page will become
movable, the old page will no longer be movable.

However, after unlocking newpage, I think there is nothing stopping
the newpage from getting dequeued and freed by virtio-balloon. This
will result in the newpage
1. No longer having PageMovable()
2. Getting moved to the local list before finally freeing it (using
   page->lru)

Back in the migration thread in __unmap_and_move(), we would after
unlocking the newpage suddenly no longer have PageMovable(newpage) and
will therefore call putback_lru_page(newpage), modifying page->lru
although that list is still in use by virtio-balloon.

To summarize, we have a race between migrating the newpage and checking
for PageMovable(newpage). Instead of checking __PageMovable(newpage), we
can simply rely on is_lru. Because if the old page was not an LRU page, the
new page also shouldn't be. If there is a problem with that, we could check
for __PageMovable(newpage) before unlocking the new page.

As I am not yet sure if this race actually exists, especially also
upstream, I am sending this as RFC.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/migrate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Michal Hocko Jan. 28, 2019, 1:07 p.m. UTC | #1
On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
[...]
> My theory:
> 
> In __unmap_and_move(), we lock the old and newpage and perform the
> migration. In case of vitio-balloon, the new page will become
> movable, the old page will no longer be movable.
> 
> However, after unlocking newpage, I think there is nothing stopping
> the newpage from getting dequeued and freed by virtio-balloon. This
> will result in the newpage
> 1. No longer having PageMovable()
> 2. Getting moved to the local list before finally freeing it (using
>    page->lru)

Does that mean that the virtio-balloon can change the Movable state
while there are other users of the page? Can you point to the code that
does it? How come this can be safe at all? Or is the PageMovable stable
only under the page lock?
David Hildenbrand Jan. 28, 2019, 1:14 p.m. UTC | #2
On 28.01.19 14:07, Michal Hocko wrote:
> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> [...]
>> My theory:
>>
>> In __unmap_and_move(), we lock the old and newpage and perform the
>> migration. In case of vitio-balloon, the new page will become
>> movable, the old page will no longer be movable.
>>
>> However, after unlocking newpage, I think there is nothing stopping
>> the newpage from getting dequeued and freed by virtio-balloon. This
>> will result in the newpage
>> 1. No longer having PageMovable()
>> 2. Getting moved to the local list before finally freeing it (using
>>    page->lru)
> 
> Does that mean that the virtio-balloon can change the Movable state
> while there are other users of the page? Can you point to the code that
> does it? How come this can be safe at all? Or is the PageMovable stable
> only under the page lock?
> 

PageMovable is stable under the lock. The relevant instructions are in

mm/balloon_compaction.c and include/linux/balloon_compaction.h

balloon_page_insert() and balloon_page_delete() modify PageMovable and
are only called with both, the page locked and b_dev_info->pages_lock
locked.

Especially:
1. balloon_page_dequeue() which tries to find an unlocked page that is
not isolated. and deletes it via balloon_page_delete().

2. balloon_page_migrate() will effectively call balloon_page_delete() on
old page and balloon_page_insert() newpage. Both pages are locked before
calling balloon_page_migrate() and the b_dev_info->pages_lock is taken
internally.


So what's left is a very small race window when migrating to a new page,
after dropping the lock of the newpage.
Michal Hocko Jan. 28, 2019, 1:21 p.m. UTC | #3
On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
> On 28.01.19 14:07, Michal Hocko wrote:
> > On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> > [...]
> >> My theory:
> >>
> >> In __unmap_and_move(), we lock the old and newpage and perform the
> >> migration. In case of vitio-balloon, the new page will become
> >> movable, the old page will no longer be movable.
> >>
> >> However, after unlocking newpage, I think there is nothing stopping
> >> the newpage from getting dequeued and freed by virtio-balloon. This
> >> will result in the newpage
> >> 1. No longer having PageMovable()
> >> 2. Getting moved to the local list before finally freeing it (using
> >>    page->lru)
> > 
> > Does that mean that the virtio-balloon can change the Movable state
> > while there are other users of the page? Can you point to the code that
> > does it? How come this can be safe at all? Or is the PageMovable stable
> > only under the page lock?
> > 
> 
> PageMovable is stable under the lock. The relevant instructions are in
> 
> mm/balloon_compaction.c and include/linux/balloon_compaction.h

OK, I have just checked __ClearPageMovable and it indeed requires
PageLock. Then we also have to move is_lru = __PageMovable(page) after
the page lock.
David Hildenbrand Jan. 28, 2019, 1:22 p.m. UTC | #4
On 28.01.19 14:21, Michal Hocko wrote:
> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
>> On 28.01.19 14:07, Michal Hocko wrote:
>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
>>> [...]
>>>> My theory:
>>>>
>>>> In __unmap_and_move(), we lock the old and newpage and perform the
>>>> migration. In case of vitio-balloon, the new page will become
>>>> movable, the old page will no longer be movable.
>>>>
>>>> However, after unlocking newpage, I think there is nothing stopping
>>>> the newpage from getting dequeued and freed by virtio-balloon. This
>>>> will result in the newpage
>>>> 1. No longer having PageMovable()
>>>> 2. Getting moved to the local list before finally freeing it (using
>>>>    page->lru)
>>>
>>> Does that mean that the virtio-balloon can change the Movable state
>>> while there are other users of the page? Can you point to the code that
>>> does it? How come this can be safe at all? Or is the PageMovable stable
>>> only under the page lock?
>>>
>>
>> PageMovable is stable under the lock. The relevant instructions are in
>>
>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
> 
> OK, I have just checked __ClearPageMovable and it indeed requires
> PageLock. Then we also have to move is_lru = __PageMovable(page) after
> the page lock.
> 

I assume that is fine as is as the page is isolated? (yes, it will be
modified later when moving but we are interested in the original state)
Michal Hocko Jan. 28, 2019, 1:35 p.m. UTC | #5
On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
> On 28.01.19 14:21, Michal Hocko wrote:
> > On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
> >> On 28.01.19 14:07, Michal Hocko wrote:
> >>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> >>> [...]
> >>>> My theory:
> >>>>
> >>>> In __unmap_and_move(), we lock the old and newpage and perform the
> >>>> migration. In case of vitio-balloon, the new page will become
> >>>> movable, the old page will no longer be movable.
> >>>>
> >>>> However, after unlocking newpage, I think there is nothing stopping
> >>>> the newpage from getting dequeued and freed by virtio-balloon. This
> >>>> will result in the newpage
> >>>> 1. No longer having PageMovable()
> >>>> 2. Getting moved to the local list before finally freeing it (using
> >>>>    page->lru)
> >>>
> >>> Does that mean that the virtio-balloon can change the Movable state
> >>> while there are other users of the page? Can you point to the code that
> >>> does it? How come this can be safe at all? Or is the PageMovable stable
> >>> only under the page lock?
> >>>
> >>
> >> PageMovable is stable under the lock. The relevant instructions are in
> >>
> >> mm/balloon_compaction.c and include/linux/balloon_compaction.h
> > 
> > OK, I have just checked __ClearPageMovable and it indeed requires
> > PageLock. Then we also have to move is_lru = __PageMovable(page) after
> > the page lock.
> > 
> 
> I assume that is fine as is as the page is isolated? (yes, it will be
> modified later when moving but we are interested in the original state)

OK, I've missed that the page is indeed isolated. Then the patch makes
sense to me.
David Hildenbrand Jan. 28, 2019, 2:38 p.m. UTC | #6
On 28.01.19 14:35, Michal Hocko wrote:
> On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
>> On 28.01.19 14:21, Michal Hocko wrote:
>>> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
>>>> On 28.01.19 14:07, Michal Hocko wrote:
>>>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
>>>>> [...]
>>>>>> My theory:
>>>>>>
>>>>>> In __unmap_and_move(), we lock the old and newpage and perform the
>>>>>> migration. In case of vitio-balloon, the new page will become
>>>>>> movable, the old page will no longer be movable.
>>>>>>
>>>>>> However, after unlocking newpage, I think there is nothing stopping
>>>>>> the newpage from getting dequeued and freed by virtio-balloon. This
>>>>>> will result in the newpage
>>>>>> 1. No longer having PageMovable()
>>>>>> 2. Getting moved to the local list before finally freeing it (using
>>>>>>    page->lru)
>>>>>
>>>>> Does that mean that the virtio-balloon can change the Movable state
>>>>> while there are other users of the page? Can you point to the code that
>>>>> does it? How come this can be safe at all? Or is the PageMovable stable
>>>>> only under the page lock?
>>>>>
>>>>
>>>> PageMovable is stable under the lock. The relevant instructions are in
>>>>
>>>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
>>>
>>> OK, I have just checked __ClearPageMovable and it indeed requires
>>> PageLock. Then we also have to move is_lru = __PageMovable(page) after
>>> the page lock.
>>>
>>
>> I assume that is fine as is as the page is isolated? (yes, it will be
>> modified later when moving but we are interested in the original state)
> 
> OK, I've missed that the page is indeed isolated. Then the patch makes
> sense to me.
> 

Thanks Michal. I assume this has broken ever since balloon compaction
was introduced. I'll wait a little more and then resend as !RFC with a
cc-stable tag.
Michal Hocko Jan. 28, 2019, 2:53 p.m. UTC | #7
On Mon 28-01-19 15:38:38, David Hildenbrand wrote:
> On 28.01.19 14:35, Michal Hocko wrote:
> > On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
> >> On 28.01.19 14:21, Michal Hocko wrote:
> >>> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
> >>>> On 28.01.19 14:07, Michal Hocko wrote:
> >>>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> >>>>> [...]
> >>>>>> My theory:
> >>>>>>
> >>>>>> In __unmap_and_move(), we lock the old and newpage and perform the
> >>>>>> migration. In case of vitio-balloon, the new page will become
> >>>>>> movable, the old page will no longer be movable.
> >>>>>>
> >>>>>> However, after unlocking newpage, I think there is nothing stopping
> >>>>>> the newpage from getting dequeued and freed by virtio-balloon. This
> >>>>>> will result in the newpage
> >>>>>> 1. No longer having PageMovable()
> >>>>>> 2. Getting moved to the local list before finally freeing it (using
> >>>>>>    page->lru)
> >>>>>
> >>>>> Does that mean that the virtio-balloon can change the Movable state
> >>>>> while there are other users of the page? Can you point to the code that
> >>>>> does it? How come this can be safe at all? Or is the PageMovable stable
> >>>>> only under the page lock?
> >>>>>
> >>>>
> >>>> PageMovable is stable under the lock. The relevant instructions are in
> >>>>
> >>>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
> >>>
> >>> OK, I have just checked __ClearPageMovable and it indeed requires
> >>> PageLock. Then we also have to move is_lru = __PageMovable(page) after
> >>> the page lock.
> >>>
> >>
> >> I assume that is fine as is as the page is isolated? (yes, it will be
> >> modified later when moving but we are interested in the original state)
> > 
> > OK, I've missed that the page is indeed isolated. Then the patch makes
> > sense to me.
> > 
> 
> Thanks Michal. I assume this has broken ever since balloon compaction
> was introduced. I'll wait a little more and then resend as !RFC with a
> cc-stable tag.

Please make sure to CC Minchan when reposting.

Btw.
Acked-by: Michal Hocko <mhocko@suse.com>
Rafael Aquini Jan. 28, 2019, 3:01 p.m. UTC | #8
On Mon, Jan 28, 2019 at 03:38:38PM +0100, David Hildenbrand wrote:
> On 28.01.19 14:35, Michal Hocko wrote:
> > On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
> >> On 28.01.19 14:21, Michal Hocko wrote:
> >>> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
> >>>> On 28.01.19 14:07, Michal Hocko wrote:
> >>>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> >>>>> [...]
> >>>>>> My theory:
> >>>>>>
> >>>>>> In __unmap_and_move(), we lock the old and newpage and perform the
> >>>>>> migration. In case of vitio-balloon, the new page will become
> >>>>>> movable, the old page will no longer be movable.
> >>>>>>
> >>>>>> However, after unlocking newpage, I think there is nothing stopping
> >>>>>> the newpage from getting dequeued and freed by virtio-balloon. This
> >>>>>> will result in the newpage
> >>>>>> 1. No longer having PageMovable()
> >>>>>> 2. Getting moved to the local list before finally freeing it (using
> >>>>>>    page->lru)
> >>>>>
> >>>>> Does that mean that the virtio-balloon can change the Movable state
> >>>>> while there are other users of the page? Can you point to the code that
> >>>>> does it? How come this can be safe at all? Or is the PageMovable stable
> >>>>> only under the page lock?
> >>>>>
> >>>>
> >>>> PageMovable is stable under the lock. The relevant instructions are in
> >>>>
> >>>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
> >>>
> >>> OK, I have just checked __ClearPageMovable and it indeed requires
> >>> PageLock. Then we also have to move is_lru = __PageMovable(page) after
> >>> the page lock.
> >>>
> >>
> >> I assume that is fine as is as the page is isolated? (yes, it will be
> >> modified later when moving but we are interested in the original state)
> > 
> > OK, I've missed that the page is indeed isolated. Then the patch makes
> > sense to me.
> > 
> 
> Thanks Michal. I assume this has broken ever since balloon compaction
> was introduced. I'll wait a little more and then resend as !RFC with a
> cc-stable tag.
>

Yes, balloon deflation could always race against migration
This race was a problem, initially, and was dealt with, via:

commit 117aad1e9e4d97448d1df3f84b08bd65811e6d6a
Author: Rafael Aquini <aquini@redhat.com>
Date:   Mon Sep 30 13:45:16 2013 -0700

    mm: avoid reinserting isolated balloon pages into LRU lists

 

I think this upstream patch has re-introduced it, in a more subtle way,
as we're stumbling on it now, again:

commit d6d86c0a7f8ddc5b38cf089222cb1d9540762dc2
Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Date:   Thu Oct 9 15:29:27 2014 -0700

    mm/balloon_compaction: redesign ballooned pages management
    


On this particular race against migration case, virtio ballon deflation would 
not see it before

commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4
Author: Minchan Kim <minchan@kernel.org>
Date:   Tue Jul 26 15:23:09 2016 -0700

    mm: balloon: use general non-lru movable page feature

as the recently released balloon page would be post-processed 
without the page->lru list handling, which for migration stability
purposes must be done under the protection of page_lock.


get rid of balloon reference count.


-- Rafael
Rafael Aquini Jan. 28, 2019, 3:04 p.m. UTC | #9
On Mon, Jan 28, 2019 at 10:01:56AM -0500, Rafael Aquini wrote:
> On Mon, Jan 28, 2019 at 03:38:38PM +0100, David Hildenbrand wrote:
> > On 28.01.19 14:35, Michal Hocko wrote:
> > > On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
> > >> On 28.01.19 14:21, Michal Hocko wrote:
> > >>> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
> > >>>> On 28.01.19 14:07, Michal Hocko wrote:
> > >>>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
> > >>>>> [...]
> > >>>>>> My theory:
> > >>>>>>
> > >>>>>> In __unmap_and_move(), we lock the old and newpage and perform the
> > >>>>>> migration. In case of vitio-balloon, the new page will become
> > >>>>>> movable, the old page will no longer be movable.
> > >>>>>>
> > >>>>>> However, after unlocking newpage, I think there is nothing stopping
> > >>>>>> the newpage from getting dequeued and freed by virtio-balloon. This
> > >>>>>> will result in the newpage
> > >>>>>> 1. No longer having PageMovable()
> > >>>>>> 2. Getting moved to the local list before finally freeing it (using
> > >>>>>>    page->lru)
> > >>>>>
> > >>>>> Does that mean that the virtio-balloon can change the Movable state
> > >>>>> while there are other users of the page? Can you point to the code that
> > >>>>> does it? How come this can be safe at all? Or is the PageMovable stable
> > >>>>> only under the page lock?
> > >>>>>
> > >>>>
> > >>>> PageMovable is stable under the lock. The relevant instructions are in
> > >>>>
> > >>>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
> > >>>
> > >>> OK, I have just checked __ClearPageMovable and it indeed requires
> > >>> PageLock. Then we also have to move is_lru = __PageMovable(page) after
> > >>> the page lock.
> > >>>
> > >>
> > >> I assume that is fine as is as the page is isolated? (yes, it will be
> > >> modified later when moving but we are interested in the original state)
> > > 
> > > OK, I've missed that the page is indeed isolated. Then the patch makes
> > > sense to me.
> > > 
> > 
> > Thanks Michal. I assume this has broken ever since balloon compaction
> > was introduced. I'll wait a little more and then resend as !RFC with a
> > cc-stable tag.
> >
> 
> Yes, balloon deflation could always race against migration
> This race was a problem, initially, and was dealt with, via:
> 
> commit 117aad1e9e4d97448d1df3f84b08bd65811e6d6a
> Author: Rafael Aquini <aquini@redhat.com>
> Date:   Mon Sep 30 13:45:16 2013 -0700
> 
>     mm: avoid reinserting isolated balloon pages into LRU lists
> 
>  
> 
> I think this upstream patch has re-introduced it, in a more subtle way,
> as we're stumbling on it now, again:
> 
> commit d6d86c0a7f8ddc5b38cf089222cb1d9540762dc2
> Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> Date:   Thu Oct 9 15:29:27 2014 -0700
> 
>     mm/balloon_compaction: redesign ballooned pages management
>     
> 
> 
> On this particular race against migration case, virtio ballon deflation would 
> not see it before
> 
> commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Tue Jul 26 15:23:09 2016 -0700
> 
>     mm: balloon: use general non-lru movable page feature
> 
> as the recently released balloon page would be post-processed 
> without the page->lru list handling, which for migration stability
> purposes must be done under the protection of page_lock.
> 
> 

missing part here:

I think your patch adresses this new case.


Acked-by: Rafael Aquini <aquini@redhat.com>


> get rid of balloon reference count.

^^ this was a left over (sorry about my fat-fingers)
> 
> 
> -- Rafael
David Hildenbrand Jan. 28, 2019, 3:13 p.m. UTC | #10
On 28.01.19 16:04, Rafael Aquini wrote:
> On Mon, Jan 28, 2019 at 10:01:56AM -0500, Rafael Aquini wrote:
>> On Mon, Jan 28, 2019 at 03:38:38PM +0100, David Hildenbrand wrote:
>>> On 28.01.19 14:35, Michal Hocko wrote:
>>>> On Mon 28-01-19 14:22:52, David Hildenbrand wrote:
>>>>> On 28.01.19 14:21, Michal Hocko wrote:
>>>>>> On Mon 28-01-19 14:14:28, David Hildenbrand wrote:
>>>>>>> On 28.01.19 14:07, Michal Hocko wrote:
>>>>>>>> On Mon 28-01-19 13:16:09, David Hildenbrand wrote:
>>>>>>>> [...]
>>>>>>>>> My theory:
>>>>>>>>>
>>>>>>>>> In __unmap_and_move(), we lock the old and newpage and perform the
>>>>>>>>> migration. In case of vitio-balloon, the new page will become
>>>>>>>>> movable, the old page will no longer be movable.
>>>>>>>>>
>>>>>>>>> However, after unlocking newpage, I think there is nothing stopping
>>>>>>>>> the newpage from getting dequeued and freed by virtio-balloon. This
>>>>>>>>> will result in the newpage
>>>>>>>>> 1. No longer having PageMovable()
>>>>>>>>> 2. Getting moved to the local list before finally freeing it (using
>>>>>>>>>    page->lru)
>>>>>>>>
>>>>>>>> Does that mean that the virtio-balloon can change the Movable state
>>>>>>>> while there are other users of the page? Can you point to the code that
>>>>>>>> does it? How come this can be safe at all? Or is the PageMovable stable
>>>>>>>> only under the page lock?
>>>>>>>>
>>>>>>>
>>>>>>> PageMovable is stable under the lock. The relevant instructions are in
>>>>>>>
>>>>>>> mm/balloon_compaction.c and include/linux/balloon_compaction.h
>>>>>>
>>>>>> OK, I have just checked __ClearPageMovable and it indeed requires
>>>>>> PageLock. Then we also have to move is_lru = __PageMovable(page) after
>>>>>> the page lock.
>>>>>>
>>>>>
>>>>> I assume that is fine as is as the page is isolated? (yes, it will be
>>>>> modified later when moving but we are interested in the original state)
>>>>
>>>> OK, I've missed that the page is indeed isolated. Then the patch makes
>>>> sense to me.
>>>>
>>>
>>> Thanks Michal. I assume this has broken ever since balloon compaction
>>> was introduced. I'll wait a little more and then resend as !RFC with a
>>> cc-stable tag.
>>>
>>
>> Yes, balloon deflation could always race against migration
>> This race was a problem, initially, and was dealt with, via:
>>
>> commit 117aad1e9e4d97448d1df3f84b08bd65811e6d6a
>> Author: Rafael Aquini <aquini@redhat.com>
>> Date:   Mon Sep 30 13:45:16 2013 -0700
>>
>>     mm: avoid reinserting isolated balloon pages into LRU lists
>>
>>  
>>
>> I think this upstream patch has re-introduced it, in a more subtle way,
>> as we're stumbling on it now, again:
>>
>> commit d6d86c0a7f8ddc5b38cf089222cb1d9540762dc2
>> Author: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
>> Date:   Thu Oct 9 15:29:27 2014 -0700
>>
>>     mm/balloon_compaction: redesign ballooned pages management
>>     
>>
>>
>> On this particular race against migration case, virtio ballon deflation would 
>> not see it before
>>
>> commit b1123ea6d3b3da25af5c8a9d843bd07ab63213f4
>> Author: Minchan Kim <minchan@kernel.org>
>> Date:   Tue Jul 26 15:23:09 2016 -0700
>>
>>     mm: balloon: use general non-lru movable page feature
>>
>> as the recently released balloon page would be post-processed 
>> without the page->lru list handling, which for migration stability
>> purposes must be done under the protection of page_lock.
>>
>>
> 
> missing part here:
> 
> I think your patch adresses this new case.
> 
> 
> Acked-by: Rafael Aquini <aquini@redhat.com>
> 
> 
>> get rid of balloon reference count.
> 
> ^^ this was a left over (sorry about my fat-fingers)

:D

Thanks! I'll resend with

Cc: stable@vger.kernel.org # 3.12+
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages
management")
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 4512afab46ac..31e002270b05 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1135,10 +1135,12 @@  static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * If migration is successful, decrease refcount of the newpage
 	 * which will not free the page because new page owner increased
 	 * refcounter. As well, if it is LRU page, add the page to LRU
-	 * list in here.
+	 * list in here. Don't rely on PageMovable(newpage), as that could
+	 * already have changed after unlocking newpage (e.g.
+	 * virtio-balloon deflation).
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__PageMovable(newpage)))
+		if (unlikely(!is_lru))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);