postcopy migration hangs while loading virtio state

Message ID	55708cff-cdda-9b1d-2106-1c6d2774f890@de.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <borntraeger@de.ibm.com>; Tue, 4 Jul 2017 01:48:18 -0600 Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 4 Jul 2017 01:48:14 -0600 To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com> References: <75a4c5ff-385e-31ac-5f86-883b082cd94e@de.ibm.com> <20170424143516.GD2075@work-vm> <5c0608dd-ba22-dc09-71a1-bb95c977f77d@de.ibm.com> <20170424191202.GQ2362@work-vm> <097a5085-1128-cf2d-abc4-54660a608f36@de.ibm.com> <20170426110144.GF2098@work-vm> <cbf8c55d-9473-9f6f-eb89-779afe7c1fd2@de.ibm.com> <cec279c5-9900-ca28-103b-b95c36d9d989@de.ibm.com> <20170630163139.GC2437@work-vm> <20170703192353-mutt-send-email-mst@kernel.org> <20170703190728.GE2206@work-vm> From: Christian Borntraeger <borntraeger@de.ibm.com> Date: Tue, 4 Jul 2017 09:48:11 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170703190728.GE2206@work-vm> Content-Type: text/plain; charset=utf-8 Content-Language: en-IE Content-Transfer-Encoding: 7bit Message-Id: <55708cff-cdda-9b1d-2106-1c6d2774f890@de.ibm.com> Subject: Re: [Qemu-devel] postcopy migration hangs while loading virtio state Precedence: list Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>, Andrea Arcangeli <aarcange@redhat.com>, thuth@redhat.com, qemu-devel <qemu-devel@nongnu.org>, Juan Quintela <quintela@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

55708cff-cdda-9b1d-2106-1c6d2774f890@de.ibm.com (mailing list archive)

State

New, archived

Headers

To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
References: <75a4c5ff-385e-31ac-5f86-883b082cd94e@de.ibm.com>
	<20170424143516.GD2075@work-vm>
	<5c0608dd-ba22-dc09-71a1-bb95c977f77d@de.ibm.com>
	<20170424191202.GQ2362@work-vm>
	<097a5085-1128-cf2d-abc4-54660a608f36@de.ibm.com>
	<20170426110144.GF2098@work-vm>
	<cbf8c55d-9473-9f6f-eb89-779afe7c1fd2@de.ibm.com>
	<cec279c5-9900-ca28-103b-b95c36d9d989@de.ibm.com>
	<20170630163139.GC2437@work-vm>
	<20170703192353-mutt-send-email-mst@kernel.org>
	<20170703190728.GE2206@work-vm>
From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Tue, 4 Jul 2017 09:48:11 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.1.0
MIME-Version: 1.0
In-Reply-To: <20170703190728.GE2206@work-vm>
Content-Type: text/plain; charset=utf-8
Content-Language: en-IE
Content-Transfer-Encoding: 7bit
Message-Id: <55708cff-cdda-9b1d-2106-1c6d2774f890@de.ibm.com>
Subject: Re: [Qemu-devel] postcopy migration hangs while loading virtio state
Precedence: list
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>, thuth@redhat.com,
	qemu-devel <qemu-devel@nongnu.org>, Juan Quintela <quintela@redhat.com>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Christian Borntraeger July 4, 2017, 7:48 a.m. UTC

On 07/03/2017 09:07 PM, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Fri, Jun 30, 2017 at 05:31:39PM +0100, Dr. David Alan Gilbert wrote:
>>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>>>> On 04/26/2017 01:45 PM, Christian Borntraeger wrote:
>>>>
>>>>>> Hmm, I have a theory, if the flags field has bit 1 set, i.e. RAM_SAVE_FLAG_COMPRESS
>>>>>> then try changing ram_handle_compressed to always do the memset.
>>>>>
>>>>> FWIW, changing ram_handle_compressed to always memset makes the problem go away.
>>>>
>>>> It is still running fine now with the "always memset change"
>>>
>>> Did we ever nail down a fix for this; as I remember Andrea said
>>> we shouldn't need to do that memset, but we came to the conclusion
>>> it was something specific to how s390 protection keys worked.
>>>
>>> Dave
>>
>> No we didn't. Let's merge that for now, with a comment that
>> we don't really understand what's going on?
> 
> Hmm no, I don't really want to change the !s390 behaviour, especially
> since it causes allocation that we otherwise avoid and Andrea's
> reply tothe original post pointed out it's not necessary.


Since storage keys are per physical page we must not have shared pages.
Therefore in s390_enable_skey we already do a break_ksm (via ksm_madvise),
in other words we already allocate pages on the keyless->keyed switch.

So why not do the same for zero pages - instead of invalidating the page
table entry, lets just do a proper COW.

Something like this maybe (Andrea, Martin any better way to do that?)

Comments

Martin Schwidefsky July 4, 2017, 8:16 a.m. UTC | #1

On Tue, 4 Jul 2017 09:48:11 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 07/03/2017 09:07 PM, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:  
> >> On Fri, Jun 30, 2017 at 05:31:39PM +0100, Dr. David Alan Gilbert wrote:  
> >>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:  
> >>>> On 04/26/2017 01:45 PM, Christian Borntraeger wrote:
> >>>>  
> >>>>>> Hmm, I have a theory, if the flags field has bit 1 set, i.e. RAM_SAVE_FLAG_COMPRESS
> >>>>>> then try changing ram_handle_compressed to always do the memset.  
> >>>>>
> >>>>> FWIW, changing ram_handle_compressed to always memset makes the problem go away.  
> >>>>
> >>>> It is still running fine now with the "always memset change"  
> >>>
> >>> Did we ever nail down a fix for this; as I remember Andrea said
> >>> we shouldn't need to do that memset, but we came to the conclusion
> >>> it was something specific to how s390 protection keys worked.
> >>>
> >>> Dave  
> >>
> >> No we didn't. Let's merge that for now, with a comment that
> >> we don't really understand what's going on?  
> > 
> > Hmm no, I don't really want to change the !s390 behaviour, especially
> > since it causes allocation that we otherwise avoid and Andrea's
> > reply tothe original post pointed out it's not necessary.  
> 
> 
> Since storage keys are per physical page we must not have shared pages.
> Therefore in s390_enable_skey we already do a break_ksm (via ksm_madvise),
> in other words we already allocate pages on the keyless->keyed switch.
> 
> So why not do the same for zero pages - instead of invalidating the page
> table entry, lets just do a proper COW.
> 
> Something like this maybe (Andrea, Martin any better way to do that?)
> 
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 4fb3d3c..11475c7 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2149,13 +2149,18 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
>  static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>                               unsigned long next, struct mm_walk *walk)
>  {
> +       struct page *page;
>         /*
> -        * Remove all zero page mappings,
> +        * Remove all zero page mappings with a COW
>          * after establishing a policy to forbid zero page mappings
>          * following faults for that page will get fresh anonymous pages
>          */
> -       if (is_zero_pfn(pte_pfn(*pte)))
> -               ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID));
> +       if (is_zero_pfn(pte_pfn(*pte))) {
> +               if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
> +                       put_page(page);
> +               else
> +                       return -ENOMEM;
> +       }
>         /* Clear storage key */
>         ptep_zap_key(walk->mm, addr, pte);
>         return 0;

I do not quite get the problem here. The zero-page mappings are always
marked as _PAGE_SPECIAL. These should be safe to replace with an empty
pte. We do mark all VMAs as unmergeable prior to the page table walk
that replaces all zero-page mappings. What is the get_user_pages() call
supposed to do?

Christian Borntraeger July 4, 2017, 8:27 a.m. UTC | #2

On 07/04/2017 10:16 AM, Martin Schwidefsky wrote:
> On Tue, 4 Jul 2017 09:48:11 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 07/03/2017 09:07 PM, Dr. David Alan Gilbert wrote:
>>> * Michael S. Tsirkin (mst@redhat.com) wrote:  
>>>> On Fri, Jun 30, 2017 at 05:31:39PM +0100, Dr. David Alan Gilbert wrote:  
>>>>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:  
>>>>>> On 04/26/2017 01:45 PM, Christian Borntraeger wrote:
>>>>>>  
>>>>>>>> Hmm, I have a theory, if the flags field has bit 1 set, i.e. RAM_SAVE_FLAG_COMPRESS
>>>>>>>> then try changing ram_handle_compressed to always do the memset.  
>>>>>>>
>>>>>>> FWIW, changing ram_handle_compressed to always memset makes the problem go away.  
>>>>>>
>>>>>> It is still running fine now with the "always memset change"  
>>>>>
>>>>> Did we ever nail down a fix for this; as I remember Andrea said
>>>>> we shouldn't need to do that memset, but we came to the conclusion
>>>>> it was something specific to how s390 protection keys worked.
>>>>>
>>>>> Dave  
>>>>
>>>> No we didn't. Let's merge that for now, with a comment that
>>>> we don't really understand what's going on?  
>>>
>>> Hmm no, I don't really want to change the !s390 behaviour, especially
>>> since it causes allocation that we otherwise avoid and Andrea's
>>> reply tothe original post pointed out it's not necessary.  
>>
>>
>> Since storage keys are per physical page we must not have shared pages.
>> Therefore in s390_enable_skey we already do a break_ksm (via ksm_madvise),
>> in other words we already allocate pages on the keyless->keyed switch.
>>
>> So why not do the same for zero pages - instead of invalidating the page
>> table entry, lets just do a proper COW.
>>
>> Something like this maybe (Andrea, Martin any better way to do that?)
>>
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index 4fb3d3c..11475c7 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2149,13 +2149,18 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
>>  static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>>                               unsigned long next, struct mm_walk *walk)yy
>>  {
>> +       struct page *page;
>>         /*
>> -        * Remove all zero page mappings,
>> +        * Remove all zero page mappings with a COW
>>          * after establishing a policy to forbid zero page mappings
>>          * following faults for that page will get fresh anonymous pages
>>          */
>> -       if (is_zero_pfn(pte_pfn(*pte)))
>> -               ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID));
>> +       if (is_zero_pfn(pte_pfn(*pte))) {
>> +               if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
>> +                       put_page(page);
>> +               else
>> +                       return -ENOMEM;
>> +       }
>>         /* Clear storage key */
>>         ptep_zap_key(walk->mm, addr, pte);
>>         return 0;
> 
> I do not quite get the problem here. The zero-page mappings are always
> marked as _PAGE_SPECIAL. These should be safe to replace with an empty
> pte. We do mark all VMAs as unmergeable prior to the page table walk
> that replaces all zero-page mappings. What is the get_user_pages() call
> supposed to do?

the problem is that postcopy migration will do a read on the target system
for zero pages (to fault in zero pages).
The logic in postcopy then relies on this page never ever delivering a 
userfault again.

If we now enable keys on the target host, we will break the zero pages with
an empty PTE (lazy faulting).
With an empty pte we will get a userfault for this if this page is accesses
again on the target. Now postcopy "knows" that this page was already transferred
and bascially ignores this fault. Since postcopy will never resolve this
userfault the guest will hang.

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 4fb3d3c..11475c7 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2149,13 +2149,18 @@  EXPORT_SYMBOL_GPL(s390_enable_sie);
 static int __s390_enable_skey(pte_t *pte, unsigned long addr,
                              unsigned long next, struct mm_walk *walk)
 {
+       struct page *page;
        /*
-        * Remove all zero page mappings,
+        * Remove all zero page mappings with a COW
         * after establishing a policy to forbid zero page mappings
         * following faults for that page will get fresh anonymous pages
         */
-       if (is_zero_pfn(pte_pfn(*pte)))
-               ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID));
+       if (is_zero_pfn(pte_pfn(*pte))) {
+               if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
+                       put_page(page);
+               else
+                       return -ENOMEM;
+       }
        /* Clear storage key */
        ptep_zap_key(walk->mm, addr, pte);
        return 0;

postcopy migration hangs while loading virtio state

Commit Message

Comments

Patch