diff mbox series

x86/PV: make post-migration page state consistent

Message ID f7ed53c1-768c-cc71-a432-553b56f7f0a7@suse.com (mailing list archive)
State Superseded
Headers show
Series x86/PV: make post-migration page state consistent | expand

Commit Message

Jan Beulich Sept. 11, 2020, 10:34 a.m. UTC
When a page table page gets de-validated, its type reference count drops
to zero (and PGT_validated gets cleared), but its type remains intact.
XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
such pages. An intermediate write to such a page via e.g.
MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
return. In libxc the decision which pages to normalize / localize
depends solely on the type returned from the domctl. As a result without
further precautions the guest won't be able to tell whether such a page
has had its (apparent) PTE entries transitioned to the new MFNs.

Add a check of PGT_validated, thus consistently avoiding normalization /
localization in the tool stack.

Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the
change at hand, also change the variable's initializer to use this
constant, too. Take the opportunity and also adjust its type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper Sept. 11, 2020, 11:55 a.m. UTC | #1
On 11/09/2020 11:34, Jan Beulich wrote:
> When a page table page gets de-validated, its type reference count drops
> to zero (and PGT_validated gets cleared), but its type remains intact.
> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
> such pages. An intermediate write to such a page via e.g.
> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
> return. In libxc the decision which pages to normalize / localize
> depends solely on the type returned from the domctl. As a result without
> further precautions the guest won't be able to tell whether such a page
> has had its (apparent) PTE entries transitioned to the new MFNs.

I'm afraid I don't follow what the problem is.

Yes - unvalidated pages probably ought to be consistently NOTAB, so this
is probably a good change, but I don't see how it impacts the migration
logic.

We already have to cope with a page really changing types in parallel
with the normalise/localise logic (that was a "fun" one to debug), which
is why errors in that logic are specifically not fatal while the guest
is live - the frame gets re-marked as dirty, and deferred until the next
round.

Errors encountered after the VM has been paused are fatal.

However, at no point, even with an unvalidated pagetable type, can the
contents of the page be anything other than legal PTEs.  (I think)

~Andrew
Jan Beulich Sept. 11, 2020, 12:37 p.m. UTC | #2
On 11.09.2020 13:55, Andrew Cooper wrote:
> On 11/09/2020 11:34, Jan Beulich wrote:
>> When a page table page gets de-validated, its type reference count drops
>> to zero (and PGT_validated gets cleared), but its type remains intact.
>> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
>> such pages. An intermediate write to such a page via e.g.
>> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
>> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
>> return. In libxc the decision which pages to normalize / localize
>> depends solely on the type returned from the domctl. As a result without
>> further precautions the guest won't be able to tell whether such a page
>> has had its (apparent) PTE entries transitioned to the new MFNs.
> 
> I'm afraid I don't follow what the problem is.
> 
> Yes - unvalidated pages probably ought to be consistently NOTAB, so this
> is probably a good change, but I don't see how it impacts the migration
> logic.

It's not the migration logic itself that's impacted, but the state
of guest pages after migration. I'm afraid I can only try to expand
on the original description.

Case 1: Once an Ln page has been unvalidated, due to the described
behavior the migration code in libxc will normalize and then localize
it. Therefore the guest could go and directly try to use it as a
page table again. This should work as long as all of the entries in
the page can still be successfully validated (i.e. unless the guest
itself has made changes to the state of other pages).

Case 2: Once an Ln page has been unvalidated, the guest for whatever
reason still writes to it through e.g. MMU_NORMAL_PT_UPDATE. Prior
to migration, and provided the new entry can be validated (and no
other reference page has changed state), the page can still be
converted to a proper page table one again. If, however, migration
occurs inbetween, the page now won't get normalized and then
localized. The MFNs in it are unlikely to make sense anymore, and
hence an attempt to make the page a page table again is likely to
fail (or if it doesn't fail the result is unlikely to be what's
intended).

Since there's no way to make case 2 "work", the only choice is to
make case 1 behave like case 2, in order for the behavior to be
predictable / consistent.

> We already have to cope with a page really changing types in parallel
> with the normalise/localise logic (that was a "fun" one to debug), which
> is why errors in that logic are specifically not fatal while the guest
> is live - the frame gets re-marked as dirty, and deferred until the next
> round.
> 
> Errors encountered after the VM has been paused are fatal.
> 
> However, at no point, even with an unvalidated pagetable type, can the
> contents of the page be anything other than legal PTEs.  (I think)

Correct, because in order to write to the page one has to either
make it a page table one again (and then write through hypercall
or for L1 through PTWR) or the mmu-normal-pt-update would first
convert the page to a writable one.

Jan
Jan Beulich Sept. 28, 2020, 12:16 p.m. UTC | #3
On 11.09.2020 14:37, Jan Beulich wrote:
> On 11.09.2020 13:55, Andrew Cooper wrote:
>> On 11/09/2020 11:34, Jan Beulich wrote:
>>> When a page table page gets de-validated, its type reference count drops
>>> to zero (and PGT_validated gets cleared), but its type remains intact.
>>> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
>>> such pages. An intermediate write to such a page via e.g.
>>> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
>>> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
>>> return. In libxc the decision which pages to normalize / localize
>>> depends solely on the type returned from the domctl. As a result without
>>> further precautions the guest won't be able to tell whether such a page
>>> has had its (apparent) PTE entries transitioned to the new MFNs.
>>
>> I'm afraid I don't follow what the problem is.
>>
>> Yes - unvalidated pages probably ought to be consistently NOTAB, so this
>> is probably a good change, but I don't see how it impacts the migration
>> logic.
> 
> It's not the migration logic itself that's impacted, but the state
> of guest pages after migration. I'm afraid I can only try to expand
> on the original description.
> 
> Case 1: Once an Ln page has been unvalidated, due to the described
> behavior the migration code in libxc will normalize and then localize
> it. Therefore the guest could go and directly try to use it as a
> page table again. This should work as long as all of the entries in
> the page can still be successfully validated (i.e. unless the guest
> itself has made changes to the state of other pages).
> 
> Case 2: Once an Ln page has been unvalidated, the guest for whatever
> reason still writes to it through e.g. MMU_NORMAL_PT_UPDATE. Prior
> to migration, and provided the new entry can be validated (and no
> other reference page has changed state), the page can still be
> converted to a proper page table one again. If, however, migration
> occurs inbetween, the page now won't get normalized and then
> localized. The MFNs in it are unlikely to make sense anymore, and
> hence an attempt to make the page a page table again is likely to
> fail (or if it doesn't fail the result is unlikely to be what's
> intended).
> 
> Since there's no way to make case 2 "work", the only choice is to
> make case 1 behave like case 2, in order for the behavior to be
> predictable / consistent.
> 
>> We already have to cope with a page really changing types in parallel
>> with the normalise/localise logic (that was a "fun" one to debug), which
>> is why errors in that logic are specifically not fatal while the guest
>> is live - the frame gets re-marked as dirty, and deferred until the next
>> round.
>>
>> Errors encountered after the VM has been paused are fatal.
>>
>> However, at no point, even with an unvalidated pagetable type, can the
>> contents of the page be anything other than legal PTEs.  (I think)
> 
> Correct, because in order to write to the page one has to either
> make it a page table one again (and then write through hypercall
> or for L1 through PTWR) or the mmu-normal-pt-update would first
> convert the page to a writable one.

Besides wanting to ping this change / discussion, I'd also like
to correct myself on this last part of the reply: The above
applies to pages after having got de-validated. However, prior
to validation pages have their type changed early (type ref count
remains zero), but at this point it can't be told yet whether the
page consists of all legal PTEs.

Jan
Jan Beulich Oct. 16, 2020, 10:21 a.m. UTC | #4
On 11.09.2020 12:34, Jan Beulich wrote:
> When a page table page gets de-validated, its type reference count drops
> to zero (and PGT_validated gets cleared), but its type remains intact.
> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
> such pages. An intermediate write to such a page via e.g.
> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
> return. In libxc the decision which pages to normalize / localize
> depends solely on the type returned from the domctl. As a result without
> further precautions the guest won't be able to tell whether such a page
> has had its (apparent) PTE entries transitioned to the new MFNs.
> 
> Add a check of PGT_validated, thus consistently avoiding normalization /
> localization in the tool stack.
> 
> Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the
> change at hand, also change the variable's initializer to use this
> constant, too. Take the opportunity and also adjust its type.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I think I did address all questions here.

Jan

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -215,7 +215,8 @@ long arch_do_domctl(
>  
>          for ( i = 0; i < num; ++i )
>          {
> -            unsigned long gfn = 0, type = 0;
> +            unsigned long gfn = 0;
> +            unsigned int type = XEN_DOMCTL_PFINFO_NOTAB;
>              struct page_info *page;
>              p2m_type_t t;
>  
> @@ -255,6 +256,8 @@ long arch_do_domctl(
>  
>                  if ( page->u.inuse.type_info & PGT_pinned )
>                      type |= XEN_DOMCTL_PFINFO_LPINTAB;
> +                else if ( !(page->u.inuse.type_info & PGT_validated) )
> +                    type = XEN_DOMCTL_PFINFO_NOTAB;
>  
>                  if ( page->count_info & PGC_broken )
>                      type = XEN_DOMCTL_PFINFO_BROKEN;
>
Jan Beulich Oct. 29, 2020, 1:53 p.m. UTC | #5
On 11.09.2020 12:34, Jan Beulich wrote:
> When a page table page gets de-validated, its type reference count drops
> to zero (and PGT_validated gets cleared), but its type remains intact.
> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
> such pages. An intermediate write to such a page via e.g.
> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
> return. In libxc the decision which pages to normalize / localize
> depends solely on the type returned from the domctl. As a result without
> further precautions the guest won't be able to tell whether such a page
> has had its (apparent) PTE entries transitioned to the new MFNs.
> 
> Add a check of PGT_validated, thus consistently avoiding normalization /
> localization in the tool stack.
> 
> Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the
> change at hand, also change the variable's initializer to use this
> constant, too. Take the opportunity and also adjust its type.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I think I did address all questions here.

Jan

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -215,7 +215,8 @@ long arch_do_domctl(
>  
>          for ( i = 0; i < num; ++i )
>          {
> -            unsigned long gfn = 0, type = 0;
> +            unsigned long gfn = 0;
> +            unsigned int type = XEN_DOMCTL_PFINFO_NOTAB;
>              struct page_info *page;
>              p2m_type_t t;
>  
> @@ -255,6 +256,8 @@ long arch_do_domctl(
>  
>                  if ( page->u.inuse.type_info & PGT_pinned )
>                      type |= XEN_DOMCTL_PFINFO_LPINTAB;
> +                else if ( !(page->u.inuse.type_info & PGT_validated) )
> +                    type = XEN_DOMCTL_PFINFO_NOTAB;
>  
>                  if ( page->count_info & PGC_broken )
>                      type = XEN_DOMCTL_PFINFO_BROKEN;
>
diff mbox series

Patch

--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -215,7 +215,8 @@  long arch_do_domctl(
 
         for ( i = 0; i < num; ++i )
         {
-            unsigned long gfn = 0, type = 0;
+            unsigned long gfn = 0;
+            unsigned int type = XEN_DOMCTL_PFINFO_NOTAB;
             struct page_info *page;
             p2m_type_t t;
 
@@ -255,6 +256,8 @@  long arch_do_domctl(
 
                 if ( page->u.inuse.type_info & PGT_pinned )
                     type |= XEN_DOMCTL_PFINFO_LPINTAB;
+                else if ( !(page->u.inuse.type_info & PGT_validated) )
+                    type = XEN_DOMCTL_PFINFO_NOTAB;
 
                 if ( page->count_info & PGC_broken )
                     type = XEN_DOMCTL_PFINFO_BROKEN;