diff mbox series

[v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

Message ID 20191127120046.1246-1-pdurrant@amazon.com (mailing list archive)
State Superseded
Headers show
Series [v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed | expand

Commit Message

Paul Durrant Nov. 27, 2019, noon UTC
From: Julien Grall <jgrall@amazon.com>

A guest will setup a shared page with the hypervisor for each vCPU via
XENPMU_init. The page will then get mapped in the hypervisor and only
released when XENPMU_finish is called.

This means that if the guest fails to invoke XENPMU_finish, e.g if it is
destroyed rather than cleanly shut down, the page will stay mapped in the
hypervisor. One of the consequences is the domain can never be fully
destroyed as a page reference is still held.

As Xen should never rely on the guest to correctly clean-up any
allocation in the hypervisor, we should also unmap such pages during the
domain destruction if there are any left.

We can re-use the same logic as in pvpmu_finish(). To avoid
duplication, move the logic in a new function that can also be called
from vpmu_destroy().

NOTE: The call to vpmu_destroy() must also be moved from
      arch_vcpu_destroy() into domain_relinquish_resources() such that the
      reference on the mapped page does not prevent domain_destroy() (which
      calls arch_vcpu_destroy()) from being called.
      Also, whils it appears that vpmu_arch_destroy() is idempotent it is
      by no means obvious. Hence move manipulation of the
      VPMU_CONTEXT_ALLOCATED flag out of implementation specific code and
      make sure it is cleared at the end of vpmu_arch_destroy().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2:
 - Re-word commit comment slightly
 - Re-enforce idempotency of vmpu_arch_destroy()
 - Move invocation of vpmu_destroy() earlier in
   domain_relinquish_resources()
---
 xen/arch/x86/cpu/vpmu.c       | 49 +++++++++++++++++++++--------------
 xen/arch/x86/cpu/vpmu_amd.c   |  1 -
 xen/arch/x86/cpu/vpmu_intel.c |  2 --
 xen/arch/x86/domain.c         | 10 ++++---
 4 files changed, 35 insertions(+), 27 deletions(-)

Comments

Jan Beulich Nov. 27, 2019, 3:44 p.m. UTC | #1
On 27.11.2019 13:00, Paul Durrant wrote:
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -479,6 +479,8 @@ static int vpmu_arch_initialise(struct vcpu *v)
>  
>      if ( ret )
>          printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
> +    else
> +        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
>  
>      return ret;
>  }
> @@ -576,11 +578,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
>  
>           vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
>      }
> +
> +    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
>  }

Boris,

I'd like to ask that you comment on this part of the change at
least, as I seem to vaguely recall that things were intentionally
not done this way originally.

Paul,

everything else looks god to me now.

Jan
Boris Ostrovsky Nov. 27, 2019, 4:32 p.m. UTC | #2
On 11/27/19 10:44 AM, Jan Beulich wrote:
> On 27.11.2019 13:00, Paul Durrant wrote:
>> --- a/xen/arch/x86/cpu/vpmu.c
>> +++ b/xen/arch/x86/cpu/vpmu.c
>> @@ -479,6 +479,8 @@ static int vpmu_arch_initialise(struct vcpu *v)
>>  
>>      if ( ret )
>>          printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
>> +    else
>> +        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);

That won't work I think.

On Intel the context is allocated lazily for HVM/PVH guests during the
first MSR access. For example:

core2_vpmu_do_wrmsr() ->
    core2_vpmu_msr_common_check()):
        if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
             !core2_vpmu_alloc_resource(current) )
                return 0;

For PV guests the context *is* allocated from vmx_vpmu_initialise().

I don't remember why only PV does eager allocation but I think doing it
for all guests would make code much simpler and then this patch will be
correct.

-boris


>>  
>>      return ret;
>>  }
>> @@ -576,11 +578,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
>>  
>>           vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
>>      }
>> +
>> +    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
>>  }
> Boris,
>
> I'd like to ask that you comment on this part of the change at
> least, as I seem to vaguely recall that things were intentionally
> not done this way originally.
>
> Paul,
>
> everything else looks god to me now.
>
> Jan
Paul Durrant Nov. 27, 2019, 4:46 p.m. UTC | #3
> -----Original Message-----
> From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Sent: 27 November 2019 16:32
> To: Jan Beulich <jbeulich@suse.com>; Durrant, Paul <pdurrant@amazon.com>
> Cc: Grall, Julien <jgrall@amazon.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Jun
> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Wei
> Liu <wl@xen.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 11/27/19 10:44 AM, Jan Beulich wrote:
> > On 27.11.2019 13:00, Paul Durrant wrote:
> >> --- a/xen/arch/x86/cpu/vpmu.c
> >> +++ b/xen/arch/x86/cpu/vpmu.c
> >> @@ -479,6 +479,8 @@ static int vpmu_arch_initialise(struct vcpu *v)
> >>
> >>      if ( ret )
> >>          printk(XENLOG_G_WARNING "VPMU: Initialization failed for
> %pv\n", v);
> >> +    else
> >> +        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> 
> That won't work I think.
> 
> On Intel the context is allocated lazily for HVM/PVH guests during the
> first MSR access. For example:
> 
> core2_vpmu_do_wrmsr() ->
>     core2_vpmu_msr_common_check()):
>         if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
>              !core2_vpmu_alloc_resource(current) )
>                 return 0;
> 
> For PV guests the context *is* allocated from vmx_vpmu_initialise().
> 
> I don't remember why only PV does eager allocation but I think doing it
> for all guests would make code much simpler and then this patch will be
> correct.
> 

Ok. Simpler if I leave setting the flag in the implementation code. I think clearing it in vcpu_arch_destroy() would still be correct in all cases.

  Paul

> -boris
> 
> 
> >>
> >>      return ret;
> >>  }
> >> @@ -576,11 +578,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >>
> >>           vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> >>      }
> >> +
> >> +    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> >>  }
> > Boris,
> >
> > I'd like to ask that you comment on this part of the change at
> > least, as I seem to vaguely recall that things were intentionally
> > not done this way originally.
> >
> > Paul,
> >
> > everything else looks god to me now.
> >
> > Jan
Julien Grall Nov. 27, 2019, 7:41 p.m. UTC | #4
Hi Paul,

On 27/11/2019 12:00, Paul Durrant wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> A guest will setup a shared page with the hypervisor for each vCPU via
> XENPMU_init. The page will then get mapped in the hypervisor and only
> released when XENPMU_finish is called.
> 
> This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> destroyed rather than cleanly shut down, the page will stay mapped in the
> hypervisor. One of the consequences is the domain can never be fully
> destroyed as a page reference is still held.
> 
> As Xen should never rely on the guest to correctly clean-up any
> allocation in the hypervisor, we should also unmap such pages during the
> domain destruction if there are any left.
> 
> We can re-use the same logic as in pvpmu_finish(). To avoid
> duplication, move the logic in a new function that can also be called
> from vpmu_destroy().
> 
> NOTE: The call to vpmu_destroy() must also be moved from
>        arch_vcpu_destroy() into domain_relinquish_resources() such that the
>        reference on the mapped page does not prevent domain_destroy() (which
>        calls arch_vcpu_destroy()) from being called.
>        Also, whils it appears that vpmu_arch_destroy() is idempotent it is
>        by no means obvious. Hence move manipulation of the
>        VPMU_CONTEXT_ALLOCATED flag out of implementation specific code and
>        make sure it is cleared at the end of vpmu_arch_destroy().

If you resend the patch, it might be worth to add a line about the lack 
of XSA. Something like:

There is no associated XSA because vPMU  is not security supported (see 
XSA-163).

Cheers,
Paul Durrant Nov. 28, 2019, 9:13 a.m. UTC | #5
> -----Original Message-----
> From: Julien Grall <jgrall@amazon.com>
> Sent: 27 November 2019 19:42
> To: Durrant, Paul <pdurrant@amazon.com>; xen-devel@lists.xenproject.org
> Cc: Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: Re: [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> Hi Paul,
> 
> On 27/11/2019 12:00, Paul Durrant wrote:
> > From: Julien Grall <jgrall@amazon.com>
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XENPMU_finish is called.
> >
> > This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> > destroyed rather than cleanly shut down, the page will stay mapped in
> the
> > hypervisor. One of the consequences is the domain can never be fully
> > destroyed as a page reference is still held.
> >
> > As Xen should never rely on the guest to correctly clean-up any
> > allocation in the hypervisor, we should also unmap such pages during the
> > domain destruction if there are any left.
> >
> > We can re-use the same logic as in pvpmu_finish(). To avoid
> > duplication, move the logic in a new function that can also be called
> > from vpmu_destroy().
> >
> > NOTE: The call to vpmu_destroy() must also be moved from
> >        arch_vcpu_destroy() into domain_relinquish_resources() such that
> the
> >        reference on the mapped page does not prevent domain_destroy()
> (which
> >        calls arch_vcpu_destroy()) from being called.
> >        Also, whils it appears that vpmu_arch_destroy() is idempotent it
> is
> >        by no means obvious. Hence move manipulation of the
> >        VPMU_CONTEXT_ALLOCATED flag out of implementation specific code
> and
> >        make sure it is cleared at the end of vpmu_arch_destroy().
> 
> If you resend the patch, it might be worth to add a line about the lack
> of XSA. Something like:
> 
> There is no associated XSA because vPMU  is not security supported (see
> XSA-163).

Sure, I'll add another note.

  Paul

> 
> Cheers,
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index f397183ec3..08742a5e22 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -479,6 +479,8 @@  static int vpmu_arch_initialise(struct vcpu *v)
 
     if ( ret )
         printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
+    else
+        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     return ret;
 }
@@ -576,11 +578,36 @@  static void vpmu_arch_destroy(struct vcpu *v)
 
          vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
     }
+
+    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
 }
 
-void vpmu_destroy(struct vcpu *v)
+static void vpmu_cleanup(struct vcpu *v)
 {
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    mfn_t mfn;
+    void *xenpmu_data;
+
+    spin_lock(&vpmu->vpmu_lock);
+
     vpmu_arch_destroy(v);
+    xenpmu_data = vpmu->xenpmu_data;
+    vpmu->xenpmu_data = NULL;
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    if ( xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(xenpmu_data);
+        ASSERT(mfn_valid(mfn));
+        unmap_domain_page_global(xenpmu_data);
+        put_page_and_type(mfn_to_page(mfn));
+    }
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    vpmu_cleanup(v);
 
     put_vpmu(v);
 }
@@ -639,9 +666,6 @@  static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
-    struct vpmu_struct *vpmu;
-    mfn_t mfn;
-    void *xenpmu_data;
 
     if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) )
         return;
@@ -650,22 +674,7 @@  static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
     if ( v != current )
         vcpu_pause(v);
 
-    vpmu = vcpu_vpmu(v);
-    spin_lock(&vpmu->vpmu_lock);
-
-    vpmu_arch_destroy(v);
-    xenpmu_data = vpmu->xenpmu_data;
-    vpmu->xenpmu_data = NULL;
-
-    spin_unlock(&vpmu->vpmu_lock);
-
-    if ( xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(xenpmu_data);
-        ASSERT(mfn_valid(mfn));
-        unmap_domain_page_global(xenpmu_data);
-        put_page_and_type(mfn_to_page(mfn));
-    }
+    vpmu_cleanup(v);
 
     if ( v != current )
         vcpu_unpause(v);
diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
index 3c6799b42c..8ca26f1e3a 100644
--- a/xen/arch/x86/cpu/vpmu_amd.c
+++ b/xen/arch/x86/cpu/vpmu_amd.c
@@ -534,7 +534,6 @@  int svm_vpmu_initialise(struct vcpu *v)
 
     vpmu->arch_vpmu_ops = &amd_vpmu_ops;
 
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
 }
 
diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index 6e27f6ec8e..a92d882597 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -483,8 +483,6 @@  static int core2_vpmu_alloc_resource(struct vcpu *v)
         memcpy(&vpmu->xenpmu_data->pmu.c.intel, core2_vpmu_cxt, regs_off);
     }
 
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-
     return 1;
 
 out_err:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f1dd86e12e..f5c0c378ef 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -454,9 +454,6 @@  void arch_vcpu_destroy(struct vcpu *v)
     xfree(v->arch.msrs);
     v->arch.msrs = NULL;
 
-    if ( !is_idle_domain(v->domain) )
-        vpmu_destroy(v);
-
     if ( is_hvm_vcpu(v) )
         hvm_vcpu_destroy(v);
     else
@@ -2136,12 +2133,17 @@  int domain_relinquish_resources(struct domain *d)
 
     PROGRESS(vcpu_pagetables):
 
-        /* Drop the in-use references to page-table bases. */
+        /*
+         * Drop the in-use references to page-table bases and clean
+         * up vPMU instances.
+         */
         for_each_vcpu ( d, v )
         {
             ret = vcpu_destroy_pagetables(v);
             if ( ret )
                 return ret;
+
+            vpmu_destroy(v);
         }
 
         if ( altp2m_active(d) )