diff mbox series

xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

Message ID 20191126171715.10881-1-pdurrant@amazon.com (mailing list archive)
State Superseded
Headers show
Series xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed | expand

Commit Message

Paul Durrant Nov. 26, 2019, 5:17 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

A guest will setup a shared page with the hypervisor for each vCPU via
XENPMU_init. The page will then get mapped in the hypervisor and only
released when XEMPMU_finish is called.

This means that if the guest is not shutdown gracefully (such as via xl
destroy), the page will stay mapped in the hypervisor. One of the
consequence is the domain can never be fully destroyed as some of its
memory is still mapped.

As Xen should never rely on the guest to correctly clean-up any
allocation in the hypervisor, we should also unmap pages during the
domain destruction if there are any left.

We can re-use the same logic as in pvpmu_finish(). To avoid
duplication, move the logic in a new function that can also be called
from vpmu_destroy().

NOTE: The call to vpmu_destroy() must also be moved from
      arch_vcpu_destroy() into domain_relinquish_resources() such that the
      mapped page does not prevent domain_destroy() (which calls
      arch_vcpu_destroy()) from being called.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/arch/x86/cpu/vpmu.c | 45 +++++++++++++++++++++++------------------
 xen/arch/x86/domain.c   |  6 +++---
 2 files changed, 28 insertions(+), 23 deletions(-)

Comments

Jan Beulich Nov. 27, 2019, 9:44 a.m. UTC | #1
On 26.11.2019 18:17, Paul Durrant wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> A guest will setup a shared page with the hypervisor for each vCPU via
> XENPMU_init. The page will then get mapped in the hypervisor and only
> released when XEMPMU_finish is called.
> 
> This means that if the guest is not shutdown gracefully (such as via xl
> destroy), the page will stay mapped in the hypervisor.

Isn't this still too weak a description? It's not the tool stack
invoking XENPMU_finish, but the guest itself afaics. I.e. a
misbehaving guest could prevent proper cleanup even with graceful
shutdown.

> @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
>      if ( is_hvm_domain(d) )
>          hvm_domain_relinquish_resources(d);
>  
> +    for_each_vcpu ( d, v )
> +        vpmu_destroy(v);
> +
>      return 0;
>  }

I think simple things which may allow shrinking the page lists
should be done early in the function. As vpmu_destroy() looks
to be idempotent, how about leveraging the very first
for_each_vcpu() loop in the function (there are too many of them
there anyway, at least for my taste)?

Jan
Paul Durrant Nov. 27, 2019, 10:57 a.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 27 November 2019 09:44
> To: Durrant, Paul <pdurrant@amazon.com>; Grall, Julien <jgrall@amazon.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>
> Subject: Re: [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 26.11.2019 18:17, Paul Durrant wrote:
> > From: Julien Grall <jgrall@amazon.com>
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XEMPMU_finish is called.
> >
> > This means that if the guest is not shutdown gracefully (such as via xl
> > destroy), the page will stay mapped in the hypervisor.
> 
> Isn't this still too weak a description? It's not the tool stack
> invoking XENPMU_finish, but the guest itself afaics. I.e. a
> misbehaving guest could prevent proper cleanup even with graceful
> shutdown.
> 

Ok, how about 'if the guest fails to invoke XENPMU_finish, e.g. if it is destroyed, rather than cleanly shut down'?

> > @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
> >      if ( is_hvm_domain(d) )
> >          hvm_domain_relinquish_resources(d);
> >
> > +    for_each_vcpu ( d, v )
> > +        vpmu_destroy(v);
> > +
> >      return 0;
> >  }
> 
> I think simple things which may allow shrinking the page lists
> should be done early in the function. As vpmu_destroy() looks
> to be idempotent, how about leveraging the very first
> for_each_vcpu() loop in the function (there are too many of them
> there anyway, at least for my taste)?
> 

Ok. I did wonder where in the sequence was best... Leaving to the end obviously puts it closer to where it was previously called, but I can't see any harm in moving it earlier.

  Paul

> Jan
Jan Beulich Nov. 27, 2019, 11:01 a.m. UTC | #3
On 27.11.2019 11:57,  Durrant, Paul  wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 27 November 2019 09:44
>> To: Durrant, Paul <pdurrant@amazon.com>; Grall, Julien <jgrall@amazon.com>
>> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
>> Liu <wl@xen.org>
>> Subject: Re: [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the
>> domain is destroyed
>>
>> On 26.11.2019 18:17, Paul Durrant wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> A guest will setup a shared page with the hypervisor for each vCPU via
>>> XENPMU_init. The page will then get mapped in the hypervisor and only
>>> released when XEMPMU_finish is called.
>>>
>>> This means that if the guest is not shutdown gracefully (such as via xl
>>> destroy), the page will stay mapped in the hypervisor.
>>
>> Isn't this still too weak a description? It's not the tool stack
>> invoking XENPMU_finish, but the guest itself afaics. I.e. a
>> misbehaving guest could prevent proper cleanup even with graceful
>> shutdown.
>>
> 
> Ok, how about 'if the guest fails to invoke XENPMU_finish, e.g. if
> it is destroyed, rather than cleanly shut down'?

Sounds good.

Jan
Julien Grall Nov. 27, 2019, 11:14 a.m. UTC | #4
Hi,

On 27/11/2019 09:44, Jan Beulich wrote:
> On 26.11.2019 18:17, Paul Durrant wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> A guest will setup a shared page with the hypervisor for each vCPU via
>> XENPMU_init. The page will then get mapped in the hypervisor and only
>> released when XEMPMU_finish is called.
>>
>> This means that if the guest is not shutdown gracefully (such as via xl
>> destroy), the page will stay mapped in the hypervisor.
> 
> Isn't this still too weak a description? It's not the tool stack
> invoking XENPMU_finish, but the guest itself afaics. I.e. a
> misbehaving guest could prevent proper cleanup even with graceful
> shutdown.
> 
>> @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
>>       if ( is_hvm_domain(d) )
>>           hvm_domain_relinquish_resources(d);
>>   
>> +    for_each_vcpu ( d, v )
>> +        vpmu_destroy(v);
>> +
>>       return 0;
>>   }
> 
> I think simple things which may allow shrinking the page lists
> should be done early in the function. As vpmu_destroy() looks
> to be idempotent, how about leveraging the very first
> for_each_vcpu() loop in the function (there are too many of them
> there anyway, at least for my taste)?

This is not entirely obvious that vpmu_destroy() is idempotent.

For instance, I can't find out who is clearing VCPU_CONTEXT_ALLOCATED. 
so I think vcpu_arch_destroy() would be executed over and over.

I don't know whether this is an issue, but I can't figure out that is it 
not one. Did I miss anything?

Cheers,
Paul Durrant Nov. 27, 2019, 11:26 a.m. UTC | #5
> -----Original Message-----
> From: Julien Grall <jgrall@amazon.com>
> Sent: 27 November 2019 11:15
> To: Jan Beulich <jbeulich@suse.com>; Durrant, Paul <pdurrant@amazon.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>
> Subject: Re: [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> Hi,
> 
> On 27/11/2019 09:44, Jan Beulich wrote:
> > On 26.11.2019 18:17, Paul Durrant wrote:
> >> From: Julien Grall <jgrall@amazon.com>
> >>
> >> A guest will setup a shared page with the hypervisor for each vCPU via
> >> XENPMU_init. The page will then get mapped in the hypervisor and only
> >> released when XEMPMU_finish is called.
> >>
> >> This means that if the guest is not shutdown gracefully (such as via xl
> >> destroy), the page will stay mapped in the hypervisor.
> >
> > Isn't this still too weak a description? It's not the tool stack
> > invoking XENPMU_finish, but the guest itself afaics. I.e. a
> > misbehaving guest could prevent proper cleanup even with graceful
> > shutdown.
> >
> >> @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
> >>       if ( is_hvm_domain(d) )
> >>           hvm_domain_relinquish_resources(d);
> >>
> >> +    for_each_vcpu ( d, v )
> >> +        vpmu_destroy(v);
> >> +
> >>       return 0;
> >>   }
> >
> > I think simple things which may allow shrinking the page lists
> > should be done early in the function. As vpmu_destroy() looks
> > to be idempotent, how about leveraging the very first
> > for_each_vcpu() loop in the function (there are too many of them
> > there anyway, at least for my taste)?
> 
> This is not entirely obvious that vpmu_destroy() is idempotent.
> 
> For instance, I can't find out who is clearing VCPU_CONTEXT_ALLOCATED.
> so I think vcpu_arch_destroy() would be executed over and over.
> 
> I don't know whether this is an issue, but I can't figure out that is it
> not one. Did I miss anything?

It's sufficiently unobvious that it is a concern whether a guest invoking XENPMU_finish multiple times can cause harm. I'll see if I can clean that up.

  Paul

> 
> Cheers,
Jan Beulich Nov. 27, 2019, 11:26 a.m. UTC | #6
On 27.11.2019 12:14, Julien Grall wrote:
> Hi,
> 
> On 27/11/2019 09:44, Jan Beulich wrote:
>> On 26.11.2019 18:17, Paul Durrant wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> A guest will setup a shared page with the hypervisor for each vCPU via
>>> XENPMU_init. The page will then get mapped in the hypervisor and only
>>> released when XEMPMU_finish is called.
>>>
>>> This means that if the guest is not shutdown gracefully (such as via xl
>>> destroy), the page will stay mapped in the hypervisor.
>>
>> Isn't this still too weak a description? It's not the tool stack
>> invoking XENPMU_finish, but the guest itself afaics. I.e. a
>> misbehaving guest could prevent proper cleanup even with graceful
>> shutdown.
>>
>>> @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
>>>       if ( is_hvm_domain(d) )
>>>           hvm_domain_relinquish_resources(d);
>>>   
>>> +    for_each_vcpu ( d, v )
>>> +        vpmu_destroy(v);
>>> +
>>>       return 0;
>>>   }
>>
>> I think simple things which may allow shrinking the page lists
>> should be done early in the function. As vpmu_destroy() looks
>> to be idempotent, how about leveraging the very first
>> for_each_vcpu() loop in the function (there are too many of them
>> there anyway, at least for my taste)?
> 
> This is not entirely obvious that vpmu_destroy() is idempotent.
> 
> For instance, I can't find out who is clearing VCPU_CONTEXT_ALLOCATED. 
> so I think vcpu_arch_destroy() would be executed over and over.
> 
> I don't know whether this is an issue, but I can't figure out that is it 
> not one. Did I miss anything?

If the function wasn't idempotent, then calling it unconditionally
from domain_relinquish_resources() would be wrong too. After all
the guest may have invoked XENPMU_finish.

As to VCPU_CONTEXT_ALLOCATED - I don't think this ever gets cleared
anywhere. But this by itself doesn't make the function non-
idempotent. The vpmu_clear_last invocation, for example, will
happen just once. And {amd,core2}_vpmu_destroy() look okay at the
first glance, too.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index f397183ec3..9ae4ed48c8 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -578,9 +578,32 @@  static void vpmu_arch_destroy(struct vcpu *v)
     }
 }
 
-void vpmu_destroy(struct vcpu *v)
+static void vpmu_cleanup(struct vcpu *v)
 {
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    mfn_t mfn;
+    void *xenpmu_data;
+
+    spin_lock(&vpmu->vpmu_lock);
+
     vpmu_arch_destroy(v);
+    xenpmu_data = vpmu->xenpmu_data;
+    vpmu->xenpmu_data = NULL;
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    if ( xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(xenpmu_data);
+        ASSERT(mfn_valid(mfn));
+        unmap_domain_page_global(xenpmu_data);
+        put_page_and_type(mfn_to_page(mfn));
+    }
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    vpmu_cleanup(v);
 
     put_vpmu(v);
 }
@@ -639,9 +662,6 @@  static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
-    struct vpmu_struct *vpmu;
-    mfn_t mfn;
-    void *xenpmu_data;
 
     if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) )
         return;
@@ -650,22 +670,7 @@  static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
     if ( v != current )
         vcpu_pause(v);
 
-    vpmu = vcpu_vpmu(v);
-    spin_lock(&vpmu->vpmu_lock);
-
-    vpmu_arch_destroy(v);
-    xenpmu_data = vpmu->xenpmu_data;
-    vpmu->xenpmu_data = NULL;
-
-    spin_unlock(&vpmu->vpmu_lock);
-
-    if ( xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(xenpmu_data);
-        ASSERT(mfn_valid(mfn));
-        unmap_domain_page_global(xenpmu_data);
-        put_page_and_type(mfn_to_page(mfn));
-    }
+    vpmu_cleanup(v);
 
     if ( v != current )
         vcpu_unpause(v);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f1dd86e12e..1d75b2e6c3 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -454,9 +454,6 @@  void arch_vcpu_destroy(struct vcpu *v)
     xfree(v->arch.msrs);
     v->arch.msrs = NULL;
 
-    if ( !is_idle_domain(v->domain) )
-        vpmu_destroy(v);
-
     if ( is_hvm_vcpu(v) )
         hvm_vcpu_destroy(v);
     else
@@ -2224,6 +2221,9 @@  int domain_relinquish_resources(struct domain *d)
     if ( is_hvm_domain(d) )
         hvm_domain_relinquish_resources(d);
 
+    for_each_vcpu ( d, v )
+        vpmu_destroy(v);
+
     return 0;
 }