diff mbox series

[RFC,03/10] domain: GADDR based shared guest area registration alternative - teardown

Message ID 214c9ec9-b948-1ca6-24d6-4e7f8852ac45@suse.com (mailing list archive)
State New, archived
Headers show
Series runstate/time area registration by (guest) physical address | expand

Commit Message

Jan Beulich Oct. 19, 2022, 7:40 a.m. UTC
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary domain cleanup hooks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
     necessary: Aiui unmap_vcpu_info() is called only because the vCPU
     info area cannot be re-registered. Beyond that I guess the
     assumption is that the areas would only be re-registered as they
     were before. If that's not the case I wonder whether the guest
     handles for both areas shouldn't also be zapped.

Comments

Julien Grall Dec. 13, 2022, 9:44 p.m. UTC | #1
Hi Jan,

On 19/10/2022 08:40, Jan Beulich wrote:
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, add the necessary domain cleanup hooks.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
>       necessary: Aiui unmap_vcpu_info() is called only because the vCPU
>       info area cannot be re-registered. Beyond that I guess the
>       assumption is that the areas would only be re-registered as they
>       were before. If that's not the case I wonder whether the guest
>       handles for both areas shouldn't also be zapped.

I don't know the code enough to be able to answer it.

The code itself looks good to me. With one remark below:

Reviewed-by: Julien Grall <jgrall@amazon.com>

[...]

> @@ -1555,6 +1559,15 @@ void unmap_vcpu_info(struct vcpu *v)
>       put_page_and_type(mfn_to_page(mfn));
>   }
>   
> +/*
> + * This is only intended to be used for domain cleanup (or more generally only
> + * with at least the respective vCPU, if it's not the current one, reliably
> + * paused).
> + */
> +void unmap_guest_area(struct vcpu *v, struct guest_area *area)
> +{

IIUC, you will add the ASSERT() we discussed in patch #7 in this patch. 
I would be fine if you keep my reviewed-by.

Cheers,
Jan Beulich Dec. 14, 2022, 9:12 a.m. UTC | #2
On 13.12.2022 22:44, Julien Grall wrote:
> On 19/10/2022 08:40, Jan Beulich wrote:
>> In preparation of the introduction of new vCPU operations allowing to
>> register the respective areas (one of the two is x86-specific) by
>> guest-physical address, add the necessary domain cleanup hooks.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
>>       necessary: Aiui unmap_vcpu_info() is called only because the vCPU
>>       info area cannot be re-registered. Beyond that I guess the
>>       assumption is that the areas would only be re-registered as they
>>       were before. If that's not the case I wonder whether the guest
>>       handles for both areas shouldn't also be zapped.
> 
> I don't know the code enough to be able to answer it.

Right; I hope the original shim authors to be able to shed some light
on this.

> The code itself looks good to me. With one remark below:
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Thanks.

>> @@ -1555,6 +1559,15 @@ void unmap_vcpu_info(struct vcpu *v)
>>       put_page_and_type(mfn_to_page(mfn));
>>   }
>>   
>> +/*
>> + * This is only intended to be used for domain cleanup (or more generally only
>> + * with at least the respective vCPU, if it's not the current one, reliably
>> + * paused).
>> + */
>> +void unmap_guest_area(struct vcpu *v, struct guest_area *area)
>> +{
> 
> IIUC, you will add the ASSERT() we discussed in patch #7 in this patch. 
> I would be fine if you keep my reviewed-by.

And thanks again. Indeed this is what I have pending for v2:

/*
 * This is only intended to be used for domain cleanup (or more generally only
 * with at least the respective vCPU, if it's not the current one, reliably
 * paused).
 */
void unmap_guest_area(struct vcpu *v, struct guest_area *area)
{
    if ( v != current )
        ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
}

Jan
Andrew Cooper Jan. 17, 2023, 9:17 p.m. UTC | #3
On 19/10/2022 8:40 am, Jan Beulich wrote:
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, add the necessary domain cleanup hooks.

What are the two areas you're discussing here?

I assume you mean VCPUOP_register_vcpu_time_memory_area to be the
x86-specific op, but ARM permits both  VCPUOP_register_vcpu_info and
VCPUOP_register_runstate_memory_area.

So isn't it more 2+1 rather than 1+1?

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
>      necessary: Aiui unmap_vcpu_info() is called only because the vCPU
>      info area cannot be re-registered. Beyond that I guess the
>      assumption is that the areas would only be re-registered as they
>      were before. If that's not the case I wonder whether the guest
>      handles for both areas shouldn't also be zapped.

At this point in pv_shim_shutdown(), have already come out of suspend
(successfully) and are preparing to poke the PV guest out of suspend too.

The PV guest needs to not have its subsequent VCPUOP_register_vcpu_info
call fail, but beyond that I can entirely believe that it was coded to
"Linux doesn't crash" rather than "what's everything we ought to reset
here" - recall that we weren't exactly flush with time when trying to
push Shim out of the door.

Whatever does get reregstered will be reregistered at the same
position.  No guest at all is going to have details like that dynamic
across migrate.


As a tangential observation, i see the periodic timer gets rearmed. 
This is still one of the more insane default properties of a PV guest;
Linux intentionally clobbers it on boot, but I can equivalent logic to
re-clobber after resume.

~Andrew
Jan Beulich Jan. 18, 2023, 8:59 a.m. UTC | #4
On 17.01.2023 22:17, Andrew Cooper wrote:
> On 19/10/2022 8:40 am, Jan Beulich wrote:
>> In preparation of the introduction of new vCPU operations allowing to
>> register the respective areas (one of the two is x86-specific) by
>> guest-physical address, add the necessary domain cleanup hooks.
> 
> What are the two areas you're discussing here?
> 
> I assume you mean VCPUOP_register_vcpu_time_memory_area to be the
> x86-specific op, but ARM permits both  VCPUOP_register_vcpu_info and
> VCPUOP_register_runstate_memory_area.
> 
> So isn't it more 2+1 rather than 1+1?

Not in my view: The vcpu_info registration is already physical address
based, and there's also no new vCPU operation introduced there.

>> ---
>> RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
>>      necessary: Aiui unmap_vcpu_info() is called only because the vCPU
>>      info area cannot be re-registered. Beyond that I guess the
>>      assumption is that the areas would only be re-registered as they
>>      were before. If that's not the case I wonder whether the guest
>>      handles for both areas shouldn't also be zapped.
> 
> At this point in pv_shim_shutdown(), have already come out of suspend
> (successfully) and are preparing to poke the PV guest out of suspend too.
> 
> The PV guest needs to not have its subsequent VCPUOP_register_vcpu_info
> call fail, but beyond that I can entirely believe that it was coded to
> "Linux doesn't crash" rather than "what's everything we ought to reset
> here" - recall that we weren't exactly flush with time when trying to
> push Shim out of the door.
> 
> Whatever does get reregstered will be reregistered at the same
> position.  No guest at all is going to have details like that dynamic
> across migrate.

I read this as "keep code as is, drop RFC remark", but that's not
necessarily the only way to interpret your reply.

> As a tangential observation, i see the periodic timer gets rearmed. 
> This is still one of the more insane default properties of a PV guest;
> Linux intentionally clobbers it on boot, but I can equivalent logic to
> re-clobber after resume.

I guess you meant s/can/can't spot/, in which case let's Cc Linux
folks for awareness.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1035,7 +1035,10 @@  int arch_domain_soft_reset(struct domain
     }
 
     for_each_vcpu ( d, v )
+    {
         set_xen_guest_handle(v->arch.time_info_guest, NULL);
+        unmap_guest_area(v, &v->arch.time_guest_area);
+    }
 
  exit_put_gfn:
     put_gfn(d, gfn_x(gfn));
@@ -2350,6 +2353,8 @@  int domain_relinquish_resources(struct d
             if ( ret )
                 return ret;
 
+            unmap_guest_area(v, &v->arch.time_guest_area);
+
             vpmu_destroy(v);
         }
 
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -661,6 +661,7 @@  struct arch_vcpu
 
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+    struct guest_area time_guest_area;
 
     struct arch_vm_event *vm_event;
 
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -394,8 +394,10 @@  int pv_shim_shutdown(uint8_t reason)
 
     for_each_vcpu ( d, v )
     {
-        /* Unmap guest vcpu_info pages. */
+        /* Unmap guest vcpu_info page and runstate/time areas. */
         unmap_vcpu_info(v);
+        unmap_guest_area(v, &v->runstate_guest_area);
+        unmap_guest_area(v, &v->arch.time_guest_area);
 
         /* Reset the periodic timer to the default value. */
         vcpu_set_periodic_timer(v, MILLISECS(10));
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -950,7 +950,10 @@  int domain_kill(struct domain *d)
         if ( cpupool_move_domain(d, cpupool0) )
             return -ERESTART;
         for_each_vcpu ( d, v )
+        {
             unmap_vcpu_info(v);
+            unmap_guest_area(v, &v->runstate_guest_area);
+        }
         d->is_dying = DOMDYING_dead;
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
@@ -1404,6 +1407,7 @@  int domain_soft_reset(struct domain *d,
     {
         set_xen_guest_handle(runstate_guest(v), NULL);
         unmap_vcpu_info(v);
+        unmap_guest_area(v, &v->runstate_guest_area);
     }
 
     rc = arch_domain_soft_reset(d);
@@ -1555,6 +1559,15 @@  void unmap_vcpu_info(struct vcpu *v)
     put_page_and_type(mfn_to_page(mfn));
 }
 
+/*
+ * This is only intended to be used for domain cleanup (or more generally only
+ * with at least the respective vCPU, if it's not the current one, reliably
+ * paused).
+ */
+void unmap_guest_area(struct vcpu *v, struct guest_area *area)
+{
+}
+
 int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct vcpu_guest_context *ctxt;
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -5,6 +5,12 @@ 
 #include <xen/types.h>
 
 #include <public/xen.h>
+
+struct guest_area {
+    struct page_info *pg;
+    void *map;
+};
+
 #include <asm/domain.h>
 #include <asm/numa.h>
 
@@ -76,6 +82,11 @@  void arch_vcpu_destroy(struct vcpu *v);
 int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned int offset);
 void unmap_vcpu_info(struct vcpu *v);
 
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+                   struct guest_area *area,
+                   void (*populate)(void *dst, struct vcpu *v));
+void unmap_guest_area(struct vcpu *v, struct guest_area *area);
+
 int arch_domain_create(struct domain *d,
                        struct xen_domctl_createdomain *config,
                        unsigned int flags);
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -202,6 +202,7 @@  struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    struct guest_area runstate_guest_area;
     unsigned int     new_state;
 
     /* Has the FPU been initialised? */