diff mbox series

[v3,7/8] x86: introduce GADDR based secondary time area registration alternative

Message ID 77218fb0-5e96-4ecb-c2b0-4fe8c3ba683f@suse.com (mailing list archive)
State New, archived
Headers show
Series runstate/time area registration by (guest) physical address | expand

Commit Message

Jan Beulich May 3, 2023, 3:58 p.m. UTC
The registration by virtual/linear address has downsides: The access is
expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
is inaccessible (and hence cannot be updated by Xen) when in guest-user
mode.

Introduce a new vCPU operation allowing to register the secondary time
area by guest-physical address.

An at least theoretical downside to using physically registered areas is
that PV then won't see dirty (and perhaps also accessed) bits set in its
respective page table entries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: Forge version in force_update_secondary_system_time().

Comments

Roger Pau Monné Sept. 27, 2023, 3:50 p.m. UTC | #1
On Wed, May 03, 2023 at 05:58:01PM +0200, Jan Beulich wrote:
> The registration by virtual/linear address has downsides: The access is
> expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
> is inaccessible (and hence cannot be updated by Xen) when in guest-user
> mode.
> 
> Introduce a new vCPU operation allowing to register the secondary time
> area by guest-physical address.
> 
> An at least theoretical downside to using physically registered areas is
> that PV then won't see dirty (and perhaps also accessed) bits set in its
> respective page table entries.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v3: Re-base.
> v2: Forge version in force_update_secondary_system_time().
> 
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1504,6 +1504,15 @@ int arch_vcpu_reset(struct vcpu *v)
>      return 0;
>  }
>  
> +static void cf_check
> +time_area_populate(void *map, struct vcpu *v)
> +{
> +    if ( is_pv_vcpu(v) )
> +        v->arch.pv.pending_system_time.version = 0;
> +
> +    force_update_secondary_system_time(v, map);
> +}
> +
>  long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      long rc = 0;
> @@ -1541,6 +1550,25 @@ long do_vcpu_op(int cmd, unsigned int vc
>  
>          break;
>      }
> +
> +    case VCPUOP_register_vcpu_time_phys_area:
> +    {
> +        struct vcpu_register_time_memory_area area;
> +
> +        rc = -EFAULT;
> +        if ( copy_from_guest(&area.addr.p, arg, 1) )
> +            break;
> +
> +        rc = map_guest_area(v, area.addr.p,
> +                            sizeof(vcpu_time_info_t),
> +                            &v->arch.time_guest_area,
> +                            time_area_populate);
> +        if ( rc == -ERESTART )
> +            rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
> +                                               cmd, vcpuid, arg);
> +
> +        break;
> +    }
>  
>      case VCPUOP_get_physid:
>      {
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -692,6 +692,8 @@ void domain_cpu_policy_changed(struct do
>  
>  bool update_secondary_system_time(struct vcpu *,
>                                    struct vcpu_time_info *);
> +void force_update_secondary_system_time(struct vcpu *,
> +                                        struct vcpu_time_info *);
>  
>  void vcpu_show_registers(const struct vcpu *);
>  
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1633,6 +1633,16 @@ void force_update_vcpu_system_time(struc
>      __update_vcpu_system_time(v, 1);
>  }
>  
> +void force_update_secondary_system_time(struct vcpu *v,
> +                                        struct vcpu_time_info *map)
> +{
> +    struct vcpu_time_info u;
> +
> +    collect_time_info(v, &u);
> +    u.version = -1; /* Compensate for version_update_end(). */

Hm, we do not seem to compensate in
VCPUOP_register_vcpu_time_memory_area, what's more, in that case the
initial version is picked from the contents of the guest address.
Hopefully the guest will have zeroed the memory.

FWIW, I would be fine with leaving this at 0, so the first version
guest sees is 1.

Thanks, Roger.
Jan Beulich Sept. 27, 2023, 4:12 p.m. UTC | #2
On 27.09.2023 17:50, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 05:58:01PM +0200, Jan Beulich wrote:
>> The registration by virtual/linear address has downsides: The access is
>> expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
>> is inaccessible (and hence cannot be updated by Xen) when in guest-user
>> mode.
>>
>> Introduce a new vCPU operation allowing to register the secondary time
>> area by guest-physical address.
>>
>> An at least theoretical downside to using physically registered areas is
>> that PV then won't see dirty (and perhaps also accessed) bits set in its
>> respective page table entries.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> ---
>> v3: Re-base.
>> v2: Forge version in force_update_secondary_system_time().

For your question below, note this revision log entry. I didn't have the
compensation originally, and my made-up XTF test thus failed.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -1633,6 +1633,16 @@ void force_update_vcpu_system_time(struc
>>      __update_vcpu_system_time(v, 1);
>>  }
>>  
>> +void force_update_secondary_system_time(struct vcpu *v,
>> +                                        struct vcpu_time_info *map)
>> +{
>> +    struct vcpu_time_info u;
>> +
>> +    collect_time_info(v, &u);
>> +    u.version = -1; /* Compensate for version_update_end(). */
> 
> Hm, we do not seem to compensate in
> VCPUOP_register_vcpu_time_memory_area, what's more, in that case the
> initial version is picked from the contents of the guest address.
> Hopefully the guest will have zeroed the memory.
> 
> FWIW, I would be fine with leaving this at 0, so the first version
> guest sees is 1.

No, we can't. Odd numbers mean "update in progress". In
__update_vcpu_system_time() we properly use version_update_begin(),
so there's no need for any compensation. If the guest puts a non-zero
value there, that's also fine from the perspective of the protocol.
Just that if the initial value is odd, the guest will mislead itself
into thinking "oh, the hypervisor is updating this right now" until
the first real update has happened. Yet even that is hypothetical,
since upon registration the area is first populated, so upon the
registration hypercall returning the field will already properly have
an even value.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1504,6 +1504,15 @@  int arch_vcpu_reset(struct vcpu *v)
     return 0;
 }
 
+static void cf_check
+time_area_populate(void *map, struct vcpu *v)
+{
+    if ( is_pv_vcpu(v) )
+        v->arch.pv.pending_system_time.version = 0;
+
+    force_update_secondary_system_time(v, map);
+}
+
 long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc = 0;
@@ -1541,6 +1550,25 @@  long do_vcpu_op(int cmd, unsigned int vc
 
         break;
     }
+
+    case VCPUOP_register_vcpu_time_phys_area:
+    {
+        struct vcpu_register_time_memory_area area;
+
+        rc = -EFAULT;
+        if ( copy_from_guest(&area.addr.p, arg, 1) )
+            break;
+
+        rc = map_guest_area(v, area.addr.p,
+                            sizeof(vcpu_time_info_t),
+                            &v->arch.time_guest_area,
+                            time_area_populate);
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+                                               cmd, vcpuid, arg);
+
+        break;
+    }
 
     case VCPUOP_get_physid:
     {
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -692,6 +692,8 @@  void domain_cpu_policy_changed(struct do
 
 bool update_secondary_system_time(struct vcpu *,
                                   struct vcpu_time_info *);
+void force_update_secondary_system_time(struct vcpu *,
+                                        struct vcpu_time_info *);
 
 void vcpu_show_registers(const struct vcpu *);
 
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1633,6 +1633,16 @@  void force_update_vcpu_system_time(struc
     __update_vcpu_system_time(v, 1);
 }
 
+void force_update_secondary_system_time(struct vcpu *v,
+                                        struct vcpu_time_info *map)
+{
+    struct vcpu_time_info u;
+
+    collect_time_info(v, &u);
+    u.version = -1; /* Compensate for version_update_end(). */
+    write_time_guest_area(map, &u);
+}
+
 static void update_domain_rtc(void)
 {
     struct domain *d;
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -115,6 +115,7 @@  compat_vcpu_op(int cmd, unsigned int vcp
 
     case VCPUOP_send_nmi:
     case VCPUOP_get_physid:
+    case VCPUOP_register_vcpu_time_phys_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
 
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -233,6 +233,7 @@  DEFINE_XEN_GUEST_HANDLE(vcpu_register_ti
  * VMASST_TYPE_runstate_update_flag engaged by the domain.
  */
 #define VCPUOP_register_runstate_phys_area      14
+#define VCPUOP_register_vcpu_time_phys_area     15
 
 #endif /* __XEN_PUBLIC_VCPU_H__ */