Message ID | 77218fb0-5e96-4ecb-c2b0-4fe8c3ba683f@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | runstate/time area registration by (guest) physical address | expand |
On Wed, May 03, 2023 at 05:58:01PM +0200, Jan Beulich wrote: > The registration by virtual/linear address has downsides: The access is > expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area > is inaccessible (and hence cannot be updated by Xen) when in guest-user > mode. > > Introduce a new vCPU operation allowing to register the secondary time > area by guest-physical address. > > An at least theoretical downside to using physically registered areas is > that PV then won't see dirty (and perhaps also accessed) bits set in its > respective page table entries. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > --- > v3: Re-base. > v2: Forge version in force_update_secondary_system_time(). > > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1504,6 +1504,15 @@ int arch_vcpu_reset(struct vcpu *v) > return 0; > } > > +static void cf_check > +time_area_populate(void *map, struct vcpu *v) > +{ > + if ( is_pv_vcpu(v) ) > + v->arch.pv.pending_system_time.version = 0; > + > + force_update_secondary_system_time(v, map); > +} > + > long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) > { > long rc = 0; > @@ -1541,6 +1550,25 @@ long do_vcpu_op(int cmd, unsigned int vc > > break; > } > + > + case VCPUOP_register_vcpu_time_phys_area: > + { > + struct vcpu_register_time_memory_area area; > + > + rc = -EFAULT; > + if ( copy_from_guest(&area.addr.p, arg, 1) ) > + break; > + > + rc = map_guest_area(v, area.addr.p, > + sizeof(vcpu_time_info_t), > + &v->arch.time_guest_area, > + time_area_populate); > + if ( rc == -ERESTART ) > + rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih", > + cmd, vcpuid, arg); > + > + break; > + } > > case VCPUOP_get_physid: > { > --- a/xen/arch/x86/include/asm/domain.h > +++ b/xen/arch/x86/include/asm/domain.h > @@ -692,6 +692,8 @@ void domain_cpu_policy_changed(struct do > > bool update_secondary_system_time(struct vcpu *, > struct vcpu_time_info *); > +void force_update_secondary_system_time(struct vcpu *, > + struct vcpu_time_info *); > > void vcpu_show_registers(const struct vcpu *); > > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -1633,6 +1633,16 @@ void force_update_vcpu_system_time(struc > __update_vcpu_system_time(v, 1); > } > > +void force_update_secondary_system_time(struct vcpu *v, > + struct vcpu_time_info *map) > +{ > + struct vcpu_time_info u; > + > + collect_time_info(v, &u); > + u.version = -1; /* Compensate for version_update_end(). */ Hm, we do not seem to compensate in VCPUOP_register_vcpu_time_memory_area, what's more, in that case the initial version is picked from the contents of the guest address. Hopefully the guest will have zeroed the memory. FWIW, I would be fine with leaving this at 0, so the first version guest sees is 1. Thanks, Roger.
On 27.09.2023 17:50, Roger Pau Monné wrote: > On Wed, May 03, 2023 at 05:58:01PM +0200, Jan Beulich wrote: >> The registration by virtual/linear address has downsides: The access is >> expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area >> is inaccessible (and hence cannot be updated by Xen) when in guest-user >> mode. >> >> Introduce a new vCPU operation allowing to register the secondary time >> area by guest-physical address. >> >> An at least theoretical downside to using physically registered areas is >> that PV then won't see dirty (and perhaps also accessed) bits set in its >> respective page table entries. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. >> --- >> v3: Re-base. >> v2: Forge version in force_update_secondary_system_time(). For your question below, note this revision log entry. I didn't have the compensation originally, and my made-up XTF test thus failed. >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -1633,6 +1633,16 @@ void force_update_vcpu_system_time(struc >> __update_vcpu_system_time(v, 1); >> } >> >> +void force_update_secondary_system_time(struct vcpu *v, >> + struct vcpu_time_info *map) >> +{ >> + struct vcpu_time_info u; >> + >> + collect_time_info(v, &u); >> + u.version = -1; /* Compensate for version_update_end(). */ > > Hm, we do not seem to compensate in > VCPUOP_register_vcpu_time_memory_area, what's more, in that case the > initial version is picked from the contents of the guest address. > Hopefully the guest will have zeroed the memory. > > FWIW, I would be fine with leaving this at 0, so the first version > guest sees is 1. No, we can't. Odd numbers mean "update in progress". In __update_vcpu_system_time() we properly use version_update_begin(), so there's no need for any compensation. If the guest puts a non-zero value there, that's also fine from the perspective of the protocol. Just that if the initial value is odd, the guest will mislead itself into thinking "oh, the hypervisor is updating this right now" until the first real update has happened. Yet even that is hypothetical, since upon registration the area is first populated, so upon the registration hypercall returning the field will already properly have an even value. Jan
--- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1504,6 +1504,15 @@ int arch_vcpu_reset(struct vcpu *v) return 0; } +static void cf_check +time_area_populate(void *map, struct vcpu *v) +{ + if ( is_pv_vcpu(v) ) + v->arch.pv.pending_system_time.version = 0; + + force_update_secondary_system_time(v, map); +} + long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc = 0; @@ -1541,6 +1550,25 @@ long do_vcpu_op(int cmd, unsigned int vc break; } + + case VCPUOP_register_vcpu_time_phys_area: + { + struct vcpu_register_time_memory_area area; + + rc = -EFAULT; + if ( copy_from_guest(&area.addr.p, arg, 1) ) + break; + + rc = map_guest_area(v, area.addr.p, + sizeof(vcpu_time_info_t), + &v->arch.time_guest_area, + time_area_populate); + if ( rc == -ERESTART ) + rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih", + cmd, vcpuid, arg); + + break; + } case VCPUOP_get_physid: { --- a/xen/arch/x86/include/asm/domain.h +++ b/xen/arch/x86/include/asm/domain.h @@ -692,6 +692,8 @@ void domain_cpu_policy_changed(struct do bool update_secondary_system_time(struct vcpu *, struct vcpu_time_info *); +void force_update_secondary_system_time(struct vcpu *, + struct vcpu_time_info *); void vcpu_show_registers(const struct vcpu *); --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1633,6 +1633,16 @@ void force_update_vcpu_system_time(struc __update_vcpu_system_time(v, 1); } +void force_update_secondary_system_time(struct vcpu *v, + struct vcpu_time_info *map) +{ + struct vcpu_time_info u; + + collect_time_info(v, &u); + u.version = -1; /* Compensate for version_update_end(). */ + write_time_guest_area(map, &u); +} + static void update_domain_rtc(void) { struct domain *d; --- a/xen/arch/x86/x86_64/domain.c +++ b/xen/arch/x86/x86_64/domain.c @@ -115,6 +115,7 @@ compat_vcpu_op(int cmd, unsigned int vcp case VCPUOP_send_nmi: case VCPUOP_get_physid: + case VCPUOP_register_vcpu_time_phys_area: rc = do_vcpu_op(cmd, vcpuid, arg); break; --- a/xen/include/public/vcpu.h +++ b/xen/include/public/vcpu.h @@ -233,6 +233,7 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_register_ti * VMASST_TYPE_runstate_update_flag engaged by the domain. */ #define VCPUOP_register_runstate_phys_area 14 +#define VCPUOP_register_vcpu_time_phys_area 15 #endif /* __XEN_PUBLIC_VCPU_H__ */
The registration by virtual/linear address has downsides: The access is expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area is inaccessible (and hence cannot be updated by Xen) when in guest-user mode. Introduce a new vCPU operation allowing to register the secondary time area by guest-physical address. An at least theoretical downside to using physically registered areas is that PV then won't see dirty (and perhaps also accessed) bits set in its respective page table entries. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v3: Re-base. v2: Forge version in force_update_secondary_system_time().