Message ID | 1558721577-13958-1-git-send-email-andrii.anisov@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC,2,DO,NOT,APPLY] introduce VCPUOP_register_runstate_phys_memory_area hypercall | expand |
From: Andrii Anisov <andrii_anisov@epam.com>
Following discussion [1] it is introduced and implemented a runstate
registration interface which uses guest's phys address instead of a virtual one.
The new hypercall employes the same data structures as a predecessor, but
expects the vcpu_runstate_info structure to not cross a page boundary.
The interface is implemented in a way vcpu_runstate_info structure is mapped to
the hypervisor on the hypercall processing and is directly accessed during its
updates. This runstate area mapping follows vcpu_info structure registration.
Permanent mapping of runstate area would consume vmap area on arm32 what is
limited to 1G. Though it might be OK because it would be possible to increase
the ARM32 virtual address area by reworking the address space.
The series is tested for ARM64. Build tested for x86. I'd appreciate if someone
could check it with x86.
The Linux kernel patch is here [2]. Though it is for 4.14. It is not still
convinced the absolute correctness of that patch, yet this should be better
aligned with linux community.
Changes in:
v3: This version again implements runstate mapping on init approach.
Patches are squashed and refactored in order to not allow virt and phys
interfaces function simultaneously but replace one another on init.
In order to measure performance impact of permanent mapping vs mapping on
access there written two RFC patches which follow mapping on access
approach with the little difference:
- RFC 1 - using copy_to_guest_phys_flush_dcache() for each access to
runstate area.
- RFC 2 - mapping runstate area before all update manipulations and unmap
after.
RFC patches were implemented for ARM only, because performance measurements
were done on ARM64 machine.
There were made performance measurements of approaches (runstate mapped on
access vs mapped on registration). The test setups are as following:
Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. In
DomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2
is ran with different resolutions in order to emit different irq load,
where 320x240 emits high IRQ load, but 1920x1080 emits low irq load.
Separately tested baking DomD benchmark run with primitive Dom0 CPU burn
(dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than
VCPU(dX)->idle->VCPU(dX).
with following results:
mapped mapped mapped
on init on access on update
RFC 1 RFC 2
GLMark2 320x240 2906 2856 (-2%) 2903 (0)
+Dom0 CPUBurn 2166 2106 (-3%) 2134 (1%)
GLMark2 800x600 2396 2367 (-1%) 2393 (0%)
+Dom0 CPUBurn 1958 1911 (-2%) 1942 (-1%)
GLMark2 1920x1080 939 936 (0%) 935 (0%)
+Dom0 CPUBurn 909 901 (-1%) 907 (0%)
Also it was checked IRQ latency difference using TBM in a setup similar to
[5]. Please note that the IRQ rate is one in 30 seconds, and only
VCPU->idle->VCPU use-case is considered. With following results (in ns,
the timer granularity 120ns):
mapped on init:
max=10080 warm_max=8760 min=6600 avg=6699
mapped on update (RFC1):
max=10440 warm_max=7560 min=7320 avg=7419
mapped on access (RFC2)
max=11520 warm_max=7920 min=7200 avg=7299
v2: It was reconsidered the new runstate interface implementation. The new
interface is made independent of the old one. Do not share runstate_area
field, and consequently avoid excessive concurrency with the old runstate
interface usage.
Introduced locks in order to resolve possible concurrency between runstate
area registration and usage.
Addressed comments from Jan Beulich [3][4] about coding style nits. Though
some of them become obsolete with refactoring and few are picked into this
thread for further discussion.
There were made performance measurements of approaches (runstate mapped on
access vs mapped on registration). The test setups are as following:
Thin Dom0 (Linux with intiramfs) with DomD running rich Yocto Linux. In
DomD 3d benchmark numbers are compared. The benchmark is GlMark2. GLMark2
is ran with different resolutions in order to emit different irq load,
where 320x240 emits high IRQ load, but 1920x1080 emits low irq load.
Separately tested baking DomD benchmark run with primitive Dom0 CPU burn
(dd), in order to stimulate VCPU(dX)->VCPU(dY) switches rather than
VCPU(dX)->idle->VCPU(dX).
with following results:
mapped mapped
on access on init
GLMark2 320x240 2852 2877 +0.8%
+Dom0 CPUBurn 2088 2094 +0.2%
GLMark2 800x600 2368 2375 +0.3%
+Dom0 CPUBurn 1868 1921 +2.8%
GLMark2 1920x1080 931 931 0%
+Dom0 CPUBurn 892 894 +0.2%
Please note that "mapped on access" means using the old runstate
registering interface. And runstate update in this case still often fails
to map runstate area like [5], despite the fact that our Linux kernel
does not have KPTI enabled. So runstate area update, in this case, is
really shortened.
Also it was checked IRQ latency difference using TBM in a setup similar to
[5]. Please note that the IRQ rate is one in 30 seconds, and only
VCPU->idle->VCPU use-case is considered. With following results (in ns,
the timer granularity 120ns):
mapped on access:
max=9960 warm_max=8640 min=7200 avg=7626
mapped on init:
max=9480 warm_max=8400 min=7080 avg=7341
Unfortunately there are no consitent results yet from profiling using
Lauterbach PowerTrace. Still in communication with the tracer vendor in
order to setup the proper configuration.
[1] https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00416.html
[2] https://github.com/aanisov/linux/commit/ba34d2780f57ea43f81810cd695aace7b55c0f29
[3] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00936.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00934.html
[5] https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg02369.html
[6] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02297.html
Andrii Anisov (1):
xen: introduce VCPUOP_register_runstate_phys_memory_area hypercall
xen/arch/arm/domain.c | 58 ++++++++++++++++++---
xen/arch/x86/domain.c | 99 ++++++++++++++++++++++++++++++++---
xen/arch/x86/x86_64/domain.c | 16 +++++-
xen/common/domain.c | 121 +++++++++++++++++++++++++++++++++++++++----
xen/include/public/vcpu.h | 15 ++++++
xen/include/xen/sched.h | 28 +++++++---
6 files changed, 306 insertions(+), 31 deletions(-)
Hi Andrii, I am not answering on the content yet, I will do that separately. The threading for this series looks quite confusing. The head of the thread is this patch (i.e RFC 2) but then you have a cover letter within the thread tagged "V3". From what I understand, the 2 e-mails tagged "V3" is the original version where as RFC 2 and RFC 3 are variants. Am I correct? If so, for next time, I would recommend to have the cover letter first and then all the patches send "In-Reply-To" the cover letter. This makes easier to track series. Cheers,
Hello Julien, On 28.05.19 11:59, Julien Grall wrote: > I am not answering on the content yet, I will do that separately. The threading for this series looks quite confusing. The head of the thread is this patch (i.e RFC 2) but then you have a cover letter within the thread tagged "V3". Actually I've noticed the mangled threading. But not sure why it is so. I just passed a folder with those patches to git-send-mail. > From what I understand, the 2 e-mails tagged "V3" is the original version where as RFC 2 and RFC 3 are variants. Am I correct? Yes, sure. > If so, for next time, I would recommend to have the cover letter first and then all the patches send "In-Reply-To" the cover letter. This makes easier to track series. It might help. But before it worked well without additional tricks.
On 28/05/2019 10:17, Andrii Anisov wrote: > Hello Julien, > > On 28.05.19 11:59, Julien Grall wrote: >> I am not answering on the content yet, I will do that separately. The >> threading for this series looks quite confusing. The head of the thread is >> this patch (i.e RFC 2) but then you have a cover letter within the thread >> tagged "V3". > > Actually I've noticed the mangled threading. But not sure why it is so. I just > passed a folder with those patches to git-send-mail. IIRC, git-send-email will use the first e-mail in the alphabetical order as the "top e-mail" and all the other will be send "In-Reply-To". > >> From what I understand, the 2 e-mails tagged "V3" is the original version >> where as RFC 2 and RFC 3 are variants. Am I correct? > > Yes, sure. > >> If so, for next time, I would recommend to have the cover letter first and >> then all the patches send "In-Reply-To" the cover letter. This makes easier to >> track series. > > It might help. But before it worked well without additional tricks. In general, all the patch sent are starting with a number followed by the title. The number ensure the correct ordering. If you don't have number then, you will fallback to alphabetical order. Cheers,
On 28.05.19 12:23, Julien Grall wrote:
> In general, all the patch sent are starting with a number followed by the title. The number ensure the correct ordering. If you don't have number then, you will fallback to alphabetical order.
Ah, yes, sure. I did rename manually RFC1 patch file. But somewhy missed renaming RFC2 patch file.
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index a9f7ff5..04c4cff 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -274,17 +274,15 @@ static void ctxt_switch_to(struct vcpu *n) virt_timer_restore(n); } -/* Update per-VCPU guest runstate shared memory area (if registered). */ -static void update_runstate_area(struct vcpu *v) +static void update_runstate_by_gvaddr(struct vcpu *v) { void __user *guest_handle = NULL; - if ( guest_handle_is_null(runstate_guest(v)) ) - return; + ASSERT(!guest_handle_is_null(runstate_guest_virt(v))); if ( VM_ASSIST(v->domain, runstate_update_flag) ) { - guest_handle = &v->runstate_guest.p->state_entry_time + 1; + guest_handle = &v->runstate_guest.virt.p->state_entry_time + 1; guest_handle--; v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; __raw_copy_to_guest(guest_handle, @@ -292,7 +290,7 @@ static void update_runstate_area(struct vcpu *v) smp_wmb(); } - __copy_to_guest(runstate_guest(v), &v->runstate, 1); + __copy_to_guest(runstate_guest_virt(v), &v->runstate, 1); if ( guest_handle ) { @@ -303,6 +301,58 @@ static void update_runstate_area(struct vcpu *v) } } +extern int map_runstate_area(struct vcpu *v, struct vcpu_runstate_info **area); +extern void unmap_runstate_area(struct vcpu_runstate_info *area); + +static void update_runstate_by_gpaddr(struct vcpu *v) +{ + struct vcpu_runstate_info *runstate; + + if ( map_runstate_area(v, &runstate) ) + return; + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; + smp_wmb(); + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; + } + + memcpy(runstate, &v->runstate, sizeof(v->runstate)); + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; + smp_wmb(); + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; + } + + unmap_runstate_area(runstate); +} + +/* Update per-VCPU guest runstate shared memory area (if registered). */ +static void update_runstate_area(struct vcpu *v) +{ + if ( xchg(&v->runstate_in_use, 1) ) + return; + + switch ( v->runstate_guest_type ) + { + case RUNSTATE_NONE: + break; + + case RUNSTATE_VADDR: + update_runstate_by_gvaddr(v); + break; + + case RUNSTATE_PADDR: + update_runstate_by_gpaddr(v); + break; + } + + xchg(&v->runstate_in_use, 0); +} + static void schedule_tail(struct vcpu *prev) { ctxt_switch_from(prev); @@ -998,6 +1048,7 @@ long do_arm_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) a { case VCPUOP_register_vcpu_info: case VCPUOP_register_runstate_memory_area: + case VCPUOP_register_runstate_phys_memory_area: return do_vcpu_op(cmd, vcpuid, arg); default: return -EINVAL; diff --git a/xen/common/domain.c b/xen/common/domain.c index 32bca8d..f167a68 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -700,6 +700,68 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, struct domain **d) return 0; } +void unmap_runstate_area(struct vcpu_runstate_info *area) +{ + mfn_t mfn; + + ASSERT(area != NULL); + + mfn = _mfn(domain_page_map_to_mfn(area)); + + unmap_domain_page_global((void *) + ((unsigned long)area & + PAGE_MASK)); + + put_page_and_type(mfn_to_page(mfn)); +} + +int map_runstate_area(struct vcpu *v, struct vcpu_runstate_info **area) +{ + unsigned long offset = v->runstate_guest.phys & ~PAGE_MASK; + gfn_t gfn = gaddr_to_gfn(v->runstate_guest.phys); + struct domain *d = v->domain; + void *mapping; + struct page_info *page; + size_t size = sizeof(struct vcpu_runstate_info); + + if ( offset > (PAGE_SIZE - size) ) + return -EINVAL; + + page = get_page_from_gfn(d, gfn_x(gfn), NULL, P2M_ALLOC); + if ( !page ) + return -EINVAL; + + if ( !get_page_type(page, PGT_writable_page) ) + { + put_page(page); + return -EINVAL; + } + + mapping = __map_domain_page_global(page); + + if ( mapping == NULL ) + { + put_page_and_type(page); + return -ENOMEM; + } + + *area = mapping + offset; + + return 0; +} + +static void discard_runstate_area(struct vcpu *v) +{ + v->runstate_guest_type = RUNSTATE_NONE; +} + +static void discard_runstate_area_locked(struct vcpu *v) +{ + while ( xchg(&v->runstate_in_use, 1) ); + discard_runstate_area(v); + xchg(&v->runstate_in_use, 0); +} + int domain_kill(struct domain *d) { int rc = 0; @@ -738,7 +800,10 @@ int domain_kill(struct domain *d) if ( cpupool_move_domain(d, cpupool0) ) return -ERESTART; for_each_vcpu ( d, v ) + { + discard_runstate_area_locked(v); unmap_vcpu_info(v); + } d->is_dying = DOMDYING_dead; /* Mem event cleanup has to go here because the rings * have to be put before we call put_domain. */ @@ -1192,7 +1257,7 @@ int domain_soft_reset(struct domain *d) for_each_vcpu ( d, v ) { - set_xen_guest_handle(runstate_guest(v), NULL); + discard_runstate_area_locked(v); unmap_vcpu_info(v); } @@ -1520,18 +1585,46 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) break; rc = 0; - runstate_guest(v) = area.addr.h; + + while( xchg(&v->runstate_in_use, 1) == 0); + + discard_runstate_area(v); + + runstate_guest_virt(v) = area.addr.h; + v->runstate_guest_type = RUNSTATE_VADDR; if ( v == current ) { - __copy_to_guest(runstate_guest(v), &v->runstate, 1); + __copy_to_guest(runstate_guest_virt(v), &v->runstate, 1); } else { vcpu_runstate_get(v, &runstate); - __copy_to_guest(runstate_guest(v), &runstate, 1); + __copy_to_guest(runstate_guest_virt(v), &runstate, 1); } + xchg(&v->runstate_in_use, 0); + + break; + } + + case VCPUOP_register_runstate_phys_memory_area: + { + struct vcpu_register_runstate_memory_area area; + + rc = -EFAULT; + if ( copy_from_guest(&area, arg, 1) ) + break; + + while( xchg(&v->runstate_in_use, 1) == 0); + + discard_runstate_area(v); + v->runstate_guest.phys = area.addr.p; + v->runstate_guest_type = RUNSTATE_PADDR; + + xchg(&v->runstate_in_use, 0); + rc = 0; + break; } diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h index 3623af9..d7da4a3 100644 --- a/xen/include/public/vcpu.h +++ b/xen/include/public/vcpu.h @@ -235,6 +235,21 @@ struct vcpu_register_time_memory_area { typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t; DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t); +/* + * Register a shared memory area from which the guest may obtain its own + * runstate information without needing to execute a hypercall. + * Notes: + * 1. The registered address must be guest's physical address. + * 2. The registered runstate area should not cross page boundary. + * 3. Only one shared area may be registered per VCPU. The shared area is + * updated by the hypervisor each time the VCPU is scheduled. Thus + * runstate.state will always be RUNSTATE_running and + * runstate.state_entry_time will indicate the system time at which the + * VCPU was last scheduled to run. + * @extra_arg == pointer to vcpu_register_runstate_memory_area structure. + */ +#define VCPUOP_register_runstate_phys_memory_area 14 + #endif /* __XEN_PUBLIC_VCPU_H__ */ /* diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index edee52d..8ac597b 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -163,17 +163,31 @@ struct vcpu void *sched_priv; /* scheduler-specific data */ struct vcpu_runstate_info runstate; + + enum { + RUNSTATE_NONE = 0, + RUNSTATE_PADDR = 1, + RUNSTATE_VADDR = 2, + } runstate_guest_type; + + unsigned long runstate_in_use; + + union + { #ifndef CONFIG_COMPAT -# define runstate_guest(v) ((v)->runstate_guest) - XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ +# define runstate_guest_virt(v) ((v)->runstate_guest.virt) + XEN_GUEST_HANDLE(vcpu_runstate_info_t) virt; /* guest address */ #else -# define runstate_guest(v) ((v)->runstate_guest.native) - union { - XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; - XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; - } runstate_guest; /* guest address */ +# define runstate_guest_virt(v) ((v)->runstate_guest.virt.native) + union { + XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; + XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; + } virt; /* guest address */ #endif + paddr_t phys; + } runstate_guest; + /* last time when vCPU is scheduled out */ uint64_t last_run_time;