Message ID | cover.1540441925.git.isaku.yamahata@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/kernel/hyper-v: xmm fast hypercall | expand |
On Thu, 25 Oct 2018 at 05:50, Isaku Yamahata <isaku.yamahata@gmail.com> wrote: > > This patch series implements xmm fast hypercall for hyper-v as guest > and kvm support as VMM. > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > benchmark result: > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't Did you evaluate xmm fast hypercall for pv ipis? In addition, testing on a large server can provide a forceful evidence. Regards, Wanpeng Li > make any behaviour change. So for now I measured the time of > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > the average of 5 runs are as follows. > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > interesting. But not yet now.) > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > with path: 9931 ns > without patch: 11111 ns > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > when possible. so in the case of vcpu=4. So I used kernel before the parch > to measure the effect of xmm fast hypercall with ipi_benchmark. > The following is the average of 100 runs. > > ipi_benchmark: average of 100 runs without 4bd06060762b > > with patch: > Dry-run 0 495181 > Self-IPI 11352737 21549999 > Normal IPI 499400218 575433727 > Broadcast IPI 0 1700692010 > Broadcast lock 0 1663001374 > > without patch: > Dry-run 0 607657 > Self-IPI 10915950 21217644 > Normal IPI 621712609 735015570 > Broadcast IPI 0 2173803373 > Broadcast lock 0 2150451543 > > Isaku Yamahata (6): > x86/kernel/hyper-v: xmm fast hypercall as guest > x86/hyperv: use hv_do_hypercall for __send_ipi_mask_ex() > x86/hyperv: use hv_do_hypercall for flush_virtual_address_space_ex > hyperv: use hv_do_hypercall instead of hv_do_fast_hypercall > x86/kvm/hyperv: implement xmm fast hypercall > local: hyperv: test ex hypercall > > arch/x86/hyperv/hv_apic.c | 16 +++- > arch/x86/hyperv/mmu.c | 24 +++-- > arch/x86/hyperv/nested.c | 2 +- > arch/x86/include/asm/hyperv-tlfs.h | 3 + > arch/x86/include/asm/mshyperv.h | 180 ++++++++++++++++++++++++++++++++++-- > arch/x86/kvm/hyperv.c | 101 ++++++++++++++++---- > drivers/hv/connection.c | 3 +- > drivers/hv/hv.c | 3 +- > drivers/pci/controller/pci-hyperv.c | 7 +- > 9 files changed, 291 insertions(+), 48 deletions(-) > > -- > 2.14.1 >
On Thu, Oct 25, 2018 at 06:02:58AM +0100, Wanpeng Li <kernellwp@gmail.com> wrote: > On Thu, 25 Oct 2018 at 05:50, Isaku Yamahata <isaku.yamahata@gmail.com> wrote: > > > > This patch series implements xmm fast hypercall for hyper-v as guest > > and kvm support as VMM. > > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > > > benchmark result: > > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > > Did you evaluate xmm fast hypercall for pv ipis? In addition, testing > on a large server can provide a forceful evidence. Please see the below results which was done without the changeset of 4bd06060762b to use xmm fasthypercall forcibly with vcpu=4. Right now I'm looking for a machine with pcpu > 64, but it would take a while. I wanted to sent out the patch early so that someone else can test/benchmark it. Thanks, > > Regards, > Wanpeng Li > > > make any behaviour change. So for now I measured the time of > > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > > the average of 5 runs are as follows. > > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > > interesting. But not yet now.) > > > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > > > with path: 9931 ns > > without patch: 11111 ns > > > > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > > when possible. so in the case of vcpu=4. So I used kernel before the parch > > to measure the effect of xmm fast hypercall with ipi_benchmark. > > The following is the average of 100 runs. > > > > ipi_benchmark: average of 100 runs without 4bd06060762b > > > > with patch: > > Dry-run 0 495181 > > Self-IPI 11352737 21549999 > > Normal IPI 499400218 575433727 > > Broadcast IPI 0 1700692010 > > Broadcast lock 0 1663001374 > > > > without patch: > > Dry-run 0 607657 > > Self-IPI 10915950 21217644 > > Normal IPI 621712609 735015570 > > Broadcast IPI 0 2173803373 > > Broadcast lock 0 2150451543 > > > > Isaku Yamahata (6): > > x86/kernel/hyper-v: xmm fast hypercall as guest > > x86/hyperv: use hv_do_hypercall for __send_ipi_mask_ex() > > x86/hyperv: use hv_do_hypercall for flush_virtual_address_space_ex > > hyperv: use hv_do_hypercall instead of hv_do_fast_hypercall > > x86/kvm/hyperv: implement xmm fast hypercall > > local: hyperv: test ex hypercall > > > > arch/x86/hyperv/hv_apic.c | 16 +++- > > arch/x86/hyperv/mmu.c | 24 +++-- > > arch/x86/hyperv/nested.c | 2 +- > > arch/x86/include/asm/hyperv-tlfs.h | 3 + > > arch/x86/include/asm/mshyperv.h | 180 ++++++++++++++++++++++++++++++++++-- > > arch/x86/kvm/hyperv.c | 101 ++++++++++++++++---- > > drivers/hv/connection.c | 3 +- > > drivers/hv/hv.c | 3 +- > > drivers/pci/controller/pci-hyperv.c | 7 +- > > 9 files changed, 291 insertions(+), 48 deletions(-) > > > > -- > > 2.14.1 > >
We will test this as well. K. Y -----Original Message----- From: Isaku Yamahata <isaku.yamahata@gmail.com> Sent: Thursday, October 25, 2018 10:38 AM To: Wanpeng Li <kernellwp@gmail.com> Cc: isaku.yamahata@gmail.com; kvm <kvm@vger.kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Radim Krcmar <rkrcmar@redhat.com>; vkuznets <vkuznets@redhat.com>; KY Srinivasan <kys@microsoft.com>; Tianyu Lan <Tianyu.Lan@microsoft.com>; yi.y.sun@intel.com; chao.gao@intel.com; isaku.yamahata@intel.com Subject: Re: [PATCH v2 0/6] x86/kernel/hyper-v: xmm fast hypercall On Thu, Oct 25, 2018 at 06:02:58AM +0100, Wanpeng Li <kernellwp@gmail.com> wrote: > On Thu, 25 Oct 2018 at 05:50, Isaku Yamahata <isaku.yamahata@gmail.com> wrote: > > > > This patch series implements xmm fast hypercall for hyper-v as guest > > and kvm support as VMM. > > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without gva > > list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > > > benchmark result: > > At the moment, my test machine have only pcpu=4, ipi benchmark > > doesn't > > Did you evaluate xmm fast hypercall for pv ipis? In addition, testing > on a large server can provide a forceful evidence. Please see the below results which was done without the changeset of 4bd06060762b to use xmm fasthypercall forcibly with vcpu=4. Right now I'm looking for a machine with pcpu > 64, but it would take a while. I wanted to sent out the patch early so that someone else can test/benchmark it. Thanks, > > Regards, > Wanpeng Li > > > make any behaviour change. So for now I measured the time of > > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > > the average of 5 runs are as follows. > > (When large machine with pcpu > 64 is avaialble, ipi_benchmark > > result is interesting. But not yet now.) > > > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > > > with path: 9931 ns > > without patch: 11111 ns > > > > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast > > hypercall when possible. so in the case of vcpu=4. So I used kernel > > before the parch to measure the effect of xmm fast hypercall with ipi_benchmark. > > The following is the average of 100 runs. > > > > ipi_benchmark: average of 100 runs without 4bd06060762b > > > > with patch: > > Dry-run 0 495181 > > Self-IPI 11352737 21549999 > > Normal IPI 499400218 575433727 > > Broadcast IPI 0 1700692010 > > Broadcast lock 0 1663001374 > > > > without patch: > > Dry-run 0 607657 > > Self-IPI 10915950 21217644 > > Normal IPI 621712609 735015570 > > Broadcast IPI 0 2173803373 > > Broadcast lock 0 2150451543 > > > > Isaku Yamahata (6): > > x86/kernel/hyper-v: xmm fast hypercall as guest > > x86/hyperv: use hv_do_hypercall for __send_ipi_mask_ex() > > x86/hyperv: use hv_do_hypercall for flush_virtual_address_space_ex > > hyperv: use hv_do_hypercall instead of hv_do_fast_hypercall > > x86/kvm/hyperv: implement xmm fast hypercall > > local: hyperv: test ex hypercall > > > > arch/x86/hyperv/hv_apic.c | 16 +++- > > arch/x86/hyperv/mmu.c | 24 +++-- > > arch/x86/hyperv/nested.c | 2 +- > > arch/x86/include/asm/hyperv-tlfs.h | 3 + > > arch/x86/include/asm/mshyperv.h | 180 ++++++++++++++++++++++++++++++++++-- > > arch/x86/kvm/hyperv.c | 101 ++++++++++++++++---- > > drivers/hv/connection.c | 3 +- > > drivers/hv/hv.c | 3 +- > > drivers/pci/controller/pci-hyperv.c | 7 +- > > 9 files changed, 291 insertions(+), 48 deletions(-) > > > > -- > > 2.14.1 > > -- Isaku Yamahata <isaku.yamahata@gmail.com>
On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote: > This patch series implements xmm fast hypercall for hyper-v as guest > and kvm support as VMM. I think it may be a good idea to do it in separate patchsets. They're probably targeted at different maintainer trees (x86/hyperv vs kvm) and the only thing they have in common is a couple of new defines in hyperv-tlfs.h. > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > benchmark result: > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > make any behaviour change. So for now I measured the time of > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. This suggests that the guest OS was Linux with your patches 1-4. What was the hypervisor? KVM with your patch 5 or Hyper-V proper? > the average of 5 runs are as follows. > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > interesting. But not yet now.) Are you referring to https://patchwork.kernel.org/patch/10122703/ ? Has it landed anywhere in the tree? I seem unable to find it... > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > with path: 9931 ns > without patch: 11111 ns > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > when possible. so in the case of vcpu=4. So I used kernel before the parch > to measure the effect of xmm fast hypercall with ipi_benchmark. > The following is the average of 100 runs. > > ipi_benchmark: average of 100 runs without 4bd06060762b > > with patch: > Dry-run 0 495181 > Self-IPI 11352737 21549999 > Normal IPI 499400218 575433727 > Broadcast IPI 0 1700692010 > Broadcast lock 0 1663001374 > > without patch: > Dry-run 0 607657 > Self-IPI 10915950 21217644 > Normal IPI 621712609 735015570 This is about 122 ms difference in IPI sending time, and 160 ms in total time, i.e. extra 38 ms for the acknowledge. AFAICS the acknowledge path should be exactly the same. Any idea where these additional 38 ms come from? > Broadcast IPI 0 2173803373 This one is strange, too: the difference should only be on the sending side, and there it should be basically constant with the number of cpus. So I would expect the patched vs unpatched delta to be about the same as for "Normal IPI". Am I missing something? > Broadcast lock 0 2150451543 Thanks, Roman.
On Mon, Oct 29, 2018 at 06:22:14PM +0000, Roman Kagan <rkagan@virtuozzo.com> wrote: > On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote: > > This patch series implements xmm fast hypercall for hyper-v as guest > > and kvm support as VMM. > > I think it may be a good idea to do it in separate patchsets. They're > probably targeted at different maintainer trees (x86/hyperv vs kvm) and > the only thing they have in common is a couple of new defines in > hyperv-tlfs.h. > > > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > > > benchmark result: > > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > > make any behaviour change. So for now I measured the time of > > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > > This suggests that the guest OS was Linux with your patches 1-4. What > was the hypervisor? KVM with your patch 5 or Hyper-V proper? For patch 1-4, it's hyper-v. For patch 5, it's kvm with hyper-v hypercall support. I'll split this patch series to avoid confustion. > > > the average of 5 runs are as follows. > > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > > interesting. But not yet now.) > > Are you referring to https://patchwork.kernel.org/patch/10122703/ ? > Has it landed anywhere in the tree? I seem unable to find it... Yes, that patch. it's not merged yet. > > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > > > with path: 9931 ns > > without patch: 11111 ns > > > > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > > when possible. so in the case of vcpu=4. So I used kernel before the parch > > to measure the effect of xmm fast hypercall with ipi_benchmark. > > The following is the average of 100 runs. > > > > ipi_benchmark: average of 100 runs without 4bd06060762b > > > > with patch: > > Dry-run 0 495181 > > Self-IPI 11352737 21549999 > > Normal IPI 499400218 575433727 > > Broadcast IPI 0 1700692010 > > Broadcast lock 0 1663001374 > > > > without patch: > > Dry-run 0 607657 > > Self-IPI 10915950 21217644 > > Normal IPI 621712609 735015570 > > This is about 122 ms difference in IPI sending time, and 160 ms in > total time, i.e. extra 38 ms for the acknowledge. AFAICS the > acknowledge path should be exactly the same. Any idea where these > additional 38 ms come from? > > > Broadcast IPI 0 2173803373 > > This one is strange, too: the difference should only be on the sending > side, and there it should be basically constant with the number of cpus. > So I would expect the patched vs unpatched delta to be about the same as > for "Normal IPI". Am I missing something? The result seems very sensitive to host activity and so is unstable. (pcpu=vcpu=4 in the benchmark.) Since the benchmark should be on large machine(vcpu>64) anyway, I didn't dig further. Thanks, > > > Broadcast lock 0 2150451543 > > Thanks, > Roman.
On Mon, Oct 29, 2018 at 07:43:19PM -0700, Isaku Yamahata wrote: > On Mon, Oct 29, 2018 at 06:22:14PM +0000, > Roman Kagan <rkagan@virtuozzo.com> wrote: > > On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote: > > > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > > > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > > > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > > > > > benchmark result: > > > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > > > make any behaviour change. So for now I measured the time of > > > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > > > > This suggests that the guest OS was Linux with your patches 1-4. What > > was the hypervisor? KVM with your patch 5 or Hyper-V proper? > > For patch 1-4, it's hyper-v. > For patch 5, it's kvm with hyper-v hypercall support. So you have two result sets? Which one was in your post? It'd also be curious to run some IPI or TLB flush sensintive benchmark with a Windows guest. > > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > > > > > with path: 9931 ns > > > without patch: 11111 ns > > > > > > > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > > > when possible. so in the case of vcpu=4. So I used kernel before the parch > > > to measure the effect of xmm fast hypercall with ipi_benchmark. > > > The following is the average of 100 runs. > > > > > > ipi_benchmark: average of 100 runs without 4bd06060762b > > > > > > with patch: > > > Dry-run 0 495181 > > > Self-IPI 11352737 21549999 > > > Normal IPI 499400218 575433727 > > > Broadcast IPI 0 1700692010 > > > Broadcast lock 0 1663001374 > > > > > > without patch: > > > Dry-run 0 607657 > > > Self-IPI 10915950 21217644 > > > Normal IPI 621712609 735015570 > > > > This is about 122 ms difference in IPI sending time, and 160 ms in > > total time, i.e. extra 38 ms for the acknowledge. AFAICS the > > acknowledge path should be exactly the same. Any idea where these > > additional 38 ms come from? > > > > > Broadcast IPI 0 2173803373 > > > > This one is strange, too: the difference should only be on the sending > > side, and there it should be basically constant with the number of cpus. > > So I would expect the patched vs unpatched delta to be about the same as > > for "Normal IPI". Am I missing something? > > The result seems very sensitive to host activity and so is unstable. > (pcpu=vcpu=4 in the benchmark.) > Since the benchmark should be on large machine(vcpu>64) anyway, IMO the bigger the vcpu set you want to pass in the hypercall, the less competitive the xmm fast version is. I think realistically every implementation of xmm fast both in the guest and in the hypervisor will actually use the parameter block in memory (and so does yours), so the difference between xmm fast and regular hypercalls is the cost of loading/storing the parameter block to/from xmm (+ preserving the task fpu state) in the guest vs mapping the parameter block in the hypervisor. The latter is constant (per spec the parameter block can't cross page boundaries so it's always one page exactly), the former grows with the size of the parameter block. So I think if there's no conclusive win on a small machine there's no reason to expect it to be on a big one. Thanks, Roman.