Message ID | 1307130668-5652-1-git-send-email-levinsasha928@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Sasha Levin <levinsasha928@gmail.com> wrote: > Coalescing MMIO allows us to avoid an exit every time we have a > MMIO write, instead - MMIO writes are coalesced in a ring which > can be flushed once an exit for a different reason is needed. > A MMIO exit is also trigged once the ring is full. > > Coalesce all MMIO regions registered in the MMIO mapper. > Add a coalescing handler under kvm_cpu. Does this have any effect on latency? I.e. does the guest side guarantee that the pending queue will be flushed after a group of updates have been done? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2011-06-04 at 11:38 +0200, Ingo Molnar wrote: > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > Coalescing MMIO allows us to avoid an exit every time we have a > > MMIO write, instead - MMIO writes are coalesced in a ring which > > can be flushed once an exit for a different reason is needed. > > A MMIO exit is also trigged once the ring is full. > > > > Coalesce all MMIO regions registered in the MMIO mapper. > > Add a coalescing handler under kvm_cpu. > > Does this have any effect on latency? I.e. does the guest side > guarantee that the pending queue will be flushed after a group of > updates have been done? Theres nothing that detects groups of MMIO writes, but the ring size is a bit less than PAGE_SIZE (half of it is overhead - rest is data) and we'll exit once the ring is full.
* Sasha Levin <levinsasha928@gmail.com> wrote: > On Sat, 2011-06-04 at 11:38 +0200, Ingo Molnar wrote: > > * Sasha Levin <levinsasha928@gmail.com> wrote: > > > > > Coalescing MMIO allows us to avoid an exit every time we have a > > > MMIO write, instead - MMIO writes are coalesced in a ring which > > > can be flushed once an exit for a different reason is needed. > > > A MMIO exit is also trigged once the ring is full. > > > > > > Coalesce all MMIO regions registered in the MMIO mapper. > > > Add a coalescing handler under kvm_cpu. > > > > Does this have any effect on latency? I.e. does the guest side > > guarantee that the pending queue will be flushed after a group of > > updates have been done? > > Theres nothing that detects groups of MMIO writes, but the ring size is > a bit less than PAGE_SIZE (half of it is overhead - rest is data) and > we'll exit once the ring is full. But if the page is only filled partially and if mmio is not submitted by the guest indefinitely (say it runs a lot of user-space code) then the mmio remains pending in the partial-page buffer? If that's how it works then i *really* don't like this, this looks like a seriously mis-designed batching feature which might have improved a few server benchmarks but which will introduce random, hard to debug delays all around the place! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Alexander Graf <agraf@suse.de> wrote: > So the simple rule is: don't register a coalesced MMIO region for a > region where latency matters. [...] So my first suspicion is confirmed. A quick look at Qemu sources shows that lots of drivers are using coalesced_mmio without being aware of the latency effects and only one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers like hw/e1000.c sure look latency critical to me. So i maintain my initial opinion: this is a pretty dangerous 'optimization' that should be used with extreme care: i can tell it you with pretty good authority that latency problems are much more easy to introduce than to find and remove ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Alexander Graf <agraf@suse.de> wrote: > > On 04.06.2011, at 16:46, Ingo Molnar wrote: > > > > > * Alexander Graf <agraf@suse.de> wrote: > > > >> So the simple rule is: don't register a coalesced MMIO region for a > >> region where latency matters. [...] > > > > So my first suspicion is confirmed. > > > > A quick look at Qemu sources shows that lots of drivers are using > > coalesced_mmio without being aware of the latency effects and only > > one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers > > like hw/e1000.c sure look latency critical to me. > > e1000 maps its NVRAM on coalesced mmio - which is completely ok. Ok! > > So i maintain my initial opinion: this is a pretty dangerous > > 'optimization' that should be used with extreme care: i can tell > > it you with pretty good authority that latency problems are much > > more easy to introduce than to find and remove ... > > Yup, which is why it's very sparsely used in qemu :). Basically, > it's only e1000 and vga, both of which are heavily used and tested > drivers. Ok, so this change in: commit 73389b5ea017288a949ae27536c8cfd298d3e317 Author: Sasha Levin <levinsasha928@gmail.com> Date: Fri Jun 3 22:51:08 2011 +0300 kvm tools: Add MMIO coalescing support @@ -67,6 +70,16 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba .kvm_mmio_callback_fn = kvm_mmio_callback_fn, }; + zone = (struct kvm_coalesced_mmio_zone) { + .addr = phys_addr, + .size = phys_addr_len, + }; + ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone); + if (ret < 0) { + free(mmio); + return false; + } Seems completely wrong, because it indiscriminately registers *all* mmio regions as coalesced ones. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2011-06-04 at 18:34 +0200, Ingo Molnar wrote: > * Alexander Graf <agraf@suse.de> wrote: > > > > > On 04.06.2011, at 16:46, Ingo Molnar wrote: > > > > > > > > * Alexander Graf <agraf@suse.de> wrote: > > > > > >> So the simple rule is: don't register a coalesced MMIO region for a > > >> region where latency matters. [...] > > > > > > So my first suspicion is confirmed. > > > > > > A quick look at Qemu sources shows that lots of drivers are using > > > coalesced_mmio without being aware of the latency effects and only > > > one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers > > > like hw/e1000.c sure look latency critical to me. > > > > e1000 maps its NVRAM on coalesced mmio - which is completely ok. > > Ok! > > > > So i maintain my initial opinion: this is a pretty dangerous > > > 'optimization' that should be used with extreme care: i can tell > > > it you with pretty good authority that latency problems are much > > > more easy to introduce than to find and remove ... > > > > Yup, which is why it's very sparsely used in qemu :). Basically, > > it's only e1000 and vga, both of which are heavily used and tested > > drivers. > > Ok, so this change in: > > commit 73389b5ea017288a949ae27536c8cfd298d3e317 > Author: Sasha Levin <levinsasha928@gmail.com> > Date: Fri Jun 3 22:51:08 2011 +0300 > > kvm tools: Add MMIO coalescing support > > @@ -67,6 +70,16 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba > .kvm_mmio_callback_fn = kvm_mmio_callback_fn, > }; > > + zone = (struct kvm_coalesced_mmio_zone) { > + .addr = phys_addr, > + .size = phys_addr_len, > + }; > + ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone); > + if (ret < 0) { > + free(mmio); > + return false; > + } > > Seems completely wrong, because it indiscriminately registers *all* > mmio regions as coalesced ones. Yes. I'll add a flag instead of making all of them coalesced.
diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c index b99f2de..a12c601 100644 --- a/tools/kvm/hw/vesa.c +++ b/tools/kvm/hw/vesa.c @@ -77,7 +77,7 @@ void vesa__init(struct kvm *kvm) vesa_pci_device.bar[0] = vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO; pci__register(&vesa_pci_device, dev); - kvm__register_mmio(VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback); + kvm__register_mmio(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback); pthread_create(&thread, NULL, vesa__dovnc, kvm); } diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 4d99246..1eb4a52 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -24,6 +24,8 @@ struct kvm_cpu { u8 is_running; u8 paused; + + struct kvm_coalesced_mmio_ring *ring; }; struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id); diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index d22a849..55551de 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -49,8 +49,8 @@ void kvm__stop_timer(struct kvm *kvm); void kvm__irq_line(struct kvm *kvm, int irq, int level); bool kvm__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int size, u32 count); bool kvm__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write); -bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write)); -bool kvm__deregister_mmio(u64 phys_addr); +bool kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write)); +bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr); void kvm__pause(void); void kvm__continue(void); void kvm__notify_paused(void); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index be0528b..1fb1c74 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -14,6 +14,8 @@ #include <errno.h> #include <stdio.h> +#define PAGE_SIZE (sysconf(_SC_PAGE_SIZE)) + extern __thread struct kvm_cpu *current_kvm_cpu; static inline bool is_in_protected_mode(struct kvm_cpu *vcpu) @@ -70,6 +72,7 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id) { struct kvm_cpu *vcpu; int mmap_size; + int coalesced_offset; vcpu = kvm_cpu__new(kvm); if (!vcpu) @@ -89,6 +92,10 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id) if (vcpu->kvm_run == MAP_FAILED) die("unable to mmap vcpu fd"); + coalesced_offset = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO); + if (coalesced_offset) + vcpu->ring = (void *)vcpu->kvm_run + (coalesced_offset * PAGE_SIZE); + vcpu->is_running = true; return vcpu; @@ -395,6 +402,22 @@ static void kvm_cpu_signal_handler(int signum) } } +static void kvm_cpu__handle_coalesced_mmio(struct kvm_cpu *cpu) +{ + if (cpu->ring) { + while (cpu->ring->first != cpu->ring->last) { + struct kvm_coalesced_mmio *m; + m = &cpu->ring->coalesced_mmio[cpu->ring->first]; + kvm__emulate_mmio(cpu->kvm, + m->phys_addr, + m->data, + m->len, + 1); + cpu->ring->first = (cpu->ring->first + 1) % KVM_COALESCED_MMIO_MAX; + } + } +} + int kvm_cpu__start(struct kvm_cpu *cpu) { sigset_t sigset; @@ -462,6 +485,7 @@ int kvm_cpu__start(struct kvm_cpu *cpu) default: goto panic_kvm; } + kvm_cpu__handle_coalesced_mmio(cpu); } exit_kvm: diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c index acd091e..64bef37 100644 --- a/tools/kvm/mmio.c +++ b/tools/kvm/mmio.c @@ -5,6 +5,8 @@ #include <stdio.h> #include <stdlib.h> +#include <sys/ioctl.h> +#include <linux/kvm.h> #include <linux/types.h> #include <linux/rbtree.h> @@ -53,9 +55,10 @@ static const char *to_direction(u8 is_write) return "read"; } -bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write)) +bool kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write)) { struct mmio_mapping *mmio; + struct kvm_coalesced_mmio_zone zone; int ret; mmio = malloc(sizeof(*mmio)); @@ -67,6 +70,16 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba .kvm_mmio_callback_fn = kvm_mmio_callback_fn, }; + zone = (struct kvm_coalesced_mmio_zone) { + .addr = phys_addr, + .size = phys_addr_len, + }; + ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone); + if (ret < 0) { + free(mmio); + return false; + } + br_write_lock(); ret = mmio_insert(&mmio_tree, mmio); br_write_unlock(); @@ -74,9 +87,10 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba return ret; } -bool kvm__deregister_mmio(u64 phys_addr) +bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) { struct mmio_mapping *mmio; + struct kvm_coalesced_mmio_zone zone; br_write_lock(); mmio = mmio_search_single(&mmio_tree, phys_addr); @@ -85,6 +99,12 @@ bool kvm__deregister_mmio(u64 phys_addr) return false; } + zone = (struct kvm_coalesced_mmio_zone) { + .addr = phys_addr, + .size = 1, + }; + ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone); + rb_int_erase(&mmio_tree, &mmio->node); br_write_unlock();
Coalescing MMIO allows us to avoid an exit every time we have a MMIO write, instead - MMIO writes are coalesced in a ring which can be flushed once an exit for a different reason is needed. A MMIO exit is also trigged once the ring is full. Coalesce all MMIO regions registered in the MMIO mapper. Add a coalescing handler under kvm_cpu. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> --- tools/kvm/hw/vesa.c | 2 +- tools/kvm/include/kvm/kvm-cpu.h | 2 ++ tools/kvm/include/kvm/kvm.h | 4 ++-- tools/kvm/kvm-cpu.c | 24 ++++++++++++++++++++++++ tools/kvm/mmio.c | 24 ++++++++++++++++++++++-- 5 files changed, 51 insertions(+), 5 deletions(-)