diff mbox

kvm tools: Add MMIO coalescing support

Message ID 1307130668-5652-1-git-send-email-levinsasha928@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sasha Levin June 3, 2011, 7:51 p.m. UTC
Coalescing MMIO allows us to avoid an exit every time we have a
MMIO write, instead - MMIO writes are coalesced in a ring which
can be flushed once an exit for a different reason is needed.
A MMIO exit is also trigged once the ring is full.

Coalesce all MMIO regions registered in the MMIO mapper.
Add a coalescing handler under kvm_cpu.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 tools/kvm/hw/vesa.c             |    2 +-
 tools/kvm/include/kvm/kvm-cpu.h |    2 ++
 tools/kvm/include/kvm/kvm.h     |    4 ++--
 tools/kvm/kvm-cpu.c             |   24 ++++++++++++++++++++++++
 tools/kvm/mmio.c                |   24 ++++++++++++++++++++++--
 5 files changed, 51 insertions(+), 5 deletions(-)

Comments

Ingo Molnar June 4, 2011, 9:38 a.m. UTC | #1
* Sasha Levin <levinsasha928@gmail.com> wrote:

> Coalescing MMIO allows us to avoid an exit every time we have a
> MMIO write, instead - MMIO writes are coalesced in a ring which
> can be flushed once an exit for a different reason is needed.
> A MMIO exit is also trigged once the ring is full.
> 
> Coalesce all MMIO regions registered in the MMIO mapper.
> Add a coalescing handler under kvm_cpu.

Does this have any effect on latency? I.e. does the guest side 
guarantee that the pending queue will be flushed after a group of 
updates have been done?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 4, 2011, 10:14 a.m. UTC | #2
On Sat, 2011-06-04 at 11:38 +0200, Ingo Molnar wrote:
> * Sasha Levin <levinsasha928@gmail.com> wrote:
> 
> > Coalescing MMIO allows us to avoid an exit every time we have a
> > MMIO write, instead - MMIO writes are coalesced in a ring which
> > can be flushed once an exit for a different reason is needed.
> > A MMIO exit is also trigged once the ring is full.
> > 
> > Coalesce all MMIO regions registered in the MMIO mapper.
> > Add a coalescing handler under kvm_cpu.
> 
> Does this have any effect on latency? I.e. does the guest side 
> guarantee that the pending queue will be flushed after a group of 
> updates have been done?

Theres nothing that detects groups of MMIO writes, but the ring size is
a bit less than PAGE_SIZE (half of it is overhead - rest is data) and
we'll exit once the ring is full.
Ingo Molnar June 4, 2011, 10:17 a.m. UTC | #3
* Sasha Levin <levinsasha928@gmail.com> wrote:

> On Sat, 2011-06-04 at 11:38 +0200, Ingo Molnar wrote:
> > * Sasha Levin <levinsasha928@gmail.com> wrote:
> > 
> > > Coalescing MMIO allows us to avoid an exit every time we have a
> > > MMIO write, instead - MMIO writes are coalesced in a ring which
> > > can be flushed once an exit for a different reason is needed.
> > > A MMIO exit is also trigged once the ring is full.
> > > 
> > > Coalesce all MMIO regions registered in the MMIO mapper.
> > > Add a coalescing handler under kvm_cpu.
> > 
> > Does this have any effect on latency? I.e. does the guest side 
> > guarantee that the pending queue will be flushed after a group of 
> > updates have been done?
> 
> Theres nothing that detects groups of MMIO writes, but the ring size is
> a bit less than PAGE_SIZE (half of it is overhead - rest is data) and
> we'll exit once the ring is full.

But if the page is only filled partially and if mmio is not submitted 
by the guest indefinitely (say it runs a lot of user-space code) then 
the mmio remains pending in the partial-page buffer?

If that's how it works then i *really* don't like this, this looks 
like a seriously mis-designed batching feature which might have 
improved a few server benchmarks but which will introduce random, 
hard to debug delays all around the place!

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar June 4, 2011, 2:46 p.m. UTC | #4
* Alexander Graf <agraf@suse.de> wrote:

> So the simple rule is: don't register a coalesced MMIO region for a 
> region where latency matters. [...]

So my first suspicion is confirmed.

A quick look at Qemu sources shows that lots of drivers are using 
coalesced_mmio without being aware of the latency effects and only 
one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers 
like hw/e1000.c sure look latency critical to me.

So i maintain my initial opinion: this is a pretty dangerous 
'optimization' that should be used with extreme care: i can tell it 
you with pretty good authority that latency problems are much more 
easy to introduce than to find and remove ...

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar June 4, 2011, 4:34 p.m. UTC | #5
* Alexander Graf <agraf@suse.de> wrote:

> 
> On 04.06.2011, at 16:46, Ingo Molnar wrote:
> 
> > 
> > * Alexander Graf <agraf@suse.de> wrote:
> > 
> >> So the simple rule is: don't register a coalesced MMIO region for a 
> >> region where latency matters. [...]
> > 
> > So my first suspicion is confirmed.
> > 
> > A quick look at Qemu sources shows that lots of drivers are using 
> > coalesced_mmio without being aware of the latency effects and only 
> > one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers 
> > like hw/e1000.c sure look latency critical to me.
> 
> e1000 maps its NVRAM on coalesced mmio - which is completely ok.

Ok!

> > So i maintain my initial opinion: this is a pretty dangerous 
> > 'optimization' that should be used with extreme care: i can tell 
> > it you with pretty good authority that latency problems are much 
> > more easy to introduce than to find and remove ...
> 
> Yup, which is why it's very sparsely used in qemu :). Basically, 
> it's only e1000 and vga, both of which are heavily used and tested 
> drivers.

Ok, so this change in:

 commit 73389b5ea017288a949ae27536c8cfd298d3e317
 Author: Sasha Levin <levinsasha928@gmail.com>
 Date:   Fri Jun 3 22:51:08 2011 +0300

    kvm tools: Add MMIO coalescing support

@@ -67,6 +70,16 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba
                .kvm_mmio_callback_fn = kvm_mmio_callback_fn,
        };
 
+       zone = (struct kvm_coalesced_mmio_zone) {
+               .addr   = phys_addr,
+               .size   = phys_addr_len,
+       };
+       ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone);
+       if (ret < 0) {
+               free(mmio);
+               return false;
+       }

Seems completely wrong, because it indiscriminately registers *all* 
mmio regions as coalesced ones.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 4, 2011, 4:50 p.m. UTC | #6
On Sat, 2011-06-04 at 18:34 +0200, Ingo Molnar wrote:
> * Alexander Graf <agraf@suse.de> wrote:
> 
> > 
> > On 04.06.2011, at 16:46, Ingo Molnar wrote:
> > 
> > > 
> > > * Alexander Graf <agraf@suse.de> wrote:
> > > 
> > >> So the simple rule is: don't register a coalesced MMIO region for a 
> > >> region where latency matters. [...]
> > > 
> > > So my first suspicion is confirmed.
> > > 
> > > A quick look at Qemu sources shows that lots of drivers are using 
> > > coalesced_mmio without being aware of the latency effects and only 
> > > one seems to make use of qemu_flush_coalesced_mmio_buffer(). Drivers 
> > > like hw/e1000.c sure look latency critical to me.
> > 
> > e1000 maps its NVRAM on coalesced mmio - which is completely ok.
> 
> Ok!
> 
> > > So i maintain my initial opinion: this is a pretty dangerous 
> > > 'optimization' that should be used with extreme care: i can tell 
> > > it you with pretty good authority that latency problems are much 
> > > more easy to introduce than to find and remove ...
> > 
> > Yup, which is why it's very sparsely used in qemu :). Basically, 
> > it's only e1000 and vga, both of which are heavily used and tested 
> > drivers.
> 
> Ok, so this change in:
> 
>  commit 73389b5ea017288a949ae27536c8cfd298d3e317
>  Author: Sasha Levin <levinsasha928@gmail.com>
>  Date:   Fri Jun 3 22:51:08 2011 +0300
> 
>     kvm tools: Add MMIO coalescing support
> 
> @@ -67,6 +70,16 @@ bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba
>                 .kvm_mmio_callback_fn = kvm_mmio_callback_fn,
>         };
>  
> +       zone = (struct kvm_coalesced_mmio_zone) {
> +               .addr   = phys_addr,
> +               .size   = phys_addr_len,
> +       };
> +       ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone);
> +       if (ret < 0) {
> +               free(mmio);
> +               return false;
> +       }
> 
> Seems completely wrong, because it indiscriminately registers *all* 
> mmio regions as coalesced ones.

Yes. I'll add a flag instead of making all of them coalesced.
diff mbox

Patch

diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c
index b99f2de..a12c601 100644
--- a/tools/kvm/hw/vesa.c
+++ b/tools/kvm/hw/vesa.c
@@ -77,7 +77,7 @@  void vesa__init(struct kvm *kvm)
 	vesa_pci_device.bar[0]		= vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO;
 	pci__register(&vesa_pci_device, dev);
 
-	kvm__register_mmio(VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback);
+	kvm__register_mmio(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, &vesa_mmio_callback);
 
 	pthread_create(&thread, NULL, vesa__dovnc, kvm);
 }
diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 4d99246..1eb4a52 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -24,6 +24,8 @@  struct kvm_cpu {
 
 	u8			is_running;
 	u8			paused;
+
+	struct kvm_coalesced_mmio_ring	*ring;
 };
 
 struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id);
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index d22a849..55551de 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -49,8 +49,8 @@  void kvm__stop_timer(struct kvm *kvm);
 void kvm__irq_line(struct kvm *kvm, int irq, int level);
 bool kvm__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int size, u32 count);
 bool kvm__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write);
-bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write));
-bool kvm__deregister_mmio(u64 phys_addr);
+bool kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write));
+bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
 void kvm__pause(void);
 void kvm__continue(void);
 void kvm__notify_paused(void);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index be0528b..1fb1c74 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -14,6 +14,8 @@ 
 #include <errno.h>
 #include <stdio.h>
 
+#define PAGE_SIZE (sysconf(_SC_PAGE_SIZE))
+
 extern __thread struct kvm_cpu *current_kvm_cpu;
 
 static inline bool is_in_protected_mode(struct kvm_cpu *vcpu)
@@ -70,6 +72,7 @@  struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id)
 {
 	struct kvm_cpu *vcpu;
 	int mmap_size;
+	int coalesced_offset;
 
 	vcpu		= kvm_cpu__new(kvm);
 	if (!vcpu)
@@ -89,6 +92,10 @@  struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id)
 	if (vcpu->kvm_run == MAP_FAILED)
 		die("unable to mmap vcpu fd");
 
+	coalesced_offset = ioctl(kvm->sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO);
+	if (coalesced_offset)
+		vcpu->ring = (void *)vcpu->kvm_run + (coalesced_offset * PAGE_SIZE);
+
 	vcpu->is_running = true;
 
 	return vcpu;
@@ -395,6 +402,22 @@  static void kvm_cpu_signal_handler(int signum)
 	}
 }
 
+static void kvm_cpu__handle_coalesced_mmio(struct kvm_cpu *cpu)
+{
+	if (cpu->ring) {
+		while (cpu->ring->first != cpu->ring->last) {
+			struct kvm_coalesced_mmio *m;
+			m = &cpu->ring->coalesced_mmio[cpu->ring->first];
+			kvm__emulate_mmio(cpu->kvm,
+					m->phys_addr,
+					m->data,
+					m->len,
+					1);
+			cpu->ring->first = (cpu->ring->first + 1) % KVM_COALESCED_MMIO_MAX;
+		}
+	}
+}
+
 int kvm_cpu__start(struct kvm_cpu *cpu)
 {
 	sigset_t sigset;
@@ -462,6 +485,7 @@  int kvm_cpu__start(struct kvm_cpu *cpu)
 		default:
 			goto panic_kvm;
 		}
+		kvm_cpu__handle_coalesced_mmio(cpu);
 	}
 
 exit_kvm:
diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c
index acd091e..64bef37 100644
--- a/tools/kvm/mmio.c
+++ b/tools/kvm/mmio.c
@@ -5,6 +5,8 @@ 
 #include <stdio.h>
 #include <stdlib.h>
 
+#include <sys/ioctl.h>
+#include <linux/kvm.h>
 #include <linux/types.h>
 #include <linux/rbtree.h>
 
@@ -53,9 +55,10 @@  static const char *to_direction(u8 is_write)
 	return "read";
 }
 
-bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write))
+bool kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callback_fn)(u64 addr, u8 *data, u32 len, u8 is_write))
 {
 	struct mmio_mapping *mmio;
+	struct kvm_coalesced_mmio_zone zone;
 	int ret;
 
 	mmio = malloc(sizeof(*mmio));
@@ -67,6 +70,16 @@  bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba
 		.kvm_mmio_callback_fn = kvm_mmio_callback_fn,
 	};
 
+	zone = (struct kvm_coalesced_mmio_zone) {
+		.addr	= phys_addr,
+		.size	= phys_addr_len,
+	};
+	ret = ioctl(kvm->vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone);
+	if (ret < 0) {
+		free(mmio);
+		return false;
+	}
+
 	br_write_lock();
 	ret = mmio_insert(&mmio_tree, mmio);
 	br_write_unlock();
@@ -74,9 +87,10 @@  bool kvm__register_mmio(u64 phys_addr, u64 phys_addr_len, void (*kvm_mmio_callba
 	return ret;
 }
 
-bool kvm__deregister_mmio(u64 phys_addr)
+bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 {
 	struct mmio_mapping *mmio;
+	struct kvm_coalesced_mmio_zone zone;
 
 	br_write_lock();
 	mmio = mmio_search_single(&mmio_tree, phys_addr);
@@ -85,6 +99,12 @@  bool kvm__deregister_mmio(u64 phys_addr)
 		return false;
 	}
 
+	zone = (struct kvm_coalesced_mmio_zone) {
+		.addr	= phys_addr,
+		.size	= 1,
+	};
+	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+
 	rb_int_erase(&mmio_tree, &mmio->node);
 	br_write_unlock();