mbox series

[v2,0/6] KVM: Improve MMIO Coalescing API

Message ID 20240718193543.624039-1-ilstam@amazon.com (mailing list archive)
Headers show
Series KVM: Improve MMIO Coalescing API | expand

Message

Ilias Stamatis July 18, 2024, 7:35 p.m. UTC
The current MMIO coalescing design has a few drawbacks which limit its
usefulness. Currently all coalesced MMIO zones use the same ring buffer.
That means that upon a userspace exit we have to handle potentially
unrelated MMIO writes synchronously. And a VM-wide lock needs to be
taken in the kernel when an MMIO exit occurs.

Additionally, there is no direct way for userspace to be notified about
coalesced MMIO writes. If the next MMIO exit to userspace is when the
ring buffer has filled then a substantial (and unbounded) amount of time
may have passed since the first coalesced MMIO.

This series adds new ioctls to KVM that allow for greater control by
making it possible to associate different MMIO zones with different ring
buffers. It also allows userspace to use poll() to check for coalesced
writes in order to avoid userspace exits in vCPU threads (see patch 3
for why this can be useful).

The idea of improving the API in this way originally came from Paul
Durrant (pdurrant@amazon.co.uk) but the implementation is mine.

The first patch in the series is a bug in the existing code that I
discovered while writing a selftest and can be merged independently.

Changelog:

v2: 
  - Fixed a bug in poll() in patch 3 where a NULL check was omitted.
  - Added an explicit include to allow the build to succeed on powerpc.
  - Rebased on top kvm/queue resolving a few conflicts.
  - Tiny changes on the last 2 patches.

v1: https://lore.kernel.org/kvm/20240710085259.2125131-1-ilstam@amazon.com/T/

Ilias Stamatis (6):
  KVM: Fix coalesced_mmio_has_room()
  KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl
  KVM: Support poll() on coalesced mmio buffer fds
  KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls
  KVM: Documentation: Document v2 of coalesced MMIO API
  KVM: selftests: Add coalesced_mmio_test

 Documentation/virt/kvm/api.rst                |  91 +++++
 include/linux/kvm_host.h                      |   1 +
 include/uapi/linux/kvm.h                      |  18 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/coalesced_mmio_test.c       | 313 ++++++++++++++++++
 virt/kvm/coalesced_mmio.c                     | 205 +++++++++++-
 virt/kvm/coalesced_mmio.h                     |  17 +-
 virt/kvm/kvm_main.c                           |  40 ++-
 8 files changed, 665 insertions(+), 21 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/coalesced_mmio_test.c

Comments

Sean Christopherson Aug. 17, 2024, 12:40 a.m. UTC | #1
On Thu, Jul 18, 2024, Ilias Stamatis wrote:
> The current MMIO coalescing design has a few drawbacks which limit its
> usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> That means that upon a userspace exit we have to handle potentially
> unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> taken in the kernel when an MMIO exit occurs.
> 
> Additionally, there is no direct way for userspace to be notified about
> coalesced MMIO writes. If the next MMIO exit to userspace is when the
> ring buffer has filled then a substantial (and unbounded) amount of time
> may have passed since the first coalesced MMIO.
> 
> This series adds new ioctls to KVM that allow for greater control by
> making it possible to associate different MMIO zones with different ring
> buffers. It also allows userspace to use poll() to check for coalesced
> writes in order to avoid userspace exits in vCPU threads (see patch 3
> for why this can be useful).
> 
> The idea of improving the API in this way originally came from Paul
> Durrant (pdurrant@amazon.co.uk) but the implementation is mine.
> 
> The first patch in the series is a bug in the existing code that I
> discovered while writing a selftest and can be merged independently.

Ya, I'll grab it, maybe for 6.11?  Doesn't seem urgent though.

Anyways, I gave this a *very* cursory review.  I'd go ahead and send v3 though,
you're going to need Paolo's eyeballs on this, and he's offline until September.
Sean Christopherson Aug. 23, 2024, 11:47 p.m. UTC | #2
On Thu, 18 Jul 2024 20:35:37 +0100, Ilias Stamatis wrote:
> The current MMIO coalescing design has a few drawbacks which limit its
> usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> That means that upon a userspace exit we have to handle potentially
> unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> taken in the kernel when an MMIO exit occurs.
> 
> Additionally, there is no direct way for userspace to be notified about
> coalesced MMIO writes. If the next MMIO exit to userspace is when the
> ring buffer has filled then a substantial (and unbounded) amount of time
> may have passed since the first coalesced MMIO.
> 
> [...]

Applied patch 1 to kvm-x86 generic.  I deliberately didn't put this in fixes or
Cc: it for stable, as the bug has been around for sooo long without anyone
noticing that there's basically zero chance that the bug is actively causing
issues.

I also reworked and expanded the changelog significantly to make it more clear
why things break, what the fallout is (KVM can _sometimes_ use the full ring),
and to call out that the lockless scheme that the buggy commit was preparing
for never seems to have landed.

Please take a gander at the changelog and holler if I messed anything up.

[1/6] KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit
      https://github.com/kvm-x86/linux/commit/92f6d4130497

--
https://github.com/kvm-x86/linux/tree/next
Stamatis, Ilias Aug. 27, 2024, 10:35 a.m. UTC | #3
On Fri, 2024-08-23 at 16:47 -0700, Sean Christopherson wrote:
> On Thu, 18 Jul 2024 20:35:37 +0100, Ilias Stamatis wrote:
> > The current MMIO coalescing design has a few drawbacks which limit its
> > usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> > That means that upon a userspace exit we have to handle potentially
> > unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> > taken in the kernel when an MMIO exit occurs.
> > 
> > Additionally, there is no direct way for userspace to be notified about
> > coalesced MMIO writes. If the next MMIO exit to userspace is when the
> > ring buffer has filled then a substantial (and unbounded) amount of time
> > may have passed since the first coalesced MMIO.
> > 
> > [...]
> 
> Applied patch 1 to kvm-x86 generic.  I deliberately didn't put this in fixes or
> Cc: it for stable, as the bug has been around for sooo long without anyone
> noticing that there's basically zero chance that the bug is actively causing
> issues.
> 
> I also reworked and expanded the changelog significantly to make it more clear
> why things break, what the fallout is (KVM can _sometimes_ use the full ring),
> and to call out that the lockless scheme that the buggy commit was preparing
> for never seems to have landed.
> 
> Please take a gander at the changelog and holler if I messed anything up.
> 
> [1/6] KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit
>       https://github.com/kvm-x86/linux/commit/92f6d4130497
> 
> --
> https://github.com/kvm-x86/linux/tree/next

It looks good, thanks.

> Doh, I applied v2 instead of v3.  Though unless mine eyes deceive me,
> they're the same.

They are the same indeed.

I'm not entirely sure what's the recommended thing to do when updating
only some of the patches of a series, whether to send new versions for
the revised patches only or resend everything. I opted to do the latter
and then I called out the changes between v2 and v3 inside the patches
that were actually modified.

Thanks,
Ilias
Sean Christopherson Aug. 27, 2024, 1:45 p.m. UTC | #4
On Tue, Aug 27, 2024, Ilias Stamatis wrote:
> On Fri, 2024-08-23 at 16:47 -0700, Sean Christopherson wrote:
> > Doh, I applied v2 instead of v3.  Though unless mine eyes deceive me,
> > they're the same.
> 
> They are the same indeed.
> 
> I'm not entirely sure what's the recommended thing to do when updating
> only some of the patches of a series, whether to send new versions for
> the revised patches only or resend everything. I opted to do the latter

Resend everything, as you did.  I simply messed up and grabbed the wrong version.

> and then I called out the changes between v2 and v3 inside the patches
> that were actually modified.

FWIW, I strongly prefer the delta be captured in the cover letter, though others
may obviously have different preferences.

https://lore.kernel.org/all/ZQnAN9TC6b8mSJ%2Ft@google.com