diff mbox

[v2,1/2] KVM: MMIO: Lock coalesced device when checking for available entry

Message ID 1310729869-1451-1-git-send-email-levinsasha928@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sasha Levin July 15, 2011, 11:37 a.m. UTC
Move the check whether there are available entries to within the spinlock.
This allows working with larger amount of VCPUs and reduces premature
exits when using a large number of VCPUs.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 virt/kvm/coalesced_mmio.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Comments

Avi Kivity July 18, 2011, 8:11 a.m. UTC | #1
On 07/15/2011 02:37 PM, Sasha Levin wrote:
> Move the check whether there are available entries to within the spinlock.
> This allows working with larger amount of VCPUs and reduces premature
> exits when using a large number of VCPUs.
>
> Cc: Avi Kivity<avi@redhat.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: Marcelo Tosatti<mtosatti@redhat.com>
> Cc: Pekka Enberg<penberg@kernel.org>
> Signed-off-by: Sasha Levin<levinsasha928@gmail.com>
> ---
>   virt/kvm/coalesced_mmio.c |    9 ++++++---
>   1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> index fc84875..34188db 100644
> --- a/virt/kvm/coalesced_mmio.c
> +++ b/virt/kvm/coalesced_mmio.c
> @@ -37,7 +37,7 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
>   	 */
>   	ring = dev->kvm->coalesced_mmio_ring;
>   	avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX;
> -	if (avail<  KVM_MAX_VCPUS) {
> +	if (avail == 0) {
>   		/* full */
>   		return 0;
>   	}
> @@ -63,11 +63,14 @@ static int coalesced_mmio_write(struct kvm_io_device *this,
>   {
>   	struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
>   	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
> -	if (!coalesced_mmio_in_range(dev, addr, len))
> -		return -EOPNOTSUPP;
>
>   	spin_lock(&dev->lock);
>
> +	if (!coalesced_mmio_in_range(dev, addr, len)) {
> +		spin_unlock(&dev->lock);
> +		return -EOPNOTSUPP;
> +	}
> +
>   	/* copy data in first free entry of the ring */

Hmm.  This means we take the lock for every I/O, whether it hits 
coalesced mmio or not.

We need to do the range check before taking the lock and the space check 
after taking the lock.
Sasha Levin July 18, 2011, 9:29 a.m. UTC | #2
On Mon, 2011-07-18 at 11:11 +0300, Avi Kivity wrote:
> On 07/15/2011 02:37 PM, Sasha Levin wrote:
> > Move the check whether there are available entries to within the spinlock.
> > This allows working with larger amount of VCPUs and reduces premature
> > exits when using a large number of VCPUs.
> >
> > Cc: Avi Kivity<avi@redhat.com>
> > Cc: Ingo Molnar<mingo@elte.hu>
> > Cc: Marcelo Tosatti<mtosatti@redhat.com>
> > Cc: Pekka Enberg<penberg@kernel.org>
> > Signed-off-by: Sasha Levin<levinsasha928@gmail.com>
> > ---
> >   virt/kvm/coalesced_mmio.c |    9 ++++++---
> >   1 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> > index fc84875..34188db 100644
> > --- a/virt/kvm/coalesced_mmio.c
> > +++ b/virt/kvm/coalesced_mmio.c
> > @@ -37,7 +37,7 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
> >   	 */
> >   	ring = dev->kvm->coalesced_mmio_ring;
> >   	avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX;
> > -	if (avail<  KVM_MAX_VCPUS) {
> > +	if (avail == 0) {
> >   		/* full */
> >   		return 0;
> >   	}
> > @@ -63,11 +63,14 @@ static int coalesced_mmio_write(struct kvm_io_device *this,
> >   {
> >   	struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
> >   	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
> > -	if (!coalesced_mmio_in_range(dev, addr, len))
> > -		return -EOPNOTSUPP;
> >
> >   	spin_lock(&dev->lock);
> >
> > +	if (!coalesced_mmio_in_range(dev, addr, len)) {
> > +		spin_unlock(&dev->lock);
> > +		return -EOPNOTSUPP;
> > +	}
> > +
> >   	/* copy data in first free entry of the ring */
> 
> Hmm.  This means we take the lock for every I/O, whether it hits 
> coalesced mmio or not.
> 
> We need to do the range check before taking the lock and the space check 
> after taking the lock.
> 

I'll fix that.

Shouldn't the range check be also locked somehow? Currently it is
possible that a coalesced region was removed while we are checking the
ranges, and we won't issue a mmio exit as the host expects.
Avi Kivity July 18, 2011, 9:50 a.m. UTC | #3
On 07/18/2011 12:29 PM, Sasha Levin wrote:
> >  Hmm.  This means we take the lock for every I/O, whether it hits
> >  coalesced mmio or not.
> >
> >  We need to do the range check before taking the lock and the space check
> >  after taking the lock.
> >
>
> I'll fix that.
>
> Shouldn't the range check be also locked somehow? Currently it is
> possible that a coalesced region was removed while we are checking the
> ranges, and we won't issue a mmio exit as the host expects

It's "locked" using rcu.
Sasha Levin July 18, 2011, 10:15 a.m. UTC | #4
On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> On 07/18/2011 12:29 PM, Sasha Levin wrote:
> > >  Hmm.  This means we take the lock for every I/O, whether it hits
> > >  coalesced mmio or not.
> > >
> > >  We need to do the range check before taking the lock and the space check
> > >  after taking the lock.
> > >
> >
> > I'll fix that.
> >
> > Shouldn't the range check be also locked somehow? Currently it is
> > possible that a coalesced region was removed while we are checking the
> > ranges, and we won't issue a mmio exit as the host expects
> 
> It's "locked" using rcu.
> 

Where is that happening?

All the coalesced zones are stored under the coalesced "device" in a
simple array. When adding and removing zones, kvm->slots_lock is taken -
I don't see anything which prevents a range check during zone removal
unless slots_lock prevents IO.
Avi Kivity July 18, 2011, 11:43 a.m. UTC | #5
On 07/18/2011 01:15 PM, Sasha Levin wrote:
> On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> >  On 07/18/2011 12:29 PM, Sasha Levin wrote:
> >  >  >   Hmm.  This means we take the lock for every I/O, whether it hits
> >  >  >   coalesced mmio or not.
> >  >  >
> >  >  >   We need to do the range check before taking the lock and the space check
> >  >  >   after taking the lock.
> >  >  >
> >  >
> >  >  I'll fix that.
> >  >
> >  >  Shouldn't the range check be also locked somehow? Currently it is
> >  >  possible that a coalesced region was removed while we are checking the
> >  >  ranges, and we won't issue a mmio exit as the host expects
> >
> >  It's "locked" using rcu.
> >
>
> Where is that happening?
>
> All the coalesced zones are stored under the coalesced "device" in a
> simple array. When adding and removing zones, kvm->slots_lock is taken -
> I don't see anything which prevents a range check during zone removal
> unless slots_lock prevents IO.

Range check during slot removal is legal.  While you are removing a 
slot, a concurrent write may hit or miss the slot; it doesn't matter.

Userspace should flush the coalesced mmio buffer after removal to ensure 
there are no pending writes.
Sasha Levin July 18, 2011, 12:03 p.m. UTC | #6
On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote:
> On 07/18/2011 01:15 PM, Sasha Levin wrote:
> > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> > >  On 07/18/2011 12:29 PM, Sasha Levin wrote:
> > >  >  >   Hmm.  This means we take the lock for every I/O, whether it hits
> > >  >  >   coalesced mmio or not.
> > >  >  >
> > >  >  >   We need to do the range check before taking the lock and the space check
> > >  >  >   after taking the lock.
> > >  >  >
> > >  >
> > >  >  I'll fix that.
> > >  >
> > >  >  Shouldn't the range check be also locked somehow? Currently it is
> > >  >  possible that a coalesced region was removed while we are checking the
> > >  >  ranges, and we won't issue a mmio exit as the host expects
> > >
> > >  It's "locked" using rcu.
> > >
> >
> > Where is that happening?
> >
> > All the coalesced zones are stored under the coalesced "device" in a
> > simple array. When adding and removing zones, kvm->slots_lock is taken -
> > I don't see anything which prevents a range check during zone removal
> > unless slots_lock prevents IO.
> 
> Range check during slot removal is legal.  While you are removing a 
> slot, a concurrent write may hit or miss the slot; it doesn't matter.
> 
> Userspace should flush the coalesced mmio buffer after removal to ensure 
> there are no pending writes.
> 

But the write may hit a non-existent slot.

Something like this:

Thread 1		Thread 2
----------------------------------
Check range	|
Found slot	|
		| Remove slot
		| Flush buffer
Get spinlock	|
Write to buffer	|
Avi Kivity July 18, 2011, 12:29 p.m. UTC | #7
On 07/18/2011 03:03 PM, Sasha Levin wrote:
> On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote:
> >  On 07/18/2011 01:15 PM, Sasha Levin wrote:
> >  >  On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> >  >  >   On 07/18/2011 12:29 PM, Sasha Levin wrote:
> >  >  >   >   >    Hmm.  This means we take the lock for every I/O, whether it hits
> >  >  >   >   >    coalesced mmio or not.
> >  >  >   >   >
> >  >  >   >   >    We need to do the range check before taking the lock and the space check
> >  >  >   >   >    after taking the lock.
> >  >  >   >   >
> >  >  >   >
> >  >  >   >   I'll fix that.
> >  >  >   >
> >  >  >   >   Shouldn't the range check be also locked somehow? Currently it is
> >  >  >   >   possible that a coalesced region was removed while we are checking the
> >  >  >   >   ranges, and we won't issue a mmio exit as the host expects
> >  >  >
> >  >  >   It's "locked" using rcu.
> >  >  >
> >  >
> >  >  Where is that happening?
> >  >
> >  >  All the coalesced zones are stored under the coalesced "device" in a
> >  >  simple array. When adding and removing zones, kvm->slots_lock is taken -
> >  >  I don't see anything which prevents a range check during zone removal
> >  >  unless slots_lock prevents IO.
> >
> >  Range check during slot removal is legal.  While you are removing a
> >  slot, a concurrent write may hit or miss the slot; it doesn't matter.
> >
> >  Userspace should flush the coalesced mmio buffer after removal to ensure
> >  there are no pending writes.
> >
>
> But the write may hit a non-existent slot.
>
> Something like this:
>
> Thread 1		Thread 2
> ----------------------------------
> Check range	|
> Found slot	|
> 		| Remove slot
> 		| Flush buffer
> Get spinlock	|
> Write to buffer	|
>

Cannot happen, due to rcu.  The "remove slot" step waits until all rcu 
readers are gone.

In other words: it's magic.
Sasha Levin July 18, 2011, 12:58 p.m. UTC | #8
On Mon, 2011-07-18 at 15:29 +0300, Avi Kivity wrote:
> On 07/18/2011 03:03 PM, Sasha Levin wrote:
> > On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote:
> > >  On 07/18/2011 01:15 PM, Sasha Levin wrote:
> > >  >  On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> > >  >  >   On 07/18/2011 12:29 PM, Sasha Levin wrote:
> > >  >  >   >   >    Hmm.  This means we take the lock for every I/O, whether it hits
> > >  >  >   >   >    coalesced mmio or not.
> > >  >  >   >   >
> > >  >  >   >   >    We need to do the range check before taking the lock and the space check
> > >  >  >   >   >    after taking the lock.
> > >  >  >   >   >
> > >  >  >   >
> > >  >  >   >   I'll fix that.
> > >  >  >   >
> > >  >  >   >   Shouldn't the range check be also locked somehow? Currently it is
> > >  >  >   >   possible that a coalesced region was removed while we are checking the
> > >  >  >   >   ranges, and we won't issue a mmio exit as the host expects
> > >  >  >
> > >  >  >   It's "locked" using rcu.
> > >  >  >
> > >  >
> > >  >  Where is that happening?
> > >  >
> > >  >  All the coalesced zones are stored under the coalesced "device" in a
> > >  >  simple array. When adding and removing zones, kvm->slots_lock is taken -
> > >  >  I don't see anything which prevents a range check during zone removal
> > >  >  unless slots_lock prevents IO.
> > >
> > >  Range check during slot removal is legal.  While you are removing a
> > >  slot, a concurrent write may hit or miss the slot; it doesn't matter.
> > >
> > >  Userspace should flush the coalesced mmio buffer after removal to ensure
> > >  there are no pending writes.
> > >
> >
> > But the write may hit a non-existent slot.
> >
> > Something like this:
> >
> > Thread 1		Thread 2
> > ----------------------------------
> > Check range	|
> > Found slot	|
> > 		| Remove slot
> > 		| Flush buffer
> > Get spinlock	|
> > Write to buffer	|
> >
> 
> Cannot happen, due to rcu.  The "remove slot" step waits until all rcu 
> readers are gone.
> 
> In other words: it's magic.
> 

I might be missing something, but I don't see anything rcu related in
anything within /virt/kvm/coalesced_mmio.c or in
kvm_vm_ioctl_unregister_coalesced_mmio() specifically.

Where is rcu invoked on the zones array?

All I see is a simple array and counter declared as such:

	int nb_zones;
	struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];

And in the register/unregister functions it's a simple array manipulation.
Avi Kivity July 18, 2011, 1:11 p.m. UTC | #9
On 07/18/2011 03:58 PM, Sasha Levin wrote:
> On Mon, 2011-07-18 at 15:29 +0300, Avi Kivity wrote:
> >  On 07/18/2011 03:03 PM, Sasha Levin wrote:
> >  >  On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote:
> >  >  >   On 07/18/2011 01:15 PM, Sasha Levin wrote:
> >  >  >   >   On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote:
> >  >  >   >   >    On 07/18/2011 12:29 PM, Sasha Levin wrote:
> >  >  >   >   >    >    >     Hmm.  This means we take the lock for every I/O, whether it hits
> >  >  >   >   >    >    >     coalesced mmio or not.
> >  >  >   >   >    >    >
> >  >  >   >   >    >    >     We need to do the range check before taking the lock and the space check
> >  >  >   >   >    >    >     after taking the lock.
> >  >  >   >   >    >    >
> >  >  >   >   >    >
> >  >  >   >   >    >    I'll fix that.
> >  >  >   >   >    >
> >  >  >   >   >    >    Shouldn't the range check be also locked somehow? Currently it is
> >  >  >   >   >    >    possible that a coalesced region was removed while we are checking the
> >  >  >   >   >    >    ranges, and we won't issue a mmio exit as the host expects
> >  >  >   >   >
> >  >  >   >   >    It's "locked" using rcu.
> >  >  >   >   >
> >  >  >   >
> >  >  >   >   Where is that happening?
> >  >  >   >
> >  >  >   >   All the coalesced zones are stored under the coalesced "device" in a
> >  >  >   >   simple array. When adding and removing zones, kvm->slots_lock is taken -
> >  >  >   >   I don't see anything which prevents a range check during zone removal
> >  >  >   >   unless slots_lock prevents IO.
> >  >  >
> >  >  >   Range check during slot removal is legal.  While you are removing a
> >  >  >   slot, a concurrent write may hit or miss the slot; it doesn't matter.
> >  >  >
> >  >  >   Userspace should flush the coalesced mmio buffer after removal to ensure
> >  >  >   there are no pending writes.
> >  >  >
> >  >
> >  >  But the write may hit a non-existent slot.
> >  >
> >  >  Something like this:
> >  >
> >  >  Thread 1		Thread 2
> >  >  ----------------------------------
> >  >  Check range	|
> >  >  Found slot	|
> >  >  		| Remove slot
> >  >  		| Flush buffer
> >  >  Get spinlock	|
> >  >  Write to buffer	|
> >  >
> >
> >  Cannot happen, due to rcu.  The "remove slot" step waits until all rcu
> >  readers are gone.
> >
> >  In other words: it's magic.
> >
>
> I might be missing something, but I don't see anything rcu related in
> anything within /virt/kvm/coalesced_mmio.c or in
> kvm_vm_ioctl_unregister_coalesced_mmio() specifically.
>
> Where is rcu invoked on the zones array?
>
> All I see is a simple array and counter declared as such:
>
> 	int nb_zones;
> 	struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
>
> And in the register/unregister functions it's a simple array manipulation.

Er, kvm_io_bus_register() does the rcu stuff.  But 
kvm_register_coalesced_mmio() doesn't call it! Instead it's just one 
device on the bus that decodes all those offsets.

In other words: it's broken.

Luckily, it's not exploitable, since the memory is static wrt the 
lifetime of the guest.

We should probably make it separate kvm_io_devices so we can reuse the 
existing locking.
diff mbox

Patch

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index fc84875..34188db 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -37,7 +37,7 @@  static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
 	 */
 	ring = dev->kvm->coalesced_mmio_ring;
 	avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX;
-	if (avail < KVM_MAX_VCPUS) {
+	if (avail == 0) {
 		/* full */
 		return 0;
 	}
@@ -63,11 +63,14 @@  static int coalesced_mmio_write(struct kvm_io_device *this,
 {
 	struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
 	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
-	if (!coalesced_mmio_in_range(dev, addr, len))
-		return -EOPNOTSUPP;
 
 	spin_lock(&dev->lock);
 
+	if (!coalesced_mmio_in_range(dev, addr, len)) {
+		spin_unlock(&dev->lock);
+		return -EOPNOTSUPP;
+	}
+
 	/* copy data in first free entry of the ring */
 
 	ring->coalesced_mmio[ring->last].phys_addr = addr;