Message ID | 1310729869-1451-1-git-send-email-levinsasha928@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/15/2011 02:37 PM, Sasha Levin wrote: > Move the check whether there are available entries to within the spinlock. > This allows working with larger amount of VCPUs and reduces premature > exits when using a large number of VCPUs. > > Cc: Avi Kivity<avi@redhat.com> > Cc: Ingo Molnar<mingo@elte.hu> > Cc: Marcelo Tosatti<mtosatti@redhat.com> > Cc: Pekka Enberg<penberg@kernel.org> > Signed-off-by: Sasha Levin<levinsasha928@gmail.com> > --- > virt/kvm/coalesced_mmio.c | 9 ++++++--- > 1 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c > index fc84875..34188db 100644 > --- a/virt/kvm/coalesced_mmio.c > +++ b/virt/kvm/coalesced_mmio.c > @@ -37,7 +37,7 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev, > */ > ring = dev->kvm->coalesced_mmio_ring; > avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX; > - if (avail< KVM_MAX_VCPUS) { > + if (avail == 0) { > /* full */ > return 0; > } > @@ -63,11 +63,14 @@ static int coalesced_mmio_write(struct kvm_io_device *this, > { > struct kvm_coalesced_mmio_dev *dev = to_mmio(this); > struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; > - if (!coalesced_mmio_in_range(dev, addr, len)) > - return -EOPNOTSUPP; > > spin_lock(&dev->lock); > > + if (!coalesced_mmio_in_range(dev, addr, len)) { > + spin_unlock(&dev->lock); > + return -EOPNOTSUPP; > + } > + > /* copy data in first free entry of the ring */ Hmm. This means we take the lock for every I/O, whether it hits coalesced mmio or not. We need to do the range check before taking the lock and the space check after taking the lock.
On Mon, 2011-07-18 at 11:11 +0300, Avi Kivity wrote: > On 07/15/2011 02:37 PM, Sasha Levin wrote: > > Move the check whether there are available entries to within the spinlock. > > This allows working with larger amount of VCPUs and reduces premature > > exits when using a large number of VCPUs. > > > > Cc: Avi Kivity<avi@redhat.com> > > Cc: Ingo Molnar<mingo@elte.hu> > > Cc: Marcelo Tosatti<mtosatti@redhat.com> > > Cc: Pekka Enberg<penberg@kernel.org> > > Signed-off-by: Sasha Levin<levinsasha928@gmail.com> > > --- > > virt/kvm/coalesced_mmio.c | 9 ++++++--- > > 1 files changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c > > index fc84875..34188db 100644 > > --- a/virt/kvm/coalesced_mmio.c > > +++ b/virt/kvm/coalesced_mmio.c > > @@ -37,7 +37,7 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev, > > */ > > ring = dev->kvm->coalesced_mmio_ring; > > avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX; > > - if (avail< KVM_MAX_VCPUS) { > > + if (avail == 0) { > > /* full */ > > return 0; > > } > > @@ -63,11 +63,14 @@ static int coalesced_mmio_write(struct kvm_io_device *this, > > { > > struct kvm_coalesced_mmio_dev *dev = to_mmio(this); > > struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; > > - if (!coalesced_mmio_in_range(dev, addr, len)) > > - return -EOPNOTSUPP; > > > > spin_lock(&dev->lock); > > > > + if (!coalesced_mmio_in_range(dev, addr, len)) { > > + spin_unlock(&dev->lock); > > + return -EOPNOTSUPP; > > + } > > + > > /* copy data in first free entry of the ring */ > > Hmm. This means we take the lock for every I/O, whether it hits > coalesced mmio or not. > > We need to do the range check before taking the lock and the space check > after taking the lock. > I'll fix that. Shouldn't the range check be also locked somehow? Currently it is possible that a coalesced region was removed while we are checking the ranges, and we won't issue a mmio exit as the host expects.
On 07/18/2011 12:29 PM, Sasha Levin wrote: > > Hmm. This means we take the lock for every I/O, whether it hits > > coalesced mmio or not. > > > > We need to do the range check before taking the lock and the space check > > after taking the lock. > > > > I'll fix that. > > Shouldn't the range check be also locked somehow? Currently it is > possible that a coalesced region was removed while we are checking the > ranges, and we won't issue a mmio exit as the host expects It's "locked" using rcu.
On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > Hmm. This means we take the lock for every I/O, whether it hits > > > coalesced mmio or not. > > > > > > We need to do the range check before taking the lock and the space check > > > after taking the lock. > > > > > > > I'll fix that. > > > > Shouldn't the range check be also locked somehow? Currently it is > > possible that a coalesced region was removed while we are checking the > > ranges, and we won't issue a mmio exit as the host expects > > It's "locked" using rcu. > Where is that happening? All the coalesced zones are stored under the coalesced "device" in a simple array. When adding and removing zones, kvm->slots_lock is taken - I don't see anything which prevents a range check during zone removal unless slots_lock prevents IO.
On 07/18/2011 01:15 PM, Sasha Levin wrote: > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > > Hmm. This means we take the lock for every I/O, whether it hits > > > > coalesced mmio or not. > > > > > > > > We need to do the range check before taking the lock and the space check > > > > after taking the lock. > > > > > > > > > > I'll fix that. > > > > > > Shouldn't the range check be also locked somehow? Currently it is > > > possible that a coalesced region was removed while we are checking the > > > ranges, and we won't issue a mmio exit as the host expects > > > > It's "locked" using rcu. > > > > Where is that happening? > > All the coalesced zones are stored under the coalesced "device" in a > simple array. When adding and removing zones, kvm->slots_lock is taken - > I don't see anything which prevents a range check during zone removal > unless slots_lock prevents IO. Range check during slot removal is legal. While you are removing a slot, a concurrent write may hit or miss the slot; it doesn't matter. Userspace should flush the coalesced mmio buffer after removal to ensure there are no pending writes.
On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote: > On 07/18/2011 01:15 PM, Sasha Levin wrote: > > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > > > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > > > Hmm. This means we take the lock for every I/O, whether it hits > > > > > coalesced mmio or not. > > > > > > > > > > We need to do the range check before taking the lock and the space check > > > > > after taking the lock. > > > > > > > > > > > > > I'll fix that. > > > > > > > > Shouldn't the range check be also locked somehow? Currently it is > > > > possible that a coalesced region was removed while we are checking the > > > > ranges, and we won't issue a mmio exit as the host expects > > > > > > It's "locked" using rcu. > > > > > > > Where is that happening? > > > > All the coalesced zones are stored under the coalesced "device" in a > > simple array. When adding and removing zones, kvm->slots_lock is taken - > > I don't see anything which prevents a range check during zone removal > > unless slots_lock prevents IO. > > Range check during slot removal is legal. While you are removing a > slot, a concurrent write may hit or miss the slot; it doesn't matter. > > Userspace should flush the coalesced mmio buffer after removal to ensure > there are no pending writes. > But the write may hit a non-existent slot. Something like this: Thread 1 Thread 2 ---------------------------------- Check range | Found slot | | Remove slot | Flush buffer Get spinlock | Write to buffer |
On 07/18/2011 03:03 PM, Sasha Levin wrote: > On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote: > > On 07/18/2011 01:15 PM, Sasha Levin wrote: > > > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > > > > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > > > > Hmm. This means we take the lock for every I/O, whether it hits > > > > > > coalesced mmio or not. > > > > > > > > > > > > We need to do the range check before taking the lock and the space check > > > > > > after taking the lock. > > > > > > > > > > > > > > > > I'll fix that. > > > > > > > > > > Shouldn't the range check be also locked somehow? Currently it is > > > > > possible that a coalesced region was removed while we are checking the > > > > > ranges, and we won't issue a mmio exit as the host expects > > > > > > > > It's "locked" using rcu. > > > > > > > > > > Where is that happening? > > > > > > All the coalesced zones are stored under the coalesced "device" in a > > > simple array. When adding and removing zones, kvm->slots_lock is taken - > > > I don't see anything which prevents a range check during zone removal > > > unless slots_lock prevents IO. > > > > Range check during slot removal is legal. While you are removing a > > slot, a concurrent write may hit or miss the slot; it doesn't matter. > > > > Userspace should flush the coalesced mmio buffer after removal to ensure > > there are no pending writes. > > > > But the write may hit a non-existent slot. > > Something like this: > > Thread 1 Thread 2 > ---------------------------------- > Check range | > Found slot | > | Remove slot > | Flush buffer > Get spinlock | > Write to buffer | > Cannot happen, due to rcu. The "remove slot" step waits until all rcu readers are gone. In other words: it's magic.
On Mon, 2011-07-18 at 15:29 +0300, Avi Kivity wrote: > On 07/18/2011 03:03 PM, Sasha Levin wrote: > > On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote: > > > On 07/18/2011 01:15 PM, Sasha Levin wrote: > > > > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > > > > > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > > > > > Hmm. This means we take the lock for every I/O, whether it hits > > > > > > > coalesced mmio or not. > > > > > > > > > > > > > > We need to do the range check before taking the lock and the space check > > > > > > > after taking the lock. > > > > > > > > > > > > > > > > > > > I'll fix that. > > > > > > > > > > > > Shouldn't the range check be also locked somehow? Currently it is > > > > > > possible that a coalesced region was removed while we are checking the > > > > > > ranges, and we won't issue a mmio exit as the host expects > > > > > > > > > > It's "locked" using rcu. > > > > > > > > > > > > > Where is that happening? > > > > > > > > All the coalesced zones are stored under the coalesced "device" in a > > > > simple array. When adding and removing zones, kvm->slots_lock is taken - > > > > I don't see anything which prevents a range check during zone removal > > > > unless slots_lock prevents IO. > > > > > > Range check during slot removal is legal. While you are removing a > > > slot, a concurrent write may hit or miss the slot; it doesn't matter. > > > > > > Userspace should flush the coalesced mmio buffer after removal to ensure > > > there are no pending writes. > > > > > > > But the write may hit a non-existent slot. > > > > Something like this: > > > > Thread 1 Thread 2 > > ---------------------------------- > > Check range | > > Found slot | > > | Remove slot > > | Flush buffer > > Get spinlock | > > Write to buffer | > > > > Cannot happen, due to rcu. The "remove slot" step waits until all rcu > readers are gone. > > In other words: it's magic. > I might be missing something, but I don't see anything rcu related in anything within /virt/kvm/coalesced_mmio.c or in kvm_vm_ioctl_unregister_coalesced_mmio() specifically. Where is rcu invoked on the zones array? All I see is a simple array and counter declared as such: int nb_zones; struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; And in the register/unregister functions it's a simple array manipulation.
On 07/18/2011 03:58 PM, Sasha Levin wrote: > On Mon, 2011-07-18 at 15:29 +0300, Avi Kivity wrote: > > On 07/18/2011 03:03 PM, Sasha Levin wrote: > > > On Mon, 2011-07-18 at 14:43 +0300, Avi Kivity wrote: > > > > On 07/18/2011 01:15 PM, Sasha Levin wrote: > > > > > On Mon, 2011-07-18 at 12:50 +0300, Avi Kivity wrote: > > > > > > On 07/18/2011 12:29 PM, Sasha Levin wrote: > > > > > > > > Hmm. This means we take the lock for every I/O, whether it hits > > > > > > > > coalesced mmio or not. > > > > > > > > > > > > > > > > We need to do the range check before taking the lock and the space check > > > > > > > > after taking the lock. > > > > > > > > > > > > > > > > > > > > > > I'll fix that. > > > > > > > > > > > > > > Shouldn't the range check be also locked somehow? Currently it is > > > > > > > possible that a coalesced region was removed while we are checking the > > > > > > > ranges, and we won't issue a mmio exit as the host expects > > > > > > > > > > > > It's "locked" using rcu. > > > > > > > > > > > > > > > > Where is that happening? > > > > > > > > > > All the coalesced zones are stored under the coalesced "device" in a > > > > > simple array. When adding and removing zones, kvm->slots_lock is taken - > > > > > I don't see anything which prevents a range check during zone removal > > > > > unless slots_lock prevents IO. > > > > > > > > Range check during slot removal is legal. While you are removing a > > > > slot, a concurrent write may hit or miss the slot; it doesn't matter. > > > > > > > > Userspace should flush the coalesced mmio buffer after removal to ensure > > > > there are no pending writes. > > > > > > > > > > But the write may hit a non-existent slot. > > > > > > Something like this: > > > > > > Thread 1 Thread 2 > > > ---------------------------------- > > > Check range | > > > Found slot | > > > | Remove slot > > > | Flush buffer > > > Get spinlock | > > > Write to buffer | > > > > > > > Cannot happen, due to rcu. The "remove slot" step waits until all rcu > > readers are gone. > > > > In other words: it's magic. > > > > I might be missing something, but I don't see anything rcu related in > anything within /virt/kvm/coalesced_mmio.c or in > kvm_vm_ioctl_unregister_coalesced_mmio() specifically. > > Where is rcu invoked on the zones array? > > All I see is a simple array and counter declared as such: > > int nb_zones; > struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; > > And in the register/unregister functions it's a simple array manipulation. Er, kvm_io_bus_register() does the rcu stuff. But kvm_register_coalesced_mmio() doesn't call it! Instead it's just one device on the bus that decodes all those offsets. In other words: it's broken. Luckily, it's not exploitable, since the memory is static wrt the lifetime of the guest. We should probably make it separate kvm_io_devices so we can reuse the existing locking.
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index fc84875..34188db 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -37,7 +37,7 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev, */ ring = dev->kvm->coalesced_mmio_ring; avail = (ring->first - ring->last - 1) % KVM_COALESCED_MMIO_MAX; - if (avail < KVM_MAX_VCPUS) { + if (avail == 0) { /* full */ return 0; } @@ -63,11 +63,14 @@ static int coalesced_mmio_write(struct kvm_io_device *this, { struct kvm_coalesced_mmio_dev *dev = to_mmio(this); struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; - if (!coalesced_mmio_in_range(dev, addr, len)) - return -EOPNOTSUPP; spin_lock(&dev->lock); + if (!coalesced_mmio_in_range(dev, addr, len)) { + spin_unlock(&dev->lock); + return -EOPNOTSUPP; + } + /* copy data in first free entry of the ring */ ring->coalesced_mmio[ring->last].phys_addr = addr;
Move the check whether there are available entries to within the spinlock. This allows working with larger amount of VCPUs and reduces premature exits when using a large number of VCPUs. Cc: Avi Kivity <avi@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> --- virt/kvm/coalesced_mmio.c | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-)