Message ID | 20090518170855.346048603@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Marcelo Tosatti wrote: > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: kvm/virt/kvm/coalesced_mmio.c > =================================================================== > --- kvm.orig/virt/kvm/coalesced_mmio.c > +++ kvm/virt/kvm/coalesced_mmio.c > @@ -26,9 +26,10 @@ static int coalesced_mmio_in_range(struc > if (!is_write) > return 0; > > - /* kvm->lock is taken by the caller and must be not released before > - * dev.read/write > - */ > + /* > + * dev->lock must not be released before coalesced_mmio_write. > + */ > + mutex_lock(&dev->lock); > > /* Are we able to batch it ? */ > > @@ -41,6 +42,7 @@ static int coalesced_mmio_in_range(struc > KVM_COALESCED_MMIO_MAX; > if (next == dev->kvm->coalesced_mmio_ring->first) { > /* full */ > + mutex_unlock(&dev->lock); > return 0; > } > > @@ -57,6 +59,7 @@ static int coalesced_mmio_in_range(struc > addr + len <= zone->addr + zone->size) > return 1; > } > + mutex_unlock(&dev->lock); > return 0; > } > So we have a function that takes a lock and conditionally releases it?
On Wed, May 20, 2009 at 03:06:26PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: >> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> >> >> Index: kvm/virt/kvm/coalesced_mmio.c >> =================================================================== >> --- kvm.orig/virt/kvm/coalesced_mmio.c >> +++ kvm/virt/kvm/coalesced_mmio.c >> @@ -26,9 +26,10 @@ static int coalesced_mmio_in_range(struc >> if (!is_write) >> return 0; >> - /* kvm->lock is taken by the caller and must be not released before >> - * dev.read/write >> - */ >> + /* >> + * dev->lock must not be released before coalesced_mmio_write. >> + */ >> + mutex_lock(&dev->lock); >> /* Are we able to batch it ? */ >> @@ -41,6 +42,7 @@ static int coalesced_mmio_in_range(struc >> KVM_COALESCED_MMIO_MAX; >> if (next == dev->kvm->coalesced_mmio_ring->first) { >> /* full */ >> + mutex_unlock(&dev->lock); >> return 0; >> } >> @@ -57,6 +59,7 @@ static int coalesced_mmio_in_range(struc >> addr + len <= zone->addr + zone->size) >> return 1; >> } >> + mutex_unlock(&dev->lock); >> return 0; >> } >> > > So we have a function that takes a lock and conditionally releases it? Yes, but it is correct: it will only return with the lock held in case it returns 1, in which case its guaranteed ->write will be called (which will unlock it). It should check the range first and/or use some smarter synchronization, but one thing at a time. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: >> So we have a function that takes a lock and conditionally releases it? >> > > Yes, but it is correct: it will only return with the lock held in case > it returns 1, in which case its guaranteed ->write will be called (which > will unlock it). > > It should check the range first and/or use some smarter synchronization, > but one thing at a time. > Yes it's correct but we'll get an endless stream of patches to 'fix' it because it is so unorthodox.
On Wed, May 20, 2009 at 05:29:23PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: > > > >>> So we have a function that takes a lock and conditionally releases it? >>> >> >> Yes, but it is correct: it will only return with the lock held in case >> it returns 1, in which case its guaranteed ->write will be called (which >> will unlock it). >> >> It should check the range first and/or use some smarter synchronization, >> but one thing at a time. >> > > Yes it's correct but we'll get an endless stream of patches to 'fix' it > because it is so unorthodox. Does it have to guarantee any kind of ordering in case of parallel writes by distincting vcpus? This is what it does now (so if a vcpu arrives first, the second will wait until the first is finished processing). I suppose that is the responsability of the guest (if it does MMIO writes to a device in parallel it'll screwup in real HW too). Because in such case, you can drop the mutex and protect only the kvm data. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 20, 2009 at 12:13:03PM -0300, Marcelo Tosatti wrote: > On Wed, May 20, 2009 at 05:29:23PM +0300, Avi Kivity wrote: > > Marcelo Tosatti wrote: > > > > > > > >>> So we have a function that takes a lock and conditionally releases it? > >>> > >> > >> Yes, but it is correct: it will only return with the lock held in case > >> it returns 1, in which case its guaranteed ->write will be called (which > >> will unlock it). > >> > >> It should check the range first and/or use some smarter synchronization, > >> but one thing at a time. > >> > > > > Yes it's correct but we'll get an endless stream of patches to 'fix' it > > because it is so unorthodox. > > Does it have to guarantee any kind of ordering in case of parallel > writes by distincting vcpus? This is what it does now (so if a vcpu > arrives first, the second will wait until the first is finished > processing). > > I suppose that is the responsability of the guest (if it does MMIO > writes to a device in parallel it'll screwup in real HW too). > > Because in such case, you can drop the mutex and protect only the kvm > data. If you want it to provide ordering (as in process the MMIO writes, by multiple vcpus, in the order they happen), you need a mutex or spinlock. And in that case, I don't see any other way around this given the way ->in_range / ->read/->write work. Even if you change the order in which the full mmio buffer check and coalesced-allowed-in-this-range happen you still need a spinlock/mutex in this unorthodox way. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: >> Yes it's correct but we'll get an endless stream of patches to 'fix' it >> because it is so unorthodox. >> > > Does it have to guarantee any kind of ordering in case of parallel > writes by distincting vcpus? This is what it does now (so if a vcpu > arrives first, the second will wait until the first is finished > processing). > No. > I suppose that is the responsability of the guest (if it does MMIO > writes to a device in parallel it'll screwup in real HW too). > Yes. > Because in such case, you can drop the mutex and protect only the kvm > data. > One has to be before the other. The order doesn't matter, but both have to be there.
Addressing comment and covering irqfd (did not remove the deadlock avoidance there since it might be useful later).
Index: kvm/virt/kvm/coalesced_mmio.c =================================================================== --- kvm.orig/virt/kvm/coalesced_mmio.c +++ kvm/virt/kvm/coalesced_mmio.c @@ -26,9 +26,10 @@ static int coalesced_mmio_in_range(struc if (!is_write) return 0; - /* kvm->lock is taken by the caller and must be not released before - * dev.read/write - */ + /* + * dev->lock must not be released before coalesced_mmio_write. + */ + mutex_lock(&dev->lock); /* Are we able to batch it ? */ @@ -41,6 +42,7 @@ static int coalesced_mmio_in_range(struc KVM_COALESCED_MMIO_MAX; if (next == dev->kvm->coalesced_mmio_ring->first) { /* full */ + mutex_unlock(&dev->lock); return 0; } @@ -57,6 +59,7 @@ static int coalesced_mmio_in_range(struc addr + len <= zone->addr + zone->size) return 1; } + mutex_unlock(&dev->lock); return 0; } @@ -67,8 +70,6 @@ static void coalesced_mmio_write(struct (struct kvm_coalesced_mmio_dev*)this->private; struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; - /* kvm->lock must be taken by caller before call to in_range()*/ - /* copy data in first free entry of the ring */ ring->coalesced_mmio[ring->last].phys_addr = addr; @@ -76,6 +77,7 @@ static void coalesced_mmio_write(struct memcpy(ring->coalesced_mmio[ring->last].data, val, len); smp_wmb(); ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX; + mutex_unlock(&dev->lock); } static void coalesced_mmio_destructor(struct kvm_io_device *this) @@ -90,6 +92,8 @@ int kvm_coalesced_mmio_init(struct kvm * dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) return -ENOMEM; + mutex_init(&dev->lock); + dev->dev.write = coalesced_mmio_write; dev->dev.in_range = coalesced_mmio_in_range; dev->dev.destructor = coalesced_mmio_destructor; @@ -109,16 +113,16 @@ int kvm_vm_ioctl_register_coalesced_mmio if (dev == NULL) return -EINVAL; - mutex_lock(&kvm->lock); + mutex_lock(&dev->lock); if (dev->nb_zones >= KVM_COALESCED_MMIO_ZONE_MAX) { - mutex_unlock(&kvm->lock); + mutex_unlock(&dev->lock); return -ENOBUFS; } dev->zone[dev->nb_zones] = *zone; dev->nb_zones++; - mutex_unlock(&kvm->lock); + mutex_unlock(&dev->lock); return 0; } @@ -132,7 +136,7 @@ int kvm_vm_ioctl_unregister_coalesced_mm if (dev == NULL) return -EINVAL; - mutex_lock(&kvm->lock); + mutex_lock(&dev->lock); i = dev->nb_zones; while(i) { @@ -150,7 +154,7 @@ int kvm_vm_ioctl_unregister_coalesced_mm i--; } - mutex_unlock(&kvm->lock); + mutex_unlock(&dev->lock); return 0; } Index: kvm/virt/kvm/coalesced_mmio.h =================================================================== --- kvm.orig/virt/kvm/coalesced_mmio.h +++ kvm/virt/kvm/coalesced_mmio.h @@ -12,6 +12,7 @@ struct kvm_coalesced_mmio_dev { struct kvm_io_device dev; struct kvm *kvm; + struct mutex lock; int nb_zones; struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; };
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>