Message ID | 20090520185134.623968425@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Marcelo Tosatti wrote: > Get rid of kvm->lock dependency on coalesced_mmio methods. Use an > atomic variable instead to guarantee only one vcpu is batching > data into the ring at a given time. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Index: kvm-irqlock/virt/kvm/coalesced_mmio.c > =================================================================== > --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c > +++ kvm-irqlock/virt/kvm/coalesced_mmio.c > @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc > if (!is_write) > return 0; > > - /* kvm->lock is taken by the caller and must be not released before > - * dev.read/write > - */ > + /* > + * Some other vcpu might be batching data into the ring, > + * fallback to userspace. Ordering not our problem. > + */ > + if (!atomic_add_unless(&dev->in_use, 1, 1)) > + return 0; > > Ordering with simultaneous writes is indeed not our problem, but the ring may contain ordered writes (even by the same vcpu!) Suggest our own lock here. in_use is basically a homemade lock, better to use the factory made ones which come with a warranty.
On Sun, May 24, 2009 at 05:04:52PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: >> Get rid of kvm->lock dependency on coalesced_mmio methods. Use an >> atomic variable instead to guarantee only one vcpu is batching >> data into the ring at a given time. >> >> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> >> >> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c >> =================================================================== >> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c >> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c >> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc >> if (!is_write) >> return 0; >> - /* kvm->lock is taken by the caller and must be not released before >> - * dev.read/write >> - */ >> + /* >> + * Some other vcpu might be batching data into the ring, >> + * fallback to userspace. Ordering not our problem. >> + */ >> + if (!atomic_add_unless(&dev->in_use, 1, 1)) >> + return 0; >> > > Ordering with simultaneous writes is indeed not our problem, but the > ring may contain ordered writes (even by the same vcpu!) You mean writes by a vcpu should maintain ordering even when that vcpu migrates to a different pcpu? The smp_wmb in coalesced_write guarantees that. > Suggest our own lock here. in_use is basically a homemade lock, better > to use the factory made ones which come with a warranty. The first submission had a mutex but was considered unorthodox. Although i agree with your argument made then, i don't see a way around that. Some pseudocode please? Agree with the suggestion on patch 3, will fix the commentary on kvm_host.h. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: >>> =================================================================== >>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c >>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c >>> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc >>> if (!is_write) >>> return 0; >>> - /* kvm->lock is taken by the caller and must be not released before >>> - * dev.read/write >>> - */ >>> + /* >>> + * Some other vcpu might be batching data into the ring, >>> + * fallback to userspace. Ordering not our problem. >>> + */ >>> + if (!atomic_add_unless(&dev->in_use, 1, 1)) >>> + return 0; >>> >>> >> Ordering with simultaneous writes is indeed not our problem, but the >> ring may contain ordered writes (even by the same vcpu!) >> > > You mean writes by a vcpu should maintain ordering even when that vcpu > migrates to a different pcpu? > No. Writes by a single vcpu need to preserve their ordering relative to each other. If a write by vcpu A completes before a write by vcpu B starts, then they need to be ordered, since the vcpus may have synchronized in between. Hm, I don't thimk you broke these rules. > The smp_wmb in coalesced_write guarantees that. > > >> Suggest our own lock here. in_use is basically a homemade lock, better >> to use the factory made ones which come with a warranty. >> > > The first submission had a mutex but was considered unorthodox. Although > i agree with your argument made then, i don't see a way around that. > > Some pseudocode please? > Why not use slots_lock to protect the entire iodevice list (rcu one day), and an internal spinlock for coalesced mmio?
On Tue, May 26, 2009 at 02:24:33PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: >>>> =================================================================== >>>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c >>>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c >>>> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc >>>> if (!is_write) >>>> return 0; >>>> - /* kvm->lock is taken by the caller and must be not released before >>>> - * dev.read/write >>>> - */ >>>> + /* >>>> + * Some other vcpu might be batching data into the ring, >>>> + * fallback to userspace. Ordering not our problem. >>>> + */ >>>> + if (!atomic_add_unless(&dev->in_use, 1, 1)) >>>> + return 0; >>>> >>> Ordering with simultaneous writes is indeed not our problem, but the >>> ring may contain ordered writes (even by the same vcpu!) >>> >> >> You mean writes by a vcpu should maintain ordering even when that vcpu >> migrates to a different pcpu? > > No. Writes by a single vcpu need to preserve their ordering relative to > each other. If a write by vcpu A completes before a write by vcpu B > starts, then they need to be ordered, since the vcpus may have > synchronized in between. > > Hm, I don't thimk you broke these rules. > >> The smp_wmb in coalesced_write guarantees that. >> >> >>> Suggest our own lock here. in_use is basically a homemade lock, >>> better to use the factory made ones which come with a warranty. >>> >> >> The first submission had a mutex but was considered unorthodox. Although >> i agree with your argument made then, i don't see a way around that. >> >> Some pseudocode please? > > Why not use slots_lock to protect the entire iodevice list (rcu one > day), and an internal spinlock for coalesced mmio? Don't like using slots_lock to protect the entire iodevice list, its reverse progress in my opinion. The PIO/MMIO device lists are data structures used in hot path, they are not directly related to the memslots, and therefore deserve a separate lock. Whenever slots_lock becomes an issue, you'll have to entangle the mess, so better start doing it now. Sure can switch to a internal spinlock for coalesced_mmio, but breaking out if spin_trylock fails. You're OK with that? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: >> >> Why not use slots_lock to protect the entire iodevice list (rcu one >> day), and an internal spinlock for coalesced mmio? >> > > Don't like using slots_lock to protect the entire iodevice list, its > reverse progress in my opinion. The PIO/MMIO device lists are data > structures used in hot path, they are not directly related to the > memslots, and therefore deserve a separate lock. Whenever slots_lock > becomes an issue, you'll have to entangle the mess, so better start > doing it now. > It's a reader/writer lock, so you'll never have contention. I'm okay with a new lock though. > Sure can switch to a internal spinlock for coalesced_mmio, but breaking > out if spin_trylock fails. You're OK with that? > Why not do a normal spin_lock()? It's not like you'll wait years for the store to complete.
Index: kvm-irqlock/virt/kvm/coalesced_mmio.c =================================================================== --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c +++ kvm-irqlock/virt/kvm/coalesced_mmio.c @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc if (!is_write) return 0; - /* kvm->lock is taken by the caller and must be not released before - * dev.read/write - */ + /* + * Some other vcpu might be batching data into the ring, + * fallback to userspace. Ordering not our problem. + */ + if (!atomic_add_unless(&dev->in_use, 1, 1)) + return 0; /* Are we able to batch it ? */ @@ -41,7 +44,7 @@ static int coalesced_mmio_in_range(struc KVM_COALESCED_MMIO_MAX; if (next == dev->kvm->coalesced_mmio_ring->first) { /* full */ - return 0; + goto out_denied; } /* is it in a batchable area ? */ @@ -57,6 +60,8 @@ static int coalesced_mmio_in_range(struc addr + len <= zone->addr + zone->size) return 1; } +out_denied: + atomic_set(&dev->in_use, 0); return 0; } @@ -67,15 +72,14 @@ static void coalesced_mmio_write(struct (struct kvm_coalesced_mmio_dev*)this->private; struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; - /* kvm->lock must be taken by caller before call to in_range()*/ - /* copy data in first free entry of the ring */ ring->coalesced_mmio[ring->last].phys_addr = addr; ring->coalesced_mmio[ring->last].len = len; memcpy(ring->coalesced_mmio[ring->last].data, val, len); - smp_wmb(); ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX; + smp_wmb(); + atomic_set(&dev->in_use, 0); } static void coalesced_mmio_destructor(struct kvm_io_device *this) @@ -90,6 +94,8 @@ int kvm_coalesced_mmio_init(struct kvm * dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) return -ENOMEM; + atomic_set(&dev->in_use, 0); + dev->dev.write = coalesced_mmio_write; dev->dev.in_range = coalesced_mmio_in_range; dev->dev.destructor = coalesced_mmio_destructor; Index: kvm-irqlock/virt/kvm/coalesced_mmio.h =================================================================== --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.h +++ kvm-irqlock/virt/kvm/coalesced_mmio.h @@ -12,6 +12,7 @@ struct kvm_coalesced_mmio_dev { struct kvm_io_device dev; struct kvm *kvm; + atomic_t in_use; int nb_zones; struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; };
Get rid of kvm->lock dependency on coalesced_mmio methods. Use an atomic variable instead to guarantee only one vcpu is batching data into the ring at a given time. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>