diff mbox

[2/4] KVM: move coalesced_mmio locking to its own device

Message ID 20090528044808.205238362@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti May 28, 2009, 4:45 a.m. UTC
Move coalesced_mmio locking to its own device, instead of relying on
kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Comments

Avi Kivity May 31, 2009, 12:14 p.m. UTC | #1
Marcelo Tosatti wrote:
> Move coalesced_mmio locking to its own device, instead of relying on
> kvm->lock.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
> ===================================================================
> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
> @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc
>  	if (!is_write)
>  		return 0;
>  
> -	/* kvm->lock is taken by the caller and must be not released before
> -         * dev.read/write
> -         */
> +	spin_lock(&dev->lock);
>   

This unbalanced locking is still very displeasing.  At a minimum you 
need a sparse annotation to indicate it.

But I think it really indicates a problem with the io_device API.

Potential solutions:
- fold in_range() into ->write and ->read.  Make those functions 
responsible for both determining whether they can handle the range and 
performing the I/O.
- have a separate rwlock for the device list.
Marcelo Tosatti June 1, 2009, 9:23 p.m. UTC | #2
On Sun, May 31, 2009 at 03:14:36PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Move coalesced_mmio locking to its own device, instead of relying on
>> kvm->lock.
>>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
>> ===================================================================
>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
>> @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc
>>  	if (!is_write)
>>  		return 0;
>>  -	/* kvm->lock is taken by the caller and must be not released before
>> -         * dev.read/write
>> -         */
>> +	spin_lock(&dev->lock);
>>   
>
> This unbalanced locking is still very displeasing.  At a minimum you  
> need a sparse annotation to indicate it.
>
> But I think it really indicates a problem with the io_device API.
>
> Potential solutions:
> - fold in_range() into ->write and ->read.  Make those functions  
> responsible for both determining whether they can handle the range and  
> performing the I/O.
> - have a separate rwlock for the device list.

IMO the problem is the coalesced_mmio device. The unbalanced locking is
a result of the abuse of the in_range() and read/write() methods.

Normally you'd expect parallel accesses to in_range() to be allowed,
since its just checking whether (aha) the access is in range, returning
a pointer to the device if positive. Now read/write() are the ones who
need serialization, since they touch the device internal state.

coalesced_mmio abuses in_range() to do more things than it should.

Ideally we should fix coalesced_mmio, but i'm not going to do that now
(sorry, not confident in changing it without seeing go through intense
torture testing).

That said, is sparse annotation enough the convince you?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 1, 2009, 9:43 p.m. UTC | #3
Marcelo Tosatti wrote:
> On Sun, May 31, 2009 at 03:14:36PM +0300, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Move coalesced_mmio locking to its own device, instead of relying on
>>> kvm->lock.
>>>
>>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>
>>> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
>>> ===================================================================
>>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
>>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
>>> @@ -26,9 +26,7 @@ static int coalesced_mmio_in_range(struc
>>>  	if (!is_write)
>>>  		return 0;
>>>  -	/* kvm->lock is taken by the caller and must be not released before
>>> -         * dev.read/write
>>> -         */
>>> +	spin_lock(&dev->lock);
>>>   
>>>       
>> This unbalanced locking is still very displeasing.  At a minimum you  
>> need a sparse annotation to indicate it.
>>
>> But I think it really indicates a problem with the io_device API.
>>
>> Potential solutions:
>> - fold in_range() into ->write and ->read.  Make those functions  
>> responsible for both determining whether they can handle the range and  
>> performing the I/O.
>> - have a separate rwlock for the device list.
>>     
>
> IMO the problem is the coalesced_mmio device. The unbalanced locking is
> a result of the abuse of the in_range() and read/write() methods.
>
>   

Okay, the penny has dropped.  I understand now.

> Normally you'd expect parallel accesses to in_range() to be allowed,
> since its just checking whether (aha) the access is in range, returning
> a pointer to the device if positive. Now read/write() are the ones who
> need serialization, since they touch the device internal state.
>
> coalesced_mmio abuses in_range() to do more things than it should.
>
> Ideally we should fix coalesced_mmio, but i'm not going to do that now
> (sorry, not confident in changing it without seeing go through intense
> torture testing).
>   

It's not trivial since it's userspace that clears the ring, and we can't 
wait on userspace.

> That said, is sparse annotation enough the convince you?
>   

Let me have a look at fixing coalesced_mmio first.  We might allow 
->write to fail, causing a fallback to userspace.  Or we could fail if 
n_avail < MAX_VCPUS, so even the worst-case race leaves us one entry.
Marcelo Tosatti June 4, 2009, 6:08 p.m. UTC | #4
This is to fix a deadlock reported by Alex Williamson, while at
the same time makes it easier to allow PIO/MMIO regions to be
registered/unregistered while a guest is alive.
Avi Kivity June 8, 2009, 9:18 a.m. UTC | #5
Marcelo Tosatti wrote:
> This is to fix a deadlock reported by Alex Williamson, while at
> the same time makes it easier to allow PIO/MMIO regions to be
> registered/unregistered while a guest is alive.
>   

Applied all, thanks.  I also changed the coalesced_mmio overflow check 
to account for KVM_MAX_VCPUS.
diff mbox

Patch

Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
===================================================================
--- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
+++ kvm-irqlock/virt/kvm/coalesced_mmio.c
@@ -26,9 +26,7 @@  static int coalesced_mmio_in_range(struc
 	if (!is_write)
 		return 0;
 
-	/* kvm->lock is taken by the caller and must be not released before
-         * dev.read/write
-         */
+	spin_lock(&dev->lock);
 
 	/* Are we able to batch it ? */
 
@@ -41,7 +39,7 @@  static int coalesced_mmio_in_range(struc
 							KVM_COALESCED_MMIO_MAX;
 	if (next == dev->kvm->coalesced_mmio_ring->first) {
 		/* full */
-		return 0;
+		goto out_denied;
 	}
 
 	/* is it in a batchable area ? */
@@ -57,6 +55,8 @@  static int coalesced_mmio_in_range(struc
 		    addr + len <= zone->addr + zone->size)
 			return 1;
 	}
+out_denied:
+	spin_unlock(&dev->lock);
 	return 0;
 }
 
@@ -67,8 +67,6 @@  static void coalesced_mmio_write(struct 
 				(struct kvm_coalesced_mmio_dev*)this->private;
 	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
 
-	/* kvm->lock must be taken by caller before call to in_range()*/
-
 	/* copy data in first free entry of the ring */
 
 	ring->coalesced_mmio[ring->last].phys_addr = addr;
@@ -76,6 +74,7 @@  static void coalesced_mmio_write(struct 
 	memcpy(ring->coalesced_mmio[ring->last].data, val, len);
 	smp_wmb();
 	ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
+	spin_unlock(&dev->lock);
 }
 
 static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -90,6 +89,8 @@  int kvm_coalesced_mmio_init(struct kvm *
 	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
 	if (!dev)
 		return -ENOMEM;
+	spin_lock_init(&dev->lock);
+
 	dev->dev.write  = coalesced_mmio_write;
 	dev->dev.in_range  = coalesced_mmio_in_range;
 	dev->dev.destructor  = coalesced_mmio_destructor;
Index: kvm-irqlock/virt/kvm/coalesced_mmio.h
===================================================================
--- kvm-irqlock.orig/virt/kvm/coalesced_mmio.h
+++ kvm-irqlock/virt/kvm/coalesced_mmio.h
@@ -12,6 +12,7 @@ 
 struct kvm_coalesced_mmio_dev {
 	struct kvm_io_device dev;
 	struct kvm *kvm;
+	spinlock_t lock;
 	int nb_zones;
 	struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
 };