diff mbox

[2/4] KVM: move coalesced_mmio locking to its own device

Message ID 20090518170855.346048603@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti May 18, 2009, 4:56 p.m. UTC
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Comments

Avi Kivity May 20, 2009, 12:06 p.m. UTC | #1
Marcelo Tosatti wrote:
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: kvm/virt/kvm/coalesced_mmio.c
> ===================================================================
> --- kvm.orig/virt/kvm/coalesced_mmio.c
> +++ kvm/virt/kvm/coalesced_mmio.c
> @@ -26,9 +26,10 @@ static int coalesced_mmio_in_range(struc
>  	if (!is_write)
>  		return 0;
>  
> -	/* kvm->lock is taken by the caller and must be not released before
> -         * dev.read/write
> -         */
> +	/*
> + 	 * dev->lock must not be released before coalesced_mmio_write.
> +	 */
> +	mutex_lock(&dev->lock);
>  
>  	/* Are we able to batch it ? */
>  
> @@ -41,6 +42,7 @@ static int coalesced_mmio_in_range(struc
>  							KVM_COALESCED_MMIO_MAX;
>  	if (next == dev->kvm->coalesced_mmio_ring->first) {
>  		/* full */
> +		mutex_unlock(&dev->lock);
>  		return 0;
>  	}
>  
> @@ -57,6 +59,7 @@ static int coalesced_mmio_in_range(struc
>  		    addr + len <= zone->addr + zone->size)
>  			return 1;
>  	}
> +	mutex_unlock(&dev->lock);
>  	return 0;
>  }
>   

So we have a function that takes a lock and conditionally releases it?
Marcelo Tosatti May 20, 2009, 2:09 p.m. UTC | #2
On Wed, May 20, 2009 at 03:06:26PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> Index: kvm/virt/kvm/coalesced_mmio.c
>> ===================================================================
>> --- kvm.orig/virt/kvm/coalesced_mmio.c
>> +++ kvm/virt/kvm/coalesced_mmio.c
>> @@ -26,9 +26,10 @@ static int coalesced_mmio_in_range(struc
>>  	if (!is_write)
>>  		return 0;
>>  -	/* kvm->lock is taken by the caller and must be not released before
>> -         * dev.read/write
>> -         */
>> +	/*
>> + 	 * dev->lock must not be released before coalesced_mmio_write.
>> +	 */
>> +	mutex_lock(&dev->lock);
>>   	/* Are we able to batch it ? */
>>  @@ -41,6 +42,7 @@ static int coalesced_mmio_in_range(struc
>>  							KVM_COALESCED_MMIO_MAX;
>>  	if (next == dev->kvm->coalesced_mmio_ring->first) {
>>  		/* full */
>> +		mutex_unlock(&dev->lock);
>>  		return 0;
>>  	}
>>  @@ -57,6 +59,7 @@ static int coalesced_mmio_in_range(struc
>>  		    addr + len <= zone->addr + zone->size)
>>  			return 1;
>>  	}
>> +	mutex_unlock(&dev->lock);
>>  	return 0;
>>  }
>>   
>
> So we have a function that takes a lock and conditionally releases it?

Yes, but it is correct: it will only return with the lock held in case
it returns 1, in which case its guaranteed ->write will be called (which
will unlock it).

It should check the range first and/or use some smarter synchronization,
but one thing at a time.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 20, 2009, 2:29 p.m. UTC | #3
Marcelo Tosatti wrote:

  

>> So we have a function that takes a lock and conditionally releases it?
>>     
>
> Yes, but it is correct: it will only return with the lock held in case
> it returns 1, in which case its guaranteed ->write will be called (which
> will unlock it).
>
> It should check the range first and/or use some smarter synchronization,
> but one thing at a time.
>   

Yes it's correct but we'll get an endless stream of patches to 'fix' it 
because it is so unorthodox.
Marcelo Tosatti May 20, 2009, 3:13 p.m. UTC | #4
On Wed, May 20, 2009 at 05:29:23PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>
>  
>
>>> So we have a function that takes a lock and conditionally releases it?
>>>     
>>
>> Yes, but it is correct: it will only return with the lock held in case
>> it returns 1, in which case its guaranteed ->write will be called (which
>> will unlock it).
>>
>> It should check the range first and/or use some smarter synchronization,
>> but one thing at a time.
>>   
>
> Yes it's correct but we'll get an endless stream of patches to 'fix' it  
> because it is so unorthodox.

Does it have to guarantee any kind of ordering in case of parallel
writes by distincting vcpus? This is what it does now (so if a vcpu
arrives first, the second will wait until the first is finished
processing).

I suppose that is the responsability of the guest (if it does MMIO
writes to a device in parallel it'll screwup in real HW too).

Because in such case, you can drop the mutex and protect only the kvm
data.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti May 20, 2009, 3:22 p.m. UTC | #5
On Wed, May 20, 2009 at 12:13:03PM -0300, Marcelo Tosatti wrote:
> On Wed, May 20, 2009 at 05:29:23PM +0300, Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> >
> >  
> >
> >>> So we have a function that takes a lock and conditionally releases it?
> >>>     
> >>
> >> Yes, but it is correct: it will only return with the lock held in case
> >> it returns 1, in which case its guaranteed ->write will be called (which
> >> will unlock it).
> >>
> >> It should check the range first and/or use some smarter synchronization,
> >> but one thing at a time.
> >>   
> >
> > Yes it's correct but we'll get an endless stream of patches to 'fix' it  
> > because it is so unorthodox.
> 
> Does it have to guarantee any kind of ordering in case of parallel
> writes by distincting vcpus? This is what it does now (so if a vcpu
> arrives first, the second will wait until the first is finished
> processing).
> 
> I suppose that is the responsability of the guest (if it does MMIO
> writes to a device in parallel it'll screwup in real HW too).
> 
> Because in such case, you can drop the mutex and protect only the kvm
> data.

If you want it to provide ordering (as in process the MMIO writes, by
multiple vcpus, in the order they happen), you need a mutex or spinlock.

And in that case, I don't see any other way around this given the way
->in_range / ->read/->write work.

Even if you change the order in which the full mmio buffer check and 
coalesced-allowed-in-this-range happen you still need a spinlock/mutex 
in this unorthodox way. 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 20, 2009, 3:22 p.m. UTC | #6
Marcelo Tosatti wrote:

  

>> Yes it's correct but we'll get an endless stream of patches to 'fix' it  
>> because it is so unorthodox.
>>     
>
> Does it have to guarantee any kind of ordering in case of parallel
> writes by distincting vcpus? This is what it does now (so if a vcpu
> arrives first, the second will wait until the first is finished
> processing).
>   

No.

> I suppose that is the responsability of the guest (if it does MMIO
> writes to a device in parallel it'll screwup in real HW too).
>   

Yes.

> Because in such case, you can drop the mutex and protect only the kvm
> data.
>   

One has to be before the other.  The order doesn't matter, but both have 
to be there.
Marcelo Tosatti May 20, 2009, 6:48 p.m. UTC | #7
Addressing comment and covering irqfd (did not remove the deadlock avoidance 
there since it might be useful later).
diff mbox

Patch

Index: kvm/virt/kvm/coalesced_mmio.c
===================================================================
--- kvm.orig/virt/kvm/coalesced_mmio.c
+++ kvm/virt/kvm/coalesced_mmio.c
@@ -26,9 +26,10 @@  static int coalesced_mmio_in_range(struc
 	if (!is_write)
 		return 0;
 
-	/* kvm->lock is taken by the caller and must be not released before
-         * dev.read/write
-         */
+	/*
+ 	 * dev->lock must not be released before coalesced_mmio_write.
+	 */
+	mutex_lock(&dev->lock);
 
 	/* Are we able to batch it ? */
 
@@ -41,6 +42,7 @@  static int coalesced_mmio_in_range(struc
 							KVM_COALESCED_MMIO_MAX;
 	if (next == dev->kvm->coalesced_mmio_ring->first) {
 		/* full */
+		mutex_unlock(&dev->lock);
 		return 0;
 	}
 
@@ -57,6 +59,7 @@  static int coalesced_mmio_in_range(struc
 		    addr + len <= zone->addr + zone->size)
 			return 1;
 	}
+	mutex_unlock(&dev->lock);
 	return 0;
 }
 
@@ -67,8 +70,6 @@  static void coalesced_mmio_write(struct 
 				(struct kvm_coalesced_mmio_dev*)this->private;
 	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
 
-	/* kvm->lock must be taken by caller before call to in_range()*/
-
 	/* copy data in first free entry of the ring */
 
 	ring->coalesced_mmio[ring->last].phys_addr = addr;
@@ -76,6 +77,7 @@  static void coalesced_mmio_write(struct 
 	memcpy(ring->coalesced_mmio[ring->last].data, val, len);
 	smp_wmb();
 	ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
+	mutex_unlock(&dev->lock);
 }
 
 static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -90,6 +92,8 @@  int kvm_coalesced_mmio_init(struct kvm *
 	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
 	if (!dev)
 		return -ENOMEM;
+	mutex_init(&dev->lock);
+
 	dev->dev.write  = coalesced_mmio_write;
 	dev->dev.in_range  = coalesced_mmio_in_range;
 	dev->dev.destructor  = coalesced_mmio_destructor;
@@ -109,16 +113,16 @@  int kvm_vm_ioctl_register_coalesced_mmio
 	if (dev == NULL)
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&dev->lock);
 	if (dev->nb_zones >= KVM_COALESCED_MMIO_ZONE_MAX) {
-		mutex_unlock(&kvm->lock);
+		mutex_unlock(&dev->lock);
 		return -ENOBUFS;
 	}
 
 	dev->zone[dev->nb_zones] = *zone;
 	dev->nb_zones++;
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&dev->lock);
 	return 0;
 }
 
@@ -132,7 +136,7 @@  int kvm_vm_ioctl_unregister_coalesced_mm
 	if (dev == NULL)
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&dev->lock);
 
 	i = dev->nb_zones;
 	while(i) {
@@ -150,7 +154,7 @@  int kvm_vm_ioctl_unregister_coalesced_mm
 		i--;
 	}
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&dev->lock);
 
 	return 0;
 }
Index: kvm/virt/kvm/coalesced_mmio.h
===================================================================
--- kvm.orig/virt/kvm/coalesced_mmio.h
+++ kvm/virt/kvm/coalesced_mmio.h
@@ -12,6 +12,7 @@ 
 struct kvm_coalesced_mmio_dev {
 	struct kvm_io_device dev;
 	struct kvm *kvm;
+	struct mutex lock;
 	int nb_zones;
 	struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
 };