diff mbox

[2/4] KVM: move coalesced_mmio locking to its own device

Message ID 20090520185134.623968425@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti May 20, 2009, 6:48 p.m. UTC
Get rid of kvm->lock dependency on coalesced_mmio methods. Use an 
atomic variable instead to guarantee only one vcpu is batching
data into the ring at a given time.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Comments

Avi Kivity May 24, 2009, 2:04 p.m. UTC | #1
Marcelo Tosatti wrote:
> Get rid of kvm->lock dependency on coalesced_mmio methods. Use an 
> atomic variable instead to guarantee only one vcpu is batching
> data into the ring at a given time.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
> ===================================================================
> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc
>  	if (!is_write)
>  		return 0;
>  
> -	/* kvm->lock is taken by the caller and must be not released before
> -         * dev.read/write
> -         */
> +	/*
> + 	 * Some other vcpu might be batching data into the ring,
> + 	 * fallback to userspace. Ordering not our problem.
> + 	 */
> +	if (!atomic_add_unless(&dev->in_use, 1, 1))
> +		return 0;
>  
>   

Ordering with simultaneous writes is indeed not our problem, but the 
ring may contain ordered writes (even by the same vcpu!)

Suggest our own lock here.  in_use is basically a homemade lock, better 
to use the factory made ones which come with a warranty.
Marcelo Tosatti May 25, 2009, 11:04 a.m. UTC | #2
On Sun, May 24, 2009 at 05:04:52PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Get rid of kvm->lock dependency on coalesced_mmio methods. Use an  
>> atomic variable instead to guarantee only one vcpu is batching
>> data into the ring at a given time.
>>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
>> ===================================================================
>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
>> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc
>>  	if (!is_write)
>>  		return 0;
>>  -	/* kvm->lock is taken by the caller and must be not released before
>> -         * dev.read/write
>> -         */
>> +	/*
>> + 	 * Some other vcpu might be batching data into the ring,
>> + 	 * fallback to userspace. Ordering not our problem.
>> + 	 */
>> +	if (!atomic_add_unless(&dev->in_use, 1, 1))
>> +		return 0;
>>    
>
> Ordering with simultaneous writes is indeed not our problem, but the  
> ring may contain ordered writes (even by the same vcpu!)

You mean writes by a vcpu should maintain ordering even when that vcpu
migrates to a different pcpu? 

The smp_wmb in coalesced_write guarantees that.

> Suggest our own lock here.  in_use is basically a homemade lock, better  
> to use the factory made ones which come with a warranty.

The first submission had a mutex but was considered unorthodox. Although
i agree with your argument made then, i don't see a way around that.

Some pseudocode please? 

Agree with the suggestion on patch 3, will fix the commentary on
kvm_host.h.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 26, 2009, 11:24 a.m. UTC | #3
Marcelo Tosatti wrote:
>>> ===================================================================
>>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
>>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
>>> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc
>>>  	if (!is_write)
>>>  		return 0;
>>>  -	/* kvm->lock is taken by the caller and must be not released before
>>> -         * dev.read/write
>>> -         */
>>> +	/*
>>> + 	 * Some other vcpu might be batching data into the ring,
>>> + 	 * fallback to userspace. Ordering not our problem.
>>> + 	 */
>>> +	if (!atomic_add_unless(&dev->in_use, 1, 1))
>>> +		return 0;
>>>    
>>>       
>> Ordering with simultaneous writes is indeed not our problem, but the  
>> ring may contain ordered writes (even by the same vcpu!)
>>     
>
> You mean writes by a vcpu should maintain ordering even when that vcpu
> migrates to a different pcpu? 
>   

No.  Writes by a single vcpu need to preserve their ordering relative to 
each other.  If a write by vcpu A completes before a write by vcpu B 
starts, then they need to be ordered, since the vcpus may have 
synchronized in between.

Hm, I don't thimk you broke these rules.

> The smp_wmb in coalesced_write guarantees that.
>
>   
>> Suggest our own lock here.  in_use is basically a homemade lock, better  
>> to use the factory made ones which come with a warranty.
>>     
>
> The first submission had a mutex but was considered unorthodox. Although
> i agree with your argument made then, i don't see a way around that.
>
> Some pseudocode please? 
>   

Why not use slots_lock to protect the entire iodevice list (rcu one 
day), and an internal spinlock for coalesced mmio?
Marcelo Tosatti May 26, 2009, 1:15 p.m. UTC | #4
On Tue, May 26, 2009 at 02:24:33PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>>> ===================================================================
>>>> --- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
>>>> +++ kvm-irqlock/virt/kvm/coalesced_mmio.c
>>>> @@ -26,9 +26,12 @@ static int coalesced_mmio_in_range(struc
>>>>  	if (!is_write)
>>>>  		return 0;
>>>>  -	/* kvm->lock is taken by the caller and must be not released before
>>>> -         * dev.read/write
>>>> -         */
>>>> +	/*
>>>> + 	 * Some other vcpu might be batching data into the ring,
>>>> + 	 * fallback to userspace. Ordering not our problem.
>>>> + 	 */
>>>> +	if (!atomic_add_unless(&dev->in_use, 1, 1))
>>>> +		return 0;
>>>>          
>>> Ordering with simultaneous writes is indeed not our problem, but the  
>>> ring may contain ordered writes (even by the same vcpu!)
>>>     
>>
>> You mean writes by a vcpu should maintain ordering even when that vcpu
>> migrates to a different pcpu?   
>
> No.  Writes by a single vcpu need to preserve their ordering relative to  
> each other.  If a write by vcpu A completes before a write by vcpu B  
> starts, then they need to be ordered, since the vcpus may have  
> synchronized in between.
>
> Hm, I don't thimk you broke these rules.
>
>> The smp_wmb in coalesced_write guarantees that.
>>
>>   
>>> Suggest our own lock here.  in_use is basically a homemade lock, 
>>> better  to use the factory made ones which come with a warranty.
>>>     
>>
>> The first submission had a mutex but was considered unorthodox. Although
>> i agree with your argument made then, i don't see a way around that.
>>
>> Some pseudocode please?   
>
> Why not use slots_lock to protect the entire iodevice list (rcu one  
> day), and an internal spinlock for coalesced mmio?

Don't like using slots_lock to protect the entire iodevice list, its
reverse progress in my opinion. The PIO/MMIO device lists are data
structures used in hot path, they are not directly related to the
memslots, and therefore deserve a separate lock. Whenever slots_lock
becomes an issue, you'll have to entangle the mess, so better start
doing it now.

Sure can switch to a internal spinlock for coalesced_mmio, but breaking
out if spin_trylock fails. You're OK with that?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 26, 2009, 1:27 p.m. UTC | #5
Marcelo Tosatti wrote:
>>
>> Why not use slots_lock to protect the entire iodevice list (rcu one  
>> day), and an internal spinlock for coalesced mmio?
>>     
>
> Don't like using slots_lock to protect the entire iodevice list, its
> reverse progress in my opinion. The PIO/MMIO device lists are data
> structures used in hot path, they are not directly related to the
> memslots, and therefore deserve a separate lock. Whenever slots_lock
> becomes an issue, you'll have to entangle the mess, so better start
> doing it now.
>   

It's a reader/writer lock, so you'll never have contention.

I'm okay with a new lock though.

> Sure can switch to a internal spinlock for coalesced_mmio, but breaking
> out if spin_trylock fails. You're OK with that?
>   

Why not do a normal spin_lock()?  It's not like you'll wait years for 
the store to complete.
diff mbox

Patch

Index: kvm-irqlock/virt/kvm/coalesced_mmio.c
===================================================================
--- kvm-irqlock.orig/virt/kvm/coalesced_mmio.c
+++ kvm-irqlock/virt/kvm/coalesced_mmio.c
@@ -26,9 +26,12 @@  static int coalesced_mmio_in_range(struc
 	if (!is_write)
 		return 0;
 
-	/* kvm->lock is taken by the caller and must be not released before
-         * dev.read/write
-         */
+	/*
+ 	 * Some other vcpu might be batching data into the ring,
+ 	 * fallback to userspace. Ordering not our problem.
+ 	 */
+	if (!atomic_add_unless(&dev->in_use, 1, 1))
+		return 0;
 
 	/* Are we able to batch it ? */
 
@@ -41,7 +44,7 @@  static int coalesced_mmio_in_range(struc
 							KVM_COALESCED_MMIO_MAX;
 	if (next == dev->kvm->coalesced_mmio_ring->first) {
 		/* full */
-		return 0;
+		goto out_denied;
 	}
 
 	/* is it in a batchable area ? */
@@ -57,6 +60,8 @@  static int coalesced_mmio_in_range(struc
 		    addr + len <= zone->addr + zone->size)
 			return 1;
 	}
+out_denied:
+	atomic_set(&dev->in_use, 0);
 	return 0;
 }
 
@@ -67,15 +72,14 @@  static void coalesced_mmio_write(struct 
 				(struct kvm_coalesced_mmio_dev*)this->private;
 	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
 
-	/* kvm->lock must be taken by caller before call to in_range()*/
-
 	/* copy data in first free entry of the ring */
 
 	ring->coalesced_mmio[ring->last].phys_addr = addr;
 	ring->coalesced_mmio[ring->last].len = len;
 	memcpy(ring->coalesced_mmio[ring->last].data, val, len);
-	smp_wmb();
 	ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
+	smp_wmb();
+	atomic_set(&dev->in_use, 0);
 }
 
 static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -90,6 +94,8 @@  int kvm_coalesced_mmio_init(struct kvm *
 	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
 	if (!dev)
 		return -ENOMEM;
+	atomic_set(&dev->in_use, 0);
+
 	dev->dev.write  = coalesced_mmio_write;
 	dev->dev.in_range  = coalesced_mmio_in_range;
 	dev->dev.destructor  = coalesced_mmio_destructor;
Index: kvm-irqlock/virt/kvm/coalesced_mmio.h
===================================================================
--- kvm-irqlock.orig/virt/kvm/coalesced_mmio.h
+++ kvm-irqlock/virt/kvm/coalesced_mmio.h
@@ -12,6 +12,7 @@ 
 struct kvm_coalesced_mmio_dev {
 	struct kvm_io_device dev;
 	struct kvm *kvm;
+	atomic_t in_use;
 	int nb_zones;
 	struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
 };