[1/6] kvm-s390: Fix memory slot versus run

Message ID	1241534358-32172-2-git-send-email-ehrhardt@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n45EdSJ7025020 for <patchwork-kvm@patchwork.kernel.org>; Tue, 5 May 2009 14:39:29 GMT From: ehrhardt@linux.vnet.ibm.com To: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org Cc: ehrhardt@linux.vnet.ibm.com, Christian Borntraeger <borntraeger@de.ibm.com>, Carsten Otte <cotte@de.ibm.com> Subject: [PATCH 1/6] kvm-s390: Fix memory slot versus run Date: Tue, 5 May 2009 16:39:13 +0200 Message-Id: <1241534358-32172-2-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1241534358-32172-1-git-send-email-ehrhardt@linux.vnet.ibm.com> References: <1241534358-32172-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

ehrhardt@linux.vnet.ibm.com May 5, 2009, 2:39 p.m. UTC

From: Carsten Otte <cotte@de.ibm.com>

This patch fixes an incorrectness in the kvm backend for s390.
In case virtual cpus are being created before the corresponding
memory slot is being registered, we need to update the sie
control blocks for the virtual cpus. In order to do that, we
use the vcpu->mutex to lock out kvm_run and friends. This way
we can ensure a consistent update of the memory for the entire
smp configuration.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity May 6, 2009, 12:01 p.m. UTC | #1

ehrhardt@linux.vnet.ibm.com wrote:
> From: Carsten Otte <cotte@de.ibm.com>
>
> This patch fixes an incorrectness in the kvm backend for s390.
> In case virtual cpus are being created before the corresponding
> memory slot is being registered, we need to update the sie
> control blocks for the virtual cpus. In order to do that, we
> use the vcpu->mutex to lock out kvm_run and friends. This way
> we can ensure a consistent update of the memory for the entire
> smp configuration.
> @@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
>  				struct kvm_memory_slot old,
>  				int user_alloc)
>  {
> +	int i;
> +
>  	/* A few sanity checks. We can have exactly one memory slot which has
>  	   to start at guest virtual zero and which has to be located at a
>  	   page boundary in userland and which has to end at a page boundary.
> @@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
>  	if (mem->memory_size & (PAGE_SIZE - 1))
>  		return -EINVAL;
>  
> +	/* lock all vcpus */
> +	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> +		if (kvm->vcpus[i])
> +			mutex_lock(&kvm->vcpus[i]->mutex);
> +	}
> +
>  	

Can't that livelock?  Nothing requires a vcpu to ever exit, and if the 
cpu on which it's running on has no other load and no interrupts, it 
could remain in guest mode indefinitely, and then the ioctl will hang, 
waiting for something to happen.

On x86, we use slots_lock to protect memory slots.  When we change the 
global memory configuration, we set a bit in vcpu->requests, and send an 
IPI to all cpus that are currently in guest mode for our guest.  This 
forces the cpu back to host mode.  On the next entry, vcpu_run notices 
vcpu->requests has the bit set and reloads the mmu configuration.  Of 
course, all this may be overkill for s390.

ehrhardt@linux.vnet.ibm.com May 11, 2009, 1 p.m. UTC | #2

Avi Kivity wrote:
> ehrhardt@linux.vnet.ibm.com wrote:
>> From: Carsten Otte <cotte@de.ibm.com>
>>
>> This patch fixes an incorrectness in the kvm backend for s390.
>> In case virtual cpus are being created before the corresponding
>> memory slot is being registered, we need to update the sie
>> control blocks for the virtual cpus. In order to do that, we
>> use the vcpu->mutex to lock out kvm_run and friends. This way
>> we can ensure a consistent update of the memory for the entire
>> smp configuration.
>> @@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
>>                  struct kvm_memory_slot old,
>>                  int user_alloc)
>>  {
>> +    int i;
>> +
>>      /* A few sanity checks. We can have exactly one memory slot 
>> which has
>>         to start at guest virtual zero and which has to be located at a
>>         page boundary in userland and which has to end at a page 
>> boundary.
>> @@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
>>      if (mem->memory_size & (PAGE_SIZE - 1))
>>          return -EINVAL;
>>  
>> +    /* lock all vcpus */
>> +    for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> +        if (kvm->vcpus[i])
>> +            mutex_lock(&kvm->vcpus[i]->mutex);
>> +    }
>> +
>>     
>
> Can't that livelock?  Nothing requires a vcpu to ever exit, and if the 
> cpu on which it's running on has no other load and no interrupts, it 
> could remain in guest mode indefinitely, and then the ioctl will hang, 
> waiting for something to happen.
>
Yes it could wait indefinitely - good spot.

> On x86, we use slots_lock to protect memory slots.  When we change the 
> global memory configuration, we set a bit in vcpu->requests, and send 
> an IPI to all cpus that are currently in guest mode for our guest.  
> This forces the cpu back to host mode.  On the next entry, vcpu_run 
> notices vcpu->requests has the bit set and reloads the mmu 
> configuration.  Of course, all this may be overkill for s390.
>
I thought about implementing it with slots_lock, vcpu->request, etc but 
it really looks like overkill for s390.
At least today we can assume that we only have one memslot. Therefore a 
set_memslot with already created vcpu's will still not interfere with 
running vcpus (they can't run without memslot and since we have only one 
they won't run).
Anyway I the code is prepared to "meet" running vcpus, because it might 
be different in future. To prevent the livelock issue I changed the code 
using mutex_trylock and in case I can't get the lock I explicitly let 
the vcpu exit from guest.

Avi Kivity May 11, 2009, 1:15 p.m. UTC | #3

Christian Ehrhardt wrote:
>
>> On x86, we use slots_lock to protect memory slots.  When we change 
>> the global memory configuration, we set a bit in vcpu->requests, and 
>> send an IPI to all cpus that are currently in guest mode for our 
>> guest.  This forces the cpu back to host mode.  On the next entry, 
>> vcpu_run notices vcpu->requests has the bit set and reloads the mmu 
>> configuration.  Of course, all this may be overkill for s390.
>>
> I thought about implementing it with slots_lock, vcpu->request, etc 
> but it really looks like overkill for s390.

We could make (some of) it common code, so it won't look so bad.  
There's value in having all kvm ports do things similarly; though of 
course we shouldn't force the solution when it isn't really needed.

vcpu->requests is useful whenever we modify global VM state that needs 
to be seen by all vcpus in host mode; see  kvm_reload_remote_mmus().

> At least today we can assume that we only have one memslot. Therefore 
> a set_memslot with already created vcpu's will still not interfere 
> with running vcpus (they can't run without memslot and since we have 
> only one they won't run).
> Anyway I the code is prepared to "meet" running vcpus, because it 
> might be different in future. To prevent the livelock issue I changed 
> the code using mutex_trylock and in case I can't get the lock I 
> explicitly let the vcpu exit from guest.

Why not do it unconditionally?

ehrhardt@linux.vnet.ibm.com May 11, 2009, 1:46 p.m. UTC | #4

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>
>>> On x86, we use slots_lock to protect memory slots.  When we change 
>>> the global memory configuration, we set a bit in vcpu->requests, and 
>>> send an IPI to all cpus that are currently in guest mode for our 
>>> guest.  This forces the cpu back to host mode.  On the next entry, 
>>> vcpu_run notices vcpu->requests has the bit set and reloads the mmu 
>>> configuration.  Of course, all this may be overkill for s390.
>>>
>> I thought about implementing it with slots_lock, vcpu->request, etc 
>> but it really looks like overkill for s390.
>
> We could make (some of) it common code, so it won't look so bad.  
> There's value in having all kvm ports do things similarly; though of 
> course we shouldn't force the solution when it isn't really needed.
>
> vcpu->requests is useful whenever we modify global VM state that needs 
> to be seen by all vcpus in host mode; see  kvm_reload_remote_mmus().
yeah I read that code after your first hint in that thread, and I agree 
that merging some of this into common code might be good.
But in my opinion not now for this bugfix patch (the intention is just 
to prevent a user being able to crash the host via vcpu create,set mem& 
and vcpu run in that order).
It might be a good point to further streamline this once we use the same 
userspace code, but I think it doesn't make sense yet.
>
>> At least today we can assume that we only have one memslot. Therefore 
>> a set_memslot with already created vcpu's will still not interfere 
>> with running vcpus (they can't run without memslot and since we have 
>> only one they won't run).
>> Anyway I the code is prepared to "meet" running vcpus, because it 
>> might be different in future. To prevent the livelock issue I changed 
>> the code using mutex_trylock and in case I can't get the lock I 
>> explicitly let the vcpu exit from guest.
>
> Why not do it unconditionally?
>
hmm I might have written that misleading - eventually it's a loop until 
it got the lock
  while !trylock
    kick vcpu out of guest
    schedule

There is no reason to kick out guests where I got the lock cleanly as 
far as I see.
Especially as I expect the vcpus not running in the common case as i 
explained above (can't run without memslot + we only have one => no vcpu 
will run).

Avi Kivity May 11, 2009, 2:02 p.m. UTC | #5

Christian Ehrhardt wrote:
>>> I thought about implementing it with slots_lock, vcpu->request, etc 
>>> but it really looks like overkill for s390.
>>
>> We could make (some of) it common code, so it won't look so bad.  
>> There's value in having all kvm ports do things similarly; though of 
>> course we shouldn't force the solution when it isn't really needed.
>>
>> vcpu->requests is useful whenever we modify global VM state that 
>> needs to be seen by all vcpus in host mode; see  
>> kvm_reload_remote_mmus().
> yeah I read that code after your first hint in that thread, and I 
> agree that merging some of this into common code might be good.
> But in my opinion not now for this bugfix patch (the intention is just 
> to prevent a user being able to crash the host via vcpu create,set 
> mem& and vcpu run in that order).
> It might be a good point to further streamline this once we use the 
> same userspace code, but I think it doesn't make sense yet.

Sure, don't mix bugfixes with infrastructure changes, when possible.

>>> At least today we can assume that we only have one memslot. 
>>> Therefore a set_memslot with already created vcpu's will still not 
>>> interfere with running vcpus (they can't run without memslot and 
>>> since we have only one they won't run).
>>> Anyway I the code is prepared to "meet" running vcpus, because it 
>>> might be different in future. To prevent the livelock issue I 
>>> changed the code using mutex_trylock and in case I can't get the 
>>> lock I explicitly let the vcpu exit from guest.
>>
>> Why not do it unconditionally?
>>
> hmm I might have written that misleading - eventually it's a loop 
> until it got the lock
>  while !trylock
>    kick vcpu out of guest
>    schedule
>
> There is no reason to kick out guests where I got the lock cleanly as 
> far as I see.
> Especially as I expect the vcpus not running in the common case as i 
> explained above (can't run without memslot + we only have one => no 
> vcpu will run).

Still livelockable, unless you stop the vcpu from entering the guest 
immediately.

That's why vcpu->requests is so powerful.  Not only you kick the vcpu 
out of guest mode, you force it to synchronize when it tries to enter again.

ehrhardt@linux.vnet.ibm.com May 11, 2009, 2:42 p.m. UTC | #6

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>>> I thought about implementing it with slots_lock, vcpu->request, etc 
>>>> but it really looks like overkill for s390.
>>>
>>> We could make (some of) it common code, so it won't look so bad.  
>>> There's value in having all kvm ports do things similarly; though of 
>>> course we shouldn't force the solution when it isn't really needed.
>>>
>>> vcpu->requests is useful whenever we modify global VM state that 
>>> needs to be seen by all vcpus in host mode; see  
>>> kvm_reload_remote_mmus().
>> yeah I read that code after your first hint in that thread, and I 
>> agree that merging some of this into common code might be good.
>> But in my opinion not now for this bugfix patch (the intention is 
>> just to prevent a user being able to crash the host via vcpu 
>> create,set mem& and vcpu run in that order).
>> It might be a good point to further streamline this once we use the 
>> same userspace code, but I think it doesn't make sense yet.
>
> Sure, don't mix bugfixes with infrastructure changes, when possible.
>
>>>> At least today we can assume that we only have one memslot. 
>>>> Therefore a set_memslot with already created vcpu's will still not 
>>>> interfere with running vcpus (they can't run without memslot and 
>>>> since we have only one they won't run).
>>>> Anyway I the code is prepared to "meet" running vcpus, because it 
>>>> might be different in future. To prevent the livelock issue I 
>>>> changed the code using mutex_trylock and in case I can't get the 
>>>> lock I explicitly let the vcpu exit from guest.
>>>
>>> Why not do it unconditionally?
>>>
>> hmm I might have written that misleading - eventually it's a loop 
>> until it got the lock
>>  while !trylock
>>    kick vcpu out of guest
>>    schedule
>>
>> There is no reason to kick out guests where I got the lock cleanly as 
>> far as I see.
>> Especially as I expect the vcpus not running in the common case as i 
>> explained above (can't run without memslot + we only have one => no 
>> vcpu will run).
>
> Still livelockable, unless you stop the vcpu from entering the guest 
> immediately.
>
> That's why vcpu->requests is so powerful.  Not only you kick the vcpu 
> out of guest mode, you force it to synchronize when it tries to enter 
> again.
>

The bad thing on vcpu->request in that case is that I don't want the 
async behaviour of vcpu->requests in that case, I want the memory slot 
updated in all vcpu's when the ioctl is returning.
Looking at vcpu->request based solution I don't find the synchronization 
I need. The changes to  vcpu->arch.guest_origin/guest_memsize and the 
changes to vcpu->arch.sie_block->gmsor/gmslm need to happen without the 
vcpu running.
Therefor i want the vcpu lock _before_ I update the both structs, 
otherwise it could be racy (at least on s390).

On the other hand while it is very++ unlikely to happen you are still 
right that it could theoretically livelock there.
I might use vcpu->request in to not enter vcpu run again after such a 
"kick" out of guest state.
It would be checked on vcpu_run enter and could then drop the lock, call 
schedule, relock and check the flag again until it is cleared.
I'm not yet happy with this solution as I expect it to end up in 
something like a reference count which then would not fit into the 
existing vcpu->request flags :-/

As I mentioned above the changes to vcpu->arch and vcpu->arch->sie_block 
have to be exclusive with the vcpu not running.
If I would find something as "transport" for the information I have on 
set_memory_slot (origin/size) until the next vcpu_run entry I could do 
both changes there synchronously.
In that case I could really use your suggested solution with 
vcpu->request, kick out unconditionally and set values on next (re-)entry.

Hmmm .. Maybe I can find all I need on reentry in vcpu->kvm->memslots[].
If I can change it that way it will definitely require some testing.
... to be continued :-)

Avi Kivity May 11, 2009, 3:01 p.m. UTC | #7

Christian Ehrhardt wrote:
>
> The bad thing on vcpu->request in that case is that I don't want the 
> async behaviour of vcpu->requests in that case, I want the memory slot 
> updated in all vcpu's when the ioctl is returning.

You mean, the hardware can access the vcpu control block even when the 
vcpu is not running?

> Looking at vcpu->request based solution I don't find the 
> synchronization I need. The changes to  
> vcpu->arch.guest_origin/guest_memsize and the changes to 
> vcpu->arch.sie_block->gmsor/gmslm need to happen without the vcpu 
> running.
> Therefor i want the vcpu lock _before_ I update the both structs, 
> otherwise it could be racy (at least on s390).
>
> On the other hand while it is very++ unlikely to happen you are still 
> right that it could theoretically livelock there.
> I might use vcpu->request in to not enter vcpu run again after such a 
> "kick" out of guest state.
> It would be checked on vcpu_run enter and could then drop the lock, 
> call schedule, relock and check the flag again until it is cleared.
> I'm not yet happy with this solution as I expect it to end up in 
> something like a reference count which then would not fit into the 
> existing vcpu->request flags :-/
>
> As I mentioned above the changes to vcpu->arch and 
> vcpu->arch->sie_block have to be exclusive with the vcpu not running.
> If I would find something as "transport" for the information I have on 
> set_memory_slot (origin/size) until the next vcpu_run entry I could do 
> both changes there synchronously.

The information can be found in kvm->memslots.

> In that case I could really use your suggested solution with 
> vcpu->request, kick out unconditionally and set values on next 
> (re-)entry.
>
> Hmmm .. Maybe I can find all I need on reentry in vcpu->kvm->memslots[].

Err...

> If I can change it that way it will definitely require some testing.
> ... to be continued :-)

I definitely recommend it -- would bring s390 more in line with the 
other ports (I know it's a backward step for you :)

Note our plan is to change slots_lock to RCU, so it's even better if you 
use memslots.

ehrhardt@linux.vnet.ibm.com May 12, 2009, 9:15 a.m. UTC | #8

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>
>> The bad thing on vcpu->request in that case is that I don't want the 
>> async behaviour of vcpu->requests in that case, I want the memory 
>> slot updated in all vcpu's when the ioctl is returning.
>
> You mean, the hardware can access the vcpu control block even when the 
> vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my own 
fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.
That makes the "per vcpu resync on next entry" approach not feasible.

On the other hand I realized at the same moment that the livelock should 
be no issue for us, because as I mentioned:
a) only one memslot
b) a vcpu can't run without memslot
So I don't even need to kick out vcpu's, they just should not be running.
Until we ever support multiple slots, or updates of the existing single 
slot this should be ok, so is the bugfix patch this should be.
To avoid a theoretical deadlock in case other code is changing (badly) 
it should be fair to aquire the lock with mutex_trylock and return 
-EINVAL if we did not get all locks.

[...]
>> If I can change it that way it will definitely require some testing.
>> ... to be continued :-)
>
> I definitely recommend it -- would bring s390 more in line with the 
> other ports (I know it's a backward step for you :)
>
> Note our plan is to change slots_lock to RCU, so it's even better if 
> you use memslots.
As long as we have the special conditions mentioned above I think its ok 
to implement it the way I do it now.
I agree that if we ever support multiple memslots we should strive for a 
common solution.

p.s. the second patch in the series ensures that a vcpu really never 
runs without a memslot being set as this was another bug we had.

Avi Kivity May 12, 2009, 11:35 a.m. UTC | #9

Christian Ehrhardt wrote:
> Avi Kivity wrote:
>> Christian Ehrhardt wrote:
>>>
>>> The bad thing on vcpu->request in that case is that I don't want the 
>>> async behaviour of vcpu->requests in that case, I want the memory 
>>> slot updated in all vcpu's when the ioctl is returning.
>>
>> You mean, the hardware can access the vcpu control block even when 
>> the vcpu is not running? 
> No, hardware only uses it with a running vcpu, but I realised my own 
> fault while changing the code to vcpu->request style.
> For s390 I need to update the KVM->arch and *all* 
> vcpu->arch->sie_block... data synchronously.

Out of interest, can you explain why?

> That makes the "per vcpu resync on next entry" approach not feasible.
>
> On the other hand I realized at the same moment that the livelock 
> should be no issue for us, because as I mentioned:
> a) only one memslot
> b) a vcpu can't run without memslot
> So I don't even need to kick out vcpu's, they just should not be running.
> Until we ever support multiple slots, or updates of the existing 
> single slot this should be ok, so is the bugfix patch this should be.
> To avoid a theoretical deadlock in case other code is changing (badly) 
> it should be fair to aquire the lock with mutex_trylock and return 
> -EINVAL if we did not get all locks.

OK.

ehrhardt@linux.vnet.ibm.com May 12, 2009, 1:33 p.m. UTC | #10

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>> Avi Kivity wrote:
>>> Christian Ehrhardt wrote:
>>>>
>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>
>>> You mean, the hardware can access the vcpu control block even when 
>>> the vcpu is not running? 
>> No, hardware only uses it with a running vcpu, but I realised my own 
>> fault while changing the code to vcpu->request style.
>> For s390 I need to update the KVM->arch and *all* 
>> vcpu->arch->sie_block... data synchronously.
>
> Out of interest, can you explain why?
Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's memory. 
Therefore some important values like guest_origin and guest_memsize (one 
slot so it's just addr+size) are kept at VM level in kvm->arch.
b) We fortunately have cool hardware support for "nearly everything"(tm) 
:-) In this case for example we set in vcpu->arch.sie_block the values 
for origin and size translated into a "limit" to get memory management 
virtualization support.
c) we have other code e.g. all our copy_from/to_guest stuff that uses 
the kvm->arch values

If we would allow e.g. updates of a memslot (or as the patch supposes to 
harden the set_memory_region code against inconsiderate code changes in 
other sections) it might happen that we set the kvm->arch information 
but the vcpu->arch->sie_block stuff not until next reentry. Now 
concurrently the running vcpu could cause some kind of fault that 
involves a copy_from/to_guest. That way we could end up with potentially 
invalid handling of that fault (fault handling and running guest would 
use different userspace adresses until it is synced on next vcpu 
reentry) - it's theoretical I know, but it might cause some issues that 
would be hard to find.

On the other hand for the long term I wanted to note that all our 
copy_from/to_guest functions is per vcpu, so when we some day implement 
updateable memslots, multiple memslots or even just fill "free time"(tm) 
and streamline our code we could redesign that origin/size storage. This 
could be done multiple ways, either just store it per vcpu or with a 
lock for the kvm->arch level variables - both ways and maybe more could 
then use the vcpu->request based approach, but unfortunately it's 
neither part of that patch nor of the current effort to do that.

The really good thing is, because of our discussion about that I now 
have a really detailed idea how I can improve that code aside from this 
bugfix patch (lets hope not too far in the future).

>> That makes the "per vcpu resync on next entry" approach not feasible.
>>
>> On the other hand I realized at the same moment that the livelock 
>> should be no issue for us, because as I mentioned:
>> a) only one memslot
>> b) a vcpu can't run without memslot
>> So I don't even need to kick out vcpu's, they just should not be 
>> running.
>> Until we ever support multiple slots, or updates of the existing 
>> single slot this should be ok, so is the bugfix patch this should be.
>> To avoid a theoretical deadlock in case other code is changing 
>> (badly) it should be fair to aquire the lock with mutex_trylock and 
>> return -EINVAL if we did not get all locks.
>
> OK.
>
>

Avi Kivity May 17, 2009, 10:31 p.m. UTC | #11

Christian Ehrhardt wrote:
>>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>>
>>>> You mean, the hardware can access the vcpu control block even when 
>>>> the vcpu is not running? 
>>> No, hardware only uses it with a running vcpu, but I realised my own 
>>> fault while changing the code to vcpu->request style.
>>> For s390 I need to update the KVM->arch and *all* 
>>> vcpu->arch->sie_block... data synchronously.
>>
>> Out of interest, can you explain why?
> Sure I'll try to give an example.
>
> a) The whole guest has "one" memory slot representing all it's memory. 
> Therefore some important values like guest_origin and guest_memsize 
> (one slot so it's just addr+size) are kept at VM level in kvm->arch.

It should really be kept in kvm->memslots[0]->{userspace_addr, npages}.  
This is common to all architectures.

> b) We fortunately have cool hardware support for "nearly 
> everything"(tm) :-) In this case for example we set in 
> vcpu->arch.sie_block the values for origin and size translated into a 
> "limit" to get memory management virtualization support.

x86 has something analogous; shadow or nested page tables are also 
per-vcpu and accessed by the hardware while the guest is running.

> c) we have other code e.g. all our copy_from/to_guest stuff that uses 
> the kvm->arch values

You want to drop these and use kvm_read_guest() / kvm_write_guest().

> If we would allow e.g. updates of a memslot (or as the patch supposes 
> to harden the set_memory_region code against inconsiderate code 
> changes in other sections) it might happen that we set the kvm->arch 
> information but the vcpu->arch->sie_block stuff not until next 
> reentry. Now concurrently the running vcpu could cause some kind of 
> fault that involves a copy_from/to_guest. That way we could end up 
> with potentially invalid handling of that fault (fault handling and 
> running guest would use different userspace adresses until it is 
> synced on next vcpu reentry) - it's theoretical I know, but it might 
> cause some issues that would be hard to find.

I agree it should be protected.  Here's how we do it in arch-independent 
code:

- code that looks up memory slots takes slots_lock for read (future: rcu)
- code that changes memory slots takes slots_lock for write, and 
requests an mmu reload (includes an IPI to force the vcpu out of guest mode)

Now, once we begin changing a slot no one can touch memory or reenter 
the guest until we are done.

> On the other hand for the long term I wanted to note that all our 
> copy_from/to_guest functions is per vcpu, so when we some day 
> implement updateable memslots, multiple memslots or even just fill 
> "free time"(tm) and streamline our code we could redesign that 
> origin/size storage. This could be done multiple ways, either just 
> store it per vcpu or with a lock for the kvm->arch level variables - 
> both ways and maybe more could then use the vcpu->request based 
> approach, but unfortunately it's neither part of that patch nor of the 
> current effort to do that.

I think we should keep that in generic code.  All of that applies to x86 
(and ia64 and ppc), if I understand you correctly, and if I understand 
the other archs correctly (don't place a large bet).

ehrhardt@linux.vnet.ibm.com May 20, 2009, 12:05 p.m. UTC | #12

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>>>
>>>>> You mean, the hardware can access the vcpu control block even when 
>>>>> the vcpu is not running? 
>>>> No, hardware only uses it with a running vcpu, but I realised my 
>>>> own fault while changing the code to vcpu->request style.
>>>> For s390 I need to update the KVM->arch and *all* 
>>>> vcpu->arch->sie_block... data synchronously.
>>>
>>> Out of interest, can you explain why?
>> Sure I'll try to give an example.
>>
>> a) The whole guest has "one" memory slot representing all it's 
>> memory. Therefore some important values like guest_origin and 
>> guest_memsize (one slot so it's just addr+size) are kept at VM level 
>> in kvm->arch.
>
> It should really be kept in kvm->memslots[0]->{userspace_addr, 
> npages}.  This is common to all architectures.
As I said wanted to do that, but due to the need to relocate my work 
environment to a new laptop I was a bit stalled the last few days.
A patch series implementing it in a streamlined (storing in memslots 
only, slots_lock, vcpu->request, ...) way will soon appear on the list.
> [...]
>> c) we have other code e.g. all our copy_from/to_guest stuff that uses 
>> the kvm->arch values
>
> You want to drop these and use kvm_read_guest() / kvm_write_guest().
I put it on my "low-prio-but-very-useful todo list" to take a look at 
that too as one of the next opportunities to streamline our code.

[...]

[1/6] kvm-s390: Fix memory slot versus run

Commit Message

Comments

Patch