Message ID | 5A01D776020000780018CEBD@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/11/17 14:55, Jan Beulich wrote: >>>> On 07.11.17 at 15:24, <igor.druzhinin@citrix.com> wrote: >> On 07/11/17 08:07, Jan Beulich wrote: >>> --- unstable.orig/xen/arch/x86/domain.c >>> +++ unstable/xen/arch/x86/domain.c >>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) >>> >>> void vcpu_destroy(struct vcpu *v) >>> { >>> + /* >>> + * Flush all state for this vCPU before fully tearing it down. This is >>> + * particularly important for HVM ones on VMX, so that this flushing of >>> + * state won't happen from the TLB flush IPI handler behind the back of >>> + * a vmx_vmcs_enter() / vmx_vmcs_exit() section. >>> + */ >>> + sync_vcpu_execstate(v); >>> + >>> xfree(v->arch.vm_event); >>> v->arch.vm_event = NULL; >> >> I don't think this is going to fix the problem since vCPU we are >> currently destroying has nothing to do with the vCPUx that actually >> caused the problem by its migration. We still are going to call >> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU. > > Oh, right, wrong vCPU. This should be better: > > --- unstable.orig/xen/arch/x86/domain.c > +++ unstable/xen/arch/x86/domain.c > @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) > > void vcpu_destroy(struct vcpu *v) > { > + /* > + * Flush all state for the vCPU previously having run on the current CPU. > + * This is in particular relevant for HVM ones on VMX, so that this > + * flushing of state won't happen from the TLB flush IPI handler behind > + * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section. > + */ > + sync_local_execstate(); > + > xfree(v->arch.vm_event); > v->arch.vm_event = NULL; > > In that case the question then is whether (rather than generalizing > is, as mentioned for the earlier version) this wouldn't better go into > vmx_vcpu_destroy(), assuming anything called earlier from > hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't > play with VMCSes). Ah, ok. Does this also apply to the previous issue? May I revert that change to test it? There is one things that I'm worrying about with this approach: At this place we just sync the idle context because we know that we are going to deal with VMCS later. But what about other potential cases (perhaps some softirqs) in which we are accessing a vCPU data structure that is currently shared between different pCPUs. Maybe we'd better sync the context as soon as possible after we switched to idle from a migrated vCPU. Igor > > Jan >
>>> On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote: > On 07/11/17 14:55, Jan Beulich wrote: >>>>> On 07.11.17 at 15:24, <igor.druzhinin@citrix.com> wrote: >>> On 07/11/17 08:07, Jan Beulich wrote: >>>> --- unstable.orig/xen/arch/x86/domain.c >>>> +++ unstable/xen/arch/x86/domain.c >>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) >>>> >>>> void vcpu_destroy(struct vcpu *v) >>>> { >>>> + /* >>>> + * Flush all state for this vCPU before fully tearing it down. This is >>>> + * particularly important for HVM ones on VMX, so that this flushing of >>>> + * state won't happen from the TLB flush IPI handler behind the back of >>>> + * a vmx_vmcs_enter() / vmx_vmcs_exit() section. >>>> + */ >>>> + sync_vcpu_execstate(v); >>>> + >>>> xfree(v->arch.vm_event); >>>> v->arch.vm_event = NULL; >>> >>> I don't think this is going to fix the problem since vCPU we are >>> currently destroying has nothing to do with the vCPUx that actually >>> caused the problem by its migration. We still are going to call >>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU. >> >> Oh, right, wrong vCPU. This should be better: >> >> --- unstable.orig/xen/arch/x86/domain.c >> +++ unstable/xen/arch/x86/domain.c >> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) >> >> void vcpu_destroy(struct vcpu *v) >> { >> + /* >> + * Flush all state for the vCPU previously having run on the current CPU. >> + * This is in particular relevant for HVM ones on VMX, so that this >> + * flushing of state won't happen from the TLB flush IPI handler behind >> + * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section. >> + */ >> + sync_local_execstate(); >> + >> xfree(v->arch.vm_event); >> v->arch.vm_event = NULL; >> >> In that case the question then is whether (rather than generalizing >> is, as mentioned for the earlier version) this wouldn't better go into >> vmx_vcpu_destroy(), assuming anything called earlier from >> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't >> play with VMCSes). > > Ah, ok. Does this also apply to the previous issue? May I revert that > change to test it? Feel free to try it, but I had checked that previous patch earlier today, and right now I don't think the two issues are related. > There is one things that I'm worrying about with this approach: > > At this place we just sync the idle context because we know that we are > going to deal with VMCS later. But what about other potential cases > (perhaps some softirqs) in which we are accessing a vCPU data structure > that is currently shared between different pCPUs. Maybe we'd better sync > the context as soon as possible after we switched to idle from a > migrated vCPU. Well, yes, I had pointed out in the earlier reply that this is just to deal with the specific case here. Whether to sync earlier after a migration I'm not really sure about - the way it's written right now is meant to deal with migration across CPUs. If so, this would perhaps belong into scheduler code (and hence cover ARM as well), and till now I wasn't able to figure a good place where to put this. George, Dario, do you have any thoughts both on the general idea as well as where to put the necessary code? Jan
>>> On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote: > There is one things that I'm worrying about with this approach: > > At this place we just sync the idle context because we know that we are > going to deal with VMCS later. But what about other potential cases > (perhaps some softirqs) in which we are accessing a vCPU data structure > that is currently shared between different pCPUs. Maybe we'd better sync > the context as soon as possible after we switched to idle from a > migrated vCPU. Short of feedback from the scheduler maintainers I've looked into this some more. Calling sync_vcpu_execstate() out of vcpu_move_locked() would seem feasible, but there are a number of other places where ->processor of a vCPU is being updated, and where the vCPU was not (obviously) put to sleep already: - credit1's csched_runq_steal() - credit2's migrate() - csched2_schedule() - null's vcpu_assign() when called out of null_schedule() - rt_schedule() I don't think it is reasonable to call sync_vcpu_execstate() in all of those places, and hence VMX'es special VMCS management may indeed better be taken care of by VMX-specific code (i.e. as previously indicated the sync_local_execstate() invocation moved from vcpu_destroy() - where my most recent patch draft had put it - to vmx_vcpu_destroy()). And we'd have to live with other VMX code paths having similar asynchronous behavior needing to similarly take care of the requirement. Jan
On Thu, 2017-11-09 at 03:05 -0700, Jan Beulich wrote: > > > > On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote: > > > > There is one things that I'm worrying about with this approach: > > > > At this place we just sync the idle context because we know that we > > are > > going to deal with VMCS later. But what about other potential cases > > (perhaps some softirqs) in which we are accessing a vCPU data > > structure > > that is currently shared between different pCPUs. Maybe we'd better > > sync > > the context as soon as possible after we switched to idle from a > > migrated vCPU. > > Short of feedback from the scheduler maintainers I've looked > into this some more. > Sorry, as you may have seen by the other email, I was looking into this today. > Calling sync_vcpu_execstate() out of > vcpu_move_locked() would seem feasible, but there are a number > of other places where ->processor of a vCPU is being updated, > and where the vCPU was not (obviously) put to sleep already: > > - credit1's csched_runq_steal() > - credit2's migrate() > - csched2_schedule() > - null's vcpu_assign() when called out of null_schedule() > - rt_schedule() > > I don't think it is reasonable to call sync_vcpu_execstate() in all > of > those places, > Yes, I agree. > and hence VMX'es special VMCS management may > indeed better be taken care of by VMX-specific code (i.e. as > previously indicated the sync_local_execstate() invocation moved > from vcpu_destroy() - where my most recent patch draft had put > it - to vmx_vcpu_destroy()). > I was still trying to form an opinion about the issue, and was leaning toward suggesting exactly the same. In fact, the point of lazy context switch is exactly that: trying to save syncing the state. Of course, that requires that we identify all the places and occasions where having the state out of sync may be a problem, and sync it!. What seems to me to be happening here is as follows: a. a pCPU becomes idle b. we do the lazy switch, i.e., the context of the previously running vCPU stays on the pCPU c. *something* happens which wants to either play with or alter the context currently loaded on the pCPU (in this case it's VMX bits of the context, but it could be other parts of it too) that is loaded on the pCPU Well, I'm afraid I only see two solutions: 1) we get rid of lazy context switch; 2) whatever it is that is happening at point c above, it needs to be aware that we use lazy context switch, and make sure to sync the context before playing with or altering it; > And we'd have to live with other > VMX code paths having similar asynchronous behavior needing to > similarly take care of the requirement. > Exactly. And in fact, in this thread, migration of vCPUs between pCPUs was mentioned, and it was being considered to treat that in a special way. But it looks to me that something very similar may, at least in theory, happen any time a lazy context switch occurs, no matter whether the pCPU has become idle because the previously running vCPU wants to move, or because it blocked for whatever other reason. Regards, Dario
>>> On 09.11.17 at 11:36, <raistlin@linux.it> wrote: > Well, I'm afraid I only see two solutions: > 1) we get rid of lazy context switch; > 2) whatever it is that is happening at point c above, it needs to be > aware that we use lazy context switch, and make sure to sync the > context before playing with or altering it; 3) Better centralize the updating of v->processor, so that it becomes reasonable to sync state there. Igor's idea of flushing state once it is known (or at least pretty certain) that the vCPU won't run on the prior pCPU next time it gets scheduled is certainly a reasonable one. It just doesn't fit well with how the individual schedulers currently behave. Jan
--- unstable.orig/xen/arch/x86/domain.c +++ unstable/xen/arch/x86/domain.c @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) void vcpu_destroy(struct vcpu *v) { + /* + * Flush all state for the vCPU previously having run on the current CPU. + * This is in particular relevant for HVM ones on VMX, so that this + * flushing of state won't happen from the TLB flush IPI handler behind + * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section. + */ + sync_local_execstate(); + xfree(v->arch.vm_event); v->arch.vm_event = NULL;