[PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

Message ID	1407249854-2953-1-git-send-email-borntraeger@de.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <borntraeger@de.ibm.com>; Tue, 5 Aug 2014 15:43:58 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 5 Aug 2014 15:43:56 +0100 From: Christian Borntraeger <borntraeger@de.ibm.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: KVM <kvm@vger.kernel.org>, Gleb Natapov <gleb@kernel.org>, Christian Borntraeger <borntraeger@de.ibm.com>, Rik van Riel <riel@redhat.com>, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>, Michael Mueller <mimu@linux.vnet.ibm.com> Subject: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl Date: Tue, 5 Aug 2014 16:44:14 +0200 Message-Id: <1407249854-2953-1-git-send-email-borntraeger@de.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Christian Borntraeger Aug. 5, 2014, 2:44 p.m. UTC

We currently track the pid of the task that runs the VCPU in
vcpu_load. Since we call vcpu_load for all kind of ioctls on a
CPU, this causes hickups due to synchronize_rcu if one CPU is
modified by another CPU or the main thread (e.g. initialization,
reset). We track the pid only for the purpose of yielding, so
let's update the pid only in the KVM_RUN ioctl.

In addition, don't do a synchronize_rcu on startup (pid == 0).

This speeds up guest boot time on s390 noticably for some configs, e.g.
HZ=100, no full state tracking, 64 guest cpus 32 host cpus.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
CC: Michael Mueller <mimu@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Raghavendra K T Aug. 7, 2014, 8:21 a.m. UTC | #1

On 08/05/2014 08:14 PM, Christian Borntraeger wrote:
> We currently track the pid of the task that runs the VCPU in
> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
> CPU, this causes hickups due to synchronize_rcu if one CPU is
> modified by another CPU or the main thread (e.g. initialization,
> reset). We track the pid only for the purpose of yielding, so
> let's update the pid only in the KVM_RUN ioctl.
>
> In addition, don't do a synchronize_rcu on startup (pid == 0).
>
> This speeds up guest boot time on s390 noticably for some configs, e.g.
> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
> ---

Please feel free to add
Reviewed-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

I could see very small improvement while testing 32 vcpu guest booting
on x86 (16 pcpu host +ht).

I was just wondering whether somebody implementing vcpu hot plug would
have to bother about this change, but could not see any. What do you
think?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 7, 2014, 9:59 a.m. UTC | #2

On 07/08/14 10:21, Raghavendra K T wrote:
> On 08/05/2014 08:14 PM, Christian Borntraeger wrote:
>> We currently track the pid of the task that runs the VCPU in
>> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>> CPU, this causes hickups due to synchronize_rcu if one CPU is
>> modified by another CPU or the main thread (e.g. initialization,
>> reset). We track the pid only for the purpose of yielding, so
>> let's update the pid only in the KVM_RUN ioctl.
>>
>> In addition, don't do a synchronize_rcu on startup (pid == 0).
>>
>> This speeds up guest boot time on s390 noticably for some configs, e.g.
>> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> CC: Rik van Riel <riel@redhat.com>
>> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
>> ---
> 
> Please feel free to add
> Reviewed-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> 
> I could see very small improvement while testing 32 vcpu guest booting
> on x86 (16 pcpu host +ht).
> 
> I was just wondering whether somebody implementing vcpu hot plug would
> have to bother about this change, but could not see any. What do you
> think?

The yield code can handle pid == 0, so the new CPU wont be a yield candidate until run for the first time. So I guess this is ok.

Paolo,

are you willing to apply to kvm/queue?


Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 7, 2014, 1:39 p.m. UTC | #3

Il 05/08/2014 16:44, Christian Borntraeger ha scritto:
> We currently track the pid of the task that runs the VCPU in
> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
> CPU, this causes hickups due to synchronize_rcu if one CPU is
> modified by another CPU or the main thread (e.g. initialization,
> reset). We track the pid only for the purpose of yielding, so
> let's update the pid only in the KVM_RUN ioctl.
> 
> In addition, don't do a synchronize_rcu on startup (pid == 0).

Speaking of QEMU, most ioctls should run from the VCPU anyway.  Which
ioctls do you see called from elsewhere?  What speedup can you see if
you just do the "no synchronize_rcu on pid == 0" part?

The patch may be okay, but I'm worried that it might be hiding a bug in
QEMU.

Paolo

> This speeds up guest boot time on s390 noticably for some configs, e.g.
> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 7, 2014, 1:40 p.m. UTC | #4

Il 07/08/2014 11:59, Christian Borntraeger ha scritto:
> Paolo,
> 
> are you willing to apply to kvm/queue?

I asked a question, but anyway... not until the end of the merge window
and my small vacation. :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Aug. 18, 2014, 5:02 a.m. UTC | #5

Hi Christian,
On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote:
>We currently track the pid of the task that runs the VCPU in
>vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>CPU, this causes hickups due to synchronize_rcu if one CPU is
>modified by another CPU or the main thread (e.g. initialization,
>reset). We track the pid only for the purpose of yielding, so
>let's update the pid only in the KVM_RUN ioctl.
>
>In addition, don't do a synchronize_rcu on startup (pid == 0).
>
>This speeds up guest boot time on s390 noticably for some configs, e.g.
>HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>
>Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>CC: Rik van Riel <riel@redhat.com>
>CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>CC: Michael Mueller <mimu@linux.vnet.ibm.com>
>---
> virt/kvm/kvm_main.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 9ae9135..ebc8f54 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
> 
> 	if (mutex_lock_killable(&vcpu->mutex))
> 		return -EINTR;

One question: 

>-	if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {

When vcpu->pid and current->pids[PIDTYPE_PID].pid will be different?

Regards,
Wanpeng Li 

>-		/* The thread running this VCPU changed. */
>-		struct pid *oldpid = vcpu->pid;
>-		struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
>-		rcu_assign_pointer(vcpu->pid, newpid);
>-		synchronize_rcu();
>-		put_pid(oldpid);
>-	}
> 	cpu = get_cpu();
> 	preempt_notifier_register(&vcpu->preempt_notifier);
> 	kvm_arch_vcpu_load(vcpu, cpu);
>@@ -1991,6 +1983,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
> 		r = -EINVAL;
> 		if (arg)
> 			goto out;
>+		if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
>+			/* The thread running this VCPU changed. */
>+			struct pid *oldpid = vcpu->pid;
>+			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
>+			rcu_assign_pointer(vcpu->pid, newpid);
>+			if (oldpid)
>+				synchronize_rcu();
>+			put_pid(oldpid);
>+		}
> 		r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
> 		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> 		break;
>-- 
>1.8.4.2
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 8:38 a.m. UTC | #6

On 07/08/14 15:39, Paolo Bonzini wrote:
> Il 05/08/2014 16:44, Christian Borntraeger ha scritto:
>> We currently track the pid of the task that runs the VCPU in
>> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>> CPU, this causes hickups due to synchronize_rcu if one CPU is
>> modified by another CPU or the main thread (e.g. initialization,
>> reset). We track the pid only for the purpose of yielding, so
>> let's update the pid only in the KVM_RUN ioctl.
>>
>> In addition, don't do a synchronize_rcu on startup (pid == 0).
> 
> Speaking of QEMU, most ioctls should run from the VCPU anyway.  Which
> ioctls do you see called from elsewhere?  What speedup can you see if
> you just do the "no synchronize_rcu on pid == 0" part?

I think on x86 "no synchronize_rcu on pid == 0" is the only thing that is necessary.

> 
> The patch may be okay, but I'm worried that it might be hiding a bug in
> QEMU.

On s390 we call "KVM_S390_INITIAL_RESET" from several reset functions, e.g. during 
CPU creation. This is the first hickup and the pid now points to the main thread.

The 2nd hickup comes when the guest activates additional CPUs via SIGP (ipi). Here
the first ioctl in the vpcu thread will get the pid back to the vcpu thread.



> 
> Paolo
> 
>> This speeds up guest boot time on s390 noticably for some configs, e.g.
>> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 8:38 a.m. UTC | #7

On 07/08/14 15:40, Paolo Bonzini wrote:
> Il 07/08/2014 11:59, Christian Borntraeger ha scritto:
>> Paolo,
>>
>> are you willing to apply to kvm/queue?
> 
> I asked a question, but anyway... not until the end of the merge window
> and my small vacation. :)
> 
> Paolo
> 
Absolutely, was on vacation myself :-) See my answers to the other mail.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 9:27 a.m. UTC | #8

Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
>> > The patch may be okay, but I'm worried that it might be hiding a bug in
>> > QEMU.
> On s390 we call "KVM_S390_INITIAL_RESET" from several reset functions, e.g. during 
> CPU creation. This is the first hickup and the pid now points to the main thread.

Any reason to have a special ioctl instead of SET_REGS/SET_ONE_REG/...
(via kvm_cpu_synchronize_state, which does the ioctls in the VCPU thread)?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 9:47 a.m. UTC | #9

On 19/08/14 11:27, Paolo Bonzini wrote:
> Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
>>>> The patch may be okay, but I'm worried that it might be hiding a bug in
>>>> QEMU.
>> On s390 we call "KVM_S390_INITIAL_RESET" from several reset functions, e.g. during 
>> CPU creation. This is the first hickup and the pid now points to the main thread.
> 
> Any reason to have a special ioctl instead of SET_REGS/SET_ONE_REG/...
> (via kvm_cpu_synchronize_state, which does the ioctls in the VCPU thread)?

Historical reasons mostly. Older kernel miss several interfaces to bring the CPU in a defined state (pending interrupts, cpu state, some registers...)

Good news is that we are working on getting rid of it: cpu states are now available as far as I can see, only local interrupt flushing is missing.This needs some more work on our side.  So in some month we probably will have a QEMU version that does not need to call this any more. For todays QEMU this patch help though.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 9:53 a.m. UTC | #10

Il 19/08/2014 11:47, Christian Borntraeger ha scritto:
> On 19/08/14 11:27, Paolo Bonzini wrote:
>> Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
>>>>> The patch may be okay, but I'm worried that it might be
>>>>> hiding a bug in QEMU.
>>> On s390 we call "KVM_S390_INITIAL_RESET" from several reset
>>> functions, e.g. during CPU creation. This is the first hickup and
>>> the pid now points to the main thread.
>> 
>> Any reason to have a special ioctl instead of
>> SET_REGS/SET_ONE_REG/... (via kvm_cpu_synchronize_state, which does
>> the ioctls in the VCPU thread)?
> 
> Historical reasons mostly. Older kernel miss several interfaces to
> bring the CPU in a defined state (pending interrupts, cpu state, some
> registers...)
> 
> Good news is that we are working on getting rid of it: cpu states are
> now available as far as I can see, only local interrupt flushing is
> missing.This needs some more work on our side.  So in some month we
> probably will have a QEMU version that does not need to call this any
> more. For todays QEMU this patch help though.

Just by the sound of it, interrupt flushing seems dangerous to do in a
way that could be concurrent with KVM_RUN...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 9:59 a.m. UTC | #11

On 19/08/14 11:53, Paolo Bonzini wrote:
> Il 19/08/2014 11:47, Christian Borntraeger ha scritto:
>> On 19/08/14 11:27, Paolo Bonzini wrote:
>>> Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
>>>>>> The patch may be okay, but I'm worried that it might be
>>>>>> hiding a bug in QEMU.
>>>> On s390 we call "KVM_S390_INITIAL_RESET" from several reset
>>>> functions, e.g. during CPU creation. This is the first hickup and
>>>> the pid now points to the main thread.
>>>
>>> Any reason to have a special ioctl instead of
>>> SET_REGS/SET_ONE_REG/... (via kvm_cpu_synchronize_state, which does
>>> the ioctls in the VCPU thread)?
>>
>> Historical reasons mostly. Older kernel miss several interfaces to
>> bring the CPU in a defined state (pending interrupts, cpu state, some
>> registers...)
>>
>> Good news is that we are working on getting rid of it: cpu states are
>> now available as far as I can see, only local interrupt flushing is
>> missing.This needs some more work on our side.  So in some month we
>> probably will have a QEMU version that does not need to call this any
>> more. For todays QEMU this patch help though.
> 
> Just by the sound of it, interrupt flushing seems dangerous to do in a
> way that could be concurrent with KVM_RUN...

Its only for the interrupts that are cpu local (like pending IPIs). In addition, we would do that only for the reset case (with an interface that can be used for migration).
Right now KVM_S390_INITIAL_RESET takes the vcpu_mutex, so this protects against KVM_RUN.

Christian


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 10:03 a.m. UTC | #12

Il 19/08/2014 11:59, Christian Borntraeger ha scritto:
> Its only for the interrupts that are cpu local (like pending IPIs).
> In addition, we would do that only for the reset case (with an
> interface that can be used for migration). Right now
> KVM_S390_INITIAL_RESET takes the vcpu_mutex, so this protects against
> KVM_RUN.

I'm not sure, this does seem like a workaround for another limitation
after all...  Gleb?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 10:09 a.m. UTC | #13

On 19/08/14 12:03, Paolo Bonzini wrote:
> Il 19/08/2014 11:59, Christian Borntraeger ha scritto:
>> Its only for the interrupts that are cpu local (like pending IPIs).
>> In addition, we would do that only for the reset case (with an
>> interface that can be used for migration). Right now
>> KVM_S390_INITIAL_RESET takes the vcpu_mutex, so this protects against
>> KVM_RUN.
> 
> I'm not sure, this does seem like a workaround for another limitation
> after all...  Gleb?

Yes. We want to get rid of KVM_S390_INITIAL_RESET in QEMU. This comes from a time, when we had another userspace prototype for KVM on s390 (kuli). Its really a wart that has to go.
Its just that we are not there yet to remove the call to KVM_S390_INITIAL_RESET. Doing so can result in hard to debug errors after reboot, if an interrupt was made pending just before reboot that gets delivered in the new instance.

The new way for local interrupt read/write will probably be some onereg or syncreg interface with a bitmask register and payload registers. We have to solve some concurrency and implemenation issues here.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 10:31 a.m. UTC | #14

Il 19/08/2014 12:09, Christian Borntraeger ha scritto:
>> I'm not sure, this does seem like a workaround for another
>> limitation after all...  Gleb?
>
> Yes. We want to get rid of KVM_S390_INITIAL_RESET in QEMU. This comes
> from a time, when we had another userspace prototype for KVM on s390
> (kuli). Its really a wart that has to go. Its just that we are not
> there yet to remove the call to KVM_S390_INITIAL_RESET. Doing so can
> result in hard to debug errors after reboot, if an interrupt was made
> pending just before reboot that gets delivered in the new instance.
> 
> The new way for local interrupt read/write will probably be some
> onereg or syncreg interface with a bitmask register and payload
> registers. We have to solve some concurrency and implemenation issues
> here.

Yes, I understand; the plan is fine and it's good that it was already on
your todo list.

But since you acknowledge that KVM_S390_INITIAL_RESET will go, I'm not
sure we want to apply this patch (except for the pid == 0 part, of
course---that one is good).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 10:48 a.m. UTC | #15

On 19/08/14 12:31, Paolo Bonzini wrote:
> Il 19/08/2014 12:09, Christian Borntraeger ha scritto:
>>> I'm not sure, this does seem like a workaround for another
>>> limitation after all...  Gleb?
>>
>> Yes. We want to get rid of KVM_S390_INITIAL_RESET in QEMU. This comes
>> from a time, when we had another userspace prototype for KVM on s390
>> (kuli). Its really a wart that has to go. Its just that we are not
>> there yet to remove the call to KVM_S390_INITIAL_RESET. Doing so can
>> result in hard to debug errors after reboot, if an interrupt was made
>> pending just before reboot that gets delivered in the new instance.
>>
>> The new way for local interrupt read/write will probably be some
>> onereg or syncreg interface with a bitmask register and payload
>> registers. We have to solve some concurrency and implemenation issues
>> here.
> 
> Yes, I understand; the plan is fine and it's good that it was already on
> your todo list.
> 
> But since you acknowledge that KVM_S390_INITIAL_RESET will go, I'm not
> sure we want to apply this patch (except for the pid == 0 part, of
> course---that one is good).

Well, it makes todays QEMU (a lot) faster on s390 bootup with many CPUs. (According to strace on my system the first GET_FPU ioctl takes up to 0.079 sec. With 64 CPUs this sums up to several seconds.
But I understand your concern of touching generic KVM code only if really necessary. Let me know if I should send a minimal pid==0 version. (I would prefer the full version, of course).

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 10:50 a.m. UTC | #16

Il 19/08/2014 12:48, Christian Borntraeger ha scritto:
> But I understand your concern of touching generic KVM code only if
> really necessary. Let me know if I should send a minimal pid==0
> version. (I would prefer the full version, of course).

Yes, please do.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand Aug. 19, 2014, 11:28 a.m. UTC | #17

> Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
> >> > The patch may be okay, but I'm worried that it might be hiding a bug in
> >> > QEMU.
> > On s390 we call "KVM_S390_INITIAL_RESET" from several reset functions, e.g. during 
> > CPU creation. This is the first hickup and the pid now points to the main thread.
> 
> Any reason to have a special ioctl instead of SET_REGS/SET_ONE_REG/...
> (via kvm_cpu_synchronize_state, which does the ioctls in the VCPU thread)?
> 
> Paolo

Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
cpu_synchronize_all_post_reset())?

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Aug. 19, 2014, 12:06 p.m. UTC | #18

Il 19/08/2014 13:28, David Hildenbrand ha scritto:
> Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
> the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
> why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
> cpu_synchronize_all_post_reset())?

No reason, feel free to post a patch for QEMU kvm-all.c.
Documentation/virtual/kvm/api.txt clearly says:

   Only run vcpu ioctls from the same thread that was used to create the
   vcpu.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand Aug. 19, 2014, 12:14 p.m. UTC | #19

> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
> > Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
> > the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
> > why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
> > cpu_synchronize_all_post_reset())?
> 
> No reason, feel free to post a patch for QEMU kvm-all.c.
> Documentation/virtual/kvm/api.txt clearly says:
> 
>    Only run vcpu ioctls from the same thread that was used to create the
>    vcpu.
> 
> Paolo
> 

Thanks! A little more tweaking in the other parts of s390x resets
and we should be able to reduce the number of "wrong" ioctls (I think I found
most cases that are responsible for the performance degradation).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 2:04 p.m. UTC | #20

On 18/08/14 07:02, Wanpeng Li wrote:
> Hi Christian,
> On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote:
>> We currently track the pid of the task that runs the VCPU in
>> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>> CPU, this causes hickups due to synchronize_rcu if one CPU is
>> modified by another CPU or the main thread (e.g. initialization,
>> reset). We track the pid only for the purpose of yielding, so
>> let's update the pid only in the KVM_RUN ioctl.
>>
>> In addition, don't do a synchronize_rcu on startup (pid == 0).
>>
>> This speeds up guest boot time on s390 noticably for some configs, e.g.
>> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> CC: Rik van Riel <riel@redhat.com>
>> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
>> ---
>> virt/kvm/kvm_main.c | 17 +++++++++--------
>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 9ae9135..ebc8f54 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
>>
>> 	if (mutex_lock_killable(&vcpu->mutex))
>> 		return -EINTR;
> 
> One question: 
> 
>> -	if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
> 
> When vcpu->pid and current->pids[PIDTYPE_PID].pid will be different?

If two different thread call an ioctl on a vcpu fd. (It must be an ioctl that has done vcpu_load - almost all except for some interrupt injections)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 2:10 p.m. UTC | #21

On 19/08/14 14:14, David Hildenbrand wrote:
>> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
>>> Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
>>> the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
>>> why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
>>> cpu_synchronize_all_post_reset())?
>>
>> No reason, feel free to post a patch for QEMU kvm-all.c.
>> Documentation/virtual/kvm/api.txt clearly says:
>>
>>    Only run vcpu ioctls from the same thread that was used to create the
>>    vcpu.
>>
>> Paolo
>>
> 
> Thanks! A little more tweaking in the other parts of s390x resets
> and we should be able to reduce the number of "wrong" ioctls (I think I found
> most cases that are responsible for the performance degradation).

Hmm. We want to not only reduce, we want them be zero.
In addition to a reworked MP_STATE patch set, we might be able to change the code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on CPU creation, because we know that all kernel version will do an implicit cpu reset on cpu creation anyway. Can you have a try on this as well when reworking that code? We could then fix this rcu performance penalty independent from getting rid of that ioctl.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand Aug. 19, 2014, 2:23 p.m. UTC | #22

> On 19/08/14 14:14, David Hildenbrand wrote:
> >> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
> >>> Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
> >>> the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
> >>> why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
> >>> cpu_synchronize_all_post_reset())?
> >>
> >> No reason, feel free to post a patch for QEMU kvm-all.c.
> >> Documentation/virtual/kvm/api.txt clearly says:
> >>
> >>    Only run vcpu ioctls from the same thread that was used to create the
> >>    vcpu.
> >>
> >> Paolo
> >>
> > 
> > Thanks! A little more tweaking in the other parts of s390x resets
> > and we should be able to reduce the number of "wrong" ioctls (I think I found
> > most cases that are responsible for the performance degradation).
> 
> Hmm. We want to not only reduce, we want them be zero.
> In addition to a reworked MP_STATE patch set, we might be able to change the code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
> If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on CPU creation, because we know that all kernel version will do an implicit cpu reset on cpu creation anyway. Can you have a try on this as well when reworking that code? We could then fix this rcu performance penalty independent from getting rid of that ioctl.
> 
> Christian
> 

Already working on it, only one ioctl left on vcpu creation that is called
from wrong context, trying to hide from me. Restarts and resets are already
blasting fast.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 19, 2014, 2:46 p.m. UTC | #23

On 19/08/14 16:23, David Hildenbrand wrote:
>> On 19/08/14 14:14, David Hildenbrand wrote:
>>>> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
>>>>> Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
>>>>> the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
>>>>> why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
>>>>> cpu_synchronize_all_post_reset())?
>>>>
>>>> No reason, feel free to post a patch for QEMU kvm-all.c.
>>>> Documentation/virtual/kvm/api.txt clearly says:
>>>>
>>>>    Only run vcpu ioctls from the same thread that was used to create the
>>>>    vcpu.
>>>>
>>>> Paolo
>>>>
>>>
>>> Thanks! A little more tweaking in the other parts of s390x resets
>>> and we should be able to reduce the number of "wrong" ioctls (I think I found
>>> most cases that are responsible for the performance degradation).
>>
>> Hmm. We want to not only reduce, we want them be zero.
>> In addition to a reworked MP_STATE patch set, we might be able to change the code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
>> If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on CPU creation, because we know that all kernel version will do an implicit cpu reset on cpu creation anyway. Can you have a try on this as well when reworking that code? We could then fix this rcu performance penalty independent from getting rid of that ioctl.
>>
>> Christian
>>
> 
> Already working on it, only one ioctl left on vcpu creation that is called
> from wrong context, trying to hide from me. Restarts and resets are already

Maybe its the synchronize when the oldpid is 0? Can you check the patch that I just sent?

> blasting fast.
> 
> David
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Hildenbrand Aug. 19, 2014, 2:52 p.m. UTC | #24

> >> Hmm. We want to not only reduce, we want them be zero.
> >> In addition to a reworked MP_STATE patch set, we might be able to change the code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
> >> If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on CPU creation, because we know that all kernel version will do an implicit cpu reset on cpu creation anyway. Can you have a try on this as well when reworking that code? We could then fix this rcu performance penalty independent from getting rid of that ioctl.
> >>
> >> Christian
> >>
> > 
> > Already working on it, only one ioctl left on vcpu creation that is called
> > from wrong context, trying to hide from me. Restarts and resets are already
> 
> Maybe its the synchronize when the oldpid is 0? Can you check the patch that I just sent?

Already got that in my code. Seems to be an architecture specific one called
from wrong context. (actually it is the third one being called
after SET_MP_STATE and SET_SIGNAL_MASK).

A few more minutes and I should have it :)

David

> 
> > blasting fast.
> > 
> > David
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Aug. 19, 2014, 11:22 p.m. UTC | #25

On Tue, Aug 19, 2014 at 04:04:03PM +0200, Christian Borntraeger wrote:
>On 18/08/14 07:02, Wanpeng Li wrote:
>> Hi Christian,
>> On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote:
>>> We currently track the pid of the task that runs the VCPU in
>>> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>>> CPU, this causes hickups due to synchronize_rcu if one CPU is
>>> modified by another CPU or the main thread (e.g. initialization,
>>> reset). We track the pid only for the purpose of yielding, so
>>> let's update the pid only in the KVM_RUN ioctl.
>>>
>>> In addition, don't do a synchronize_rcu on startup (pid == 0).
>>>
>>> This speeds up guest boot time on s390 noticably for some configs, e.g.
>>> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>>>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> CC: Rik van Riel <riel@redhat.com>
>>> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>>> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
>>> ---
>>> virt/kvm/kvm_main.c | 17 +++++++++--------
>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 9ae9135..ebc8f54 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
>>>
>>> 	if (mutex_lock_killable(&vcpu->mutex))
>>> 		return -EINTR;
>> 
>> One question: 
>> 
>>> -	if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
>> 
>> When vcpu->pid and current->pids[PIDTYPE_PID].pid will be different?
>
>If two different thread call an ioctl on a vcpu fd. (It must be an ioctl that has done vcpu_load - almost all except for some interrupt injections)

Thanks for your explanation. When can this happen?

Regards,
Wanpeng Li 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian Borntraeger Aug. 20, 2014, 7:01 a.m. UTC | #26

On 20/08/14 01:22, Wanpeng Li wrote:
> On Tue, Aug 19, 2014 at 04:04:03PM +0200, Christian Borntraeger wrote:
>> On 18/08/14 07:02, Wanpeng Li wrote:
>>> Hi Christian,
>>> On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote:
>>>> We currently track the pid of the task that runs the VCPU in
>>>> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
>>>> CPU, this causes hickups due to synchronize_rcu if one CPU is
>>>> modified by another CPU or the main thread (e.g. initialization,
>>>> reset). We track the pid only for the purpose of yielding, so
>>>> let's update the pid only in the KVM_RUN ioctl.
>>>>
>>>> In addition, don't do a synchronize_rcu on startup (pid == 0).
>>>>
>>>> This speeds up guest boot time on s390 noticably for some configs, e.g.
>>>> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
>>>>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> CC: Rik van Riel <riel@redhat.com>
>>>> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>>>> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
>>>> ---
>>>> virt/kvm/kvm_main.c | 17 +++++++++--------
>>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 9ae9135..ebc8f54 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
>>>>
>>>> 	if (mutex_lock_killable(&vcpu->mutex))
>>>> 		return -EINTR;
>>>
>>> One question: 
>>>
>>>> -	if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
>>>
>>> When vcpu->pid and current->pids[PIDTYPE_PID].pid will be different?
>>
>> If two different thread call an ioctl on a vcpu fd. (It must be an ioctl that has done vcpu_load - almost all except for some interrupt injections)
> 
> Thanks for your explanation. When can this happen?

In general, by using clone and do an ioctl in the new thread on a pre-existing fd.
In qemu, e.g. by using an kvm_ioctl on a vcpu from main thread or another cpu.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Dec. 3, 2014, 1:20 p.m. UTC | #27

On 05/08/2014 16:44, Christian Borntraeger wrote:
> We currently track the pid of the task that runs the VCPU in
> vcpu_load. Since we call vcpu_load for all kind of ioctls on a
> CPU, this causes hickups due to synchronize_rcu if one CPU is
> modified by another CPU or the main thread (e.g. initialization,
> reset). We track the pid only for the purpose of yielding, so
> let's update the pid only in the KVM_RUN ioctl.
> 
> In addition, don't do a synchronize_rcu on startup (pid == 0).
> 
> This speeds up guest boot time on s390 noticably for some configs, e.g.
> HZ=100, no full state tracking, 64 guest cpus 32 host cpus.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> CC: Michael Mueller <mimu@linux.vnet.ibm.com>
> ---
>  virt/kvm/kvm_main.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9ae9135..ebc8f54 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
>  
>  	if (mutex_lock_killable(&vcpu->mutex))
>  		return -EINTR;
> -	if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
> -		/* The thread running this VCPU changed. */
> -		struct pid *oldpid = vcpu->pid;
> -		struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
> -		rcu_assign_pointer(vcpu->pid, newpid);
> -		synchronize_rcu();
> -		put_pid(oldpid);
> -	}
>  	cpu = get_cpu();
>  	preempt_notifier_register(&vcpu->preempt_notifier);
>  	kvm_arch_vcpu_load(vcpu, cpu);
> @@ -1991,6 +1983,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  		r = -EINVAL;
>  		if (arg)
>  			goto out;
> +		if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
> +			/* The thread running this VCPU changed. */
> +			struct pid *oldpid = vcpu->pid;
> +			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
> +			rcu_assign_pointer(vcpu->pid, newpid);
> +			if (oldpid)
> +				synchronize_rcu();
> +			put_pid(oldpid);
> +		}
>  		r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
>  		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
>  		break;
> 

Applied with rewritten commit message:

    KVM: track pid for VCPU only on KVM_RUN ioctl
    
    We currently track the pid of the task that runs the VCPU in vcpu_load.
    If a yield to that VCPU is triggered while the PID of the wrong thread
    is active, the wrong thread might receive a yield, but this will most
    likely not help the executing thread at all.  Instead, if we only track
    the pid on the KVM_RUN ioctl, there are two possibilities:
    
    1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
    the VCPU thread is waiting for.  In this case, the VCPU thread is not
    runnable, but we also do not do a wrong yield.
    
    2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
    something that does not block the VCPU thread.  In this case, the
    VCPU thread can receive the directed yield correctly.
    
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    CC: Rik van Riel <riel@redhat.com>
    CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
    CC: Michael Mueller <mimu@linux.vnet.ibm.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

Commit Message

Comments

Patch