diff mbox

[RFC,v3,1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

Message ID 1510567565-5118-2-git-send-email-quan.xu0@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Quan Xu Nov. 13, 2017, 10:06 a.m. UTC
From: Quan Xu <quan.xu0@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
---
 arch/x86/include/asm/paravirt.h       |    5 +++++
 arch/x86/include/asm/paravirt_types.h |    6 ++++++
 arch/x86/kernel/paravirt.c            |    6 ++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

Comments

Jürgen Groß Nov. 13, 2017, 10:53 a.m. UTC | #1
On 13/11/17 11:06, Quan Xu wrote:
> From: Quan Xu <quan.xu0@gmail.com>
> 
> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
> in idle path which will poll for a while before we enter the real idle
> state.
> 
> In virtualization, idle path includes several heavy operations
> includes timer access(LAPIC timer or TSC deadline timer) which will
> hurt performance especially for latency intensive workload like message
> passing task. The cost is mainly from the vmexit which is a hardware
> context switch between virtual machine and hypervisor. Our solution is
> to poll for a while and do not enter real idle path if we can get the
> schedule event during polling.
> 
> Poll may cause the CPU waste so we adopt a smart polling mechanism to
> reduce the useless poll.
> 
> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Alok Kataria <akataria@vmware.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary? Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


Juergen
Wanpeng Li Nov. 13, 2017, 11:09 a.m. UTC | #2
2017-11-13 18:53 GMT+08:00 Juergen Gross <jgross@suse.com>:
> On 13/11/17 11:06, Quan Xu wrote:
>> From: Quan Xu <quan.xu0@gmail.com>
>>
>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>> in idle path which will poll for a while before we enter the real idle
>> state.
>>
>> In virtualization, idle path includes several heavy operations
>> includes timer access(LAPIC timer or TSC deadline timer) which will
>> hurt performance especially for latency intensive workload like message
>> passing task. The cost is mainly from the vmexit which is a hardware
>> context switch between virtual machine and hypervisor. Our solution is
>> to poll for a while and do not enter real idle path if we can get the
>> schedule event during polling.
>>
>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>> reduce the useless poll.
>>
>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Alok Kataria <akataria@vmware.com>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: xen-devel@lists.xenproject.org
>
> Hmm, is the idle entry path really so critical to performance that a new
> pvops function is necessary? Wouldn't a function pointer, maybe guarded
> by a static key, be enough? A further advantage would be that this would
> work on other architectures, too.

There is a "Adaptive halt-polling" which are merged to upstream more
than two years ago avoids to thread the critical path and has already
been ported to other architectures. https://lkml.org/lkml/2015/9/3/615

Regards,
Wanpeng Li
Quan Xu Nov. 14, 2017, 7:02 a.m. UTC | #3
On 2017/11/13 18:53, Juergen Gross wrote:
> On 13/11/17 11:06, Quan Xu wrote:
>> From: Quan Xu <quan.xu0@gmail.com>
>>
>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>> in idle path which will poll for a while before we enter the real idle
>> state.
>>
>> In virtualization, idle path includes several heavy operations
>> includes timer access(LAPIC timer or TSC deadline timer) which will
>> hurt performance especially for latency intensive workload like message
>> passing task. The cost is mainly from the vmexit which is a hardware
>> context switch between virtual machine and hypervisor. Our solution is
>> to poll for a while and do not enter real idle path if we can get the
>> schedule event during polling.
>>
>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>> reduce the useless poll.
>>
>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Alok Kataria <akataria@vmware.com>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: xen-devel@lists.xenproject.org
> Hmm, is the idle entry path really so critical to performance that a new
> pvops function is necessary?
Juergen, Here is the data we get when running benchmark netperf:
  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
     29031.6 bit/s -- 76.1 %CPU

  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
     35787.7 bit/s -- 129.4 %CPU

  3. w/ kvm dynamic poll:
     35735.6 bit/s -- 200.0 %CPU

  4. w/patch and w/ kvm dynamic poll:
     42225.3 bit/s -- 198.7 %CPU

  5. idle=poll
     37081.7 bit/s -- 998.1 %CPU



  w/ this patch, we will improve performance by 23%.. even we could improve
  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
  cost of CPU is much lower than 'idle=poll' case..

> Wouldn't a function pointer, maybe guarded
> by a static key, be enough? A further advantage would be that this would
> work on other architectures, too.

I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, 
but it
doesn't match.



Quan
Alibaba Cloud
>
> Juergen
>
Wanpeng Li Nov. 14, 2017, 7:12 a.m. UTC | #4
2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>
>
> On 2017/11/13 18:53, Juergen Gross wrote:
>>
>> On 13/11/17 11:06, Quan Xu wrote:
>>>
>>> From: Quan Xu <quan.xu0@gmail.com>
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>> Cc: Juergen Gross <jgross@suse.com>
>>> Cc: Alok Kataria <akataria@vmware.com>
>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Cc: virtualization@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-devel@lists.xenproject.org
>>
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
>
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>     29031.6 bit/s -- 76.1 %CPU
>
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>     35787.7 bit/s -- 129.4 %CPU
>
>  3. w/ kvm dynamic poll:
>     35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

Regards,
Wanpeng Li

>
>  4. w/patch and w/ kvm dynamic poll:
>     42225.3 bit/s -- 198.7 %CPU
>
>  5. idle=poll
>     37081.7 bit/s -- 998.1 %CPU
>
>
>
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..
>
>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
>
>
> I assume this feature will be ported to other archs.. a new pvops makes code
> clean and easy to maintain. also I tried to add it into existed pvops, but
> it
> doesn't match.
>
>
>
> Quan
> Alibaba Cloud
>>
>>
>> Juergen
>>
>
Jürgen Groß Nov. 14, 2017, 7:30 a.m. UTC | #5
On 14/11/17 08:02, Quan Xu wrote:
> 
> 
> On 2017/11/13 18:53, Juergen Gross wrote:
>> On 13/11/17 11:06, Quan Xu wrote:
>>> From: Quan Xu <quan.xu0@gmail.com>
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>> Cc: Juergen Gross <jgross@suse.com>
>>> Cc: Alok Kataria <akataria@vmware.com>
>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Cc: virtualization@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-devel@lists.xenproject.org
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>     29031.6 bit/s -- 76.1 %CPU
> 
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>     35787.7 bit/s -- 129.4 %CPU
> 
>  3. w/ kvm dynamic poll:
>     35735.6 bit/s -- 200.0 %CPU
> 
>  4. w/patch and w/ kvm dynamic poll:
>     42225.3 bit/s -- 198.7 %CPU
> 
>  5. idle=poll
>     37081.7 bit/s -- 998.1 %CPU
> 
> 
> 
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.

>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
> 
> I assume this feature will be ported to other archs.. a new pvops makes
> code
> clean and easy to maintain. also I tried to add it into existed pvops,
> but it
> doesn't match.

You are aware that pvops is x86 only?

I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
	if (static_key_false(&guest_idle_poll_key))
		guest_idle_poll_func();
}

And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.


Juergen
Quan Xu Nov. 14, 2017, 8:15 a.m. UTC | #6
On 2017/11/14 15:12, Wanpeng Li wrote:
> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>>
>> On 2017/11/13 18:53, Juergen Gross wrote:
>>> On 13/11/17 11:06, Quan Xu wrote:
>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>
>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>> in idle path which will poll for a while before we enter the real idle
>>>> state.
>>>>
>>>> In virtualization, idle path includes several heavy operations
>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>> hurt performance especially for latency intensive workload like message
>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>> context switch between virtual machine and hypervisor. Our solution is
>>>> to poll for a while and do not enter real idle path if we can get the
>>>> schedule event during polling.
>>>>
>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>> reduce the useless poll.
>>>>
>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>> Cc: Juergen Gross <jgross@suse.com>
>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>> Cc: x86@kernel.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Cc: xen-devel@lists.xenproject.org
>>> Hmm, is the idle entry path really so critical to performance that a new
>>> pvops function is necessary?
>> Juergen, Here is the data we get when running benchmark netperf:
>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      29031.6 bit/s -- 76.1 %CPU
>>
>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      35787.7 bit/s -- 129.4 %CPU
>>
>>   3. w/ kvm dynamic poll:
>>      35735.6 bit/s -- 200.0 %CPU
> Actually we can reduce the CPU utilization by sleeping a period of
> time as what has already been done in the poll logic of IO subsystem,
> then we can improve the algorithm in kvm instead of introduing another
> duplicate one in the kvm guest.
We really appreciate upstream's kvm dynamic poll mechanism, which is
really helpful for a lot of scenario..

However, as description said, in virtualization, idle path includes
several heavy operations includes timer access (LAPIC timer or TSC
deadline timer) which will hurt performance especially for latency
intensive workload like message passing task. The cost is mainly from
the vmexit which is a hardware context switch between virtual machine
and hypervisor.

for upstream's kvm dynamic poll mechanism, even you could provide a
better algorism, how could you bypass timer access (LAPIC timer or TSC
deadline timer), or a hardware context switch between virtual machine
and hypervisor. I know these is a tradeoff.

Furthermore, here is the data we get when running benchmark contextswitch
to measure the latency(lower is better):

1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
   3402.9 ns/ctxsw -- 199.8 %CPU

2. w/ patch and disable kvm dynamic poll:
   1163.5 ns/ctxsw -- 205.5 %CPU

3. w/ kvm dynamic poll:
   2280.6 ns/ctxsw -- 199.5 %CPU

so, these tow solution are quite similar, but not duplicate..

that's also why to add a generic idle poll before enter real idle path.
When a reschedule event is pending, we can bypass the real idle path.


Quan
Alibaba Cloud




> Regards,
> Wanpeng Li
>
>>   4. w/patch and w/ kvm dynamic poll:
>>      42225.3 bit/s -- 198.7 %CPU
>>
>>   5. idle=poll
>>      37081.7 bit/s -- 998.1 %CPU
>>
>>
>>
>>   w/ this patch, we will improve performance by 23%.. even we could improve
>>   performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>>   cost of CPU is much lower than 'idle=poll' case..
>>
>>> Wouldn't a function pointer, maybe guarded
>>> by a static key, be enough? A further advantage would be that this would
>>> work on other architectures, too.
>>
>> I assume this feature will be ported to other archs.. a new pvops makes code
>> clean and easy to maintain. also I tried to add it into existed pvops, but
>> it
>> doesn't match.
>>
>>
>>
>> Quan
>> Alibaba Cloud
>>>
>>> Juergen
>>>
Wanpeng Li Nov. 14, 2017, 8:22 a.m. UTC | #7
2017-11-14 16:15 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>
>
> On 2017/11/14 15:12, Wanpeng Li wrote:
>>
>> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>>>
>>>
>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>
>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>
>>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>>
>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>>> in idle path which will poll for a while before we enter the real idle
>>>>> state.
>>>>>
>>>>> In virtualization, idle path includes several heavy operations
>>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>>> hurt performance especially for latency intensive workload like message
>>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>>> context switch between virtual machine and hypervisor. Our solution is
>>>>> to poll for a while and do not enter real idle path if we can get the
>>>>> schedule event during polling.
>>>>>
>>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>>> reduce the useless poll.
>>>>>
>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: x86@kernel.org
>>>>> Cc: virtualization@lists.linux-foundation.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Cc: xen-devel@lists.xenproject.org
>>>>
>>>> Hmm, is the idle entry path really so critical to performance that a new
>>>> pvops function is necessary?
>>>
>>> Juergen, Here is the data we get when running benchmark netperf:
>>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      29031.6 bit/s -- 76.1 %CPU
>>>
>>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      35787.7 bit/s -- 129.4 %CPU
>>>
>>>   3. w/ kvm dynamic poll:
>>>      35735.6 bit/s -- 200.0 %CPU
>>
>> Actually we can reduce the CPU utilization by sleeping a period of
>> time as what has already been done in the poll logic of IO subsystem,
>> then we can improve the algorithm in kvm instead of introduing another
>> duplicate one in the kvm guest.
>
> We really appreciate upstream's kvm dynamic poll mechanism, which is
> really helpful for a lot of scenario..
>
> However, as description said, in virtualization, idle path includes
> several heavy operations includes timer access (LAPIC timer or TSC
> deadline timer) which will hurt performance especially for latency
> intensive workload like message passing task. The cost is mainly from
> the vmexit which is a hardware context switch between virtual machine
> and hypervisor.
>
> for upstream's kvm dynamic poll mechanism, even you could provide a
> better algorism, how could you bypass timer access (LAPIC timer or TSC
> deadline timer), or a hardware context switch between virtual machine
> and hypervisor. I know these is a tradeoff.
>
> Furthermore, here is the data we get when running benchmark contextswitch
> to measure the latency(lower is better):
>
> 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>   3402.9 ns/ctxsw -- 199.8 %CPU
>
> 2. w/ patch and disable kvm dynamic poll:
>   1163.5 ns/ctxsw -- 205.5 %CPU
>
> 3. w/ kvm dynamic poll:
>   2280.6 ns/ctxsw -- 199.5 %CPU
>
> so, these tow solution are quite similar, but not duplicate..
>
> that's also why to add a generic idle poll before enter real idle path.
> When a reschedule event is pending, we can bypass the real idle path.
>

There is a similar logic in the idle governor/driver, so how this
patchset influence the decision in the idle governor/driver when
running on bare-metal(power managment is not exposed to the guest so
we will not enter into idle driver in the guest)?

Regards,
Wanpeng Li
Quan Xu Nov. 14, 2017, 9:38 a.m. UTC | #8
On 2017/11/14 15:30, Juergen Gross wrote:
> On 14/11/17 08:02, Quan Xu wrote:
>>
>> On 2017/11/13 18:53, Juergen Gross wrote:
>>> On 13/11/17 11:06, Quan Xu wrote:
>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>
>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>> in idle path which will poll for a while before we enter the real idle
>>>> state.
>>>>
>>>> In virtualization, idle path includes several heavy operations
>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>> hurt performance especially for latency intensive workload like message
>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>> context switch between virtual machine and hypervisor. Our solution is
>>>> to poll for a while and do not enter real idle path if we can get the
>>>> schedule event during polling.
>>>>
>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>> reduce the useless poll.
>>>>
>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>> Cc: Juergen Gross <jgross@suse.com>
>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>> Cc: x86@kernel.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Cc: xen-devel@lists.xenproject.org
>>> Hmm, is the idle entry path really so critical to performance that a new
>>> pvops function is necessary?
>> Juergen, Here is the data we get when running benchmark netperf:
>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      29031.6 bit/s -- 76.1 %CPU
>>
>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>      35787.7 bit/s -- 129.4 %CPU
>>
>>   3. w/ kvm dynamic poll:
>>      35735.6 bit/s -- 200.0 %CPU
>>
>>   4. w/patch and w/ kvm dynamic poll:
>>      42225.3 bit/s -- 198.7 %CPU
>>
>>   5. idle=poll
>>      37081.7 bit/s -- 998.1 %CPU
>>
>>
>>
>>   w/ this patch, we will improve performance by 23%.. even we could improve
>>   performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>>   cost of CPU is much lower than 'idle=poll' case..
> I don't question the general idea. I just think pvops isn't the best way
> to implement it.
>
>>> Wouldn't a function pointer, maybe guarded
>>> by a static key, be enough? A further advantage would be that this would
>>> work on other architectures, too.
>> I assume this feature will be ported to other archs.. a new pvops makes

       sorry, a typo.. /other archs/other hypervisors/
       it refers hypervisor like Xen, HyperV and VMware)..

>> code
>> clean and easy to maintain. also I tried to add it into existed pvops,
>> but it
>> doesn't match.
> You are aware that pvops is x86 only?

yes, I'm aware..

> I really don't see the big difference in maintainability compared to the
> static key / function pointer variant:
>
> void (*guest_idle_poll_func)(void);
> struct static_key guest_idle_poll_key __read_mostly;
>
> static inline void guest_idle_poll(void)
> {
> 	if (static_key_false(&guest_idle_poll_key))
> 		guest_idle_poll_func();
> }



thank you for your sample code :)
I agree there is no big difference.. I think we are discussion for two 
things:
  1) x86 VM on different hypervisors
  2) different archs VM on kvm hypervisor

What I want to do is x86 VM on different hypervisors, such as kvm / xen 
/ hyperv ..

> And KVM would just need to set guest_idle_poll_func and enable the
> static key. Works on non-x86 architectures, too.
>

.. referred to 'pv_mmu_ops', HyperV and Xen can implement their own 
functions for 'pv_mmu_ops'.
I think it is the same to pv_idle_ops.

with above explaination, do you still think I need to define the static
key/function pointer variant?

btw, any interest to port it to Xen HVM guest? :)

Quan
Alibaba Cloud
Quan Xu Nov. 14, 2017, 10:23 a.m. UTC | #9
On 2017/11/14 16:22, Wanpeng Li wrote:
> 2017-11-14 16:15 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>>
>> On 2017/11/14 15:12, Wanpeng Li wrote:
>>> 2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0@gmail.com>:
>>>>
>>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>>>
>>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>>>> in idle path which will poll for a while before we enter the real idle
>>>>>> state.
>>>>>>
>>>>>> In virtualization, idle path includes several heavy operations
>>>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>>>> hurt performance especially for latency intensive workload like message
>>>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>>>> context switch between virtual machine and hypervisor. Our solution is
>>>>>> to poll for a while and do not enter real idle path if we can get the
>>>>>> schedule event during polling.
>>>>>>
>>>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>>>> reduce the useless poll.
>>>>>>
>>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>> Cc: x86@kernel.org
>>>>>> Cc: virtualization@lists.linux-foundation.org
>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>> Cc: xen-devel@lists.xenproject.org
>>>>> Hmm, is the idle entry path really so critical to performance that a new
>>>>> pvops function is necessary?
>>>> Juergen, Here is the data we get when running benchmark netperf:
>>>>    1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       29031.6 bit/s -- 76.1 %CPU
>>>>
>>>>    2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       35787.7 bit/s -- 129.4 %CPU
>>>>
>>>>    3. w/ kvm dynamic poll:
>>>>       35735.6 bit/s -- 200.0 %CPU
>>> Actually we can reduce the CPU utilization by sleeping a period of
>>> time as what has already been done in the poll logic of IO subsystem,
>>> then we can improve the algorithm in kvm instead of introduing another
>>> duplicate one in the kvm guest.
>> We really appreciate upstream's kvm dynamic poll mechanism, which is
>> really helpful for a lot of scenario..
>>
>> However, as description said, in virtualization, idle path includes
>> several heavy operations includes timer access (LAPIC timer or TSC
>> deadline timer) which will hurt performance especially for latency
>> intensive workload like message passing task. The cost is mainly from
>> the vmexit which is a hardware context switch between virtual machine
>> and hypervisor.
>>
>> for upstream's kvm dynamic poll mechanism, even you could provide a
>> better algorism, how could you bypass timer access (LAPIC timer or TSC
>> deadline timer), or a hardware context switch between virtual machine
>> and hypervisor. I know these is a tradeoff.
>>
>> Furthermore, here is the data we get when running benchmark contextswitch
>> to measure the latency(lower is better):
>>
>> 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>    3402.9 ns/ctxsw -- 199.8 %CPU
>>
>> 2. w/ patch and disable kvm dynamic poll:
>>    1163.5 ns/ctxsw -- 205.5 %CPU
>>
>> 3. w/ kvm dynamic poll:
>>    2280.6 ns/ctxsw -- 199.5 %CPU
>>
>> so, these tow solution are quite similar, but not duplicate..
>>
>> that's also why to add a generic idle poll before enter real idle path.
>> When a reschedule event is pending, we can bypass the real idle path.
>>
> There is a similar logic in the idle governor/driver, so how this
> patchset influence the decision in the idle governor/driver when
> running on bare-metal(power managment is not exposed to the guest so
> we will not enter into idle driver in the guest)?
>

This is expected to take effect only when running as a virtual machine with
proper CONFIG_* enabled. This can not work on bare mental even with proper
CONFIG_* enabled.

Quan
Alibaba Cloud
Jürgen Groß Nov. 14, 2017, 10:27 a.m. UTC | #10
On 14/11/17 10:38, Quan Xu wrote:
> 
> 
> On 2017/11/14 15:30, Juergen Gross wrote:
>> On 14/11/17 08:02, Quan Xu wrote:
>>>
>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>>
>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>>> in idle path which will poll for a while before we enter the real idle
>>>>> state.
>>>>>
>>>>> In virtualization, idle path includes several heavy operations
>>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>>> hurt performance especially for latency intensive workload like
>>>>> message
>>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>>> context switch between virtual machine and hypervisor. Our solution is
>>>>> to poll for a while and do not enter real idle path if we can get the
>>>>> schedule event during polling.
>>>>>
>>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>>> reduce the useless poll.
>>>>>
>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: x86@kernel.org
>>>>> Cc: virtualization@lists.linux-foundation.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Cc: xen-devel@lists.xenproject.org
>>>> Hmm, is the idle entry path really so critical to performance that a
>>>> new
>>>> pvops function is necessary?
>>> Juergen, Here is the data we get when running benchmark netperf:
>>>   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      29031.6 bit/s -- 76.1 %CPU
>>>
>>>   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>      35787.7 bit/s -- 129.4 %CPU
>>>
>>>   3. w/ kvm dynamic poll:
>>>      35735.6 bit/s -- 200.0 %CPU
>>>
>>>   4. w/patch and w/ kvm dynamic poll:
>>>      42225.3 bit/s -- 198.7 %CPU
>>>
>>>   5. idle=poll
>>>      37081.7 bit/s -- 998.1 %CPU
>>>
>>>
>>>
>>>   w/ this patch, we will improve performance by 23%.. even we could
>>> improve
>>>   performance by 45.4%, if we use w/patch and w/ kvm dynamic poll.
>>> also the
>>>   cost of CPU is much lower than 'idle=poll' case..
>> I don't question the general idea. I just think pvops isn't the best way
>> to implement it.
>>
>>>> Wouldn't a function pointer, maybe guarded
>>>> by a static key, be enough? A further advantage would be that this
>>>> would
>>>> work on other architectures, too.
>>> I assume this feature will be ported to other archs.. a new pvops makes
> 
>       sorry, a typo.. /other archs/other hypervisors/
>       it refers hypervisor like Xen, HyperV and VMware)..
> 
>>> code
>>> clean and easy to maintain. also I tried to add it into existed pvops,
>>> but it
>>> doesn't match.
>> You are aware that pvops is x86 only?
> 
> yes, I'm aware..
> 
>> I really don't see the big difference in maintainability compared to the
>> static key / function pointer variant:
>>
>> void (*guest_idle_poll_func)(void);
>> struct static_key guest_idle_poll_key __read_mostly;
>>
>> static inline void guest_idle_poll(void)
>> {
>>     if (static_key_false(&guest_idle_poll_key))
>>         guest_idle_poll_func();
>> }
> 
> 
> 
> thank you for your sample code :)
> I agree there is no big difference.. I think we are discussion for two
> things:
>  1) x86 VM on different hypervisors
>  2) different archs VM on kvm hypervisor
> 
> What I want to do is x86 VM on different hypervisors, such as kvm / xen
> / hyperv ..

Why limit the solution to x86 if the more general solution isn't
harder?

As you didn't give any reason why the pvops approach is better other
than you don't care for non-x86 platforms you won't get an "Ack" from
me for this patch.

> 
>> And KVM would just need to set guest_idle_poll_func and enable the
>> static key. Works on non-x86 architectures, too.
>>
> 
> .. referred to 'pv_mmu_ops', HyperV and Xen can implement their own
> functions for 'pv_mmu_ops'.
> I think it is the same to pv_idle_ops.
> 
> with above explaination, do you still think I need to define the static
> key/function pointer variant?
> 
> btw, any interest to port it to Xen HVM guest? :)

Maybe. But this should work for Xen on ARM, too.


Juergen
Quan Xu Nov. 14, 2017, 11:43 a.m. UTC | #11
On 2017/11/14 18:27, Juergen Gross wrote:
> On 14/11/17 10:38, Quan Xu wrote:
>>
>> On 2017/11/14 15:30, Juergen Gross wrote:
>>> On 14/11/17 08:02, Quan Xu wrote:
>>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>>>
>>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>>>>> in idle path which will poll for a while before we enter the real idle
>>>>>> state.
>>>>>>
>>>>>> In virtualization, idle path includes several heavy operations
>>>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>>>> hurt performance especially for latency intensive workload like
>>>>>> message
>>>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>>>> context switch between virtual machine and hypervisor. Our solution is
>>>>>> to poll for a while and do not enter real idle path if we can get the
>>>>>> schedule event during polling.
>>>>>>
>>>>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>>>>> reduce the useless poll.
>>>>>>
>>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>> Cc: x86@kernel.org
>>>>>> Cc: virtualization@lists.linux-foundation.org
>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>> Cc: xen-devel@lists.xenproject.org
>>>>> Hmm, is the idle entry path really so critical to performance that a
>>>>> new
>>>>> pvops function is necessary?
>>>> Juergen, Here is the data we get when running benchmark netperf:
>>>>    1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       29031.6 bit/s -- 76.1 %CPU
>>>>
>>>>    2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>       35787.7 bit/s -- 129.4 %CPU
>>>>
>>>>    3. w/ kvm dynamic poll:
>>>>       35735.6 bit/s -- 200.0 %CPU
>>>>
>>>>    4. w/patch and w/ kvm dynamic poll:
>>>>       42225.3 bit/s -- 198.7 %CPU
>>>>
>>>>    5. idle=poll
>>>>       37081.7 bit/s -- 998.1 %CPU
>>>>
>>>>
>>>>
>>>>    w/ this patch, we will improve performance by 23%.. even we could
>>>> improve
>>>>    performance by 45.4%, if we use w/patch and w/ kvm dynamic poll.
>>>> also the
>>>>    cost of CPU is much lower than 'idle=poll' case..
>>> I don't question the general idea. I just think pvops isn't the best way
>>> to implement it.
>>>
>>>>> Wouldn't a function pointer, maybe guarded
>>>>> by a static key, be enough? A further advantage would be that this
>>>>> would
>>>>> work on other architectures, too.
>>>> I assume this feature will be ported to other archs.. a new pvops makes
>>        sorry, a typo.. /other archs/other hypervisors/
>>        it refers hypervisor like Xen, HyperV and VMware)..
>>
>>>> code
>>>> clean and easy to maintain. also I tried to add it into existed pvops,
>>>> but it
>>>> doesn't match.
>>> You are aware that pvops is x86 only?
>> yes, I'm aware..
>>
>>> I really don't see the big difference in maintainability compared to the
>>> static key / function pointer variant:
>>>
>>> void (*guest_idle_poll_func)(void);
>>> struct static_key guest_idle_poll_key __read_mostly;
>>>
>>> static inline void guest_idle_poll(void)
>>> {
>>>      if (static_key_false(&guest_idle_poll_key))
>>>          guest_idle_poll_func();
>>> }
>>
>>
>> thank you for your sample code :)
>> I agree there is no big difference.. I think we are discussion for two
>> things:
>>   1) x86 VM on different hypervisors
>>   2) different archs VM on kvm hypervisor
>>
>> What I want to do is x86 VM on different hypervisors, such as kvm / xen
>> / hyperv ..
> Why limit the solution to x86 if the more general solution isn't
> harder?
>
> As you didn't give any reason why the pvops approach is better other
> than you don't care for non-x86 platforms you won't get an "Ack" from
> me for this patch.


It just looks a little odder to me. I understand you care about no-x86 arch.

Are you aware 'pv_time_ops' for arm64/arm/x86 archs, defined in
    - arch/arm64/include/asm/paravirt.h
    - arch/x86/include/asm/paravirt_types.h
    - arch/arm/include/asm/paravirt.h

I am unfamilar with arm code. IIUC, if you'd implement pv_idle_ops
for arm/arm64 arch, you'd define a same structure in
    - arch/arm64/include/asm/paravirt.h     or
    - arch/arm/include/asm/paravirt.h

.. instead of static key / fuction.

then implement a real function in
    - arch/arm/kernel/paravirt.c.

Also I wonder HOW/WHERE to define a static key/function, then to benifit
x86/no-x86 archs?

Quan
Alibaba Cloud

>>> And KVM would just need to set guest_idle_poll_func and enable the
>>> static key. Works on non-x86 architectures, too.
>>>
>> .. referred to 'pv_mmu_ops', HyperV and Xen can implement their own
>> functions for 'pv_mmu_ops'.
>> I think it is the same to pv_idle_ops.
>>
>> with above explaination, do you still think I need to define the static
>> key/function pointer variant?
>>
>> btw, any interest to port it to Xen HVM guest? :)
> Maybe. But this should work for Xen on ARM, too.
>
>
> Juergen
>
Jürgen Groß Nov. 14, 2017, 11:58 a.m. UTC | #12
On 14/11/17 12:43, Quan Xu wrote:
> 
> 
> On 2017/11/14 18:27, Juergen Gross wrote:
>> On 14/11/17 10:38, Quan Xu wrote:
>>>
>>> On 2017/11/14 15:30, Juergen Gross wrote:
>>>> On 14/11/17 08:02, Quan Xu wrote:
>>>>> On 2017/11/13 18:53, Juergen Gross wrote:
>>>>>> On 13/11/17 11:06, Quan Xu wrote:
>>>>>>> From: Quan Xu <quan.xu0@gmail.com>
>>>>>>>
>>>>>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is
>>>>>>> called
>>>>>>> in idle path which will poll for a while before we enter the real
>>>>>>> idle
>>>>>>> state.
>>>>>>>
>>>>>>> In virtualization, idle path includes several heavy operations
>>>>>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>>>>>> hurt performance especially for latency intensive workload like
>>>>>>> message
>>>>>>> passing task. The cost is mainly from the vmexit which is a hardware
>>>>>>> context switch between virtual machine and hypervisor. Our
>>>>>>> solution is
>>>>>>> to poll for a while and do not enter real idle path if we can get
>>>>>>> the
>>>>>>> schedule event during polling.
>>>>>>>
>>>>>>> Poll may cause the CPU waste so we adopt a smart polling
>>>>>>> mechanism to
>>>>>>> reduce the useless poll.
>>>>>>>
>>>>>>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>>>>>>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
>>>>>>> Cc: Juergen Gross <jgross@suse.com>
>>>>>>> Cc: Alok Kataria <akataria@vmware.com>
>>>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>> Cc: x86@kernel.org
>>>>>>> Cc: virtualization@lists.linux-foundation.org
>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>> Cc: xen-devel@lists.xenproject.org
>>>>>> Hmm, is the idle entry path really so critical to performance that a
>>>>>> new
>>>>>> pvops function is necessary?
>>>>> Juergen, Here is the data we get when running benchmark netperf:
>>>>>    1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>>       29031.6 bit/s -- 76.1 %CPU
>>>>>
>>>>>    2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>>>>>       35787.7 bit/s -- 129.4 %CPU
>>>>>
>>>>>    3. w/ kvm dynamic poll:
>>>>>       35735.6 bit/s -- 200.0 %CPU
>>>>>
>>>>>    4. w/patch and w/ kvm dynamic poll:
>>>>>       42225.3 bit/s -- 198.7 %CPU
>>>>>
>>>>>    5. idle=poll
>>>>>       37081.7 bit/s -- 998.1 %CPU
>>>>>
>>>>>
>>>>>
>>>>>    w/ this patch, we will improve performance by 23%.. even we could
>>>>> improve
>>>>>    performance by 45.4%, if we use w/patch and w/ kvm dynamic poll.
>>>>> also the
>>>>>    cost of CPU is much lower than 'idle=poll' case..
>>>> I don't question the general idea. I just think pvops isn't the best
>>>> way
>>>> to implement it.
>>>>
>>>>>> Wouldn't a function pointer, maybe guarded
>>>>>> by a static key, be enough? A further advantage would be that this
>>>>>> would
>>>>>> work on other architectures, too.
>>>>> I assume this feature will be ported to other archs.. a new pvops
>>>>> makes
>>>        sorry, a typo.. /other archs/other hypervisors/
>>>        it refers hypervisor like Xen, HyperV and VMware)..
>>>
>>>>> code
>>>>> clean and easy to maintain. also I tried to add it into existed pvops,
>>>>> but it
>>>>> doesn't match.
>>>> You are aware that pvops is x86 only?
>>> yes, I'm aware..
>>>
>>>> I really don't see the big difference in maintainability compared to
>>>> the
>>>> static key / function pointer variant:
>>>>
>>>> void (*guest_idle_poll_func)(void);
>>>> struct static_key guest_idle_poll_key __read_mostly;
>>>>
>>>> static inline void guest_idle_poll(void)
>>>> {
>>>>      if (static_key_false(&guest_idle_poll_key))
>>>>          guest_idle_poll_func();
>>>> }
>>>
>>>
>>> thank you for your sample code :)
>>> I agree there is no big difference.. I think we are discussion for two
>>> things:
>>>   1) x86 VM on different hypervisors
>>>   2) different archs VM on kvm hypervisor
>>>
>>> What I want to do is x86 VM on different hypervisors, such as kvm / xen
>>> / hyperv ..
>> Why limit the solution to x86 if the more general solution isn't
>> harder?
>>
>> As you didn't give any reason why the pvops approach is better other
>> than you don't care for non-x86 platforms you won't get an "Ack" from
>> me for this patch.
> 
> 
> It just looks a little odder to me. I understand you care about no-x86
> arch.
> 
> Are you aware 'pv_time_ops' for arm64/arm/x86 archs, defined in
>    - arch/arm64/include/asm/paravirt.h
>    - arch/x86/include/asm/paravirt_types.h
>    - arch/arm/include/asm/paravirt.h

Yes, I know. This is just a hack to make it compile. Other than the
same names this has nothing to do with pvops, but is just a function
vector.

> I am unfamilar with arm code. IIUC, if you'd implement pv_idle_ops
> for arm/arm64 arch, you'd define a same structure in
>    - arch/arm64/include/asm/paravirt.h     or
>    - arch/arm/include/asm/paravirt.h
> 
> .. instead of static key / fuction.
> 
> then implement a real function in
>    - arch/arm/kernel/paravirt.c.

So just to use pvops you want to implement it in each arch instead
of using a mechanism available everywhere?

> Also I wonder HOW/WHERE to define a static key/function, then to benifit
> x86/no-x86 archs?

What? There are plenty of examples in the kernel.

Please stop wasting my time. Either write a patch which is acceptable
or let it be. I won't take your pvops approach without a really good
reason to do so. And so far you haven't given any reason other than
you are too lazy to write a proper patch, sorry.


Juergen
diff mbox

Patch

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index fd81228..3c83727 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -198,6 +198,11 @@  static inline unsigned long long paravirt_read_pmc(int counter)
 
 #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
 
+static inline void paravirt_idle_poll(void)
+{
+	PVOP_VCALL0(pv_idle_ops.poll);
+}
+
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 {
 	PVOP_VCALL2(pv_cpu_ops.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 10cc3b9..95c0e3e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -313,6 +313,10 @@  struct pv_lock_ops {
 	struct paravirt_callee_save vcpu_is_preempted;
 } __no_randomize_layout;
 
+struct pv_idle_ops {
+	void (*poll)(void);
+} __no_randomize_layout;
+
 /* This contains all the paravirt structures: we get a convenient
  * number for each function using the offset which we use to indicate
  * what to patch. */
@@ -323,6 +327,7 @@  struct paravirt_patch_template {
 	struct pv_irq_ops pv_irq_ops;
 	struct pv_mmu_ops pv_mmu_ops;
 	struct pv_lock_ops pv_lock_ops;
+	struct pv_idle_ops pv_idle_ops;
 } __no_randomize_layout;
 
 extern struct pv_info pv_info;
@@ -332,6 +337,7 @@  struct paravirt_patch_template {
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
 extern struct pv_lock_ops pv_lock_ops;
+extern struct pv_idle_ops pv_idle_ops;
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 19a3e8f..67cab22 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -128,6 +128,7 @@  unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 		.pv_lock_ops = pv_lock_ops,
 #endif
+		.pv_idle_ops = pv_idle_ops,
 	};
 	return *((void **)&tmpl + type);
 }
@@ -312,6 +313,10 @@  struct pv_time_ops pv_time_ops = {
 	.steal_clock = native_steal_clock,
 };
 
+struct pv_idle_ops pv_idle_ops = {
+	.poll = paravirt_nop,
+};
+
 __visible struct pv_irq_ops pv_irq_ops = {
 	.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
 	.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
@@ -463,3 +468,4 @@  struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
 EXPORT_SYMBOL    (pv_mmu_ops);
 EXPORT_SYMBOL_GPL(pv_info);
 EXPORT_SYMBOL    (pv_irq_ops);
+EXPORT_SYMBOL    (pv_idle_ops);