mbox series

[0/5] cpuidle haltpoll driver and governor (v6)

Message ID 20190703235124.783034907@amt.cnet (mailing list archive)
Headers show
Series cpuidle haltpoll driver and governor (v6) | expand

Message

Marcelo Tosatti July 3, 2019, 11:51 p.m. UTC
(rebased against queue branch of kvm.git tree)

The cpuidle-haltpoll driver with haltpoll governor allows the guest
vcpus to poll for a specified amount of time before halting.
This provides the following benefits to host side polling:

         1) The POLL flag is set while polling is performed, which allows
            a remote vCPU to avoid sending an IPI (and the associated
            cost of handling the IPI) when performing a wakeup.

         2) The VM-exit cost can be avoided.

The downside of guest side polling is that polling is performed
even with other runnable tasks in the host.

Results comparing halt_poll_ns and server/client application
where a small packet is ping-ponged:

host                                        --> 31.33
halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)

For the SAP HANA benchmarks (where idle_spin is a parameter
of the previous version of the patch, results should be the
same):

hpns == halt_poll_ns

                           idle_spin=0/   idle_spin=800/    idle_spin=0/
                           hpns=200000    hpns=0            hpns=800000
DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)	   1.29   (-3.7%)
UpdateC00T03 (1 thread)   4.72           4.18 (-12%)	   4.53   (-5%)

V2:

- Move from x86 to generic code (Paolo/Christian)
- Add auto-tuning logic (Paolo)
- Add MSR to disable host side polling (Paolo)

V3:

- Do not be specific about HLT VM-exit in the documentation (Ankur Arora)
- Mark tuning parameters static and __read_mostly (Andrea Arcangeli)
- Add WARN_ON if host does not support poll control (Joao Martins)
- Use sched_clock and cleanup haltpoll_enter_idle (Peter Zijlstra)
- Mark certain functions in kvm.c as static (kernel test robot)
- Remove tracepoints as they use RCU from extended quiescent state (kernel
test robot)

V4:
- Use a haltpoll governor, use poll_state.c poll code (Rafael J. Wysocki)

V5:
- Take latency requirement into consideration (Rafael J. Wysocki)
- Set target_residency/exit_latency to 1 (Rafael J. Wysocki)
- Do not load cpuidle driver if not virtualized (Rafael J. Wysocki)

V6:
- Switch from callback to poll_limit_ns variable in cpuidle device structure
(Rafael J. Wysocki)
- Move last_used_idx to cpuidle device structure (Rafael J. Wysocki)
- Drop per-cpu device structure in haltpoll governor (Rafael J. Wysocki)

Comments

Rafael J. Wysocki July 4, 2019, 8:32 a.m. UTC | #1
On Thu, Jul 4, 2019 at 1:59 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> (rebased against queue branch of kvm.git tree)
>
> The cpuidle-haltpoll driver with haltpoll governor allows the guest
> vcpus to poll for a specified amount of time before halting.
> This provides the following benefits to host side polling:
>
>          1) The POLL flag is set while polling is performed, which allows
>             a remote vCPU to avoid sending an IPI (and the associated
>             cost of handling the IPI) when performing a wakeup.
>
>          2) The VM-exit cost can be avoided.
>
> The downside of guest side polling is that polling is performed
> even with other runnable tasks in the host.
>
> Results comparing halt_poll_ns and server/client application
> where a small packet is ping-ponged:
>
> host                                        --> 31.33
> halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
>
> For the SAP HANA benchmarks (where idle_spin is a parameter
> of the previous version of the patch, results should be the
> same):
>
> hpns == halt_poll_ns
>
>                            idle_spin=0/   idle_spin=800/    idle_spin=0/
>                            hpns=200000    hpns=0            hpns=800000
> DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)      1.29   (-3.7%)
> UpdateC00T03 (1 thread)   4.72           4.18 (-12%)       4.53   (-5%)
>
> V2:
>
> - Move from x86 to generic code (Paolo/Christian)
> - Add auto-tuning logic (Paolo)
> - Add MSR to disable host side polling (Paolo)
>
> V3:
>
> - Do not be specific about HLT VM-exit in the documentation (Ankur Arora)
> - Mark tuning parameters static and __read_mostly (Andrea Arcangeli)
> - Add WARN_ON if host does not support poll control (Joao Martins)
> - Use sched_clock and cleanup haltpoll_enter_idle (Peter Zijlstra)
> - Mark certain functions in kvm.c as static (kernel test robot)
> - Remove tracepoints as they use RCU from extended quiescent state (kernel
> test robot)
>
> V4:
> - Use a haltpoll governor, use poll_state.c poll code (Rafael J. Wysocki)
>
> V5:
> - Take latency requirement into consideration (Rafael J. Wysocki)
> - Set target_residency/exit_latency to 1 (Rafael J. Wysocki)
> - Do not load cpuidle driver if not virtualized (Rafael J. Wysocki)
>
> V6:
> - Switch from callback to poll_limit_ns variable in cpuidle device structure
> (Rafael J. Wysocki)
> - Move last_used_idx to cpuidle device structure (Rafael J. Wysocki)
> - Drop per-cpu device structure in haltpoll governor (Rafael J. Wysocki)

It looks good to me now, but I have some cpuidle changes in the work
that will clash in some changes in this series if not rebased on top
of it, so IMO it would make sense for me to get patches [1-4/5] at
least into my queue.  I can expose an immutable branch with them for
the KVM tree to consume.  I can take the last patch in the series as
well if I get an ACK for it.

Would that work for everybody?
Paolo Bonzini July 22, 2019, 3:25 p.m. UTC | #2
On 04/07/19 10:32, Rafael J. Wysocki wrote:
> On Thu, Jul 4, 2019 at 1:59 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>
>> (rebased against queue branch of kvm.git tree)
>>
>> The cpuidle-haltpoll driver with haltpoll governor allows the guest
>> vcpus to poll for a specified amount of time before halting.
>> This provides the following benefits to host side polling:
>>
>>          1) The POLL flag is set while polling is performed, which allows
>>             a remote vCPU to avoid sending an IPI (and the associated
>>             cost of handling the IPI) when performing a wakeup.
>>
>>          2) The VM-exit cost can be avoided.
>>
>> The downside of guest side polling is that polling is performed
>> even with other runnable tasks in the host.
>>
>> Results comparing halt_poll_ns and server/client application
>> where a small packet is ping-ponged:
>>
>> host                                        --> 31.33
>> halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
>> halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
>>
>> For the SAP HANA benchmarks (where idle_spin is a parameter
>> of the previous version of the patch, results should be the
>> same):
>>
>> hpns == halt_poll_ns
>>
>>                            idle_spin=0/   idle_spin=800/    idle_spin=0/
>>                            hpns=200000    hpns=0            hpns=800000
>> DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
>> InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
>> DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)      1.29   (-3.7%)
>> UpdateC00T03 (1 thread)   4.72           4.18 (-12%)       4.53   (-5%)
>>
>> V2:
>>
>> - Move from x86 to generic code (Paolo/Christian)
>> - Add auto-tuning logic (Paolo)
>> - Add MSR to disable host side polling (Paolo)
>>
>> V3:
>>
>> - Do not be specific about HLT VM-exit in the documentation (Ankur Arora)
>> - Mark tuning parameters static and __read_mostly (Andrea Arcangeli)
>> - Add WARN_ON if host does not support poll control (Joao Martins)
>> - Use sched_clock and cleanup haltpoll_enter_idle (Peter Zijlstra)
>> - Mark certain functions in kvm.c as static (kernel test robot)
>> - Remove tracepoints as they use RCU from extended quiescent state (kernel
>> test robot)
>>
>> V4:
>> - Use a haltpoll governor, use poll_state.c poll code (Rafael J. Wysocki)
>>
>> V5:
>> - Take latency requirement into consideration (Rafael J. Wysocki)
>> - Set target_residency/exit_latency to 1 (Rafael J. Wysocki)
>> - Do not load cpuidle driver if not virtualized (Rafael J. Wysocki)
>>
>> V6:
>> - Switch from callback to poll_limit_ns variable in cpuidle device structure
>> (Rafael J. Wysocki)
>> - Move last_used_idx to cpuidle device structure (Rafael J. Wysocki)
>> - Drop per-cpu device structure in haltpoll governor (Rafael J. Wysocki)
> 
> It looks good to me now, but I have some cpuidle changes in the work
> that will clash in some changes in this series if not rebased on top
> of it, so IMO it would make sense for me to get patches [1-4/5] at
> least into my queue.  I can expose an immutable branch with them for
> the KVM tree to consume.  I can take the last patch in the series as
> well if I get an ACK for it.
> 
> Would that work for everybody?

Rafael, please take the whole series in your tree.  Thanks!

Paolo