mbox series

[RFC,0/2] High downtime with 95+ throttle pct

Message ID 20190710092338.23559-1-yury-kotov@yandex-team.ru (mailing list archive)
Headers show
Series High downtime with 95+ throttle pct | expand

Message

Yury Kotov July 10, 2019, 9:23 a.m. UTC
Hi,

I wrote a test for migration auto converge and found out a strange thing:
1. Enable auto converge
2. Set max-bandwidth 1Gb/s
3. Set downtime-limit 1ms
4. Run standard test (just writes a byte per page)
5. Wait for converge
6. It's converged with 99% throttle percentage
7. The result downtime was about 300-600ms   <<<<

It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
And it sleeps even after a cpu kick.

I tried to fix it by using timedwait for ms part of sleep.
E.g timedwait(halt_cond, 1ms) + usleep(500).

But I'm not sure about using timedwait function here with qemu_global_mutex.
The original function uses qemu_mutex_unlock_iothread + qemu_mutex_lock_iothread
It differs from locking/unlocking (inside timedwait) qemu_global_mutex
because of using qemu_bql_mutex_lock_func function which could be anything.
This is why the series is RFC.

What do you think?
Thanks!

Yury Kotov (2):
  qemu-thread: Add qemu_cond_timedwait
  cpus: Fix throttling during vm_stop

 cpus.c                   | 27 +++++++++++++++++++--------
 include/qemu/thread.h    | 12 ++++++++++++
 util/qemu-thread-posix.c | 40 ++++++++++++++++++++++++++++------------
 util/qemu-thread-win32.c | 16 ++++++++++++++++
 4 files changed, 75 insertions(+), 20 deletions(-)

Comments

Dr. David Alan Gilbert July 10, 2019, 9:56 a.m. UTC | #1
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Hi,
> 
> I wrote a test for migration auto converge and found out a strange thing:
> 1. Enable auto converge
> 2. Set max-bandwidth 1Gb/s
> 3. Set downtime-limit 1ms
> 4. Run standard test (just writes a byte per page)
> 5. Wait for converge
> 6. It's converged with 99% throttle percentage
> 7. The result downtime was about 300-600ms   <<<<
> 
> It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
> function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
> And it sleeps even after a cpu kick.
> 
> I tried to fix it by using timedwait for ms part of sleep.
> E.g timedwait(halt_cond, 1ms) + usleep(500).
> 
> But I'm not sure about using timedwait function here with qemu_global_mutex.
> The original function uses qemu_mutex_unlock_iothread + qemu_mutex_lock_iothread
> It differs from locking/unlocking (inside timedwait) qemu_global_mutex
> because of using qemu_bql_mutex_lock_func function which could be anything.
> This is why the series is RFC.
> 
> What do you think?

Would qemu_sem_timedwait work for your use?  I use it in
migration_thread for the bandwidth limiting and allowing that to be
woken up.

Dave

> Thanks!
> 
> Yury Kotov (2):
>   qemu-thread: Add qemu_cond_timedwait
>   cpus: Fix throttling during vm_stop
> 
>  cpus.c                   | 27 +++++++++++++++++++--------
>  include/qemu/thread.h    | 12 ++++++++++++
>  util/qemu-thread-posix.c | 40 ++++++++++++++++++++++++++++------------
>  util/qemu-thread-win32.c | 16 ++++++++++++++++
>  4 files changed, 75 insertions(+), 20 deletions(-)
> 
> -- 
> 2.22.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Yury Kotov July 10, 2019, 10:24 a.m. UTC | #2
10.07.2019, 12:57, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  Hi,
>>
>>  I wrote a test for migration auto converge and found out a strange thing:
>>  1. Enable auto converge
>>  2. Set max-bandwidth 1Gb/s
>>  3. Set downtime-limit 1ms
>>  4. Run standard test (just writes a byte per page)
>>  5. Wait for converge
>>  6. It's converged with 99% throttle percentage
>>  7. The result downtime was about 300-600ms <<<<
>>
>>  It's much higher than expected 1ms. I figured out that cpu_throttle_thread()
>>  function sleeps for 100ms+ for high throttle percentage (>=95%) in VCPU thread.
>>  And it sleeps even after a cpu kick.
>>
>>  I tried to fix it by using timedwait for ms part of sleep.
>>  E.g timedwait(halt_cond, 1ms) + usleep(500).
>>
>>  But I'm not sure about using timedwait function here with qemu_global_mutex.
>>  The original function uses qemu_mutex_unlock_iothread + qemu_mutex_lock_iothread
>>  It differs from locking/unlocking (inside timedwait) qemu_global_mutex
>>  because of using qemu_bql_mutex_lock_func function which could be anything.
>>  This is why the series is RFC.
>>
>>  What do you think?
>
> Would qemu_sem_timedwait work for your use? I use it in
> migration_thread for the bandwidth limiting and allowing that to be
> woken up.

It's a good idea and it should work. But it's more complicated than reusing
halt_cond. I see that it's ok to use qemu_cond_wait in cpus.c so I hope it's
ok to use qemu_cond_timedwait too. But if it isn't then using qemu_sem_timedwait
is a good fallback I think.

Regards,
Yury

>
> Dave
>
>>  Thanks!
>>
>>  Yury Kotov (2):
>>    qemu-thread: Add qemu_cond_timedwait
>>    cpus: Fix throttling during vm_stop
>>
>>   cpus.c | 27 +++++++++++++++++++--------
>>   include/qemu/thread.h | 12 ++++++++++++
>>   util/qemu-thread-posix.c | 40 ++++++++++++++++++++++++++++------------
>>   util/qemu-thread-win32.c | 16 ++++++++++++++++
>>   4 files changed, 75 insertions(+), 20 deletions(-)
>>
>>  --
>>  2.22.0
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK