diff mbox

sched/cputime: add steal clock warps handling during cpu hotplug

Message ID b17566fa-1943-3bda-28f3-f6e21a424d9c@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini June 6, 2016, 1:40 p.m. UTC
On 02/06/2016 15:59, Rik van Riel wrote:
> If a guest is saved to disk and later restored (eg. after
> a host reboot), or live migrated to another host, I would
> expect to get totally disjoint steal time statistics from
> the "new run" of the guest (which is the same run of the
> guest OS).

Why?  The preexisting guest steal time is always added to by
KVM, so the time won't restart from zero.

Continuing the previous count on CPU hot-unplug followed by hot-plug
is less obvious, but I think it's overall the right thing to do.

In fact, I was going to test a patch this week as simple as this:



Thanks,

Paolo

> In fact, this code may also need to deal with the case
> where steal time suddenly increases by a ludicrous amount,
> and ignore those events, too.
> 
> A safe threshold might be to only apply steal times that
> are positive and smaller than one second (as long as nohz_full
> has the one second timer tick left), ignoring intervals that
> are negative or longer than a second, and using those to sync
> up the guest with the host.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Wanpeng Li June 6, 2016, 10:42 p.m. UTC | #1
2016-06-06 21:40 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 02/06/2016 15:59, Rik van Riel wrote:
>> If a guest is saved to disk and later restored (eg. after
>> a host reboot), or live migrated to another host, I would
>> expect to get totally disjoint steal time statistics from
>> the "new run" of the guest (which is the same run of the
>> guest OS).
>
> Why?  The preexisting guest steal time is always added to by
> KVM, so the time won't restart from zero.
>
> Continuing the previous count on CPU hot-unplug followed by hot-plug
> is less obvious, but I think it's overall the right thing to do.
>
> In fact, I was going to test a patch this week as simple as this:
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index eea2a6f72b31..1ef5e48b3a36 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>         if (!has_steal_clock)
>                 return;
>
> -       memset(st, 0, sizeof(*st));
> -
>         wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
>         pr_info("kvm-stealtime: cpu %d, msr %llx\n",
>                 cpu, (unsigned long long) slow_virt_to_phys(st));

Thanks for the suggestion, I will try it today.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel June 7, 2016, 1:24 a.m. UTC | #2
On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
> 
> On 02/06/2016 15:59, Rik van Riel wrote:
> > 
> > If a guest is saved to disk and later restored (eg. after
> > a host reboot), or live migrated to another host, I would
> > expect to get totally disjoint steal time statistics from
> > the "new run" of the guest (which is the same run of the
> > guest OS).
> Why?  The preexisting guest steal time is always added to by
> KVM, so the time won't restart from zero.
> 
> Continuing the previous count on CPU hot-unplug followed by hot-plug
> is less obvious, but I think it's overall the right thing to do.
> 
> In fact, I was going to test a patch this week as simple as this:
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index eea2a6f72b31..1ef5e48b3a36 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>  	if (!has_steal_clock)
>  		return;
>  
> -	memset(st, 0, sizeof(*st));
> -
>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
> KVM_MSR_ENABLED));

By removing the memset from initial bootup allocation,
can't that cause the steal time to "increase by a ludicrous
amount" the very first time it is compared with the arch
independent value in the scheduler code?

In other words, would removal of the memset result in still
requiring Wanpeng's patch?

What am I overlooking?

Is there something preventing a non-zero value right at
the beginning?

Also, is there a chance of ending up with a non-zero bit
in the seqcount if the memset is removed?
Paolo Bonzini June 7, 2016, 7:31 a.m. UTC | #3
On 07/06/2016 03:24, Rik van Riel wrote:
> On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
>>
>> On 02/06/2016 15:59, Rik van Riel wrote:
>>>
>>> If a guest is saved to disk and later restored (eg. after
>>> a host reboot), or live migrated to another host, I would
>>> expect to get totally disjoint steal time statistics from
>>> the "new run" of the guest (which is the same run of the
>>> guest OS).
>> Why?  The preexisting guest steal time is always added to by
>> KVM, so the time won't restart from zero.
>>
>> Continuing the previous count on CPU hot-unplug followed by hot-plug
>> is less obvious, but I think it's overall the right thing to do.
>>
>> In fact, I was going to test a patch this week as simple as this:
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index eea2a6f72b31..1ef5e48b3a36 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>>  	if (!has_steal_clock)
>>  		return;
>>  
>> -	memset(st, 0, sizeof(*st));
>> -
>>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
>> KVM_MSR_ENABLED));
> 
> By removing the memset from initial bootup allocation,
> can't that cause the steal time to "increase by a ludicrous
> amount" the very first time it is compared with the arch
> independent value in the scheduler code?
> 
> In other words, would removal of the memset result in still
> requiring Wanpeng's patch?

The percpu area is initialized to zero, isn't it?

Paolo

> What am I overlooking?
> 
> Is there something preventing a non-zero value right at
> the beginning?
> 
> Also, is there a chance of ending up with a non-zero bit
> in the seqcount if the memset is removed?
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li June 7, 2016, 7:35 a.m. UTC | #4
2016-06-07 15:31 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/06/2016 03:24, Rik van Riel wrote:
>> On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
>>>
>>> On 02/06/2016 15:59, Rik van Riel wrote:
>>>>
>>>> If a guest is saved to disk and later restored (eg. after
>>>> a host reboot), or live migrated to another host, I would
>>>> expect to get totally disjoint steal time statistics from
>>>> the "new run" of the guest (which is the same run of the
>>>> guest OS).
>>> Why?  The preexisting guest steal time is always added to by
>>> KVM, so the time won't restart from zero.
>>>
>>> Continuing the previous count on CPU hot-unplug followed by hot-plug
>>> is less obvious, but I think it's overall the right thing to do.
>>>
>>> In fact, I was going to test a patch this week as simple as this:
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index eea2a6f72b31..1ef5e48b3a36 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>>>      if (!has_steal_clock)
>>>              return;
>>>
>>> -    memset(st, 0, sizeof(*st));
>>> -
>>>      wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
>>> KVM_MSR_ENABLED));
>>
>> By removing the memset from initial bootup allocation,
>> can't that cause the steal time to "increase by a ludicrous
>> amount" the very first time it is compared with the arch
>> independent value in the scheduler code?
>>
>> In other words, would removal of the memset result in still
>> requiring Wanpeng's patch?
>
> The percpu area is initialized to zero, isn't it?

Your proposal can fix the steal clock warp during guest cpu hotplug, I
will send out a new version later.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f72b31..1ef5e48b3a36 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@  static void kvm_register_steal_time(void)
 	if (!has_steal_clock)
 		return;
 
-	memset(st, 0, sizeof(*st));
-
 	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
 	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
 		cpu, (unsigned long long) slow_virt_to_phys(st));