x86/vpt: update last_guest_time with cmpxchg and drop pl_time_lock
diff mbox series

Message ID 1576877960-12767-1-git-send-email-igor.druzhinin@citrix.com
State New, archived
Headers show
Series
  • x86/vpt: update last_guest_time with cmpxchg and drop pl_time_lock
Related show

Commit Message

Igor Druzhinin Dec. 20, 2019, 9:39 p.m. UTC
Similarly to PV vTSC emulation, optimize HVM side for consistency
and scalability by dropping a spinlock protecting a single variable.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/hvm/vpt.c        | 19 ++++++++-----------
 xen/include/asm-x86/hvm/vpt.h |  5 ++---
 2 files changed, 10 insertions(+), 14 deletions(-)

Comments

Jan Beulich Feb. 18, 2020, 5 p.m. UTC | #1
On 20.12.2019 22:39, Igor Druzhinin wrote:
> Similarly to PV vTSC emulation, optimize HVM side for consistency
> and scalability by dropping a spinlock protecting a single variable.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Seeing that you didn't reply to my comment sent on Dec 23rd,
I'm going to drop this patch now from my to-be-dealt-with
folder. You can always re-submit.

Jan
Igor Druzhinin Feb. 18, 2020, 5:06 p.m. UTC | #2
On 18/02/2020 17:00, Jan Beulich wrote:
> On 20.12.2019 22:39, Igor Druzhinin wrote:
>> Similarly to PV vTSC emulation, optimize HVM side for consistency
>> and scalability by dropping a spinlock protecting a single variable.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> Seeing that you didn't reply to my comment sent on Dec 23rd,
> I'm going to drop this patch now from my to-be-dealt-with
> folder. You can always re-submit.

I didn't receive anything. This is literally the first reply on the thread.
This patch wasn't terribly important so I didn't chase.
Could you resend your comment?

Igor
Jan Beulich Feb. 19, 2020, 7:48 a.m. UTC | #3
(Resend; no idea where the original, sent on Dec 23rd, ended up - I
can't find it in the list archives in any event)

On 20.12.2019 22:39, Igor Druzhinin wrote:
> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>  {
>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
> -    u64 now;
> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>  
>      /* Called from device models shared with PV guests. Be careful. */
>      ASSERT(is_hvm_vcpu(v));
>  
> -    spin_lock(&pl->pl_time_lock);
> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
> -
>      if ( !at_tsc )
>      {
> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
> -            pl->last_guest_time = now;
> -        else
> -            now = ++pl->last_guest_time;
> +        do {
> +            old = pl->last_guest_time;
> +            new = now > pl->last_guest_time ? now : old + 1;
> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );

I wonder whether you wouldn't better re-invoke get_s_time() in
case you need to retry here. See how the function previously
was called only after the lock was already acquired.

Jan
Igor Druzhinin Feb. 19, 2020, 6:52 p.m. UTC | #4
On 19/02/2020 07:48, Jan Beulich wrote:
> On 20.12.2019 22:39, Igor Druzhinin wrote:
>> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>>  {
>>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
>> -    u64 now;
>> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>  
>>      /* Called from device models shared with PV guests. Be careful. */
>>      ASSERT(is_hvm_vcpu(v));
>>  
>> -    spin_lock(&pl->pl_time_lock);
>> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>> -
>>      if ( !at_tsc )
>>      {
>> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
>> -            pl->last_guest_time = now;
>> -        else
>> -            now = ++pl->last_guest_time;
>> +        do {
>> +            old = pl->last_guest_time;
>> +            new = now > pl->last_guest_time ? now : old + 1;
>> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
> 
> I wonder whether you wouldn't better re-invoke get_s_time() in
> case you need to retry here. See how the function previously
> was called only after the lock was already acquired.

If there is a concurrent writer, wouldn't it just update pl->last_guest_time
with the new get_s_time() and then we subsequently would just use the new
time on retry? We use the same logic in pv_soft_rdtsc() and so far it
proved to be safe.

Igor
Jan Beulich Feb. 20, 2020, 8:27 a.m. UTC | #5
On 19.02.2020 19:52, Igor Druzhinin wrote:
> On 19/02/2020 07:48, Jan Beulich wrote:
>> On 20.12.2019 22:39, Igor Druzhinin wrote:
>>> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>>>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>>>  {
>>>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
>>> -    u64 now;
>>> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>  
>>>      /* Called from device models shared with PV guests. Be careful. */
>>>      ASSERT(is_hvm_vcpu(v));
>>>  
>>> -    spin_lock(&pl->pl_time_lock);
>>> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>> -
>>>      if ( !at_tsc )
>>>      {
>>> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
>>> -            pl->last_guest_time = now;
>>> -        else
>>> -            now = ++pl->last_guest_time;
>>> +        do {
>>> +            old = pl->last_guest_time;
>>> +            new = now > pl->last_guest_time ? now : old + 1;
>>> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>
>> I wonder whether you wouldn't better re-invoke get_s_time() in
>> case you need to retry here. See how the function previously
>> was called only after the lock was already acquired.
> 
> If there is a concurrent writer, wouldn't it just update pl->last_guest_time
> with the new get_s_time() and then we subsequently would just use the new
> time on retry?

Yes, it would, but the latency until the retry actually occurs
is unknown (in particular if Xen itself runs virtualized). I.e.
in the at_tsc == 0 case I think the value would better be
re-calculated on every iteration.

Anther thing I notice only now are the multiple reads of
pl->last_guest_time. Wouldn't you better do

        do {
            old = ACCESS_ONCE(pl->last_guest_time);
            new = now > old ? now : old + 1;
        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );

?

Jan
Igor Druzhinin Feb. 20, 2020, 3:37 p.m. UTC | #6
On 20/02/2020 08:27, Jan Beulich wrote:
> On 19.02.2020 19:52, Igor Druzhinin wrote:
>> On 19/02/2020 07:48, Jan Beulich wrote:
>>> On 20.12.2019 22:39, Igor Druzhinin wrote:
>>>> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>>>>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>>>>  {
>>>>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
>>>> -    u64 now;
>>>> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>>  
>>>>      /* Called from device models shared with PV guests. Be careful. */
>>>>      ASSERT(is_hvm_vcpu(v));
>>>>  
>>>> -    spin_lock(&pl->pl_time_lock);
>>>> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>> -
>>>>      if ( !at_tsc )
>>>>      {
>>>> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
>>>> -            pl->last_guest_time = now;
>>>> -        else
>>>> -            now = ++pl->last_guest_time;
>>>> +        do {
>>>> +            old = pl->last_guest_time;
>>>> +            new = now > pl->last_guest_time ? now : old + 1;
>>>> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>>
>>> I wonder whether you wouldn't better re-invoke get_s_time() in
>>> case you need to retry here. See how the function previously
>>> was called only after the lock was already acquired.
>>
>> If there is a concurrent writer, wouldn't it just update pl->last_guest_time
>> with the new get_s_time() and then we subsequently would just use the new
>> time on retry?
> 
> Yes, it would, but the latency until the retry actually occurs
> is unknown (in particular if Xen itself runs virtualized). I.e.
> in the at_tsc == 0 case I think the value would better be
> re-calculated on every iteration.

Why does it need to be recalculated if a concurrent writer did this
for us already anyway and (get_s_time_fixed(at_tsc) + pl->stime_offset)
value is common for all of vCPUs? Yes, it might reduce jitter slightly
but overall latency could come from any point (especially in case of
rinning virtualized) and it's important just to preserve invariant that
the value is monotonic across vCPUs.

> Anther thing I notice only now are the multiple reads of
> pl->last_guest_time. Wouldn't you better do
> 
>         do {
>             old = ACCESS_ONCE(pl->last_guest_time);
>             new = now > old ? now : old + 1;
>         } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
> 
> ?

Fair enough, although even reading it multiple times wouldn't cause
any harm as any inconsistency would be resolved by cmpxchg op. I'd
prefer to make it in a separate commit to unify it with pv_soft_rdtsc().

Igor
Jan Beulich Feb. 20, 2020, 3:47 p.m. UTC | #7
On 20.02.2020 16:37, Igor Druzhinin wrote:
> On 20/02/2020 08:27, Jan Beulich wrote:
>> On 19.02.2020 19:52, Igor Druzhinin wrote:
>>> On 19/02/2020 07:48, Jan Beulich wrote:
>>>> On 20.12.2019 22:39, Igor Druzhinin wrote:
>>>>> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>>>>>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>>>>>  {
>>>>>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
>>>>> -    u64 now;
>>>>> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>>>  
>>>>>      /* Called from device models shared with PV guests. Be careful. */
>>>>>      ASSERT(is_hvm_vcpu(v));
>>>>>  
>>>>> -    spin_lock(&pl->pl_time_lock);
>>>>> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>>> -
>>>>>      if ( !at_tsc )
>>>>>      {
>>>>> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
>>>>> -            pl->last_guest_time = now;
>>>>> -        else
>>>>> -            now = ++pl->last_guest_time;
>>>>> +        do {
>>>>> +            old = pl->last_guest_time;
>>>>> +            new = now > pl->last_guest_time ? now : old + 1;
>>>>> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>>>
>>>> I wonder whether you wouldn't better re-invoke get_s_time() in
>>>> case you need to retry here. See how the function previously
>>>> was called only after the lock was already acquired.
>>>
>>> If there is a concurrent writer, wouldn't it just update pl->last_guest_time
>>> with the new get_s_time() and then we subsequently would just use the new
>>> time on retry?
>>
>> Yes, it would, but the latency until the retry actually occurs
>> is unknown (in particular if Xen itself runs virtualized). I.e.
>> in the at_tsc == 0 case I think the value would better be
>> re-calculated on every iteration.
> 
> Why does it need to be recalculated if a concurrent writer did this
> for us already anyway and (get_s_time_fixed(at_tsc) + pl->stime_offset)
> value is common for all of vCPUs? Yes, it might reduce jitter slightly
> but overall latency could come from any point (especially in case of
> rinning virtualized) and it's important just to preserve invariant that
> the value is monotonic across vCPUs.

I'm afraid I don't follow: If we rely on remote CPUs updating
pl->last_guest_time, then what we'd return is whatever was put
there plus one. Whereas the correct value might be dozens of
clocks further ahead.

>> Anther thing I notice only now are the multiple reads of
>> pl->last_guest_time. Wouldn't you better do
>>
>>         do {
>>             old = ACCESS_ONCE(pl->last_guest_time);
>>             new = now > old ? now : old + 1;
>>         } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>
>> ?
> 
> Fair enough, although even reading it multiple times wouldn't cause
> any harm as any inconsistency would be resolved by cmpxchg op.

Afaics "new", if calculated from a value latched _earlier_
than "old", could cause time to actually move backwards. Reads
can be re-ordered, after all.

> I'd
> prefer to make it in a separate commit to unify it with pv_soft_rdtsc().

I'd be fine if you changed pv_soft_rdtsc() first, and then
made the code here match. But I don't think the code should be
introduced in other than its (for the time being) final shape.

Jan
Igor Druzhinin Feb. 20, 2020, 4:08 p.m. UTC | #8
On 20/02/2020 15:47, Jan Beulich wrote:
> On 20.02.2020 16:37, Igor Druzhinin wrote:
>> On 20/02/2020 08:27, Jan Beulich wrote:
>>> On 19.02.2020 19:52, Igor Druzhinin wrote:
>>>> On 19/02/2020 07:48, Jan Beulich wrote:
>>>>> On 20.12.2019 22:39, Igor Druzhinin wrote:
>>>>>> @@ -38,24 +37,22 @@ void hvm_init_guest_time(struct domain *d)
>>>>>>  uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
>>>>>>  {
>>>>>>      struct pl_time *pl = v->domain->arch.hvm.pl_time;
>>>>>> -    u64 now;
>>>>>> +    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>>>>  
>>>>>>      /* Called from device models shared with PV guests. Be careful. */
>>>>>>      ASSERT(is_hvm_vcpu(v));
>>>>>>  
>>>>>> -    spin_lock(&pl->pl_time_lock);
>>>>>> -    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
>>>>>> -
>>>>>>      if ( !at_tsc )
>>>>>>      {
>>>>>> -        if ( (int64_t)(now - pl->last_guest_time) > 0 )
>>>>>> -            pl->last_guest_time = now;
>>>>>> -        else
>>>>>> -            now = ++pl->last_guest_time;
>>>>>> +        do {
>>>>>> +            old = pl->last_guest_time;
>>>>>> +            new = now > pl->last_guest_time ? now : old + 1;
>>>>>> +        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>>>>
>>>>> I wonder whether you wouldn't better re-invoke get_s_time() in
>>>>> case you need to retry here. See how the function previously
>>>>> was called only after the lock was already acquired.
>>>>
>>>> If there is a concurrent writer, wouldn't it just update pl->last_guest_time
>>>> with the new get_s_time() and then we subsequently would just use the new
>>>> time on retry?
>>>
>>> Yes, it would, but the latency until the retry actually occurs
>>> is unknown (in particular if Xen itself runs virtualized). I.e.
>>> in the at_tsc == 0 case I think the value would better be
>>> re-calculated on every iteration.
>>
>> Why does it need to be recalculated if a concurrent writer did this
>> for us already anyway and (get_s_time_fixed(at_tsc) + pl->stime_offset)
>> value is common for all of vCPUs? Yes, it might reduce jitter slightly
>> but overall latency could come from any point (especially in case of
>> rinning virtualized) and it's important just to preserve invariant that
>> the value is monotonic across vCPUs.
> 
> I'm afraid I don't follow: If we rely on remote CPUs updating
> pl->last_guest_time, then what we'd return is whatever was put
> there plus one. Whereas the correct value might be dozens of
> clocks further ahead.

I'm merely stating that there might be other places contributing to
jitter and getting rid of one of them wouldn't solve the issue completely
(if there is one). But again, I'd like the code to be unified with
pv_soft_rdtsc() so will have to introduce re-calculation there as well.

>>> Anther thing I notice only now are the multiple reads of
>>> pl->last_guest_time. Wouldn't you better do
>>>
>>>         do {
>>>             old = ACCESS_ONCE(pl->last_guest_time);
>>>             new = now > old ? now : old + 1;
>>>         } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
>>>
>>> ?
>>
>> Fair enough, although even reading it multiple times wouldn't cause
>> any harm as any inconsistency would be resolved by cmpxchg op.
> 
> Afaics "new", if calculated from a value latched _earlier_
> than "old", could cause time to actually move backwards. Reads
> can be re-ordered, after all.

I don't think it's possible due to x86 memory model and the fact
pl->last_guest_time only goes forward. But I will change it to
make it explicit and improve readability.

>> I'd
>> prefer to make it in a separate commit to unify it with pv_soft_rdtsc().
> 
> I'd be fine if you changed pv_soft_rdtsc() first, and then
> made the code here match. But I don't think the code should be
> introduced in other than its (for the time being) final shape.

Ok, I'll put pv_soft_rdtsc() commit first.

Igor

Patch
diff mbox series

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index ecd25d7..bf4c432 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -30,7 +30,6 @@  void hvm_init_guest_time(struct domain *d)
 {
     struct pl_time *pl = d->arch.hvm.pl_time;
 
-    spin_lock_init(&pl->pl_time_lock);
     pl->stime_offset = -(u64)get_s_time();
     pl->last_guest_time = 0;
 }
@@ -38,24 +37,22 @@  void hvm_init_guest_time(struct domain *d)
 uint64_t hvm_get_guest_time_fixed(const struct vcpu *v, uint64_t at_tsc)
 {
     struct pl_time *pl = v->domain->arch.hvm.pl_time;
-    u64 now;
+    s_time_t old, new, now = get_s_time_fixed(at_tsc) + pl->stime_offset;
 
     /* Called from device models shared with PV guests. Be careful. */
     ASSERT(is_hvm_vcpu(v));
 
-    spin_lock(&pl->pl_time_lock);
-    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
-
     if ( !at_tsc )
     {
-        if ( (int64_t)(now - pl->last_guest_time) > 0 )
-            pl->last_guest_time = now;
-        else
-            now = ++pl->last_guest_time;
+        do {
+            old = pl->last_guest_time;
+            new = now > pl->last_guest_time ? now : old + 1;
+        } while ( cmpxchg(&pl->last_guest_time, old, new) != old );
     }
-    spin_unlock(&pl->pl_time_lock);
+    else
+        new = now;
 
-    return now + v->arch.hvm.stime_offset;
+    return new + v->arch.hvm.stime_offset;
 }
 
 void hvm_set_guest_time(struct vcpu *v, u64 guest_time)
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 99169dd..f5ccb49 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -135,10 +135,9 @@  struct pl_time {    /* platform time */
     struct HPETState vhpet;
     struct PMTState  vpmt;
     /* guest_time = Xen sys time + stime_offset */
-    int64_t stime_offset;
+    s_time_t stime_offset;
     /* Ensures monotonicity in appropriate timer modes. */
-    uint64_t last_guest_time;
-    spinlock_t pl_time_lock;
+    s_time_t last_guest_time;
     struct domain *domain;
 };