diff mbox series

schedule: mask out XEN_RUNSTATE_UPDATE from state entry time

Message ID 1563443655-15802-1-git-send-email-andrii.anisov@gmail.com (mailing list archive)
State New, archived
Headers show
Series schedule: mask out XEN_RUNSTATE_UPDATE from state entry time | expand

Commit Message

Andrii Anisov July 18, 2019, 9:54 a.m. UTC
From: Andrii Anisov <andrii_anisov@epam.com>

Using XEN_RUNSTATE_UPDATE mask during the process of copying runstate
values to guest causes runstate entry time to be eventually considered
negative on a calculation of the time delta. So the XEN_RUNSTATE_UPDATE
should be masked out during the calculation of the time spent in the
particular runstate.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/common/schedule.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Andrii Anisov July 18, 2019, 10:18 a.m. UTC | #1
Hello George,

The patch was also CC'ed to you, but I got:

"554 esa1.hc3370-68.iphmx.com You are being rejected because your senderbase score is below our accepted policy."

from your server.

ps. I hope I'm not bothering you too much.

On 18.07.19 12:54, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> Using XEN_RUNSTATE_UPDATE mask during the process of copying runstate
> values to guest causes runstate entry time to be eventually considered
> negative on a calculation of the time delta. So the XEN_RUNSTATE_UPDATE
> should be masked out during the calculation of the time spent in the
> particular runstate.
> 
> Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
> ---
>   xen/common/schedule.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 25f6ab3..f4f1a81 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -182,7 +182,9 @@ static inline void vcpu_runstate_change(
>   
>       trace_runstate_change(v, new_state);
>   
> -    delta = new_entry_time - v->runstate.state_entry_time;
> +    delta = new_entry_time -
> +            (v->runstate.state_entry_time & ~XEN_RUNSTATE_UPDATE);
> +
>       if ( delta > 0 )
>       {
>           v->runstate.time[v->runstate.state] += delta;
>
Jan Beulich July 18, 2019, 11:10 a.m. UTC | #2
On 18.07.2019 11:54, Andrii Anisov wrote:
> Using XEN_RUNSTATE_UPDATE mask during the process of copying runstate
> values to guest causes runstate entry time to be eventually considered
> negative on a calculation of the time delta. So the XEN_RUNSTATE_UPDATE
> should be masked out during the calculation of the time spent in the
> particular runstate.

A concrete scenario where update_runstate_area() and vcpu_runstate_change()
can actually race would be very worthwhile to point out here. In particular
on Arm I'm not (yet?) seeing how this could happen.

Considering the value of XEN_RUNSTATE_UPDATE it must be a rather rare race
anyway, I would guess.

Jan
Andrii Anisov July 18, 2019, 2:12 p.m. UTC | #3
Hello Jan,

On 18.07.19 14:10, Jan Beulich wrote:
> A concrete scenario where update_runstate_area() and vcpu_runstate_change()
> can actually race would be very worthwhile to point out here. In particular
> on Arm I'm not (yet?) seeing how this could happen.

The scenario is quite trivial: some vcpu uptdating its runstate values (e.g. context switching) while those values are being read by a guest using other vcpu.

> Considering the value of XEN_RUNSTATE_UPDATE it must be a rather rare race
> anyway, I would guess.

I'm not sure how do you link the value of XEN_RUNSTATE_UPDATE and the issue occurrence rate. Could you please provide more details about the idea?
Andrii Anisov July 18, 2019, 2:18 p.m. UTC | #4
Hello Jan,

On 18.07.19 14:10, Jan Beulich wrote:
> Considering the value of XEN_RUNSTATE_UPDATE it must be a rather rare race
> anyway, I would guess.

I'm not sure about the exact rate of the race, but with following prints:

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 25f6ab3..6ba82b8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -188,6 +188,13 @@ static inline void vcpu_runstate_change(
          v->runstate.time[v->runstate.state] += delta;
          v->runstate.state_entry_time = new_entry_time;
      }
+    else
+    {
+        printk("delta %"PRIx64", v->runstate.state_entry_time is %"PRIx64", new_entry_time %"PRIx64"\n",
+            (long unsigned int)delta,
+            (long unsigned int)v->runstate.state_entry_time,
+            (long unsigned int)new_entry_time);
+    }
  
      v->runstate.state = new_state;
  }

I've got my console completely flooded with something like following:

(XEN) delta 80000000000008e8, v->runstate.state_entry_time is 80000018d58e3cb3, new_entry_time 18d58e459b
(XEN) delta 8000000000001c98, v->runstate.state_entry_time is 80000018d8224fa5, new_entry_time 18d8226c3d
(XEN) delta 80000000000012c0, v->runstate.state_entry_time is 80000018d94fea14, new_entry_time 18d94ffcd4
(XEN) delta 8000000000000ca8, v->runstate.state_entry_time is 80000018da874e8c, new_entry_time 18da875b34
(XEN) delta 8000000000001338, v->runstate.state_entry_time is 80000018db2602bc, new_entry_time 18db2615f4
(XEN) delta 8000000000000780, v->runstate.state_entry_time is 80000018ddd9ed1a, new_entry_time 18ddd9f49a
(XEN) delta 80000000000016f8, v->runstate.state_entry_time is 80000018e19def39, new_entry_time 18e19e0631
(XEN) delta 800000000001669c, v->runstate.state_entry_time is 80000018e22b1553, new_entry_time 18e22c7bef
(XEN) delta 80000000000010e0, v->runstate.state_entry_time is 80000018e2d28e72, new_entry_time 18e2d29f52
Jan Beulich July 18, 2019, 3:04 p.m. UTC | #5
On 18.07.2019 16:12, Andrii Anisov wrote:
> On 18.07.19 14:10, Jan Beulich wrote:
>> A concrete scenario where update_runstate_area() and vcpu_runstate_change()
>> can actually race would be very worthwhile to point out here. In particular
>> on Arm I'm not (yet?) seeing how this could happen.
> 
> The scenario is quite trivial: some vcpu uptdating its runstate values
> (e.g. context switching) while those values are being read by a guest using
> other vcpu.

Well, that's (afaia) not an intended usage scenario. That's specifically
what the XEN_RUNSTATE_UPDATE flag was introduced for: This way guests
can notice that they shouldn't use the values, as they're likely
inconsistent. They'd pause for a brief moment and make another attempt;
see Linux'es xen_get_runstate_snapshot_cpu_delta().

But neither from the code change nor from the description I would have
implied that it's a guest side problem you're trying to address. So
far I was assuming you had found a race in Xen itself.

>> Considering the value of XEN_RUNSTATE_UPDATE it must be a rather rare race
>> anyway, I would guess.
> 
> I'm not sure how do you link the value of XEN_RUNSTATE_UPDATE and the issue
> occurrence rate. Could you please provide more details about the idea?

The value is huge, hence it being wrongly part of a calculation ought to
be easily noticeable _if_ it occurred often enough.

Jan
Andrii Anisov July 18, 2019, 6:11 p.m. UTC | #6
Hello Jan,

On 18.07.19 18:04, Jan Beulich wrote:
> On 18.07.2019 16:12, Andrii Anisov wrote:
>> The scenario is quite trivial: some vcpu uptdating its runstate values
>> (e.g. context switching) while those values are being read by a guest using
>> other vcpu.
> 
> Well, that's (afaia) not an intended usage scenario. That's specifically
> what the XEN_RUNSTATE_UPDATE flag was introduced for: This way guests
> can notice that they shouldn't use the values, as they're likely
> inconsistent. They'd pause for a brief moment and make another attempt;
> see Linux'es xen_get_runstate_snapshot_cpu_delta().

OK, I did this patch a while ago and wrongly recalled the scenario.
The race occurs when some vcpu was just scheduled out in RUNSTATE_blocked. Going down by schedule path it enters `update_runstate_area(prev)`, and, at this moment, it is kicked by `vcpu_wake()` from other PCPU.

> But neither from the code change nor from the description I would have
> implied that it's a guest side problem you're trying to address. So
> far I was assuming you had found a race in Xen itself.

As I describe above the race is in XEN itself. Yet it has no practical impact on the system at this moment.
This patch does fix the race in XEN, but breaks what was fixed by XEN_RUNSTATE_UPDATE. So I have to recall it.

>>> Considering the value of XEN_RUNSTATE_UPDATE it must be a rather rare race
>>> anyway, I would guess.
>>
>> I'm not sure how do you link the value of XEN_RUNSTATE_UPDATE and the issue
>> occurrence rate. Could you please provide more details about the idea?
> 
> The value is huge, hence it being wrongly part of a calculation ought to
> be easily noticeable _if_ it occurred often enough.

Well, the `delta` value become negative, so it is not accumulated into the current runstate time and the new runstate entry time is not updated.
While we currently seeing this effect for blocked-to-runnable transition only, it can be ignored.
diff mbox series

Patch

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 25f6ab3..f4f1a81 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -182,7 +182,9 @@  static inline void vcpu_runstate_change(
 
     trace_runstate_change(v, new_state);
 
-    delta = new_entry_time - v->runstate.state_entry_time;
+    delta = new_entry_time -
+            (v->runstate.state_entry_time & ~XEN_RUNSTATE_UPDATE);
+
     if ( delta > 0 )
     {
         v->runstate.time[v->runstate.state] += delta;