diff mbox

drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6

Message ID 20180410092328.20645-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin April 10, 2018, 9:23 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
on some CI machines I have concluded that: a) the PMU readout of RC6 can
race against runtime PM transitions, and b) there are other reasons than
being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
fail.

Therefore when estimating RC6 the code needs to assert we are indeed in
suspended state and if not the best we can do is return the last known RC6
value.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
---
I was able to trigger state != RPM_SUSPENDED on the shards, but not yet
the actual estimation overaccounting. As such this fix is based partially
on speculation that it will fix the sporadic perf_pmu/rc6* failures.
Nevertheless I think it is correct to add this check regardless.
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Chris Wilson April 10, 2018, 9:57 a.m. UTC | #1
Quoting Tvrtko Ursulin (2018-04-10 10:23:28)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
> on some CI machines I have concluded that: a) the PMU readout of RC6 can
> race against runtime PM transitions, and b) there are other reasons than
> being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
> fail.
> 
> Therefore when estimating RC6 the code needs to assert we are indeed in
> suspended state and if not the best we can do is return the last known RC6
> value.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> ---
> I was able to trigger state != RPM_SUSPENDED on the shards, but not yet
> the actual estimation overaccounting. As such this fix is based partially
> on speculation that it will fix the sporadic perf_pmu/rc6* failures.
> Nevertheless I think it is correct to add this check regardless.
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index bd7e695fc663..e92a9571db77 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -473,6 +473,30 @@ static u64 get_rc6(struct drm_i915_private *i915)
>                 spin_lock_irqsave(&i915->pmu.lock, flags);
>                 spin_lock(&kdev->power.lock);
>  
> +               /*
> +                * After the above branch intel_runtime_pm_get_if_in_use failed
> +                * to get the runtime PM reference we cannot assume we are in
> +                * runtime suspend since we can either: a) race with coming out
> +                * of it before we took the power.lock, or b) there are other
> +                * states than suspended which can bring us here.
> +                *
> +                * We need to double-check that we are indeed currently runtime
> +                * suspended and if not we cannot do better than report the last
> +                * known RC6 value.
> +                */
> +               if (kdev->power.runtime_status != RPM_SUSPENDED) {
> +                       spin_unlock(&kdev->power.lock);
> +
> +                       if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
> +                       else
> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;

If rpm awake, but having lost the race to read the regs, report the last
known value.

This is because we don't know if another thread is in the other branch,
and so we will have one updating the estimate while it being compared
against.

But I'm not understanding the failure -- why is the estimate bad? At the
very least we still ensure that it is monotonic? Is it just the jitter
you are worrying about? (If the estimate is bad here, isn't it always
bad?)

> +
> +                       spin_unlock_irqrestore(&i915->pmu.lock, flags);
> +
> +                       return val;
> +               }

I'd prefer moving the RPM_SUSPENDED code into an else branch to avoid
another unlock/early return here. (It just fits into 80cols, so no
excuses ;)
-Chris
Tvrtko Ursulin April 10, 2018, 10:22 a.m. UTC | #2
On 10/04/2018 10:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-10 10:23:28)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
>> on some CI machines I have concluded that: a) the PMU readout of RC6 can
>> race against runtime PM transitions, and b) there are other reasons than
>> being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
>> fail.
>>
>> Therefore when estimating RC6 the code needs to assert we are indeed in
>> suspended state and if not the best we can do is return the last known RC6
>> value.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Imre Deak <imre.deak@intel.com>
>> ---
>> I was able to trigger state != RPM_SUSPENDED on the shards, but not yet
>> the actual estimation overaccounting. As such this fix is based partially
>> on speculation that it will fix the sporadic perf_pmu/rc6* failures.
>> Nevertheless I think it is correct to add this check regardless.
>> ---
>>   drivers/gpu/drm/i915/i915_pmu.c | 24 ++++++++++++++++++++++++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index bd7e695fc663..e92a9571db77 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -473,6 +473,30 @@ static u64 get_rc6(struct drm_i915_private *i915)
>>                  spin_lock_irqsave(&i915->pmu.lock, flags);
>>                  spin_lock(&kdev->power.lock);
>>   
>> +               /*
>> +                * After the above branch intel_runtime_pm_get_if_in_use failed
>> +                * to get the runtime PM reference we cannot assume we are in
>> +                * runtime suspend since we can either: a) race with coming out
>> +                * of it before we took the power.lock, or b) there are other
>> +                * states than suspended which can bring us here.
>> +                *
>> +                * We need to double-check that we are indeed currently runtime
>> +                * suspended and if not we cannot do better than report the last
>> +                * known RC6 value.
>> +                */
>> +               if (kdev->power.runtime_status != RPM_SUSPENDED) {
>> +                       spin_unlock(&kdev->power.lock);
>> +
>> +                       if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
>> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
>> +                       else
>> +                               val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;
> 
> If rpm awake, but having lost the race to read the regs, report the last
> known value.
> 
> This is because we don't know if another thread is in the other branch,
> and so we will have one updating the estimate while it being compared
> against.

No, the race is intel_runtime_pm_get_if_in_use telling us the device is 
not active, but a) that doesn't mean it is suspended, and b) it doesn't 
mean it is still suspended after the check.

Also PMU internal state is serialized by the spinlock so there is no 
inconsistency there.
> But I'm not understanding the failure -- why is the estimate bad? At the
> very least we still ensure that it is monotonic? Is it just the jitter
> you are worrying about? (If the estimate is bad here, isn't it always
> bad?)

As far as I have seen failures from CI are all estimate being too large. 
(no jitter and no going backwards)

What I suspect is going bad in either case, is that we must not add the 
delta from current jiffies to internal runtime pm counters if state is 
not suspended. If we do that we are accounting an unknown period of time 
as suspended time and that would explain the over-estimation.

In other words we are only allowed to estimate if the current state is 
definitely suspended. If it is anything else we need to report either 
the last estimated value, or the last real value, depending what is more 
recent.

I've done a CI run which definitely shows we can end up in this path 
when state is not suspended.

>> +
>> +                       spin_unlock_irqrestore(&i915->pmu.lock, flags);
>> +
>> +                       return val;
>> +               }
> 
> I'd prefer moving the RPM_SUSPENDED code into an else branch to avoid
> another unlock/early return here. (It just fits into 80cols, so no
> excuses ;)

Okay makes sense.

Regards,

Tvrtko
Chris Wilson April 10, 2018, 10:34 a.m. UTC | #3
Quoting Tvrtko Ursulin (2018-04-10 11:22:55)
> 
> On 10/04/2018 10:57, Chris Wilson wrote:
> > But I'm not understanding the failure -- why is the estimate bad? At the
> > very least we still ensure that it is monotonic? Is it just the jitter
> > you are worrying about? (If the estimate is bad here, isn't it always
> > bad?)
> 
> As far as I have seen failures from CI are all estimate being too large. 
> (no jitter and no going backwards)
> 
> What I suspect is going bad in either case, is that we must not add the 
> delta from current jiffies to internal runtime pm counters if state is 
> not suspended. If we do that we are accounting an unknown period of time 
> as suspended time and that would explain the over-estimation.
> 
> In other words we are only allowed to estimate if the current state is 
> definitely suspended. If it is anything else we need to report either 
> the last estimated value, or the last real value, depending what is more 
> recent.

i.e. we must not use kdev->power.suspended_jiffies before we know it is
set.

Ok, that is stating to make sense. Thanks, can you update the commitmsg
with this (pretty much verbatim as it is a good explanation).
-Chris
Tvrtko Ursulin April 10, 2018, 10:54 a.m. UTC | #4
On 10/04/2018 11:34, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-10 11:22:55)
>>
>> On 10/04/2018 10:57, Chris Wilson wrote:
>>> But I'm not understanding the failure -- why is the estimate bad? At the
>>> very least we still ensure that it is monotonic? Is it just the jitter
>>> you are worrying about? (If the estimate is bad here, isn't it always
>>> bad?)
>>
>> As far as I have seen failures from CI are all estimate being too large.
>> (no jitter and no going backwards)
>>
>> What I suspect is going bad in either case, is that we must not add the
>> delta from current jiffies to internal runtime pm counters if state is
>> not suspended. If we do that we are accounting an unknown period of time
>> as suspended time and that would explain the over-estimation.
>>
>> In other words we are only allowed to estimate if the current state is
>> definitely suspended. If it is anything else we need to report either
>> the last estimated value, or the last real value, depending what is more
>> recent.
> 
> i.e. we must not use kdev->power.suspended_jiffies before we know it is
> set.
> 
> Ok, that is stating to make sense. Thanks, can you update the commitmsg
> with this (pretty much verbatim as it is a good explanation).

Can do.

The patch makes sense - but I still cannot explain the failures since 
the test is supposed to be running in an controlled environment:

1. enter runtime suspend
2. sample rc6 (this sets the estimation state)
3. sleep for 2s
4. sample rc6

And the diff between two rc6 states can show from 10% - 50% more RC6 
elapsed time than sleep time, even to 300% more in some reports. If i915 
is runtime suspended the whole time, and dmesg says it is, I don't know 
how this is possible.

There must be another flaw somewhere which I am not seeing currently.

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index bd7e695fc663..e92a9571db77 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -473,6 +473,30 @@  static u64 get_rc6(struct drm_i915_private *i915)
 		spin_lock_irqsave(&i915->pmu.lock, flags);
 		spin_lock(&kdev->power.lock);
 
+		/*
+		 * After the above branch intel_runtime_pm_get_if_in_use failed
+		 * to get the runtime PM reference we cannot assume we are in
+		 * runtime suspend since we can either: a) race with coming out
+		 * of it before we took the power.lock, or b) there are other
+		 * states than suspended which can bring us here.
+		 *
+		 * We need to double-check that we are indeed currently runtime
+		 * suspended and if not we cannot do better than report the last
+		 * known RC6 value.
+		 */
+		if (kdev->power.runtime_status != RPM_SUSPENDED) {
+			spin_unlock(&kdev->power.lock);
+
+			if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
+				val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
+			else
+				val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;
+
+			spin_unlock_irqrestore(&i915->pmu.lock, flags);
+
+			return val;
+		}
+
 		if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
 			i915->pmu.suspended_jiffies_last =
 						kdev->power.suspended_jiffies;