diff mbox series

drm/i915/pmu: Check actual RC6 status

Message ID 20210331101850.2582027-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/pmu: Check actual RC6 status | expand

Commit Message

Tvrtko Ursulin March 31, 2021, 10:18 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

RC6 support cannot be simply established by looking at the static device
HAS_RC6() flag. There are cases which disable RC6 at driver load time so
use the status of those check when deciding whether to enumerate the rc6
counter.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Eero T Tamminen <eero.t.tamminen@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Rodrigo Vivi April 1, 2021, 9:19 a.m. UTC | #1
On Wed, Mar 31, 2021 at 11:18:50AM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> RC6 support cannot be simply established by looking at the static device
> HAS_RC6() flag. There are cases which disable RC6 at driver load time so
> use the status of those check when deciding whether to enumerate the rc6
> counter.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reported-by: Eero T Tamminen <eero.t.tamminen@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 41651ac255fa..a75cd1db320b 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -476,6 +476,8 @@ engine_event_status(struct intel_engine_cs *engine,
>  static int
>  config_status(struct drm_i915_private *i915, u64 config)
>  {
> +	struct intel_gt *gt = &i915->gt;
> +
>  	switch (config) {
>  	case I915_PMU_ACTUAL_FREQUENCY:
>  		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> @@ -489,7 +491,7 @@ config_status(struct drm_i915_private *i915, u64 config)
>  	case I915_PMU_INTERRUPTS:
>  		break;
>  	case I915_PMU_RC6_RESIDENCY:
> -		if (!HAS_RC6(i915))
> +		if (!gt->rc6.supported)

Is this really going to remove any confusion?
Right now it is there but with residency 0, but after this change the event is
not there anymore so I wonder if we are not just changing to a different kind
of confusion on users.

>  			return -ENODEV;

would a different return help somehow?

>  		break;
>  	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> -- 
> 2.27.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin April 1, 2021, 9:38 a.m. UTC | #2
On 01/04/2021 10:19, Rodrigo Vivi wrote:
> On Wed, Mar 31, 2021 at 11:18:50AM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> RC6 support cannot be simply established by looking at the static device
>> HAS_RC6() flag. There are cases which disable RC6 at driver load time so
>> use the status of those check when deciding whether to enumerate the rc6
>> counter.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Reported-by: Eero T Tamminen <eero.t.tamminen@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_pmu.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>> index 41651ac255fa..a75cd1db320b 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -476,6 +476,8 @@ engine_event_status(struct intel_engine_cs *engine,
>>   static int
>>   config_status(struct drm_i915_private *i915, u64 config)
>>   {
>> +	struct intel_gt *gt = &i915->gt;
>> +
>>   	switch (config) {
>>   	case I915_PMU_ACTUAL_FREQUENCY:
>>   		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
>> @@ -489,7 +491,7 @@ config_status(struct drm_i915_private *i915, u64 config)
>>   	case I915_PMU_INTERRUPTS:
>>   		break;
>>   	case I915_PMU_RC6_RESIDENCY:
>> -		if (!HAS_RC6(i915))
>> +		if (!gt->rc6.supported)
> 
> Is this really going to remove any confusion?
> Right now it is there but with residency 0, but after this change the event is
> not there anymore so I wonder if we are not just changing to a different kind
> of confusion on users.

I think it is possible to argue both ways.

1)
HAS_RC6 means hardware has RC6 so if we view PMU as very low level we 
can say always export it.

If i915 had to turn it off (rc6->supported == false) due firmware or 
GVT-g, then we could say reporting zero RC6 is accurate in that sense. 
Only the reason "why it is zero" is missing for PMU users.

2)
Or if we go with this patch we could say that presence of the PMU metric 
means RC6 is active and enabled, while absence means it is either not 
supported due platform (or firmware) or how the platform is getting used 
(GVT-g).

So I think patch is a bit better. I don't see it is adding more confusion.

> 
>>   			return -ENODEV;
> 
> would a different return help somehow?

Like distinguishing between not theoretically possible to support on 
this GPU, versus not active? Perhaps.. suggest an errno? :)

Regards,

Tvrtko

> 
>>   		break;
>>   	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Rodrigo Vivi April 1, 2021, 9:54 a.m. UTC | #3
On Thu, Apr 01, 2021 at 10:38:11AM +0100, Tvrtko Ursulin wrote:
> 
> On 01/04/2021 10:19, Rodrigo Vivi wrote:
> > On Wed, Mar 31, 2021 at 11:18:50AM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > RC6 support cannot be simply established by looking at the static device
> > > HAS_RC6() flag. There are cases which disable RC6 at driver load time so
> > > use the status of those check when deciding whether to enumerate the rc6
> > > counter.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Reported-by: Eero T Tamminen <eero.t.tamminen@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/i915_pmu.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > > index 41651ac255fa..a75cd1db320b 100644
> > > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > > @@ -476,6 +476,8 @@ engine_event_status(struct intel_engine_cs *engine,
> > >   static int
> > >   config_status(struct drm_i915_private *i915, u64 config)
> > >   {
> > > +	struct intel_gt *gt = &i915->gt;
> > > +
> > >   	switch (config) {
> > >   	case I915_PMU_ACTUAL_FREQUENCY:
> > >   		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
> > > @@ -489,7 +491,7 @@ config_status(struct drm_i915_private *i915, u64 config)
> > >   	case I915_PMU_INTERRUPTS:
> > >   		break;
> > >   	case I915_PMU_RC6_RESIDENCY:
> > > -		if (!HAS_RC6(i915))
> > > +		if (!gt->rc6.supported)
> > 
> > Is this really going to remove any confusion?
> > Right now it is there but with residency 0, but after this change the event is
> > not there anymore so I wonder if we are not just changing to a different kind
> > of confusion on users.
> 
> I think it is possible to argue both ways.
> 
> 1)
> HAS_RC6 means hardware has RC6 so if we view PMU as very low level we can
> say always export it.
> 
> If i915 had to turn it off (rc6->supported == false) due firmware or GVT-g,
> then we could say reporting zero RC6 is accurate in that sense. Only the
> reason "why it is zero" is missing for PMU users.
> 
> 2)
> Or if we go with this patch we could say that presence of the PMU metric
> means RC6 is active and enabled, while absence means it is either not
> supported due platform (or firmware) or how the platform is getting used
> (GVT-g).
>

yeap, these 2 cases described well my mental conflict...

> So I think patch is a bit better. I don't see it is adding more confusion.

As I said on the other patch I have no strong position on which is better,
but if you and Eero feel that this works better for the current case,
let's do it...

> 
> > 
> > >   			return -ENODEV;
> > 
> > would a different return help somehow?
> 
> Like distinguishing between not theoretically possible to support on this
> GPU, versus not active? Perhaps.. suggest an errno? :)

ENODATA? or EIDRM?

But only if it helps somehow... otherwise don't bother and move with
this as is:

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > >   		break;
> > >   	case I915_PMU_SOFTWARE_GT_AWAKE_TIME:
> > > -- 
> > > 2.27.0
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tamminen, Eero T April 1, 2021, 10:24 a.m. UTC | #4
Hi,

On Thu, 2021-04-01 at 05:54 -0400, Rodrigo Vivi wrote:
> On Thu, Apr 01, 2021 at 10:38:11AM +0100, Tvrtko Ursulin wrote:
...
> > I think it is possible to argue both ways.
> > 
> > 1)
> > HAS_RC6 means hardware has RC6 so if we view PMU as very low level
> > we can
> > say always export it.
> > 
> > If i915 had to turn it off (rc6->supported == false) due firmware or
> > GVT-g,
> > then we could say reporting zero RC6 is accurate in that sense. Only
> > the
> > reason "why it is zero" is missing for PMU users.
> > 
> > 2)
> > Or if we go with this patch we could say that presence of the PMU
> > metric
> > means RC6 is active and enabled, while absence means it is either
> > not
> > supported due platform (or firmware) or how the platform is getting
> > used
> > (GVT-g).
> > 
> 
> yeap, these 2 cases described well my mental conflict...
> 
> > So I think patch is a bit better. I don't see it is adding more
> > confusion.
> 
> As I said on the other patch I have no strong position on which is
> better,
> but if you and Eero feel that this works better for the current case,
> let's do it...

IMHO seeing case 1) i.e. zero RC6 could be slightly better from user
point of view than not seeing RC6 at all, because:

A) user then knows that GPU is not entering RC6, and

B) then the question is why it's not going to RC6 => one can see from
sysfs that it has been disabled


Whereas in case 2), the question is why there's no RC6 info, and user
doesn't know whether GPU is suspended or not (i.e. why GPU power
consumption is higher than expected).  It would help if i-g-t could show
e.g. "RC6 OFF" in that case.


	- Eero
Tvrtko Ursulin April 1, 2021, 11:38 a.m. UTC | #5
On 01/04/2021 11:24, Tamminen, Eero T wrote:
> Hi,
> 
> On Thu, 2021-04-01 at 05:54 -0400, Rodrigo Vivi wrote:
>> On Thu, Apr 01, 2021 at 10:38:11AM +0100, Tvrtko Ursulin wrote:
> ...
>>> I think it is possible to argue both ways.
>>>
>>> 1)
>>> HAS_RC6 means hardware has RC6 so if we view PMU as very low level
>>> we can
>>> say always export it.
>>>
>>> If i915 had to turn it off (rc6->supported == false) due firmware or
>>> GVT-g,
>>> then we could say reporting zero RC6 is accurate in that sense. Only
>>> the
>>> reason "why it is zero" is missing for PMU users.
>>>
>>> 2)
>>> Or if we go with this patch we could say that presence of the PMU
>>> metric
>>> means RC6 is active and enabled, while absence means it is either
>>> not
>>> supported due platform (or firmware) or how the platform is getting
>>> used
>>> (GVT-g).
>>>
>>
>> yeap, these 2 cases described well my mental conflict...
>>
>>> So I think patch is a bit better. I don't see it is adding more
>>> confusion.
>>
>> As I said on the other patch I have no strong position on which is
>> better,
>> but if you and Eero feel that this works better for the current case,
>> let's do it...
> 
> IMHO seeing case 1) i.e. zero RC6 could be slightly better from user
> point of view than not seeing RC6 at all, because:
> 
> A) user then knows that GPU is not entering RC6, and
> 
> B) then the question is why it's not going to RC6 => one can see from
> sysfs that it has been disabled
> 
> 
> Whereas in case 2), the question is why there's no RC6 info, and user
> doesn't know whether GPU is suspended or not (i.e. why GPU power
> consumption is higher than expected).  It would help if i-g-t could show
> e.g. "RC6 OFF" in that case.

So many options.. :)

It can be handle on the "presentation" layer (intel_gpu_top). If we go 
with this patch but different errnos it could indeed distinguish and 
either not show RC6 or say "RC6 OFF".

If we go with the other patch 
(https://patchwork.freedesktop.org/patch/426589/?series=88580&rev=1) 
then intel_gpu_top could really still do the same by looking at 
/sys/class/drm/card0/power/rc6_enable.

So strictly no i915 patch is even needed to provide clarity in 
intel_gpu_top.

But still one of those two i915 patches is required to improve how 
low-level Perf/PMU RC6 counter gets exposed (or not exposed). I don't 
have a strong preference which one to take either. :)

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 41651ac255fa..a75cd1db320b 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -476,6 +476,8 @@  engine_event_status(struct intel_engine_cs *engine,
 static int
 config_status(struct drm_i915_private *i915, u64 config)
 {
+	struct intel_gt *gt = &i915->gt;
+
 	switch (config) {
 	case I915_PMU_ACTUAL_FREQUENCY:
 		if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
@@ -489,7 +491,7 @@  config_status(struct drm_i915_private *i915, u64 config)
 	case I915_PMU_INTERRUPTS:
 		break;
 	case I915_PMU_RC6_RESIDENCY:
-		if (!HAS_RC6(i915))
+		if (!gt->rc6.supported)
 			return -ENODEV;
 		break;
 	case I915_PMU_SOFTWARE_GT_AWAKE_TIME: