[4/4,v2] pm_rps: Require that cur reaches min at idle

Message ID	1390514090-17875-1-git-send-email-jeff.mcgee@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: jeff.mcgee@intel.com To: intel-gfx@lists.freedesktop.org Date: Thu, 23 Jan 2014 15:54:50 -0600 Message-Id: <1390514090-17875-1-git-send-email-jeff.mcgee@intel.com> In-Reply-To: <20140123202455.GA29495@jeffdesk> References: <20140123202455.GA29495@jeffdesk> Subject: [Intel-gfx] [PATCH 4/4 v2] pm_rps: Require that cur reaches min at idle Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org

jeff.mcgee@intel.com Jan. 23, 2014, 9:54 p.m. UTC

From: Jeff McGee <jeff.mcgee@intel.com>

The current frequency should reach the minimum frequency within a
reasonable time during idle.

v2: Not using forcewake for this particular subtest per Daniel's
    suggestion.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
---
 tests/pm_rps.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Daniel Vetter Jan. 25, 2014, 7:46 p.m. UTC | #1

On Thu, Jan 23, 2014 at 03:54:50PM -0600, jeff.mcgee@intel.com wrote:
> From: Jeff McGee <jeff.mcgee@intel.com>
> 
> The current frequency should reach the minimum frequency within a
> reasonable time during idle.
> 
> v2: Not using forcewake for this particular subtest per Daniel's
>     suggestion.
> 
> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>

Hm, I guess I wasn't clear enough: I've thought about adding a new subtest
which just does the check that the actual frequency reaches the lowest
level on idle a new subtest, not extend the existing one.

Same somewhat holds for your first patch, atm the testcase is a very basic
test of the kernel error checking.

But even ignoring that I'm not really sure what you're aiming at. Imo the
current coverage is good enough since it makes sure that we have at least
a bit of error checking in place. Any extensions to this test should imo
only be done when we add new features or to exercise bugs (or classes of
bugs) that actually happened. Iirc (and I didn't check olds mails, so this
might be wrong) we have some corner-cases across suspend/resume and some
with the in-kernel code to adjust the requested frequency under load. So
imo that's what we should test for.

Reading through my test requirements write-up I've just noticed that I
didn't emphasis that there's also too much testing possible imho. I'm not
a big proponent of test driven developement and similar validate
everything approaches. I guess I need to write up my thoughts about too
much testing, too ;-)

Anyway I'm a bit confused about what's the overall goal here, so please
elaborate.

Cheers, Daniel

> --- tests/pm_rps.c | 22 ++++++++++++++++++---- 1 file changed, 18
> insertions(+), 4 deletions(-)
> 
> diff --git a/tests/pm_rps.c b/tests/pm_rps.c index 7ae0438..24a1ad6
> 100644 --- a/tests/pm_rps.c +++ b/tests/pm_rps.c @@ -206,13 +206,27 @@
> static void min_max_config(void (*check)(void)) check(); }
>  
> +#define IDLE_WAIT_TIMESTEP_MSEC 100 +#define IDLE_WAIT_TIMEOUT_MSEC
> 3000 static void idle_check(void) { int freqs[NUMFREQ]; - -
> read_freqs(freqs); -	dump(freqs); -	checkit(freqs); +	int wait =
> 0; + +	/* Monitor frequencies until cur settles down to min,
> which should +	 * happen within the allotted time */ +	do { +
> read_freqs(freqs); +		dump(freqs); +		checkit(freqs); +
> if (freqs[CUR] == freqs[MIN]) +			break; +
> usleep(1000 * IDLE_WAIT_TIMESTEP_MSEC); +		wait +=
> IDLE_WAIT_TIMESTEP_MSEC; +	} while (wait < IDLE_WAIT_TIMEOUT_MSEC); +
> +	igt_assert(freqs[CUR] == freqs[MIN]); +	log("Required %d msec to
> reach cur=min\n", wait); }
>  
>  static void pm_rps_exit_handler(int sig) -- 1.8.5.2
> 
> _______________________________________________ Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

jeff.mcgee@intel.com Jan. 27, 2014, 4:24 p.m. UTC | #2

On Sat, Jan 25, 2014 at 08:46:45PM +0100, Daniel Vetter wrote:
> On Thu, Jan 23, 2014 at 03:54:50PM -0600, jeff.mcgee@intel.com wrote:
> > From: Jeff McGee <jeff.mcgee@intel.com>
> > 
> > The current frequency should reach the minimum frequency within a
> > reasonable time during idle.
> > 
> > v2: Not using forcewake for this particular subtest per Daniel's
> >     suggestion.
> > 
> > Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> 
> Hm, I guess I wasn't clear enough: I've thought about adding a new subtest
> which just does the check that the actual frequency reaches the lowest
> level on idle a new subtest, not extend the existing one.
> 
> Same somewhat holds for your first patch, atm the testcase is a very basic
> test of the kernel error checking.
> 
> But even ignoring that I'm not really sure what you're aiming at. Imo the
> current coverage is good enough since it makes sure that we have at least
> a bit of error checking in place. Any extensions to this test should imo
> only be done when we add new features or to exercise bugs (or classes of
> bugs) that actually happened. Iirc (and I didn't check olds mails, so this
> might be wrong) we have some corner-cases across suspend/resume and some
> with the in-kernel code to adjust the requested frequency under load. So
> imo that's what we should test for.
> 
> Reading through my test requirements write-up I've just noticed that I
> didn't emphasis that there's also too much testing possible imho. I'm not
> a big proponent of test driven developement and similar validate
> everything approaches. I guess I need to write up my thoughts about too
> much testing, too ;-)
> 
> Anyway I'm a bit confused about what's the overall goal here, so please
> elaborate.
> 
> Cheers, Daniel
> 
Hi Daniel. I guess it would have helped if I described my overall goals
with this test development in the beginning. I have two rps related
driver patches on hold from a few months ago because you felt the igt
testing wasn't adequate to accept these. They are:

"Update rps interrupt limits"
"Restore rps/rc6 on reset"

It took me some time to get around to this, but now I am intent on
expanding pm_rps with the proper subtests to expose the issues. An outline
of the subtests I have in mind:

min-max-config-at-idle
- Manipulate min and max through sysfs and make sure driver does the right
  thing. Right thing includes accepting valid values, rejecting invalid
  values, ensuring that cur remains between min and max, and (for idle)
  ensuring that cur reaches min after such changes. This last requirement
  is not being met in part because interrupt limits are not being updated
  properly when min and max are changed. That will be justification for
  patch "Update rps interrupt limits".
- I have been forming this subtest as an expansion of Ben's original test.
  His cases for min/max checking are a subset of this.

min-max-config-at-load
- Manipulate min and max as above, but do so during load. Check for the
  same basic requirements. When heavily loaded, cur should reach max. There
  is also potentially some problem with this scenario due to interrupt
  limits not being updated reliably. If cur is at max, and max is then
  increased, cur may stay at old max.
- This will be a new subtest that re-uses the min/max manipulation function
  but just do so under load.

restore-on-reset
- Trigger gpu reset, then test that user-set (non-default) min/max settings
  were retained and that turbo functions correctly when load is applied and
  removed. This scenario will fail today and is the point of "Restore rps/rc6
  on reset".
- This will be another new subtest not yet submitted,

So the subtests are focused on known issues. I'm open to any suggestions on
their design or implementation. Thanks.

Jeff

Daniel Vetter Jan. 27, 2014, 4:50 p.m. UTC | #3

On Mon, Jan 27, 2014 at 10:24:53AM -0600, Jeff McGee wrote:
> On Sat, Jan 25, 2014 at 08:46:45PM +0100, Daniel Vetter wrote:
> > On Thu, Jan 23, 2014 at 03:54:50PM -0600, jeff.mcgee@intel.com wrote:
> > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > 
> > > The current frequency should reach the minimum frequency within a
> > > reasonable time during idle.
> > > 
> > > v2: Not using forcewake for this particular subtest per Daniel's
> > >     suggestion.
> > > 
> > > Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> > 
> > Hm, I guess I wasn't clear enough: I've thought about adding a new subtest
> > which just does the check that the actual frequency reaches the lowest
> > level on idle a new subtest, not extend the existing one.
> > 
> > Same somewhat holds for your first patch, atm the testcase is a very basic
> > test of the kernel error checking.
> > 
> > But even ignoring that I'm not really sure what you're aiming at. Imo the
> > current coverage is good enough since it makes sure that we have at least
> > a bit of error checking in place. Any extensions to this test should imo
> > only be done when we add new features or to exercise bugs (or classes of
> > bugs) that actually happened. Iirc (and I didn't check olds mails, so this
> > might be wrong) we have some corner-cases across suspend/resume and some
> > with the in-kernel code to adjust the requested frequency under load. So
> > imo that's what we should test for.
> > 
> > Reading through my test requirements write-up I've just noticed that I
> > didn't emphasis that there's also too much testing possible imho. I'm not
> > a big proponent of test driven developement and similar validate
> > everything approaches. I guess I need to write up my thoughts about too
> > much testing, too ;-)
> > 
> > Anyway I'm a bit confused about what's the overall goal here, so please
> > elaborate.
> > 
> > Cheers, Daniel
> > 
> Hi Daniel. I guess it would have helped if I described my overall goals
> with this test development in the beginning. I have two rps related
> driver patches on hold from a few months ago because you felt the igt
> testing wasn't adequate to accept these. They are:
> 
> "Update rps interrupt limits"
> "Restore rps/rc6 on reset"
> 
> It took me some time to get around to this, but now I am intent on
> expanding pm_rps with the proper subtests to expose the issues. An outline
> of the subtests I have in mind:
> 
> min-max-config-at-idle
> - Manipulate min and max through sysfs and make sure driver does the right
>   thing. Right thing includes accepting valid values, rejecting invalid
>   values, ensuring that cur remains between min and max, and (for idle)
>   ensuring that cur reaches min after such changes. This last requirement
>   is not being met in part because interrupt limits are not being updated
>   properly when min and max are changed. That will be justification for
>   patch "Update rps interrupt limits".
> - I have been forming this subtest as an expansion of Ben's original test.
>   His cases for min/max checking are a subset of this.

Oh, I didn't realize that the idle limits are currently broken and that
your first patches fixes that. I think I now also understand why do you
the idle-check at each step, so that this actually gets properly
exercised.

Problem with your approach here is that this will cause a spurious
regression report from our QA since the current min-max-config-at-idle
will suddenly starts failing (since it will test more and currently broken
things with your patches applied). And that might shadow other basic
issues (yeah, unlikely but still).

So I think the right approach here is to keep the current subtest as-is
(maybe rename it to "basic-api" since that's pretty much the only thing it
does), and then add a completely new set of subtests like you've laid out
here. So just leave the existing code as-is and copypaste the code for all
your new tests (where you make changes at least).

It's probably simplest if you do this shuffling to avoid a rebase mess
with your existing patches.

> min-max-config-at-load
> - Manipulate min and max as above, but do so during load. Check for the
>   same basic requirements. When heavily loaded, cur should reach max. There
>   is also potentially some problem with this scenario due to interrupt
>   limits not being updated reliably. If cur is at max, and max is then
>   increased, cur may stay at old max.
> - This will be a new subtest that re-uses the min/max manipulation function
>   but just do so under load.
> 
> restore-on-reset
> - Trigger gpu reset, then test that user-set (non-default) min/max settings
>   were retained and that turbo functions correctly when load is applied and
>   removed. This scenario will fail today and is the point of "Restore rps/rc6
>   on reset".
> - This will be another new subtest not yet submitted,
> 
> So the subtests are focused on known issues. I'm open to any suggestions on
> their design or implementation. Thanks.

Excellent test plan imo and thanks a lot for unconfusing me.

Cheers, Daniel

jeff.mcgee@intel.com Jan. 27, 2014, 10:23 p.m. UTC | #4

On Mon, Jan 27, 2014 at 05:50:04PM +0100, Daniel Vetter wrote:
> On Mon, Jan 27, 2014 at 10:24:53AM -0600, Jeff McGee wrote:
> > On Sat, Jan 25, 2014 at 08:46:45PM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 23, 2014 at 03:54:50PM -0600, jeff.mcgee@intel.com wrote:
> > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > 
> > > > The current frequency should reach the minimum frequency within a
> > > > reasonable time during idle.
> > > > 
> > > > v2: Not using forcewake for this particular subtest per Daniel's
> > > >     suggestion.
> > > > 
> > > > Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> > > 
> > > Hm, I guess I wasn't clear enough: I've thought about adding a new subtest
> > > which just does the check that the actual frequency reaches the lowest
> > > level on idle a new subtest, not extend the existing one.
> > > 
> > > Same somewhat holds for your first patch, atm the testcase is a very basic
> > > test of the kernel error checking.
> > > 
> > > But even ignoring that I'm not really sure what you're aiming at. Imo the
> > > current coverage is good enough since it makes sure that we have at least
> > > a bit of error checking in place. Any extensions to this test should imo
> > > only be done when we add new features or to exercise bugs (or classes of
> > > bugs) that actually happened. Iirc (and I didn't check olds mails, so this
> > > might be wrong) we have some corner-cases across suspend/resume and some
> > > with the in-kernel code to adjust the requested frequency under load. So
> > > imo that's what we should test for.
> > > 
> > > Reading through my test requirements write-up I've just noticed that I
> > > didn't emphasis that there's also too much testing possible imho. I'm not
> > > a big proponent of test driven developement and similar validate
> > > everything approaches. I guess I need to write up my thoughts about too
> > > much testing, too ;-)
> > > 
> > > Anyway I'm a bit confused about what's the overall goal here, so please
> > > elaborate.
> > > 
> > > Cheers, Daniel
> > > 
> > Hi Daniel. I guess it would have helped if I described my overall goals
> > with this test development in the beginning. I have two rps related
> > driver patches on hold from a few months ago because you felt the igt
> > testing wasn't adequate to accept these. They are:
> > 
> > "Update rps interrupt limits"
> > "Restore rps/rc6 on reset"
> > 
> > It took me some time to get around to this, but now I am intent on
> > expanding pm_rps with the proper subtests to expose the issues. An outline
> > of the subtests I have in mind:
> > 
> > min-max-config-at-idle
> > - Manipulate min and max through sysfs and make sure driver does the right
> >   thing. Right thing includes accepting valid values, rejecting invalid
> >   values, ensuring that cur remains between min and max, and (for idle)
> >   ensuring that cur reaches min after such changes. This last requirement
> >   is not being met in part because interrupt limits are not being updated
> >   properly when min and max are changed. That will be justification for
> >   patch "Update rps interrupt limits".
> > - I have been forming this subtest as an expansion of Ben's original test.
> >   His cases for min/max checking are a subset of this.
> 
> Oh, I didn't realize that the idle limits are currently broken and that
> your first patches fixes that. I think I now also understand why do you
> the idle-check at each step, so that this actually gets properly
> exercised.
> 
> Problem with your approach here is that this will cause a spurious
> regression report from our QA since the current min-max-config-at-idle
> will suddenly starts failing (since it will test more and currently broken
> things with your patches applied). And that might shadow other basic
> issues (yeah, unlikely but still).
> 
> So I think the right approach here is to keep the current subtest as-is
> (maybe rename it to "basic-api" since that's pretty much the only thing it
> does), and then add a completely new set of subtests like you've laid out
> here. So just leave the existing code as-is and copypaste the code for all
> your new tests (where you make changes at least).
> 
Ah yes, expanding the subtest to the point that it uncovers an old problem
will appear as a regression from recent driver updates, which it is not. I
didn't consider this. I agree with your approach to have the current subtest
cover only basic min/max parameter validation. But I would recommend taking
patches 1-3 of this set - they work within that scope and will not cause
failures. Can we just drop patch 4? I will re-introduce it as part of adding
the new subtests.
-Jeff

> It's probably simplest if you do this shuffling to avoid a rebase mess
> with your existing patches.
> 
> > min-max-config-at-load
> > - Manipulate min and max as above, but do so during load. Check for the
> >   same basic requirements. When heavily loaded, cur should reach max. There
> >   is also potentially some problem with this scenario due to interrupt
> >   limits not being updated reliably. If cur is at max, and max is then
> >   increased, cur may stay at old max.
> > - This will be a new subtest that re-uses the min/max manipulation function
> >   but just do so under load.
> > 
> > restore-on-reset
> > - Trigger gpu reset, then test that user-set (non-default) min/max settings
> >   were retained and that turbo functions correctly when load is applied and
> >   removed. This scenario will fail today and is the point of "Restore rps/rc6
> >   on reset".
> > - This will be another new subtest not yet submitted,
> > 
> > So the subtests are focused on known issues. I'm open to any suggestions on
> > their design or implementation. Thanks.
> 
> Excellent test plan imo and thanks a lot for unconfusing me.
> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Daniel Vetter Jan. 27, 2014, 10:39 p.m. UTC | #5

On Mon, Jan 27, 2014 at 04:23:26PM -0600, Jeff McGee wrote:
> On Mon, Jan 27, 2014 at 05:50:04PM +0100, Daniel Vetter wrote:
> > On Mon, Jan 27, 2014 at 10:24:53AM -0600, Jeff McGee wrote:
> > > On Sat, Jan 25, 2014 at 08:46:45PM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 23, 2014 at 03:54:50PM -0600, jeff.mcgee@intel.com wrote:
> > > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > > 
> > > > > The current frequency should reach the minimum frequency within a
> > > > > reasonable time during idle.
> > > > > 
> > > > > v2: Not using forcewake for this particular subtest per Daniel's
> > > > >     suggestion.
> > > > > 
> > > > > Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> > > > 
> > > > Hm, I guess I wasn't clear enough: I've thought about adding a new subtest
> > > > which just does the check that the actual frequency reaches the lowest
> > > > level on idle a new subtest, not extend the existing one.
> > > > 
> > > > Same somewhat holds for your first patch, atm the testcase is a very basic
> > > > test of the kernel error checking.
> > > > 
> > > > But even ignoring that I'm not really sure what you're aiming at. Imo the
> > > > current coverage is good enough since it makes sure that we have at least
> > > > a bit of error checking in place. Any extensions to this test should imo
> > > > only be done when we add new features or to exercise bugs (or classes of
> > > > bugs) that actually happened. Iirc (and I didn't check olds mails, so this
> > > > might be wrong) we have some corner-cases across suspend/resume and some
> > > > with the in-kernel code to adjust the requested frequency under load. So
> > > > imo that's what we should test for.
> > > > 
> > > > Reading through my test requirements write-up I've just noticed that I
> > > > didn't emphasis that there's also too much testing possible imho. I'm not
> > > > a big proponent of test driven developement and similar validate
> > > > everything approaches. I guess I need to write up my thoughts about too
> > > > much testing, too ;-)
> > > > 
> > > > Anyway I'm a bit confused about what's the overall goal here, so please
> > > > elaborate.
> > > > 
> > > > Cheers, Daniel
> > > > 
> > > Hi Daniel. I guess it would have helped if I described my overall goals
> > > with this test development in the beginning. I have two rps related
> > > driver patches on hold from a few months ago because you felt the igt
> > > testing wasn't adequate to accept these. They are:
> > > 
> > > "Update rps interrupt limits"
> > > "Restore rps/rc6 on reset"
> > > 
> > > It took me some time to get around to this, but now I am intent on
> > > expanding pm_rps with the proper subtests to expose the issues. An outline
> > > of the subtests I have in mind:
> > > 
> > > min-max-config-at-idle
> > > - Manipulate min and max through sysfs and make sure driver does the right
> > >   thing. Right thing includes accepting valid values, rejecting invalid
> > >   values, ensuring that cur remains between min and max, and (for idle)
> > >   ensuring that cur reaches min after such changes. This last requirement
> > >   is not being met in part because interrupt limits are not being updated
> > >   properly when min and max are changed. That will be justification for
> > >   patch "Update rps interrupt limits".
> > > - I have been forming this subtest as an expansion of Ben's original test.
> > >   His cases for min/max checking are a subset of this.
> > 
> > Oh, I didn't realize that the idle limits are currently broken and that
> > your first patches fixes that. I think I now also understand why do you
> > the idle-check at each step, so that this actually gets properly
> > exercised.
> > 
> > Problem with your approach here is that this will cause a spurious
> > regression report from our QA since the current min-max-config-at-idle
> > will suddenly starts failing (since it will test more and currently broken
> > things with your patches applied). And that might shadow other basic
> > issues (yeah, unlikely but still).
> > 
> > So I think the right approach here is to keep the current subtest as-is
> > (maybe rename it to "basic-api" since that's pretty much the only thing it
> > does), and then add a completely new set of subtests like you've laid out
> > here. So just leave the existing code as-is and copypaste the code for all
> > your new tests (where you make changes at least).
> > 
> Ah yes, expanding the subtest to the point that it uncovers an old problem
> will appear as a regression from recent driver updates, which it is not. I
> didn't consider this. I agree with your approach to have the current subtest
> cover only basic min/max parameter validation. But I would recommend taking
> patches 1-3 of this set - they work within that scope and will not cause
> failures. Can we just drop patch 4? I will re-introduce it as part of adding
> the new subtests.

Works for me, too. I've also mashed the rename to "basic-api" on top while
at it. Patches merged to i-g-t, thanks.
-Daniel

[4/4,v2] pm_rps: Require that cur reaches min at idle

Commit Message

Comments

Patch