Message ID | 1350830642-10923-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, 21 Oct 2012 15:44:02 +0100 Chris Wilson <chris@chris-wilson.co.uk> wrote: > Even though we do not use the EI mode for determining when to change GPU > frequencies for RPS, changing this value causes no up interrupts to be > generated whilst an OpenGL client runs. > > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Aug 15 10:41:45 2012 +0200 > > drm/i915: use hsw rps tuning values everywhere on gen6+ > > Reported-by: Eric Anholt <eric@anholt.net> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Eric Anholt <eric@anholt.net> > Cc: Ben Widawsky <ben@bwidawsk.net> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/i915/intel_pm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index 81e88c2..15b585e 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev) > > I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); > I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); > - I915_WRITE(GEN6_RP_UP_EI, 66000); > + I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000); > I915_WRITE(GEN6_RP_DOWN_EI, 350000); > > I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10); I've not verified this with interrupts, but simply reading back the current frequency using the sysfs interface. What I've seen running xonotic on Ivybridge is that we do bump the frequency initially, and then stops. Nearing the end of the demo, we again raise the frequency. Note that on my system, both before, and after this patch, I am able to get to the max GPU frequency with the xonotic demo. Specifically, on my IVB which has a range of 350->1100 with an RP1 of 650. I see the following (the demo is roughly 2 minutes) without patch: Within a few seconds we cycle up to 750 Nothing for about 30 seconds very slowly cycle up to 1100 (*just* before the demo ends) demo ends; throttle down to 350 quickly with patch: Within a few seconds we cycle up to 1000 Nothing for about 30 seconds cycle up to 1100 demo ends; throttle down to 350 slowly I think if this fixes someones critical issue, it's great, but unfortunately I do not see the problem the patch claims to fix. Furthermore, none of us can really make sense of why this has the effect that it does, but I believe a lot of that is because the workloads we run (in this case xonotic) are very blackbox. Personally, on this IVB, I think the behavior before the patch is more desirable because it stays near RP1 for a longer period of time, and drops to RP0 quickly (but it's definitely a matter of opinion).
On Wed, 24 Oct 2012 12:05:35 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > On Sun, 21 Oct 2012 15:44:02 +0100 > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > Even though we do not use the EI mode for determining when to change GPU > > frequencies for RPS, changing this value causes no up interrupts to be > > generated whilst an OpenGL client runs. > > > > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > Date: Wed Aug 15 10:41:45 2012 +0200 > > > > drm/i915: use hsw rps tuning values everywhere on gen6+ > > > > Reported-by: Eric Anholt <eric@anholt.net> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Eric Anholt <eric@anholt.net> > > Cc: Ben Widawsky <ben@bwidawsk.net> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: stable@vger.kernel.org > > --- > > drivers/gpu/drm/i915/intel_pm.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > > index 81e88c2..15b585e 100644 > > --- a/drivers/gpu/drm/i915/intel_pm.c > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev) > > > > I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); > > I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); > > - I915_WRITE(GEN6_RP_UP_EI, 66000); > > + I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000); > > I915_WRITE(GEN6_RP_DOWN_EI, 350000); > > > > I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10); > > > > > I've not verified this with interrupts, but simply reading back the > current frequency using the sysfs interface. > > What I've seen running xonotic on Ivybridge is that we do bump the > frequency initially, and then stops. Nearing the end of the demo, we > again raise the frequency. Note that on my system, both before, and > after this patch, I am able to get to the max GPU frequency with the > xonotic demo. > > Specifically, on my IVB which has a range of 350->1100 with an RP1 of > 650. I see the following (the demo is roughly 2 minutes) > > without patch: > Within a few seconds we cycle up to 750 > Nothing for about 30 seconds > very slowly cycle up to 1100 (*just* before the demo ends) > demo ends; throttle down to 350 quickly > > with patch: > Within a few seconds we cycle up to 1000 > Nothing for about 30 seconds > cycle up to 1100 > demo ends; throttle down to 350 slowly > > I think if this fixes someones critical issue, it's great, but > unfortunately I do not see the problem the patch claims to fix. > Furthermore, none of us can really make sense of why this has the effect > that it does, but I believe a lot of that is because the workloads we > run (in this case xonotic) are very blackbox. > > Personally, on this IVB, I think the behavior before the patch is more > desirable because it stays near RP1 for a longer period of time, and > drops to RP0 quickly (but it's definitely a matter of opinion). It's a balance between power and performance. Running at the higher freqs is definitely less power efficient, but without this patch we definitely have a performance regression (running at 750 instead of 1000 MHz for most of the demo, plus the cases Chris saw). But we also don't want to prevent RC6 entry on vulnerable systems, so maybe we need two sets of values or a broader set of changes that work better for a swath of workloads. Also, the initial patch to use the HSW values implies that the values are applicable across generations. They're not. They're very specific to a given generation and potentially even different versions of a given generation, so sharing the values is probably a bad idea in general.
On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote: > On Wed, 24 Oct 2012 12:05:35 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > On Sun, 21 Oct 2012 15:44:02 +0100 > > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > > Even though we do not use the EI mode for determining when to change GPU > > > frequencies for RPS, changing this value causes no up interrupts to be > > > generated whilst an OpenGL client runs. > > > > > > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c > > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > > Date: Wed Aug 15 10:41:45 2012 +0200 > > > > > > drm/i915: use hsw rps tuning values everywhere on gen6+ > > > > > > Reported-by: Eric Anholt <eric@anholt.net> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Eric Anholt <eric@anholt.net> > > > Cc: Ben Widawsky <ben@bwidawsk.net> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > Cc: stable@vger.kernel.org > > > --- > > > drivers/gpu/drm/i915/intel_pm.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > > > index 81e88c2..15b585e 100644 > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev) > > > > > > I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); > > > I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); > > > - I915_WRITE(GEN6_RP_UP_EI, 66000); > > > + I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000); > > > I915_WRITE(GEN6_RP_DOWN_EI, 350000); > > > > > > I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10); > > > > > > > > > > I've not verified this with interrupts, but simply reading back the > > current frequency using the sysfs interface. > > > > What I've seen running xonotic on Ivybridge is that we do bump the > > frequency initially, and then stops. Nearing the end of the demo, we > > again raise the frequency. Note that on my system, both before, and > > after this patch, I am able to get to the max GPU frequency with the > > xonotic demo. > > > > Specifically, on my IVB which has a range of 350->1100 with an RP1 of > > 650. I see the following (the demo is roughly 2 minutes) > > > > without patch: > > Within a few seconds we cycle up to 750 > > Nothing for about 30 seconds > > very slowly cycle up to 1100 (*just* before the demo ends) > > demo ends; throttle down to 350 quickly > > > > with patch: > > Within a few seconds we cycle up to 1000 > > Nothing for about 30 seconds > > cycle up to 1100 > > demo ends; throttle down to 350 slowly > > > > I think if this fixes someones critical issue, it's great, but > > unfortunately I do not see the problem the patch claims to fix. > > Furthermore, none of us can really make sense of why this has the effect > > that it does, but I believe a lot of that is because the workloads we > > run (in this case xonotic) are very blackbox. > > > > Personally, on this IVB, I think the behavior before the patch is more > > desirable because it stays near RP1 for a longer period of time, and > > drops to RP0 quickly (but it's definitely a matter of opinion). > > It's a balance between power and performance. Running at the higher > freqs is definitely less power efficient, but without this patch we > definitely have a performance regression (running at 750 instead of > 1000 MHz for most of the demo, plus the cases Chris saw). > > But we also don't want to prevent RC6 entry on vulnerable systems, so > maybe we need two sets of values or a broader set of changes that work > better for a swath of workloads. To me this sounds like an idea which is better on paper than in practice. Unless we can generalize the various sets of values with knobs for users to pick from, it will really be overwhelming (I feel), and will probably provide more bad data than good. Fine grained knobs are already given via registers, and really pestering users can toy with those, if they want. Perhaps it's on us to add better comments to i915_reg.h to explain how we think it works a bit better? > > Also, the initial patch to use the HSW values implies that the values > are applicable across generations. They're not. They're very specific > to a given generation and potentially even different versions of a > given generation, so sharing the values is probably a bad idea in > general. > > -- > Jesse Barnes, Intel Open Source Technology Center To set the record straight, I am pretty sure we can agree on a few things. 1. We don't really know wtf. 2. There will be no globally good set of values. 2a. There may even be no globally decent set of values. 3. 1 workload is insufficient to determine anything If a workload is obviously too GPU render heavy, I think a decent solution is to just use the sysfs parameters to force a specific min. We don't really need to be skittish about setting this as really *most* things are too GPU render heavy for us at the moment. For the general case I'll assert the performance 'profile' is the right thing unless someone proves otherwise, and here we just want the set of values which throttle down faster than they throttle up. I think there is still some value in the proof of concept stuff you were hoping to do around a system wide GPU power governor thing. Maybe we can just set the RPS autothrottle values to be as dumb as possible, and then use that governor to set the min/max frequencies as it sees fit. Another similar option is to try to use SW RC6. If we had an intern, I'd even suggest to pick some decent ranges for all the thresholds and have that intern run benchmarks with all the different values. Okay, done wasting your time now..
On Thu, 20 Dec 2012 17:57:04 -0800 Ben Widawsky <ben@bwidawsk.net> wrote: > On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote: > > On Wed, 24 Oct 2012 12:05:35 -0700 > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > On Sun, 21 Oct 2012 15:44:02 +0100 > > > Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > > > > Even though we do not use the EI mode for determining when to > > > > change GPU frequencies for RPS, changing this value causes no > > > > up interrupts to be generated whilst an OpenGL client runs. > > > > > > > > Fixes regression from commit > > > > 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c Author: Daniel Vetter > > > > <daniel.vetter@ffwll.ch> Date: Wed Aug 15 10:41:45 2012 +0200 > > > > > > > > drm/i915: use hsw rps tuning values everywhere on gen6+ > > > > > > > > Reported-by: Eric Anholt <eric@anholt.net> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Eric Anholt <eric@anholt.net> > > > > Cc: Ben Widawsky <ben@bwidawsk.net> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > Cc: stable@vger.kernel.org > > > > --- > > > > drivers/gpu/drm/i915/intel_pm.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > > > > b/drivers/gpu/drm/i915/intel_pm.c index 81e88c2..15b585e 100644 > > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct > > > > drm_device *dev) > > > > I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); > > > > I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); > > > > - I915_WRITE(GEN6_RP_UP_EI, 66000); > > > > + I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : > > > > 100000); I915_WRITE(GEN6_RP_DOWN_EI, 350000); > > > > > > > > I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10); > > > > > > > > > > > > > > > I've not verified this with interrupts, but simply reading back > > > the current frequency using the sysfs interface. > > > > > > What I've seen running xonotic on Ivybridge is that we do bump the > > > frequency initially, and then stops. Nearing the end of the demo, > > > we again raise the frequency. Note that on my system, both > > > before, and after this patch, I am able to get to the max GPU > > > frequency with the xonotic demo. > > > > > > Specifically, on my IVB which has a range of 350->1100 with an > > > RP1 of 650. I see the following (the demo is roughly 2 minutes) > > > > > > without patch: > > > Within a few seconds we cycle up to 750 > > > Nothing for about 30 seconds > > > very slowly cycle up to 1100 (*just* before the demo ends) > > > demo ends; throttle down to 350 quickly > > > > > > with patch: > > > Within a few seconds we cycle up to 1000 > > > Nothing for about 30 seconds > > > cycle up to 1100 > > > demo ends; throttle down to 350 slowly > > > > > > I think if this fixes someones critical issue, it's great, but > > > unfortunately I do not see the problem the patch claims to fix. > > > Furthermore, none of us can really make sense of why this has the > > > effect that it does, but I believe a lot of that is because the > > > workloads we run (in this case xonotic) are very blackbox. > > > > > > Personally, on this IVB, I think the behavior before the patch is > > > more desirable because it stays near RP1 for a longer period of > > > time, and drops to RP0 quickly (but it's definitely a matter of > > > opinion). > > > > It's a balance between power and performance. Running at the higher > > freqs is definitely less power efficient, but without this patch we > > definitely have a performance regression (running at 750 instead of > > 1000 MHz for most of the demo, plus the cases Chris saw). > > > > But we also don't want to prevent RC6 entry on vulnerable systems, > > so maybe we need two sets of values or a broader set of changes > > that work better for a swath of workloads. > > To me this sounds like an idea which is better on paper than in > practice. Unless we can generalize the various sets of values with > knobs for users to pick from, it will really be overwhelming (I > feel), and will probably provide more bad data than good. Fine > grained knobs are already given via registers, and really pestering > users can toy with those, if they want. Perhaps it's on us to add > better comments to i915_reg.h to explain how we think it works a bit > better? > > > > > Also, the initial patch to use the HSW values implies that the > > values are applicable across generations. They're not. They're > > very specific to a given generation and potentially even different > > versions of a given generation, so sharing the values is probably a > > bad idea in general. > > > > -- > > Jesse Barnes, Intel Open Source Technology Center > > To set the record straight, I am pretty sure we can agree on a few > things. > 1. We don't really know wtf. > 2. There will be no globally good set of values. > 2a. There may even be no globally decent set of values. > 3. 1 workload is insufficient to determine anything > > If a workload is obviously too GPU render heavy, I think a decent > solution is to just use the sysfs parameters to force a specific min. > We don't really need to be skittish about setting this as really > *most* things are too GPU render heavy for us at the moment. For the > general case I'll assert the performance 'profile' is the right thing > unless someone proves otherwise, and here we just want the set of > values which throttle down faster than they throttle up. > In the email in my head, that said "the power 'profile'" > I think there is still some value in the proof of concept stuff you > were hoping to do around a system wide GPU power governor thing. > Maybe we can just set the RPS autothrottle values to be as dumb as > possible, and then use that governor to set the min/max frequencies > as it sees fit. Another similar option is to try to use SW RC6. If we > had an intern, I'd even suggest to pick some decent ranges for all > the thresholds and have that intern run benchmarks with all the > different values. > > Okay, done wasting your time now..
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 81e88c2..15b585e 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev) I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); - I915_WRITE(GEN6_RP_UP_EI, 66000); + I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000); I915_WRITE(GEN6_RP_DOWN_EI, 350000); I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
Even though we do not use the EI mode for determining when to change GPU frequencies for RPS, changing this value causes no up interrupts to be generated whilst an OpenGL client runs. Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Aug 15 10:41:45 2012 +0200 drm/i915: use hsw rps tuning values everywhere on gen6+ Reported-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Eric Anholt <eric@anholt.net> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org --- drivers/gpu/drm/i915/intel_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)