diff mbox

Revert "drm/i915: fix infinite loop at gen6_update_ring_freq"

Message ID 1397113487-25717-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter April 10, 2014, 7:04 a.m. UTC
This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.

This patch duct-tapes over some issue in the current bdw rps patches
which must wait with enabling rc6/rps until the very first batch has
been submitted by userspace.

But those patches aren't merged yet, and for upstream we need to have
an in-kernel emission of the very first batch. I shouldn't have
merged this patch so let's revert it again.

Also Imre noticed that even when rps is set up normally there's a
small window (due to the 1s delay of the async rps init work) where we
could runtime suspend already and blow up all over the place. Imre has
a proper fix to block runtime pm until the rps init work has
successfully completed.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_pm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Ben Widawsky April 10, 2014, 5:50 p.m. UTC | #1
On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
> This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
> 
> This patch duct-tapes over some issue in the current bdw rps patches
> which must wait with enabling rc6/rps until the very first batch has
> been submitted by userspace.
> 
> But those patches aren't merged yet, and for upstream we need to have
> an in-kernel emission of the very first batch. I shouldn't have
> merged this patch so let's revert it again.

I said this on the mailing last before you merged the patch.

> 
> Also Imre noticed that even when rps is set up normally there's a
> small window (due to the 1s delay of the async rps init work) where we
> could runtime suspend already and blow up all over the place. Imre has
> a proper fix to block runtime pm until the rps init work has
> successfully completed.
> 
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 8531cf6e2774..dc7adadbb945 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3522,8 +3522,7 @@ void gen6_update_ring_freq(struct drm_device *dev)
>  	 * to use for memory access.  We do this by specifying the IA frequency
>  	 * the PCU should use as a reference to determine the ring frequency.
>  	 */
> -	for (gpu_freq = dev_priv->rps.max_freq_softlimit;
> -	     gpu_freq >= dev_priv->rps.min_freq_softlimit && gpu_freq != 0;
> +	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
>  	     gpu_freq--) {
>  		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
>  		unsigned int ia_freq = 0, ring_freq = 0;
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ben Widawsky April 10, 2014, 5:52 p.m. UTC | #2
On Thu, Apr 10, 2014 at 10:50:43AM -0700, Ben Widawsky wrote:
> On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
> > This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
> > 
> > This patch duct-tapes over some issue in the current bdw rps patches
> > which must wait with enabling rc6/rps until the very first batch has
> > been submitted by userspace.
> > 
> > But those patches aren't merged yet, and for upstream we need to have
> > an in-kernel emission of the very first batch. I shouldn't have
> > merged this patch so let's revert it again.
> 
> I said this on the mailing last before you merged the patch.

20140402050338.GB13824@bwidawsk.net

> 
> > 
> > Also Imre noticed that even when rps is set up normally there's a
> > small window (due to the 1s delay of the async rps init work) where we
> > could runtime suspend already and blow up all over the place. Imre has
> > a proper fix to block runtime pm until the rps init work has
> > successfully completed.
> > 
> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > Cc: Imre Deak <imre.deak@intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 8531cf6e2774..dc7adadbb945 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3522,8 +3522,7 @@ void gen6_update_ring_freq(struct drm_device *dev)
> >  	 * to use for memory access.  We do this by specifying the IA frequency
> >  	 * the PCU should use as a reference to determine the ring frequency.
> >  	 */
> > -	for (gpu_freq = dev_priv->rps.max_freq_softlimit;
> > -	     gpu_freq >= dev_priv->rps.min_freq_softlimit && gpu_freq != 0;
> > +	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
> >  	     gpu_freq--) {
> >  		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
> >  		unsigned int ia_freq = 0, ring_freq = 0;
> > -- 
> > 1.8.5.2
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter April 11, 2014, 9:02 a.m. UTC | #3
On Thu, Apr 10, 2014 at 10:52:26AM -0700, Ben Widawsky wrote:
> On Thu, Apr 10, 2014 at 10:50:43AM -0700, Ben Widawsky wrote:
> > On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
> > > This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
> > > 
> > > This patch duct-tapes over some issue in the current bdw rps patches
> > > which must wait with enabling rc6/rps until the very first batch has
> > > been submitted by userspace.
> > > 
> > > But those patches aren't merged yet, and for upstream we need to have
> > > an in-kernel emission of the very first batch. I shouldn't have
> > > merged this patch so let's revert it again.
> > 
> > I said this on the mailing last before you merged the patch.
> 
> 20140402050338.GB13824@bwidawsk.net

20140402145813.GV7225@phenom.ffwll.local will explain things.

-Daniel
Paulo Zanoni April 22, 2014, 9:25 p.m. UTC | #4
2014-04-11 6:02 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
> On Thu, Apr 10, 2014 at 10:52:26AM -0700, Ben Widawsky wrote:
>> On Thu, Apr 10, 2014 at 10:50:43AM -0700, Ben Widawsky wrote:
>> > On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
>> > > This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
>> > >
>> > > This patch duct-tapes over some issue in the current bdw rps patches
>> > > which must wait with enabling rc6/rps until the very first batch has
>> > > been submitted by userspace.
>> > >
>> > > But those patches aren't merged yet, and for upstream we need to have
>> > > an in-kernel emission of the very first batch. I shouldn't have
>> > > merged this patch so let's revert it again.
>> >
>> > I said this on the mailing last before you merged the patch.
>>
>> 20140402050338.GB13824@bwidawsk.net
>
> 20140402145813.GV7225@phenom.ffwll.local will explain things.

There's now a regression report pointing to the revert:
https://bugs.freedesktop.org/show_bug.cgi?id=77565 .

What is the proposed solution now? Just WARN and still avoid the
infinite loop? Or keep the infinite loop and leave the bug open?
Disable BDW runtime PM?

Thanks,
Paulo

>
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter April 23, 2014, 7:05 a.m. UTC | #5
On Tue, Apr 22, 2014 at 06:25:12PM -0300, Paulo Zanoni wrote:
> 2014-04-11 6:02 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
> > On Thu, Apr 10, 2014 at 10:52:26AM -0700, Ben Widawsky wrote:
> >> On Thu, Apr 10, 2014 at 10:50:43AM -0700, Ben Widawsky wrote:
> >> > On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
> >> > > This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
> >> > >
> >> > > This patch duct-tapes over some issue in the current bdw rps patches
> >> > > which must wait with enabling rc6/rps until the very first batch has
> >> > > been submitted by userspace.
> >> > >
> >> > > But those patches aren't merged yet, and for upstream we need to have
> >> > > an in-kernel emission of the very first batch. I shouldn't have
> >> > > merged this patch so let's revert it again.
> >> >
> >> > I said this on the mailing last before you merged the patch.
> >>
> >> 20140402050338.GB13824@bwidawsk.net
> >
> > 20140402145813.GV7225@phenom.ffwll.local will explain things.
> 
> There's now a regression report pointing to the revert:
> https://bugs.freedesktop.org/show_bug.cgi?id=77565 .
> 
> What is the proposed solution now? Just WARN and still avoid the
> infinite loop? Or keep the infinite loop and leave the bug open?
> Disable BDW runtime PM?

I've thought that we can only hit this with the as-yet unmerged rc6
patches on bdw, so I'm really confused why this blows up now?

In any case I've thought Imre has stumbled over a similar issue on byt and
he has a fix to prevent runtime pm until the delayed rps init has run.
I've assigned the bug to him.

Still confused why this suddenly blew up ...
-Daniel
Paulo Zanoni April 28, 2014, 6:14 p.m. UTC | #6
2014-04-23 4:05 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
> On Tue, Apr 22, 2014 at 06:25:12PM -0300, Paulo Zanoni wrote:
>> 2014-04-11 6:02 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
>> > On Thu, Apr 10, 2014 at 10:52:26AM -0700, Ben Widawsky wrote:
>> >> On Thu, Apr 10, 2014 at 10:50:43AM -0700, Ben Widawsky wrote:
>> >> > On Thu, Apr 10, 2014 at 09:04:47AM +0200, Daniel Vetter wrote:
>> >> > > This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.
>> >> > >
>> >> > > This patch duct-tapes over some issue in the current bdw rps patches
>> >> > > which must wait with enabling rc6/rps until the very first batch has
>> >> > > been submitted by userspace.
>> >> > >
>> >> > > But those patches aren't merged yet, and for upstream we need to have
>> >> > > an in-kernel emission of the very first batch. I shouldn't have
>> >> > > merged this patch so let's revert it again.
>> >> >
>> >> > I said this on the mailing last before you merged the patch.
>> >>
>> >> 20140402050338.GB13824@bwidawsk.net
>> >
>> > 20140402145813.GV7225@phenom.ffwll.local will explain things.
>>
>> There's now a regression report pointing to the revert:
>> https://bugs.freedesktop.org/show_bug.cgi?id=77565 .
>>
>> What is the proposed solution now? Just WARN and still avoid the
>> infinite loop? Or keep the infinite loop and leave the bug open?
>> Disable BDW runtime PM?
>
> I've thought that we can only hit this with the as-yet unmerged rc6
> patches on bdw, so I'm really confused why this blows up now?
>
> In any case I've thought Imre has stumbled over a similar issue on byt and
> he has a fix to prevent runtime pm until the delayed rps init has run.
> I've assigned the bug to him.
>
> Still confused why this suddenly blew up ...

Sorry for the delayed response.

The bug is very simple: since we did not enable RC6, by the time we
run gen6_update_ring_freq(), the RPS limits will all be zero. The loop
decrements a variable until it reaches a point where it is smaller
than the other. But since the other variable is zero, the loop won't
end since we can't be smaller than zero on the unsigned world, no
matter how much we decrement it.

This can probably be reproduced on non-BDW machines too, with RC6 disabled.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter April 28, 2014, 7:23 p.m. UTC | #7
On Mon, Apr 28, 2014 at 8:14 PM, Paulo Zanoni <przanoni@gmail.com> wrote:
> This can probably be reproduced on non-BDW machines too, with RC6 disabled.

If I understand Imre's patch correctly the bug is that we didn't have
rc6 on bdw, but the sanitize function didn't make this clear leading
to bugs. If my understanding is wrong the I need to drop Imre's patch
again.
-Daniel
Imre Deak April 28, 2014, 7:54 p.m. UTC | #8
On Mon, 2014-04-28 at 21:23 +0200, Daniel Vetter wrote:
> On Mon, Apr 28, 2014 at 8:14 PM, Paulo Zanoni <przanoni@gmail.com> wrote:
> > This can probably be reproduced on non-BDW machines too, with RC6 disabled.
> 
> If I understand Imre's patch correctly the bug is that we didn't have
> rc6 on bdw, but the sanitize function didn't make this clear leading
> to bugs. 

Yes, that's correct. For runtime PM we require RC6 to be enabled, and we
use intel_enable_rc6() to check for this. Before patch [1]
intel_enable_rc6() reported incorrectly on BDW that RC6 is enabled.

--Imre

[1]
http://lists.freedesktop.org/archives/intel-gfx/2014-April/044354.html

>If my understanding is wrong the I need to drop Imre's patch again.

> -Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8531cf6e2774..dc7adadbb945 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3522,8 +3522,7 @@  void gen6_update_ring_freq(struct drm_device *dev)
 	 * to use for memory access.  We do this by specifying the IA frequency
 	 * the PCU should use as a reference to determine the ring frequency.
 	 */
-	for (gpu_freq = dev_priv->rps.max_freq_softlimit;
-	     gpu_freq >= dev_priv->rps.min_freq_softlimit && gpu_freq != 0;
+	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
 	     gpu_freq--) {
 		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
 		unsigned int ia_freq = 0, ring_freq = 0;