Linux 3.10-rc7

Message ID	20130625130816.55e6cb80@jbarnes-desktop (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org> Date: Tue, 25 Jun 2013 13:08:16 -0700 From: Jesse Barnes <jbarnes@virtuousgeek.org> To: Shuah Khan <shuah.kh@samsung.com> Message-ID: <20130625130816.55e6cb80@jbarnes-desktop> In-Reply-To: <B8EFE96D1287C24090BAD9D858E15E617305B4@sisaex02sj> References: <CA+55aFz1yMZ1q5daHdTA91pW5QtQ1Yids1dLD9b2JC-ruicNQw@mail.gmail.com> <B8EFE96D1287C24090BAD9D858E15E6172FFB9@sisaex02sj> <CA+55aFxuB_nx8fZ_kN8U=fMFOq43cKZC_HJbj=6vKo7-fDqNuw@mail.gmail.com> <CAKMK7uH0q1vSaVRigtZD2pf5ZeVLEv1sn+tsiO7ZWjJdt=DwBA@mail.gmail.com> <20130625125437.6e7bb2d4@jbarnes-desktop> <B8EFE96D1287C24090BAD9D858E15E617305B4@sisaex02sj> Mime-Version: 1.0 Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx <intel-gfx@lists.freedesktop.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "shuahkhan@gmail.com" <shuahkhan@gmail.com>, Dave Airlie <airlied@redhat.com>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [Intel-gfx] Linux 3.10-rc7 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Jesse Barnes June 25, 2013, 8:08 p.m. UTC

On Tue, 25 Jun 2013 19:59:28 +0000
Shuah Khan <shuah.kh@samsung.com> wrote:

> On 06/25/2013 01:52 PM, Jesse Barnes wrote:
> > On Tue, 25 Jun 2013 21:37:37 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> 
> >>
> >> Adding more lists to cc + Jesse since he's the guilty one for the
> >> vt-switchless state restore stuff.
> >
> > Yeah, looks like we don't fetch the PLL state on resume from hibernate,
> > leading to this warning.  The refcount is nonzero, indicating the pll
> > is in use, but the active field is clear, which means we're missing an
> > update somewhere.
> >
> > Shuah, just to confirm, does your resume actually work ok aside from
> > the warning?  I *think* it's harmless in this case, but does indicate a
> > real bug in our state tracking... trying to come up with a patch now.
> >
> > Thanks,
> >
> 
> Resume works just fine. I see it take longer for it to suspend compared 
> to 3.9.7 and then resumes just fine. Suspend taking longer very well 
> could be because of this warn_on. Other than this warn_on I haven't 
> noticed any other problems.

Here's the patch I'm testing now, can you give it a try?

Shuah Khan June 25, 2013, 8:51 p.m. UTC | #1

On 06/25/2013 02:06 PM, Jesse Barnes wrote:
> On Tue, 25 Jun 2013 19:59:28 +0000
> Shuah Khan <shuah.kh@samsung.com> wrote:
>
>> On 06/25/2013 01:52 PM, Jesse Barnes wrote:
>>> On Tue, 25 Jun 2013 21:37:37 +0200
>>> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>
>>
>>>>
>>>> Adding more lists to cc + Jesse since he's the guilty one for the
>>>> vt-switchless state restore stuff.
>>>
>>> Yeah, looks like we don't fetch the PLL state on resume from hibernate,
>>> leading to this warning.  The refcount is nonzero, indicating the pll
>>> is in use, but the active field is clear, which means we're missing an
>>> update somewhere.
>>>
>>> Shuah, just to confirm, does your resume actually work ok aside from
>>> the warning?  I *think* it's harmless in this case, but does indicate a
>>> real bug in our state tracking... trying to come up with a patch now.
>>>
>>> Thanks,
>>>
>>
>> Resume works just fine. I see it take longer for it to suspend compared
>> to 3.9.7 and then resumes just fine. Suspend taking longer very well
>> could be because of this warn_on. Other than this warn_on I haven't
>> noticed any other problems.
>
> Here's the patch I'm testing now, can you give it a try?
>

Jesse,

With this patch warn_on went away. Resume worked. I started seeing:

[   78.733062] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
[   78.733079] mei_me 0000:00:16.0: reset: wrong host start response
[   78.733082] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING

over and over again after resume from reboot mode suspend. dmesg filled 
up with these messages.

I did suspend to disk shutdown mode right away. Resume worked, no 
warn_ons, and no mei_me messages.

-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research 
America (Silicon Valley) shuah.kh@samsung.com | (970) 672-0658

Winkler, Tomas June 25, 2013, 8:54 p.m. UTC | #2

On Tue, Jun 25, 2013 at 11:51 PM, Shuah Khan <shuah.kh@samsung.com> wrote:

> On 06/25/2013 02:06 PM, Jesse Barnes wrote:
> > On Tue, 25 Jun 2013 19:59:28 +0000
> > Shuah Khan <shuah.kh@samsung.com> wrote:
> >
> >> On 06/25/2013 01:52 PM, Jesse Barnes wrote:
> >>> On Tue, 25 Jun 2013 21:37:37 +0200
> >>> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>
> >>
> >>>>
> >>>> Adding more lists to cc + Jesse since he's the guilty one for the
> >>>> vt-switchless state restore stuff.
> >>>
> >>> Yeah, looks like we don't fetch the PLL state on resume from hibernate,
> >>> leading to this warning.  The refcount is nonzero, indicating the pll
> >>> is in use, but the active field is clear, which means we're missing an
> >>> update somewhere.
> >>>
> >>> Shuah, just to confirm, does your resume actually work ok aside from
> >>> the warning?  I *think* it's harmless in this case, but does indicate a
> >>> real bug in our state tracking... trying to come up with a patch now.
> >>>
> >>> Thanks,
> >>>
> >>
> >> Resume works just fine. I see it take longer for it to suspend compared
> >> to 3.9.7 and then resumes just fine. Suspend taking longer very well
> >> could be because of this warn_on. Other than this warn_on I haven't
> >> noticed any other problems.
> >
> > Here's the patch I'm testing now, can you give it a try?
> >
>
> Jesse,
>
> With this patch warn_on went away. Resume worked. I started seeing:
>
> [   78.733062] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
> [   78.733079] mei_me 0000:00:16.0: reset: wrong host start response
> [   78.733082] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
>
> over and over again after resume from reboot mode suspend. dmesg filled
> up with these messages.
>

Can you please send me the log part when this starts?

Thanks
Tomas

Winkler, Tomas June 25, 2013, 8:57 p.m. UTC | #3

On Tue, Jun 25, 2013 at 11:51 PM, Shuah Khan <shuah.kh@samsung.com> wrote:
>
> On 06/25/2013 02:06 PM, Jesse Barnes wrote:
> > On Tue, 25 Jun 2013 19:59:28 +0000
> > Shuah Khan <shuah.kh@samsung.com> wrote:
> >
> >> On 06/25/2013 01:52 PM, Jesse Barnes wrote:
> >>> On Tue, 25 Jun 2013 21:37:37 +0200
> >>> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>
> >>
> >>>>
> >>>> Adding more lists to cc + Jesse since he's the guilty one for the
> >>>> vt-switchless state restore stuff.
> >>>
> >>> Yeah, looks like we don't fetch the PLL state on resume from
> >>> hibernate,
> >>> leading to this warning.  The refcount is nonzero, indicating the pll
> >>> is in use, but the active field is clear, which means we're missing an
> >>> update somewhere.
> >>>
> >>> Shuah, just to confirm, does your resume actually work ok aside from
> >>> the warning?  I *think* it's harmless in this case, but does indicate
> >>> a
> >>> real bug in our state tracking... trying to come up with a patch now.
> >>>
> >>> Thanks,
> >>>
> >>
> >> Resume works just fine. I see it take longer for it to suspend compared
> >> to 3.9.7 and then resumes just fine. Suspend taking longer very well
> >> could be because of this warn_on. Other than this warn_on I haven't
> >> noticed any other problems.
> >
> > Here's the patch I'm testing now, can you give it a try?
> >
>
> Jesse,
>
> With this patch warn_on went away. Resume worked. I started seeing:
>
> [   78.733062] mei_me 0000:00:16.0: unexpected reset: dev_state =
> RESETTING
> [   78.733079] mei_me 0000:00:16.0: reset: wrong host start response
> [   78.733082] mei_me 0000:00:16.0: unexpected reset: dev_state =
> RESETTING
>
> over and over again after resume from reboot mode suspend. dmesg filled
> up with these messages.

Can you please send me the log part when this starts?

Thanks

> I did suspend to disk shutdown mode right away. Resume worked, no
> warn_ons, and no mei_me messages.
>
> -- Shuah

Jesse Barnes June 25, 2013, 9:09 p.m. UTC | #4

On Tue, 25 Jun 2013 20:51:27 +0000
Shuah Khan <shuah.kh@samsung.com> wrote:

> On 06/25/2013 02:06 PM, Jesse Barnes wrote:
> > On Tue, 25 Jun 2013 19:59:28 +0000
> > Shuah Khan <shuah.kh@samsung.com> wrote:
> >
> >> On 06/25/2013 01:52 PM, Jesse Barnes wrote:
> >>> On Tue, 25 Jun 2013 21:37:37 +0200
> >>> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>
> >>
> >>>>
> >>>> Adding more lists to cc + Jesse since he's the guilty one for the
> >>>> vt-switchless state restore stuff.
> >>>
> >>> Yeah, looks like we don't fetch the PLL state on resume from hibernate,
> >>> leading to this warning.  The refcount is nonzero, indicating the pll
> >>> is in use, but the active field is clear, which means we're missing an
> >>> update somewhere.
> >>>
> >>> Shuah, just to confirm, does your resume actually work ok aside from
> >>> the warning?  I *think* it's harmless in this case, but does indicate a
> >>> real bug in our state tracking... trying to come up with a patch now.
> >>>
> >>> Thanks,
> >>>
> >>
> >> Resume works just fine. I see it take longer for it to suspend compared
> >> to 3.9.7 and then resumes just fine. Suspend taking longer very well
> >> could be because of this warn_on. Other than this warn_on I haven't
> >> noticed any other problems.
> >
> > Here's the patch I'm testing now, can you give it a try?
> >
> 
> Jesse,
> 
> With this patch warn_on went away. Resume worked. I started seeing:
> 
> [   78.733062] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
> [   78.733079] mei_me 0000:00:16.0: reset: wrong host start response
> [   78.733082] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
> 
> over and over again after resume from reboot mode suspend. dmesg filled 
> up with these messages.
> 
> I did suspend to disk shutdown mode right away. Resume worked, no 
> warn_ons, and no mei_me messages.

Ok good, so that means the new stuff we have queued will likely work
too.

I'll leave it up to Daniel and Linus whether we just kill this warning
for now though, or apply the state fetching patch.

Shuah Khan June 25, 2013, 9:11 p.m. UTC | #5

On 06/25/2013 02:57 PM, Tomas Winkler wrote:
> On Tue, Jun 25, 2013 at 11:51 PM, Shuah Khan <shuah.kh@samsung.com> wrote:
>>

>> With this patch warn_on went away. Resume worked. I started seeing:
>>
>> [   78.733062] mei_me 0000:00:16.0: unexpected reset: dev_state =
>> RESETTING
>> [   78.733079] mei_me 0000:00:16.0: reset: wrong host start response
>> [   78.733082] mei_me 0000:00:16.0: unexpected reset: dev_state =
>> RESETTING
>>
>> over and over again after resume from reboot mode suspend. dmesg filled
>> up with these messages.
>
> Can you please send me the log part when this starts?
>
> Thanks
>

It rolled over and I don't have prior messages. I tried reproducing 
twice and didn't see it again. I will try a few more times and see if I 
can get it to happen again.

This is what I could save before dmesg rolled over:

[   78.709014] mei_me 0000:00:16.0: reset: wrong host start response
[   78.709016] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
[   78.709029] mei_me 0000:00:16.0: reset: unexpected enumeration 
response hbm.
[   78.709031] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING
[   78.709069] mei_me 0000:00:16.0: reset: wrong host start response

-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research 
America (Silicon Valley) shuah.kh@samsung.com | (970) 672-0658

Winkler, Tomas June 26, 2013, 10:11 p.m. UTC | #6

> > Can you please send me the log part when this starts?
> >
> > Thanks
> >
> 
> It rolled over and I don't have prior messages. I tried reproducing twice and
> didn't see it again. I will try a few more times and see if I can get it to happen
> again.
> 
> This is what I could save before dmesg rolled over:
> 
> [   78.709014] mei_me 0000:00:16.0: reset: wrong host start response
> [   78.709016] mei_me 0000:00:16.0: unexpected reset: dev_state =
> RESETTING
> [   78.709029] mei_me 0000:00:16.0: reset: unexpected enumeration
> response hbm.
> [   78.709031] mei_me 0000:00:16.0: unexpected reset: dev_state =
> RESETTING
> [   78.709069] mei_me 0000:00:16.0: reset: wrong host start response
> 
So far I was able to positively reproduce it with  3.10-rc5 but not with 3.10-rc7 and above.

There are 3 patches that went in to fix this issue

42f132f mei: me: clear interrupts on the resume path
2753ff5 mei: nfc: fix nfc device freeing
5e85b36 mei: init: Flush scheduled work before resetting the device

Are you sure you have these 3 in?

Thanks
Tomas

Shuah Khan June 26, 2013, 10:24 p.m. UTC | #7

On 06/26/2013 04:12 PM, Winkler, Tomas wrote:
>
>
>>> Can you please send me the log part when this starts?
>>>
>>> Thanks
>>>
>>
>> It rolled over and I don't have prior messages. I tried reproducing twice and
>> didn't see it again. I will try a few more times and see if I can get it to happen
>> again.
>>
>> This is what I could save before dmesg rolled over:
>>
>> [   78.709014] mei_me 0000:00:16.0: reset: wrong host start response
>> [   78.709016] mei_me 0000:00:16.0: unexpected reset: dev_state =
>> RESETTING
>> [   78.709029] mei_me 0000:00:16.0: reset: unexpected enumeration
>> response hbm.
>> [   78.709031] mei_me 0000:00:16.0: unexpected reset: dev_state =
>> RESETTING
>> [   78.709069] mei_me 0000:00:16.0: reset: wrong host start response
>>
> So far I was able to positively reproduce it with  3.10-rc5 but not with 3.10-rc7 and above.
>
> There are 3 patches that went in to fix this issue
>
> 42f132f mei: me: clear interrupts on the resume path
> 2753ff5 mei: nfc: fix nfc device freeing
> 5e85b36 mei: init: Flush scheduled work before resetting the device
>
> Are you sure you have these 3 in?
>

Checked the git log and yes I have all three commits. It appears this 
problem is intermittent and hard to reproduce at least on 3.10-rc7. I 
tried several times yesterday to capture the log and couldn't reproduce.

-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research 
America (Silicon Valley) shuah.kh@samsung.com | (970) 672-0658

Shuah Khan July 1, 2013, 2:54 p.m. UTC | #8

On 06/26/2013 04:24 PM, Shuah Khan wrote:
> On 06/26/2013 04:12 PM, Winkler, Tomas wrote:
>>
>>

>> 42f132f mei: me: clear interrupts on the resume path
>> 2753ff5 mei: nfc: fix nfc device freeing
>> 5e85b36 mei: init: Flush scheduled work before resetting the device
>>
>> Are you sure you have these 3 in?
>>
>
> Checked the git log and yes I have all three commits. It appears this
> problem is intermittent and hard to reproduce at least on 3.10-rc7. I
> tried several times yesterday to capture the log and couldn't reproduce.
>
> -- Shuah

Tomas,

I saw the mei_me problem again, however couldn't save the logs. I am 
getting into the habit of saving dmesg as soon as system gets resumed to 
catch the dmesg buffer prior to mei getting into this state. There is 
another difference in suspend sequence between 3.9.8 and 3.10-rc6 and rc-7.

When I do echo disk > state,

Screen clears and instead of going into console mode like it does on 
3.9.8, it will get back into graphics mode and show the screen exactly 
the way it was right after echo disk > state command was issued. It 
stays in that state for good 60 seconds or more and then I see the 
suspend complete.

I can start bi-sect of this problem on intel-display scope if you would 
like me to. Please let me know if the bisect scope should be larger.

-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research 
America (Silicon Valley) shuah.kh@samsung.com | (970) 672-0658

Winkler, Tomas July 4, 2013, 7:26 p.m. UTC | #9

On Mon, Jul 1, 2013 at 5:54 PM, Shuah Khan <shuah.kh@samsung.com> wrote:
> On 06/26/2013 04:24 PM, Shuah Khan wrote:
>> On 06/26/2013 04:12 PM, Winkler, Tomas wrote:
>>>
>>>
>
>>> 42f132f mei: me: clear interrupts on the resume path
>>> 2753ff5 mei: nfc: fix nfc device freeing
>>> 5e85b36 mei: init: Flush scheduled work before resetting the device
>>>
>>> Are you sure you have these 3 in?
>>>
>>
>> Checked the git log and yes I have all three commits. It appears this
>> problem is intermittent and hard to reproduce at least on 3.10-rc7. I
>> tried several times yesterday to capture the log and couldn't reproduce.
>>
>> -- Shuah
>
> Tomas,
>
> I saw the mei_me problem again, however couldn't save the logs. I am
> getting into the habit of saving dmesg as soon as system gets resumed to
> catch the dmesg buffer prior to mei getting into this state. There is
> another difference in suspend sequence between 3.9.8 and 3.10-rc6 and rc-7.
>
> When I do echo disk > state,
>
> Screen clears and instead of going into console mode like it does on
> 3.9.8, it will get back into graphics mode and show the screen exactly
> the way it was right after echo disk > state command was issued. It
> stays in that state for good 60 seconds or more and then I see the
> suspend complete.
>
> I can start bi-sect of this problem on intel-display scope if you would
> like me to. Please let me know if the bisect scope should be larger.
>
> -- Shuah

I got finally an older system where this reproduces consistently, I'm
trying to root cause that now.
As soon I have something to test I will send it out.

Thanks
Tomas

Commit Message

Comments

Patch