diff mbox

mysterious crashes on OMAP5 uevm

Message ID CANOLnOODjTaBcL1QzAm7o4YOB=_P-s7JYovu6fhNSqJSV2Bq+Q@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grazvydas Ignotas Sept. 8, 2015, 8:41 p.m. UTC
On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>> Hi,
>>
>> this is a longstanding problem I'm seeing since the very beginning,
>> which was around 3.12 or so (when I've first got the hardware) and it
>> seems 4.2 is affected by it still. Basically what happens is Xorg
>> randomly segfaults at some "impossible" location. I don't have the
>> details at the moment (could get them is needed), but from what I
>> examined with gdb some time ago the situation did not make any sense.
>>
>> There are 2 workarounds that I know which make the problem go away
>> (one is enough):
>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>> - disable ARCH_MULTI_V6 in the kernel config
>>
>> Because of the above workarounds I have forgotten about it several
>> times, but it regularly comes back and bites again. It would look like
>> some missing erratum workaround, but I have all of them enabled in the
>> kernel.
>>
>> Does anyone know about this? Perhaps some missing erratum workaround
>> in the bootloader? u-boot isn't too old here (2015.07).
>
> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> places ignoring uncompress and davinci code.

ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
disabled, it is enough to just do this:


... and the problem appears, so I guess this needs some real
multiplatform handling,.

> Do you have some easy way to reproduce this issue?

Just moving a browser window around with mouse usually triggers it
within a minute.

>
> Regards,
>
> Tony

Gražvydas

Comments

Tony Lindgren Sept. 8, 2015, 9:07 p.m. UTC | #1
* Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >> Hi,
> >>
> >> this is a longstanding problem I'm seeing since the very beginning,
> >> which was around 3.12 or so (when I've first got the hardware) and it
> >> seems 4.2 is affected by it still. Basically what happens is Xorg
> >> randomly segfaults at some "impossible" location. I don't have the
> >> details at the moment (could get them is needed), but from what I
> >> examined with gdb some time ago the situation did not make any sense.
> >>
> >> There are 2 workarounds that I know which make the problem go away
> >> (one is enough):
> >> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >> - disable ARCH_MULTI_V6 in the kernel config
> >>
> >> Because of the above workarounds I have forgotten about it several
> >> times, but it regularly comes back and bites again. It would look like
> >> some missing erratum workaround, but I have all of them enabled in the
> >> kernel.
> >>
> >> Does anyone know about this? Perhaps some missing erratum workaround
> >> in the bootloader? u-boot isn't too old here (2015.07).
> >
> > Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> > Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> > __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> > places ignoring uncompress and davinci code.
> 
> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> disabled, it is enough to just do this:
> 
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>                 /*
>                  * The LSB of the handler determines if we're going to
>                  * be using THUMB or ARM mode for this signal handler.
>                  */
>                 thumb = handler & 1;
> 
> -#if __LINUX_ARM_ARCH__ >= 7
> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>                 /*
>                  * Clear the If-Then Thumb-2 execution state
>                  * ARM spec requires this to be all 000s in ARM mode
>                  * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>                  * signal transition without this.
>                  */
> 
> ... and the problem appears, so I guess this needs some real
> multiplatform handling,.

OK nice to hear you found it. Yeah looks like some runtime
capability check is needed.
 
> > Do you have some easy way to reproduce this issue?
> 
> Just moving a browser window around with mouse usually triggers it
> within a minute.

OK good to know.

Regards,

Tony
H. Nikolaus Schaller Sept. 10, 2015, 6:42 a.m. UTC | #2
Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:

> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>> Hi,
>>>> 
>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>> randomly segfaults at some "impossible" location. I don't have the
>>>> details at the moment (could get them is needed), but from what I
>>>> examined with gdb some time ago the situation did not make any sense.
>>>> 
>>>> There are 2 workarounds that I know which make the problem go away
>>>> (one is enough):
>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>> 
>>>> Because of the above workarounds I have forgotten about it several
>>>> times, but it regularly comes back and bites again. It would look like
>>>> some missing erratum workaround, but I have all of them enabled in the
>>>> kernel.
>>>> 
>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>> 
>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>> places ignoring uncompress and davinci code.
>> 
>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>> disabled, it is enough to just do this:
>> 
>> --- a/arch/arm/kernel/signal.c
>> +++ b/arch/arm/kernel/signal.c
>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>                /*
>>                 * The LSB of the handler determines if we're going to
>>                 * be using THUMB or ARM mode for this signal handler.
>>                 */
>>                thumb = handler & 1;
>> 
>> -#if __LINUX_ARM_ARCH__ >= 7
>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>                /*
>>                 * Clear the If-Then Thumb-2 execution state
>>                 * ARM spec requires this to be all 000s in ARM mode
>>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>                 * signal transition without this.
>>                 */
>> 
>> ... and the problem appears, so I guess this needs some real
>> multiplatform handling,.
> 
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
> 
>>> Do you have some easy way to reproduce this issue?
>> 
>> Just moving a browser window around with mouse usually triggers it
>> within a minute.
> 
> OK good to know.

It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.

[we are using the binary xserver from debian wheezy
ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]

We know about this bug for a while, but so far did think that some touch screen
event bit has changed and we have to fix our touch screen driver.

Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> #if 0 //__LINUX_ARM_ARCH__ >= 7
makes it re-appear.

A while ago I tried to debug running the x-server under strace and could find that it also has
something to do with SIGALRM.

And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c

BR,
Nikolaus
Russell King - ARM Linux Sept. 10, 2015, 8:30 a.m. UTC | #3
On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> 
> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
> 
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> >> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> >>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >>>> Hi,
> >>>> 
> >>>> this is a longstanding problem I'm seeing since the very beginning,
> >>>> which was around 3.12 or so (when I've first got the hardware) and it
> >>>> seems 4.2 is affected by it still. Basically what happens is Xorg
> >>>> randomly segfaults at some "impossible" location. I don't have the
> >>>> details at the moment (could get them is needed), but from what I
> >>>> examined with gdb some time ago the situation did not make any sense.
> >>>> 
> >>>> There are 2 workarounds that I know which make the problem go away
> >>>> (one is enough):
> >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >>>> - disable ARCH_MULTI_V6 in the kernel config
> >>>> 
> >>>> Because of the above workarounds I have forgotten about it several
> >>>> times, but it regularly comes back and bites again. It would look like
> >>>> some missing erratum workaround, but I have all of them enabled in the
> >>>> kernel.
> >>>> 
> >>>> Does anyone know about this? Perhaps some missing erratum workaround
> >>>> in the bootloader? u-boot isn't too old here (2015.07).
> >>> 
> >>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> >>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> >>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> >>> places ignoring uncompress and davinci code.
> >> 
> >> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> >> disabled, it is enough to just do this:
> >> 
> >> --- a/arch/arm/kernel/signal.c
> >> +++ b/arch/arm/kernel/signal.c
> >> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
> >>                /*
> >>                 * The LSB of the handler determines if we're going to
> >>                 * be using THUMB or ARM mode for this signal handler.
> >>                 */
> >>                thumb = handler & 1;
> >> 
> >> -#if __LINUX_ARM_ARCH__ >= 7
> >> +#if 0 //__LINUX_ARM_ARCH__ >= 7
> >>                /*
> >>                 * Clear the If-Then Thumb-2 execution state
> >>                 * ARM spec requires this to be all 000s in ARM mode
> >>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
> >>                 * signal transition without this.
> >>                 */
> >> 
> >> ... and the problem appears, so I guess this needs some real
> >> multiplatform handling,.
> > 
> > OK nice to hear you found it. Yeah looks like some runtime
> > capability check is needed.
> > 
> >>> Do you have some easy way to reproduce this issue?
> >> 
> >> Just moving a browser window around with mouse usually triggers it
> >> within a minute.
> > 
> > OK good to know.
> 
> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
> 
> [we are using the binary xserver from debian wheezy
> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
> 
> We know about this bug for a while, but so far did think that some touch screen
> event bit has changed and we have to fix our touch screen driver.
> 
> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> makes it re-appear.
> 
> A while ago I tried to debug running the x-server under strace and could find that it also has
> something to do with SIGALRM.
> 
> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c

It would be really nice if someone could diagnose what's going on here.
What exception is causing the X server to be killed (someone said a
segfault)?  What is the register state at the point that happens?  What
does the code look like  Is it happening inside the SIGALRM handler, or
when the SIGALRM handler has returned?

I'd suggest attaching gdb to the X server, but remember to set gdb to
ignore SIGPIPEs.
H. Nikolaus Schaller Sept. 10, 2015, 8:57 a.m. UTC | #4
Am 10.09.2015 um 10:30 schrieb Russell King - ARM Linux <linux@arm.linux.org.uk>:

> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> 
>> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
>> 
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>>>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>>>> Hi,
>>>>>> 
>>>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>>>> randomly segfaults at some "impossible" location. I don't have the
>>>>>> details at the moment (could get them is needed), but from what I
>>>>>> examined with gdb some time ago the situation did not make any sense.
>>>>>> 
>>>>>> There are 2 workarounds that I know which make the problem go away
>>>>>> (one is enough):
>>>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>>>> 
>>>>>> Because of the above workarounds I have forgotten about it several
>>>>>> times, but it regularly comes back and bites again. It would look like
>>>>>> some missing erratum workaround, but I have all of them enabled in the
>>>>>> kernel.
>>>>>> 
>>>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>>>> 
>>>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>>>> places ignoring uncompress and davinci code.
>>>> 
>>>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>>>> disabled, it is enough to just do this:
>>>> 
>>>> --- a/arch/arm/kernel/signal.c
>>>> +++ b/arch/arm/kernel/signal.c
>>>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>>>               /*
>>>>                * The LSB of the handler determines if we're going to
>>>>                * be using THUMB or ARM mode for this signal handler.
>>>>                */
>>>>               thumb = handler & 1;
>>>> 
>>>> -#if __LINUX_ARM_ARCH__ >= 7
>>>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>>>               /*
>>>>                * Clear the If-Then Thumb-2 execution state
>>>>                * ARM spec requires this to be all 000s in ARM mode
>>>>                * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>>>                * signal transition without this.
>>>>                */
>>>> 
>>>> ... and the problem appears, so I guess this needs some real
>>>> multiplatform handling,.
>>> 
>>> OK nice to hear you found it. Yeah looks like some runtime
>>> capability check is needed.
>>> 
>>>>> Do you have some easy way to reproduce this issue?
>>>> 
>>>> Just moving a browser window around with mouse usually triggers it
>>>> within a minute.
>>> 
>>> OK good to know.
>> 
>> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
>> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
>> 
>> [we are using the binary xserver from debian wheezy
>> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
>> 
>> We know about this bug for a while, but so far did think that some touch screen
>> event bit has changed and we have to fix our touch screen driver.
>> 
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>>>> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>> 
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>> 
>> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
> 
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
> 
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

I don’t have a setup to run gdb (with source) on the device and really zero
experience with Xserver sources. But maybe Grazvydas can do that better
than me.

Attached is some strace I had recorded during my earlier experiments.
X-Server appears not only to heavily use SIGALRM but SIGIO.

And it looks as if it a SEGFAULT appears inside the SIGIO handler after
having done 3 syscalls (select, read, clock_gettime) but before the
sigreturn. At least in this example.

Xserver then does a graceful shutdown after SEGFAULT. I.e. it prints the
segfault message by itself.

Hope this is a useful piece to solve the puzzle and helps a little.

BR,
Nikolaus

…
--- SIGALRM (Alarm clock) @ 0 (0) ---
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T^\351\n\0\3\0\0\0:\4\0\0;\230\353T^\351\n\0\3\0\1\0=\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 494831541}) = 0
sigreturn()                             = ? (mask now [ILL ABRT KILL USR1 SEGV PIPE TERM STKFLT CHLD STOP TSTP TTIN XFSZ VTALRM PROF IO PWR RTMIN])
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 499042967}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 500050047}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 501911619}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tw\20\v\0\3\0\0\0h\4\0\0;\230\353Tw\20\v\0\3\0\1\0\256\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 504536131}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
clock_gettime(CLOCK_MONOTONIC, {7330, 506275633}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 506855467}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 507587889}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508442381}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508961180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509418943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509998777}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 511860350}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TT7\v\0\3\0\0\0\242\4\0\0;\230\353TT7\v\0\3\0\1\0\367\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 514484861}) = 0
sigreturn()                             = ? (mask now [])
clock_gettime(CLOCK_MONOTONIC, {7330, 516224363}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 516743162}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517200926}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517719725}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 518452147}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519367674}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519947508}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tn^\v\0\3\0\0\0\370\4\0\0;\230\353Tn^\v\0\3\0\1\0y\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 525074461}) = 0
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 528400877}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 529377440}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 530018309}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 531910399}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\246\205\v\0\3\0\0\0V\5\0\0;\230\353T\246\205\v\0\3\0\1\0\336\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 534534910}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
writev(20, [{"\6\0T\3\256\332o\0\345\0\0\0\3\0\0\1\0\0\0\0h\0\377\0h\0\377\0\0\1\1\0"..., 224}], 1) = 224
clock_gettime(CLOCK_MONOTONIC, {7330, 542164305}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TX\255\v\0\3\0\0\0\317\5\0\0;\230\353TX\255\v\0\3\0\1\0T\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 546253660}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
read(20, "5\20\4\0\236\0\0\1\3\0\0\1\33\1\257\0\224\4\6\0\237\0\0\1\236\0\0\1)\0\0\0"..., 4096) = 1088
clock_gettime(CLOCK_MONOTONIC, {7330, 548756102}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 549366453}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [HUP QUIT ILL])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\273\323\v\0\3\0\0\0K\6\0\0;\230\353T\273\323\v\0\3\0\1\0\314\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 554707029}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 558155516}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 559132078}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 560749510}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\325\372\v\0\3\0\0\0\326\6\0\0;\230\353T\325\372\v\0\3\0\1\0:\n\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 564564207}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 565968016}) = 0
write(0, "[  7330.565] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "Backtrace:\n", 11Backtrace:
)            = 11
clock_gettime(CLOCK_MONOTONIC, {7330, 568195799}) = 0
write(0, "[  7330.568] ", 13)           = 13
write(0, "Backtrace:\n", 11)            = 11
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 571125486}) = 0
write(0, "[  7330.571] ", 13)           = 13
write(0, "\n", 1)                       = 1
futex(0xb6c587d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "Segmentation fault at address (n"..., 36Segmentation fault at address (nil)
) = 36
clock_gettime(CLOCK_MONOTONIC, {7330, 575092772}) = 0
write(0, "[  7330.575] ", 13)           = 13
write(0, "Segmentation fault at address (n"..., 36) = 36
write(2, "\nFatal server error:\n", 21
Fatal server error:
) = 21
clock_gettime(CLOCK_MONOTONIC, {7330, 577412108}) = 0
write(0, "[  7330.577] ", 13)           = 13
write(0, "\nFatal server error:\n", 21) = 21
write(2, "Caught signal 11 (Segmentation f"..., 55Caught signal 11 (Segmentation fault). Server aborting
) = 55
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [ABRT BUS FPE USR1 SEGV USR2 ALRM STKFLT CHLD CONT TTIN TTOU URG XCPU VTALRM PROF WINCH IO PWR RTMIN])
clock_gettime(CLOCK_MONOTONIC, {7330, 582752684}) = 0
write(0, "[  7330.582] ", 13)           = 13
write(0, "Caught signal 11 (Segmentation f"..., 55) = 55
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 585041502}) = 0
write(0, "[  7330.585] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "\nPlease consult the The X.Org Fo"..., 85
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
for help. 
) = 85
clock_gettime(CLOCK_MONOTONIC, {7330, 587208250}) = 0
write(0, "[  7330.587] ", 13)           = 13
write(0, "\nPlease consult the The X.Org Fo"..., 85) = 85
write(2, "Please also check the log file a"..., 84Please also check the log file at "/var/log/Xorg.0.log" for additional information.
) = 84
clock_gettime(CLOCK_MONOTONIC, {7330, 589466551}) = 0
write(0, "[  7330.589] ", 13)           = 13
write(0, "Please also check the log file a"..., 84) = 84
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 593525389}) = 0
write(0, "[  7330.593] ", 13)           = 13
write(0, "\n", 1)                       = 1
close(1)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
unlink("/tmp/.X11-unix/X0")             = 0
unlink("/tmp/.X0-lock")                 = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 599567869}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 601948240}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 603168943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 604145506}) = 0
fcntl64(9, F_GETFL)                     = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
fcntl64(9, F_GETFD)                     = 0
close(9)                                = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 606983641}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 608509520}) = 0
write(0, "[  7330.608] ", 13)           = 13
write(0, "(II) evdev: Touchscreen: Close\n", 31) = 31
clock_gettime(CLOCK_MONOTONIC, {7330, 610798338}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 611408690}) = 0
write(0, "[  7330.611] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
clock_gettime(CLOCK_MONOTONIC, {7330, 613361815}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 614368895}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615009764}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615986326}) = 0
fcntl64(10, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(10, F_GETFD)                    = 0
close(10)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 618336180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 619007567}) = 0
write(0, "[  7330.619] ", 13)           = 13
write(0, "(II) evdev: Power Button: Close\n", 32) = 32
clock_gettime(CLOCK_MONOTONIC, {7330, 621601561}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 622181395}) = 0
write(0, "[  7330.622] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
fcntl64(11, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(11, F_GETFD)                    = 0
rt_sigaction(SIGIO, {SIG_IGN, [IO], 0x4000000 /* SA_??? */}, {0xb6f0d63d, [IO], 0x4000000 /* SA_??? */}, 8) = 0
close(11)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 626606443}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 627308348}) = 0
write(0, "[  7330.627] ", 13)           = 13
write(0, "(II) evdev: AUX Button: Close\n", 30) = 30
clock_gettime(CLOCK_MONOTONIC, {7330, 629261473}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 629810789}) = 0
write(0, "[  7330.629] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
rt_sigprocmask(SIG_SETMASK, [SEGV IO], NULL, 8) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 634663084}) = 0
write(0, "[  7330.634] ", 13)           = 13
write(0, "(NI) OMAPFBLeaveVT\n", 19)    = 19
ioctl(7, KDSETMODE, 0)                  = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
ioctl(7, KDSKBMODE, 0x3)                = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
ioctl(7, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, VIDIOC_RESERVED or VT_GETMODE, 0xbef3b348) = 0
ioctl(7, VIDIOC_ENUM_FMT or VT_SETMODE, 0xbef3b348) = 0
ioctl(7, VT_ACTIVATE, 0x1)              = 0
ioctl(7, VT_WAITACTIVE, 0x1)            = 0
close(7)                                = 0
write(2, "Server terminated with error (1)"..., 52Server terminated with error (1). Closing log file.
) = 52
clock_gettime(CLOCK_MONOTONIC, {7330, 655903318}) = 0
write(0, "[  7330.655] ", 13)           = 13
write(0, "Server terminated with error (1)"..., 52) = 52
close(0)                                = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(4586, 4586, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
root@gta04:~#
Woodruff, Richard Sept. 10, 2015, 11:33 p.m. UTC | #5
> From: linux-arm-kernel [mailto:linux-arm-kernel-
> bounces@lists.infradead.org] On Behalf Of Russell King - ARM Linux
 
> > >>>> There are 2 workarounds that I know which make the problem go
> > >>>> away (one is enough):
> > >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's
> > >>>> thumb2 by default)
> > >>>> - disable ARCH_MULTI_V6 in the kernel config

This reminds me of a customer crash I saw quite a while ago relating to thumb2.  I thought it was fixed but maybe not.

In a couple spots the PSR_IT_MASK was not conditionally handled well in ARCH_MULTI_V6 flow.  Some stack sanity check failed and a BUG() was triggered.

Compiling the app for v6 or pulling MULTI from the kernel build solved the issue.

Additionally it was not handled correctly in GDB.   The old build of GDB didn't do MULTI and needed a hack to be useable on thumb2 code.

Regards,
Richard W.
Grazvydas Ignotas Sept. 11, 2015, 1:27 p.m. UTC | #6
On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> ...
>>
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>>
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>>
>> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
>
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
>
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

It's actually pretty random, see some debug sessions in [1].
The first one is the most useful one, but I haven't though of checking
what pixman_rasterize_edges() was doing when the signal arrived, and
most often the "less useful" segfaults occur. However from the
disassembly (see debug1_libpixman.gz) it can be seen that the signal
arrived right after IT.

[1] http://notaz.gp2x.de/tmp/thumb_segfault/

Gražvydas
Russell King - ARM Linux Sept. 11, 2015, 2:03 p.m. UTC | #7
On Fri, Sep 11, 2015 at 03:27:13PM +0200, Grazvydas Ignotas wrote:
> On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> >> ...
> >>
> >> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> >> makes it re-appear.
> >>
> >> A while ago I tried to debug running the x-server under strace and could find that it also has
> >> something to do with SIGALRM.
> >>
> >> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
> >
> > It would be really nice if someone could diagnose what's going on here.
> > What exception is causing the X server to be killed (someone said a
> > segfault)?  What is the register state at the point that happens?  What
> > does the code look like  Is it happening inside the SIGALRM handler, or
> > when the SIGALRM handler has returned?
> >
> > I'd suggest attaching gdb to the X server, but remember to set gdb to
> > ignore SIGPIPEs.
> 
> It's actually pretty random, see some debug sessions in [1].
> The first one is the most useful one, but I haven't though of checking
> what pixman_rasterize_edges() was doing when the signal arrived, and
> most often the "less useful" segfaults occur. However from the
> disassembly (see debug1_libpixman.gz) it can be seen that the signal
> arrived right after IT.
> 
> [1] http://notaz.gp2x.de/tmp/thumb_segfault/

We're not going from ARM -> Thumb or Thumb -> ARM here, but Thumb code
in libpixman is being interrupted calling a Thumb signal handler.

Working through the code:

   0x7f717ec8 <SmartScheduleTimer>:     ldr     r2, [pc, #20]   ; = 0x0004112e
   0x7f717eca <SmartScheduleTimer+2>:   ldr     r1, [pc, #24]   ; = 0x00000c48
   0x7f717ecc <SmartScheduleTimer+4>:   ldr     r3, [pc, #24]   ; = 0x00000e6c
   0x7f717ece <SmartScheduleTimer+6>:   add     r2, pc
   0x7f717ed0 <SmartScheduleTimer+8>:   ldr     r1, [r2, r1]
   0x7f717ed2 <SmartScheduleTimer+10>:  ldr     r3, [r2, r3]
=> 0x7f717ed4 <SmartScheduleTimer+12>:  ldr     r2, [r1, #0]

The instruction at 0x7f717ed4 was trying to access 0xd1242963 which
is in kernel space, and this is the faulting instruction.

At this point, r2 should contain 0x0004112e plus the PC value.  r2 in
the register dump was 0x7f717fa0.  Let's calculate the value that PC
should be here.  0x7f717fa0 - 0x0004112e = 0x7f6d6e72, which is
clearly wrong.

So, I don't think the first instruction here was executed by the CPU.

gdb indicates that the parent context to the signal frame, pc was at
0xb6dd87f8, which works out at 0x297f8 into the libpixman-1 library:

   297f0:       449c            add     ip, r3
   297f2:       f1bc 0fff       cmp.w   ip, #255        ; 0xff
   297f6:       bfd4            ite     le
   297f8:       fa5f fc8c       uxtble.w        ip, ip
   297fc:       f04f 0cff       movgt.w ip, #255        ; 0xff
   29800:       f88a c000       strb.w  ip, [sl]

and as you say, is just after an IT instruction, which would have
set the IT execution state to appropriately skip either the first or
the second instruction.

Unfortunately, the IT instruction's condition is being carried forward
to the signal handler, causing either the first or second instruction
there to be skipped.

Looking back at the history, the original commit introducing the
clearing of the PSR_IT_MASK bits is just wrong:

-               if (thumb)
+               if (thumb) {
                        cpsr |= PSR_T_BIT;
-               else
+#if __LINUX_ARM_ARCH__ >= 7
+                       /* clear the If-Then Thumb-2 execution state */
+                       cpsr &= ~PSR_IT_MASK;
+#endif
+               } else
                        cpsr &= ~PSR_T_BIT;

This shouldn't be a compile-time decision at all, and it certainly should
not be dependent on __LINUX_ARM_ARCH__, which marks the _lowest_ supported
architecture.

However, even the idea that it's ARMv7 or later is wrong.  According to
the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

Looking at the ARM ARM, these bits are "reserved" in previous non-T2
architectures, have an undefined value at reset, and are probably zero
anyway.

Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
and I doubt there's any ARMv6 non-T2 systems out there that would be
affected by clearing the IT state bits.
Woodruff, Richard Sept. 11, 2015, 4:12 p.m. UTC | #8
> From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> owner@vger.kernel.org] On Behalf Of Russell King - ARM Linux
> Sent: Friday, September 11, 2015 9:03 AM
> To: Grazvydas Ignotas

> However, even the idea that it's ARMv7 or later is wrong.  According to
> the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

I recall seeing ARMv6T2 first implemented in the ARM1156 which is a v6 CPU with T2 option added.

Cortex-R class was the ARMv7 successor to the 1156 CPU which also use T2.

> Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> architectures, have an undefined value at reset, and are probably zero
> anyway.
> 
> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> problem,
> and I doubt there's any ARMv6 non-T2 systems out there that would be
> affected by clearing the IT state bits.

Probably you already looked, but cpsr.it usage is not restricted to this one spot.

Looking back at old notes I think both debug and signal handler code keyed on bit usage.  I see from LXR kernel KVM code also uses in some capacity.

The 1156/Cortex-R are typically MMU-less.   They may (or not) have something else to consider when fixing.

Regards,
Richard W.
Russell King - ARM Linux Sept. 11, 2015, 5:48 p.m. UTC | #9
On Fri, Sep 11, 2015 at 04:12:21PM +0000, Woodruff, Richard wrote:
> > From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> > owner@vger.kernel.org] On Behalf Of Russell King - ARM Linux
> > Sent: Friday, September 11, 2015 9:03 AM
> > To: Grazvydas Ignotas
> 
> > However, even the idea that it's ARMv7 or later is wrong.  According to
> > the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> > means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).
> 
> I recall seeing ARMv6T2 first implemented in the ARM1156 which is a
> v6 CPU with T2 option added.

Exactly, which is why we need to be dealing with the IT bits in signal
handling for >= ARMv6, not >= ARMv7.

> > Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> > architectures, have an undefined value at reset, and are probably zero
> > anyway.
> > 
> > Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> > problem,
> > and I doubt there's any ARMv6 non-T2 systems out there that would be
> > affected by clearing the IT state bits.
> 
> Probably you already looked, but cpsr.it usage is not restricted to this
> one spot.

Other places:

arch/arm/mm/extable.c-#ifdef CONFIG_THUMB2_KERNEL
arch/arm/mm/extable.c-          /* Clear the IT state to avoid nasty surprises in the fixup */
arch/arm/mm/extable.c:          regs->ARM_cpsr &= ~PSR_IT_MASK;
arch/arm/mm/extable.c-#endif

which is irrelevant here.  This code only deals with kernel mode, and
the only time that this makes sense is when the kernel is built using
Thumb2 instructions.  CONFIG_THUMB2_KERNEL covers the case properly.

arch/arm/probes/kprobes/test-core.c-    regs->ARM_lr = val ^ (14 << 8);
arch/arm/probes/kprobes/test-core.c:    regs->ARM_cpsr &= ~(APSR_MASK | PSR_IT_MASK);
arch/arm/probes/kprobes/test-core.c-    regs->ARM_cpsr |= test_context_cpsr(scenario);

From what I can see, this happens unconditionally.

KVM and Xen code... that requires virtualisation support, which is ARMv7.

arch/arm/probes/kprobes/actions-thumb.c... emulating an IT instruction.
arch/arm/probes/decode.h::it_advance... emulating Thumb2.

So really there's no other places that need fixing.

> Looking back at old notes I think both debug and signal handler code
> keyed on bit usage.  I see from LXR kernel KVM code also uses in some
> capacity.

Frankly, Richard, you're getting on my nerves in this thread - you
seem to know all about this problem, yet you never reported the problem
upstream, so people are effectively having to waste time re-doing the
work that you've already done.

Nothing annoys me more than having people say "oh yes, I found that
problem and worked on it" and nothing coming of it (no report, no
patch, no nothing.)

As you have "old notes" you've already investigated this issue, and
presumably you came up with a patch.  Where is it?
Woodruff, Richard Sept. 11, 2015, 6:34 p.m. UTC | #10
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Friday, September 11, 2015 12:49 PM

> Frankly, Richard, you're getting on my nerves in this thread - you seem to
> know all about this problem, yet you never reported the problem upstream,
> so people are effectively having to waste time re-doing the work that you've
> already done.
>
> Nothing annoys me more than having people say "oh yes, I found that
> problem and worked on it" and nothing coming of it (no report, no patch, no
> nothing.)

Yes, when I put out the hint (to help speed resolution) I expected there might be some negative interpretation.

When I originally hit the issue, I did pass along information to folks who work in the area with expectation they would follow through.  Probably it got lost.

When I noticed this thread, it appeared like the CPSR.IT information didn't make it out, so I directly posted what I recalled.

> As you have "old notes" you've already investigated this issue, and
> presumably you came up with a patch.  Where is it?

I didn't generate a comprehensive one. I did a couple of hack versions but was unsure in some of the areas your analysis has cleared... for that issue I ended up advising a reversion of MULTI_V6 for that older kernel.

Regards,
Richard W.
Tony Lindgren Sept. 18, 2015, 5:48 p.m. UTC | #11
Hi Grazvydas,

* Tony Lindgren <tony@atomide.com> [150908 14:11]:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> > On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
>  
> > > Do you have some easy way to reproduce this issue?
> > 
> > Just moving a browser window around with mouse usually triggers it
> > within a minute.
> 
> OK good to know.

Just FYI, I too was now able to produce it here too moving around
icewweasel for about a minute. And can confirm Russell's patch
fixes the problem.

I'm using i3 tiling window manager here, and don't usually
ever have any floating windows which probably explains why I
did not run into this issue earlier with my lapdock experiments :)

Regards,

Tony
diff mbox

Patch

--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -340,13 +340,13 @@  setup_return(struct pt_regs *regs, struct ksignal *ksig,
                /*
                 * The LSB of the handler determines if we're going to
                 * be using THUMB or ARM mode for this signal handler.
                 */
                thumb = handler & 1;

-#if __LINUX_ARM_ARCH__ >= 7
+#if 0 //__LINUX_ARM_ARCH__ >= 7
                /*
                 * Clear the If-Then Thumb-2 execution state
                 * ARM spec requires this to be all 000s in ARM mode
                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
                 * signal transition without this.
                 */