diff mbox

[V2,3/3] mmc: mmci: Reverse IRQ handling for the arm_variant

Message ID 1402906147-26596-1-git-send-email-ulf.hansson@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ulf Hansson June 16, 2014, 8:09 a.m. UTC
Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
when using the ARM model of the PL181 and running QEMU.

The bug was reported for the following QEMU version:
$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

To resolve the problem, let's restore the old behavior were the DATA
irq is handled prior the CMD irq, but only for the arm_variant, which
the problem was reported for.

Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/host/mmci.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

John Stultz June 16, 2014, 6:58 p.m. UTC | #1
On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
> when using the ARM model of the PL181 and running QEMU.
>
> The bug was reported for the following QEMU version:
> $ qemu-system-arm -version
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> To resolve the problem, let's restore the old behavior were the DATA
> irq is handled prior the CMD irq, but only for the arm_variant, which
> the problem was reported for.
>
> Reported-by: John Stultz <john.stultz@linaro.org>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Hey Ulf,
  Thanks for sending these out (though I wasn't cc'ed on this 3'rd
patch). The first two apply for me, but this one doesn't. What should
I be basing this off of?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 16, 2014, 9:20 p.m. UTC | #2
On 16 June 2014 20:58, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
>> when using the ARM model of the PL181 and running QEMU.
>>
>> The bug was reported for the following QEMU version:
>> $ qemu-system-arm -version
>> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
>> (c) 2003-2008 Fabrice Bellard
>>
>> To resolve the problem, let's restore the old behavior were the DATA
>> irq is handled prior the CMD irq, but only for the arm_variant, which
>> the problem was reported for.
>>
>> Reported-by: John Stultz <john.stultz@linaro.org>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Hey Ulf,
>   Thanks for sending these out (though I wasn't cc'ed on this 3'rd
> patch). The first two apply for me, but this one doesn't. What should
> I be basing this off of?

Hi John,

Thanks for helping out testing!

This patch based upon my latest mmc tree and the next branch. I tried
to apply it for 3.15, and I think you will be able resolve the
conflict - I should be quite trivial.

Patch 1 and 2 applies on 3.15 as well. So I guess testing this on 3.15
will provide us with answers if it solves our problem.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 16, 2014, 10:41 p.m. UTC | #3
On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> This patch based upon my latest mmc tree and the next branch. I tried
> to apply it for 3.15, and I think you will be able resolve the
> conflict - I should be quite trivial.

No worries. I just didn't want to waste time resolving it if it was
logically dependent on some other change.

I'll give it a shot and get back to you.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 16, 2014, 11:29 p.m. UTC | #4
On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> This patch based upon my latest mmc tree and the next branch. I tried
>> to apply it for 3.15, and I think you will be able resolve the
>> conflict - I should be quite trivial.
>
> No worries. I just didn't want to waste time resolving it if it was
> logically dependent on some other change.
>
> I'll give it a shot and get back to you.

So unfortunately I'm still seeing trouble..

[   94.202843] EXT4-fs error (device mmcblk0p5):
ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
gd; block bitmap corrupt.
[   94.203873] Aborting journal on device mmcblk0p5-8.
[   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
panic forced after error
[   94.206553]
[   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
3.15.0-00002-g044f37a-dirty #589
[   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
(show_stack+0x11/0x14)
[   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
(dump_stack+0x59/0x7c)
[   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
[   94.209724] [<c042a57f>] (panic) from [<c0135055>]
(ext4_handle_error+0x69/0x74)
[   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
(__ext4_grp_locked_error+0x6b/0x160)
[   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
[<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
[   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
(ext4_mb_init_cache+0x219/0x4e0)
[   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
(ext4_mb_init_group+0xbb/0x13c)
[   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
(ext4_mb_good_group+0xf3/0xfc)
[   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
(ext4_mb_regular_allocator+0x153/0x2c4)
[   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
[<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
[   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
(ext4_ext_map_blocks+0x965/0x10f0)
[   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
(ext4_map_blocks+0xff/0x374)
[   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
[<c0127049>] (ext4_writepages+0x2b9/0x4e8)
[   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
(do_writepages+0x19/0x28)
[   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
(__filemap_fdatawrite_range+0x3d/0x44)
[   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
[<c008cc3f>] (filemap_flush+0x23/0x28)
[   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
(ext4_rename+0x2f9/0x3e4)
[   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
(vfs_rename+0x183/0x45c)
[   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
(SyS_renameat2+0x22b/0x26c)
[   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
(SyS_rename+0x1f/0x24)
[   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
(ret_fast_syscall+0x1/0x5c)


That said, this mirrors the behavior when I was reverting your change
by hand on-top of 3.15. While git bisect pointed to your patch and
reverting it from the commit seems to resolve the issue at that point,
there seems to be some other commit in the 3.14->3.15-rc1 interval
that is causing problems as well.

Are there any sort of debugging options for mmc that I can use to try
to better narrow down whats going wrong?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 17, 2014, 7:33 a.m. UTC | #5
On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> This patch based upon my latest mmc tree and the next branch. I tried
>>> to apply it for 3.15, and I think you will be able resolve the
>>> conflict - I should be quite trivial.
>>
>> No worries. I just didn't want to waste time resolving it if it was
>> logically dependent on some other change.
>>
>> I'll give it a shot and get back to you.
>
> So unfortunately I'm still seeing trouble..
>
> [   94.202843] EXT4-fs error (device mmcblk0p5):
> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
> gd; block bitmap corrupt.
> [   94.203873] Aborting journal on device mmcblk0p5-8.
> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
> panic forced after error
> [   94.206553]
> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
> 3.15.0-00002-g044f37a-dirty #589
> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
> (show_stack+0x11/0x14)
> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
> (dump_stack+0x59/0x7c)
> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
> (ext4_handle_error+0x69/0x74)
> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
> (__ext4_grp_locked_error+0x6b/0x160)
> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
> (ext4_mb_init_cache+0x219/0x4e0)
> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
> (ext4_mb_init_group+0xbb/0x13c)
> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
> (ext4_mb_good_group+0xf3/0xfc)
> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
> (ext4_mb_regular_allocator+0x153/0x2c4)
> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
> (ext4_ext_map_blocks+0x965/0x10f0)
> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
> (ext4_map_blocks+0xff/0x374)
> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
> (do_writepages+0x19/0x28)
> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
> (__filemap_fdatawrite_range+0x3d/0x44)
> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
> [<c008cc3f>] (filemap_flush+0x23/0x28)
> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
> (ext4_rename+0x2f9/0x3e4)
> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
> (vfs_rename+0x183/0x45c)
> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
> (SyS_renameat2+0x22b/0x26c)
> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
> (SyS_rename+0x1f/0x24)
> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
> (ret_fast_syscall+0x1/0x5c)
>
>
> That said, this mirrors the behavior when I was reverting your change
> by hand on-top of 3.15. While git bisect pointed to your patch and
> reverting it from the commit seems to resolve the issue at that point,
> there seems to be some other commit in the 3.14->3.15-rc1 interval
> that is causing problems as well.
>
> Are there any sort of debugging options for mmc that I can use to try
> to better narrow down whats going wrong?

It seems like you want to debug the mmci host driver and unfortunate
the debug utilities available are only dev_dbg prints. I wouldn't be
surprised if the problem goes away when you enable them. :-)

I have some other locally stored debug patches for mmci, but those are
not re-based and I am not sure you want to deal with them as is.

I guess I need to set up the QEMU environment and run the tests
myself, unless we go for the revert path.
How do you perform the tests, is just a simple mounting/un-mounting
that triggers the problem?
Any specific things that I need to think of when running QEMU?

Kind regards
Uffe

>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 17, 2014, 1:57 p.m. UTC | #6
On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> It seems like you want to debug the mmci host driver and unfortunate
> the debug utilities available are only dev_dbg prints. I wouldn't be
> surprised if the problem goes away when you enable them. :-)
>
> I have some other locally stored debug patches for mmci, but those are
> not re-based and I am not sure you want to deal with them as is.
>
> I guess I need to set up the QEMU environment and run the tests
> myself, unless we go for the revert path.

Again, a simple revert *doesn't* seem to work. There's something else
in the 3.14->3.15-rc1 series that is also causing problems. Reverting
the patch only seems to work very close to the point that the commit
was made. I'm trying to redo the bisection reverting the first
problematic change each step to see if I can narrow the second problem
(but I've had some interruptions so I've not made much progress there
yet).

> How do you perform the tests, is just a simple mounting/un-mounting
> that triggers the problem?
> Any specific things that I need to think of when running QEMU?

Unfortunately its not quite so simple as mounting/unmounting.

My environment is the Linaro Android image:
http://releases.linaro.org/14.04/android/images/armv7-lsk/vexpress.img.bz2
(using the initrd in
http://releases.linaro.org/14.04/android/images/armv7-lsk/boot.tar.bz2).

My qemu line is:
QEMU_AUDIO_DRV=none qemu-system-arm -kernel zImage -initrd initrd -M
vexpress-a9 -cpu cortex-a9 -nographic -vnc :0 -m 1G -append
'root=/dev/mmcblk0p1 rw mem=1024M raid=noautodetect
console=ttyAMA0,38400n8 rootwait vmalloc=256MB devtmpfs.mount=0' -sd
vexpress-android.img -redir tcp:5555::5555

The first bootup is usually *very* slow (since it has to dex
everything), and I usually use the zImage kernel in the boot.tar.bz2
to let that finish so I can get a sane backup image. Connecting via
vnc to localhost:0 will let you know when its booted all the way and
finished dexing. I'd expect to wait ~5-10 minutes.

I then kill qemu, and make a backup copy of the disk image so future
boots are a bit faster.

And then booting again with my kernel zImage (config attached).

It most frequently triggers within ~90 seconds of boot, but not always
(I'll go ahead and kill qemu again if I don't see it). Usually I can
reproduce it within 4 boots. One potential clue here is that it
doesn't seem to trigger on the first boot w/ the affected kernel.

Every time I see corruption, I revert back to the backup dex'ed disk image.

thanks
-john
Kees Cook June 27, 2014, 8:37 p.m. UTC | #7
On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>> to apply it for 3.15, and I think you will be able resolve the
>>>> conflict - I should be quite trivial.
>>>
>>> No worries. I just didn't want to waste time resolving it if it was
>>> logically dependent on some other change.
>>>
>>> I'll give it a shot and get back to you.
>>
>> So unfortunately I'm still seeing trouble..
>>
>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>> gd; block bitmap corrupt.
>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>> panic forced after error
>> [   94.206553]
>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>> 3.15.0-00002-g044f37a-dirty #589
>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>> (show_stack+0x11/0x14)
>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>> (dump_stack+0x59/0x7c)
>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>> (ext4_handle_error+0x69/0x74)
>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>> (__ext4_grp_locked_error+0x6b/0x160)
>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>> (ext4_mb_init_cache+0x219/0x4e0)
>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>> (ext4_mb_init_group+0xbb/0x13c)
>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>> (ext4_mb_good_group+0xf3/0xfc)
>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>> (ext4_mb_regular_allocator+0x153/0x2c4)
>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>> (ext4_ext_map_blocks+0x965/0x10f0)
>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>> (ext4_map_blocks+0xff/0x374)
>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>> (do_writepages+0x19/0x28)
>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>> (__filemap_fdatawrite_range+0x3d/0x44)
>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>> (ext4_rename+0x2f9/0x3e4)
>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>> (vfs_rename+0x183/0x45c)
>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>> (SyS_renameat2+0x22b/0x26c)
>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>> (SyS_rename+0x1f/0x24)
>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>> (ret_fast_syscall+0x1/0x5c)
>>
>>
>> That said, this mirrors the behavior when I was reverting your change
>> by hand on-top of 3.15. While git bisect pointed to your patch and
>> reverting it from the commit seems to resolve the issue at that point,
>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>> that is causing problems as well.
>>
>> Are there any sort of debugging options for mmc that I can use to try
>> to better narrow down whats going wrong?
>
> It seems like you want to debug the mmci host driver and unfortunate
> the debug utilities available are only dev_dbg prints. I wouldn't be
> surprised if the problem goes away when you enable them. :-)
>
> I have some other locally stored debug patches for mmci, but those are
> not re-based and I am not sure you want to deal with them as is.
>
> I guess I need to set up the QEMU environment and run the tests
> myself, unless we go for the revert path.
> How do you perform the tests, is just a simple mounting/un-mounting
> that triggers the problem?
> Any specific things that I need to think of when running QEMU?

FWIW, I'm hitting this problem as well. For me, it is every time I try
to boot. Only reverting to 3.14 makes it go away, and this series
doesn't fix it for me either. :(

My only difference is that I don't run with an initrd:

qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
-drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
"root=/dev/mmcblk0p1 console=ttyAMA0"

-Kees
John Stultz June 27, 2014, 10:53 p.m. UTC | #8
On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>>> to apply it for 3.15, and I think you will be able resolve the
>>>>> conflict - I should be quite trivial.
>>>>
>>>> No worries. I just didn't want to waste time resolving it if it was
>>>> logically dependent on some other change.
>>>>
>>>> I'll give it a shot and get back to you.
>>>
>>> So unfortunately I'm still seeing trouble..
>>>
>>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>>> gd; block bitmap corrupt.
>>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>>> panic forced after error
>>> [   94.206553]
>>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>>> 3.15.0-00002-g044f37a-dirty #589
>>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>>> (show_stack+0x11/0x14)
>>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>>> (dump_stack+0x59/0x7c)
>>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>>> (ext4_handle_error+0x69/0x74)
>>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>>> (__ext4_grp_locked_error+0x6b/0x160)
>>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>>> (ext4_mb_init_cache+0x219/0x4e0)
>>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>>> (ext4_mb_init_group+0xbb/0x13c)
>>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>>> (ext4_mb_good_group+0xf3/0xfc)
>>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>>> (ext4_mb_regular_allocator+0x153/0x2c4)
>>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>>> (ext4_ext_map_blocks+0x965/0x10f0)
>>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>>> (ext4_map_blocks+0xff/0x374)
>>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>>> (do_writepages+0x19/0x28)
>>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>>> (__filemap_fdatawrite_range+0x3d/0x44)
>>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>>> (ext4_rename+0x2f9/0x3e4)
>>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>>> (vfs_rename+0x183/0x45c)
>>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>>> (SyS_renameat2+0x22b/0x26c)
>>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>>> (SyS_rename+0x1f/0x24)
>>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>>> (ret_fast_syscall+0x1/0x5c)
>>>
>>>
>>> That said, this mirrors the behavior when I was reverting your change
>>> by hand on-top of 3.15. While git bisect pointed to your patch and
>>> reverting it from the commit seems to resolve the issue at that point,
>>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>>> that is causing problems as well.
>>>
>>> Are there any sort of debugging options for mmc that I can use to try
>>> to better narrow down whats going wrong?
>>
>> It seems like you want to debug the mmci host driver and unfortunate
>> the debug utilities available are only dev_dbg prints. I wouldn't be
>> surprised if the problem goes away when you enable them. :-)
>>
>> I have some other locally stored debug patches for mmci, but those are
>> not re-based and I am not sure you want to deal with them as is.
>>
>> I guess I need to set up the QEMU environment and run the tests
>> myself, unless we go for the revert path.
>> How do you perform the tests, is just a simple mounting/un-mounting
>> that triggers the problem?
>> Any specific things that I need to think of when running QEMU?
>
> FWIW, I'm hitting this problem as well. For me, it is every time I try
> to boot. Only reverting to 3.14 makes it go away, and this series
> doesn't fix it for me either. :(
>
> My only difference is that I don't run with an initrd:
>
> qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
> rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
> -drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
> "root=/dev/mmcblk0p1 console=ttyAMA0"

I've been continuing to try to bisect this down with
8d94b54d99ea968a9d188ca0e68793ebed601220 and
e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
bisection results are really inconsistent.  I suspect it actually
shows up earlier, its just its harder to trip the problem with the
patches reverted, so I'm marking good commits that are actually bad.

If you are seeing this on every bootup, it might be worth trying to do
the bisection with the two commits above reverted to see if you can
narrow it down any better?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kees Cook July 1, 2014, 5:45 p.m. UTC | #9
On Fri, Jun 27, 2014 at 3:53 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
>>>> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>>>>> This patch based upon my latest mmc tree and the next branch. I tried
>>>>>> to apply it for 3.15, and I think you will be able resolve the
>>>>>> conflict - I should be quite trivial.
>>>>>
>>>>> No worries. I just didn't want to waste time resolving it if it was
>>>>> logically dependent on some other change.
>>>>>
>>>>> I'll give it a shot and get back to you.
>>>>
>>>> So unfortunately I'm still seeing trouble..
>>>>
>>>> [   94.202843] EXT4-fs error (device mmcblk0p5):
>>>> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
>>>> gd; block bitmap corrupt.
>>>> [   94.203873] Aborting journal on device mmcblk0p5-8.
>>>> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
>>>> panic forced after error
>>>> [   94.206553]
>>>> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
>>>> 3.15.0-00002-g044f37a-dirty #589
>>>> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
>>>> (show_stack+0x11/0x14)
>>>> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
>>>> (dump_stack+0x59/0x7c)
>>>> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
>>>> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
>>>> (ext4_handle_error+0x69/0x74)
>>>> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
>>>> (__ext4_grp_locked_error+0x6b/0x160)
>>>> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
>>>> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
>>>> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
>>>> (ext4_mb_init_cache+0x219/0x4e0)
>>>> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
>>>> (ext4_mb_init_group+0xbb/0x13c)
>>>> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
>>>> (ext4_mb_good_group+0xf3/0xfc)
>>>> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
>>>> (ext4_mb_regular_allocator+0x153/0x2c4)
>>>> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
>>>> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
>>>> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
>>>> (ext4_ext_map_blocks+0x965/0x10f0)
>>>> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
>>>> (ext4_map_blocks+0xff/0x374)
>>>> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
>>>> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
>>>> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
>>>> (do_writepages+0x19/0x28)
>>>> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
>>>> (__filemap_fdatawrite_range+0x3d/0x44)
>>>> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
>>>> [<c008cc3f>] (filemap_flush+0x23/0x28)
>>>> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
>>>> (ext4_rename+0x2f9/0x3e4)
>>>> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
>>>> (vfs_rename+0x183/0x45c)
>>>> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
>>>> (SyS_renameat2+0x22b/0x26c)
>>>> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
>>>> (SyS_rename+0x1f/0x24)
>>>> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
>>>> (ret_fast_syscall+0x1/0x5c)
>>>>
>>>>
>>>> That said, this mirrors the behavior when I was reverting your change
>>>> by hand on-top of 3.15. While git bisect pointed to your patch and
>>>> reverting it from the commit seems to resolve the issue at that point,
>>>> there seems to be some other commit in the 3.14->3.15-rc1 interval
>>>> that is causing problems as well.
>>>>
>>>> Are there any sort of debugging options for mmc that I can use to try
>>>> to better narrow down whats going wrong?
>>>
>>> It seems like you want to debug the mmci host driver and unfortunate
>>> the debug utilities available are only dev_dbg prints. I wouldn't be
>>> surprised if the problem goes away when you enable them. :-)
>>>
>>> I have some other locally stored debug patches for mmci, but those are
>>> not re-based and I am not sure you want to deal with them as is.
>>>
>>> I guess I need to set up the QEMU environment and run the tests
>>> myself, unless we go for the revert path.
>>> How do you perform the tests, is just a simple mounting/un-mounting
>>> that triggers the problem?
>>> Any specific things that I need to think of when running QEMU?
>>
>> FWIW, I'm hitting this problem as well. For me, it is every time I try
>> to boot. Only reverting to 3.14 makes it go away, and this series
>> doesn't fix it for me either. :(
>>
>> My only difference is that I don't run with an initrd:
>>
>> qemu-system-arm -nographic -m 1024 -M vexpress-a15 -dtb
>> rtsm_ve-cortex_a15x4.dtb -kernel ~/src/linux/arch/arm/boot/zImage
>> -drive file=$HOME/image/arm/vda.qcow2,if=sd,format=qcow2 -append
>> "root=/dev/mmcblk0p1 console=ttyAMA0"
>
> I've been continuing to try to bisect this down with
> 8d94b54d99ea968a9d188ca0e68793ebed601220 and
> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
> like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
> bisection results are really inconsistent.  I suspect it actually
> shows up earlier, its just its harder to trip the problem with the
> patches reverted, so I'm marking good commits that are actually bad.
>
> If you are seeing this on every bootup, it might be worth trying to do
> the bisection with the two commits above reverted to see if you can
> narrow it down any better?

And now I can't reproduce it! I think I was being tricked by
filesystem corruption that spanned some of my test boots. I'm going to
start this over and try again.

-Kees
John Stultz July 1, 2014, 6:19 p.m. UTC | #10
On Tue, Jul 1, 2014 at 10:45 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Jun 27, 2014 at 3:53 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Fri, Jun 27, 2014 at 1:37 PM, Kees Cook <keescook@chromium.org> wrote:
>> I've been continuing to try to bisect this down with
>> 8d94b54d99ea968a9d188ca0e68793ebed601220 and
>> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 reverted each step. It seems
>> like it pops up somewhere between 3.15-rc6 and 3.15-rc7, but the
>> bisection results are really inconsistent.  I suspect it actually
>> shows up earlier, its just its harder to trip the problem with the
>> patches reverted, so I'm marking good commits that are actually bad.
>>
>> If you are seeing this on every bootup, it might be worth trying to do
>> the bisection with the two commits above reverted to see if you can
>> narrow it down any better?
>
> And now I can't reproduce it! I think I was being tricked by
> filesystem corruption that spanned some of my test boots. I'm going to
> start this over and try again.

Yea. :( I've been starting over and over on this quite a bit as well.
With those two patches reverted, it gets harder to reproduce reliably,
but I'm still seeing trobule.

Bisecting is still not pointing to anything obvious, my current
earliest bad commit right now is:
2aafe1a4d451866e3e7b476e2fa0813b69b313c1. I'm still working on it, but
its slow going.


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 5d20bfba..e4d4707 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -74,6 +74,7 @@  static unsigned int fmax = 515633;
  * @pwrreg_nopower: bits in MMCIPOWER don't controls ext. power supply
  * @explicit_mclk_control: enable explicit mclk control in driver.
  * @qcom_fifo: enables qcom specific fifo pio read logic.
+ * @reversed_irq_handling: handle data irq before cmd irq.
  */
 struct variant_data {
 	unsigned int		clkreg;
@@ -97,6 +98,7 @@  struct variant_data {
 	bool			pwrreg_nopower;
 	bool			explicit_mclk_control;
 	bool			qcom_fifo;
+	bool			reversed_irq_handling;
 };
 
 static struct variant_data variant_arm = {
@@ -105,6 +107,7 @@  static struct variant_data variant_arm = {
 	.datalength_bits	= 16,
 	.pwrreg_powerup		= MCI_PWR_UP,
 	.f_max			= 100000000,
+	.reversed_irq_handling	= true,
 };
 
 static struct variant_data variant_arm_extended_fifo = {
@@ -1236,8 +1239,13 @@  static irqreturn_t mmci_irq(int irq, void *dev_id)
 
 		dev_dbg(mmc_dev(host->mmc), "irq0 (data+cmd) %08x\n", status);
 
-		mmci_cmd_irq(host, host->cmd, status);
-		mmci_data_irq(host, host->data, status);
+		if (host->variant->reversed_irq_handling) {
+			mmci_data_irq(host, host->data, status);
+			mmci_cmd_irq(host, host->cmd, status);
+		} else {
+			mmci_cmd_irq(host, host->cmd, status);
+			mmci_data_irq(host, host->data, status);
+		}
 
 		/* Don't poll for busy completion in irq context. */
 		if (host->busy_status)