Message ID | 20170209153403.9730-12-linus.walleij@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote: > Instead of doing retries at the same time as trying to submit new > requests, do the retries when the request is reported as completed > by the driver, in the finalization worker. > > This is achieved by letting the core worker call back into the block > layer using mmc_blk_rw_done(), that will read the status and repeatedly > try to hammer the request using single request etc by calling back to > the core layer using mmc_restart_areq() > > The beauty of it is that the completion will not complete until the > block layer has had the opportunity to hammer a bit at the card using > a bunch of different approaches in the while() loop in > mmc_blk_rw_done() > > The algorithm for recapture, retry and handle errors is essentially > identical to the one we used to have in mmc_blk_issue_rw_rq(), > only augmented to get called in another path. > > We have to add and initialize a pointer back to the struct mmc_queue > from the struct mmc_queue_req to find the queue from the asynchronous > request. > > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> It seems that after this change we can end up queuing more work for kthread from the kthread worker itself and wait inside it for this nested work to complete. I hope that you've tested it with simulating errors and it all works. Under this assumption: Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Also some very minor nit: + case MMC_BLK_DATA_ERR: { + int err; + err = mmc_blk_reset(md, host, type); During the code movement CodingStyle suffered. Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Tuesday, February 28, 2017 06:45:20 PM Bartlomiej Zolnierkiewicz wrote: > On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote: > > Instead of doing retries at the same time as trying to submit new > > requests, do the retries when the request is reported as completed > > by the driver, in the finalization worker. > > > > This is achieved by letting the core worker call back into the block > > layer using mmc_blk_rw_done(), that will read the status and repeatedly > > try to hammer the request using single request etc by calling back to > > the core layer using mmc_restart_areq() > > > > The beauty of it is that the completion will not complete until the > > block layer has had the opportunity to hammer a bit at the card using > > a bunch of different approaches in the while() loop in > > mmc_blk_rw_done() > > > > The algorithm for recapture, retry and handle errors is essentially > > identical to the one we used to have in mmc_blk_issue_rw_rq(), > > only augmented to get called in another path. > > > > We have to add and initialize a pointer back to the struct mmc_queue > > from the struct mmc_queue_req to find the queue from the asynchronous > > request. > > > > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > > It seems that after this change we can end up queuing more > work for kthread from the kthread worker itself and wait > inside it for this nested work to complete. I hope that On the second look it seems that there is no waiting for the retried areq to complete so I cannot see what protects us from racing and trying to run two areq-s in parallel: 1st areq being retried (in the completion kthread): mmc_blk_rw_done()->mmc_restart_areq()->__mmc_start_data_req() 2nd areq coming from the second request in the queue (in the queuing kthread): mmc_blk_issue_rw_rq()->mmc_start_areq()->__mmc_start_data_req() (after mmc_blk_rw_done() is done in mmc_finalize_areq() 1st areq is marked as completed by the completion kthread and the waiting on host->areq in mmc_start_areq() of the queuing kthread is done and 2nd areq is started while the 1st one is still being retried) ? Also retrying of areqs for MMC_BLK_RETRY status case got broken (before change do {} while() loop increased retry variable, now the loop is gone and retry variable will not be increased correctly and we can loop forever). Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, March 01, 2017 12:45:57 PM Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Tuesday, February 28, 2017 06:45:20 PM Bartlomiej Zolnierkiewicz wrote: > > On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote: > > > Instead of doing retries at the same time as trying to submit new > > > requests, do the retries when the request is reported as completed > > > by the driver, in the finalization worker. > > > > > > This is achieved by letting the core worker call back into the block > > > layer using mmc_blk_rw_done(), that will read the status and repeatedly > > > try to hammer the request using single request etc by calling back to > > > the core layer using mmc_restart_areq() > > > > > > The beauty of it is that the completion will not complete until the > > > block layer has had the opportunity to hammer a bit at the card using > > > a bunch of different approaches in the while() loop in > > > mmc_blk_rw_done() > > > > > > The algorithm for recapture, retry and handle errors is essentially > > > identical to the one we used to have in mmc_blk_issue_rw_rq(), > > > only augmented to get called in another path. > > > > > > We have to add and initialize a pointer back to the struct mmc_queue > > > from the struct mmc_queue_req to find the queue from the asynchronous > > > request. > > > > > > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> > > > > It seems that after this change we can end up queuing more > > work for kthread from the kthread worker itself and wait > > inside it for this nested work to complete. I hope that > > On the second look it seems that there is no waiting for > the retried areq to complete so I cannot see what protects > us from racing and trying to run two areq-s in parallel: > > 1st areq being retried (in the completion kthread): > > mmc_blk_rw_done()->mmc_restart_areq()->__mmc_start_data_req() > > 2nd areq coming from the second request in the queue > (in the queuing kthread): > > mmc_blk_issue_rw_rq()->mmc_start_areq()->__mmc_start_data_req() > > (after mmc_blk_rw_done() is done in mmc_finalize_areq() 1st > areq is marked as completed by the completion kthread and > the waiting on host->areq in mmc_start_areq() of the queuing > kthread is done and 2nd areq is started while the 1st one > is still being retried) > > ? > > Also retrying of areqs for MMC_BLK_RETRY status case got broken > (before change do {} while() loop increased retry variable, > now the loop is gone and retry variable will not be increased > correctly and we can loop forever). There is another problem with this patch. During boot there is ~30 sec delay and later I get deadlock on trying to run sync command (first thing I do after boot): ... [ 5.960623] asoc-simple-card sound: HiFi <-> 3830000.i2s mapping ok done. [....] Waiting for /dev to be fully populated...[ 17.745887] random: crng init done done. [....] Activating swap...done. [ 39.767982] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null) ... root@target:~# sync [ 248.801708] INFO: task udevd:287 blocked for more than 120 seconds. [ 248.806552] Tainted: G W 4.10.0-rc3-00118-g4515dc6 #2736 [ 248.813590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.821275] udevd D 0 287 249 0x00000005 [ 248.826815] [<c06df404>] (__schedule) from [<c06df90c>] (schedule+0x40/0xac) [ 248.833889] [<c06df90c>] (schedule) from [<c06e526c>] (schedule_timeout+0x148/0x220) [ 248.841598] [<c06e526c>] (schedule_timeout) from [<c06df24c>] (io_schedule_timeout+0x74/0xb0) [ 248.849993] [<c06df24c>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118) [ 248.858235] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c) [ 248.867053] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20) [ 248.876525] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8) [ 248.884828] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20) [ 248.892375] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0) [ 248.899383] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc) [ 248.906593] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc) [ 248.913938] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc) [ 248.921046] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c) [ 248.928776] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4) [ 248.935970] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4) [ 248.943506] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20) [ 248.951637] INFO: task sync:1398 blocked for more than 120 seconds. [ 248.957756] Tainted: G W 4.10.0-rc3-00118-g4515dc6 #2736 [ 248.965052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.972681] sync D 0 1398 1390 0x00000000 [ 248.978117] [<c06df404>] (__schedule) from [<c06df90c>] (schedule+0x40/0xac) [ 248.985174] [<c06df90c>] (schedule) from [<c06dfb3c>] (schedule_preempt_disabled+0x14/0x20) [ 248.993609] [<c06dfb3c>] (schedule_preempt_disabled) from [<c06e3b18>] (__mutex_lock_slowpath+0x480/0x6ec) [ 249.003153] [<c06e3b18>] (__mutex_lock_slowpath) from [<c0215964>] (iterate_bdevs+0xb8/0x108) [ 249.011729] [<c0215964>] (iterate_bdevs) from [<c020c0ac>] (sys_sync+0x54/0x98) [ 249.018802] [<c020c0ac>] (sys_sync) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c) To be exact the same issue also sometimes happens with previous commit 784da04 ("mmc: queue: simplify queue logic") and I also got deadlock on boot once with commit 9a4c8a3 ("mmc: core: kill off the context info"): ... [ 5.958868] asoc-simple-card sound: HiFi <-> 3830000.i2s mapping ok done. [....] Waiting for /dev to be fully populated...[ 16.361597] random: crng init done done. [ 248.801776] INFO: task mmcqd/0:127 blocked for more than 120 seconds. [ 248.806795] Tainted: G W 4.10.0-rc3-00116-g9a4c8a3 #2735 [ 248.813882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.821909] mmcqd/0 D 0 127 2 0x00000000 [ 248.827031] [<c06df4b4>] (__schedule) from [<c06df9bc>] (schedule+0x40/0xac) [ 248.834098] [<c06df9bc>] (schedule) from [<c06e531c>] (schedule_timeout+0x148/0x220) [ 248.841788] [<c06e531c>] (schedule_timeout) from [<c06e02a8>] (wait_for_common+0xb8/0x144) [ 248.849969] [<c06e02a8>] (wait_for_common) from [<c05280f8>] (mmc_start_areq+0x40/0x1ac) [ 248.858092] [<c05280f8>] (mmc_start_areq) from [<c0537680>] (mmc_blk_issue_rw_rq+0x78/0x314) [ 248.866485] [<c0537680>] (mmc_blk_issue_rw_rq) from [<c0538318>] (mmc_blk_issue_rq+0x9c/0x458) [ 248.875060] [<c0538318>] (mmc_blk_issue_rq) from [<c0538820>] (mmc_queue_thread+0x90/0x16c) [ 248.883383] [<c0538820>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 248.890867] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 248.898124] INFO: task udevd:273 blocked for more than 120 seconds. [ 248.904331] Tainted: G W 4.10.0-rc3-00116-g9a4c8a3 #2735 [ 248.911191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.919057] udevd D 0 273 250 0x00000005 [ 248.924543] [<c06df4b4>] (__schedule) from [<c06df9bc>] (schedule+0x40/0xac) [ 248.931557] [<c06df9bc>] (schedule) from [<c06e531c>] (schedule_timeout+0x148/0x220) [ 248.939206] [<c06e531c>] (schedule_timeout) from [<c06df2fc>] (io_schedule_timeout+0x74/0xb0) [ 248.947770] [<c06df2fc>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118) [ 248.955916] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c) [ 248.964751] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20) [ 248.974401] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8) [ 248.982593] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20) [ 248.990088] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0) [ 248.997229] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc) [ 249.004380] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc) [ 249.011570] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc) [ 249.018732] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c) [ 249.026392] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4) [ 249.033577] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4) [ 249.041086] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20) I assume that the problem got introduced even earlier, commit 4515dc6 ("mmc: block: shuffle retry and error handling") just makes it happen every time. The hardware I use for testing is Odroid XU3-Lite. Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, March 01, 2017 04:52:38 PM Bartlomiej Zolnierkiewicz wrote: > I assume that the problem got introduced even earlier, > commit 4515dc6 ("mmc: block: shuffle retry and error > handling") just makes it happen every time. Patch #16 makes it worse as now I get deadlock on boot: [ 248.801750] INFO: task kworker/2:2:113 blocked for more than 120 seconds. [ 248.807119] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 248.814162] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.821943] kworker/2:2 D 0 113 2 0x00000000 [ 248.827357] Workqueue: events_freezable mmc_rescan [ 248.832227] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 248.839123] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0) [ 248.846851] [<c0527708>] (__mmc_claim_host) from [<c052dc54>] (mmc_attach_mmc+0xb8/0x14c) [ 248.854989] [<c052dc54>] (mmc_attach_mmc) from [<c052a124>] (mmc_rescan+0x274/0x34c) [ 248.862725] [<c052a124>] (mmc_rescan) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 248.870498] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 248.878653] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 248.885934] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 248.893098] INFO: task jbd2/mmcblk0p2-:132 blocked for more than 120 seconds. [ 248.900092] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 248.907108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.914904] jbd2/mmcblk0p2- D 0 132 2 0x00000000 [ 248.920319] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 248.927433] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220) [ 248.935139] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0) [ 248.943634] [<c06def74>] (io_schedule_timeout) from [<c06df91c>] (bit_wait_io+0x10/0x58) [ 248.951684] [<c06df91c>] (bit_wait_io) from [<c06dfd3c>] (__wait_on_bit+0x84/0xbc) [ 248.959134] [<c06dfd3c>] (__wait_on_bit) from [<c06dfe60>] (out_of_line_wait_on_bit+0x68/0x70) [ 248.968142] [<c06dfe60>] (out_of_line_wait_on_bit) from [<c0295f4c>] (jbd2_journal_commit_transaction+0x1468/0x15c4) [ 248.978397] [<c0295f4c>] (jbd2_journal_commit_transaction) from [<c0298af0>] (kjournald2+0xbc/0x264) [ 248.987514] [<c0298af0>] (kjournald2) from [<c0135604>] (kthread+0xfc/0x134) [ 248.994494] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 249.001714] INFO: task kworker/1:2H:134 blocked for more than 120 seconds. [ 249.008412] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.015479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.023094] kworker/1:2H D 0 134 2 0x00000000 [ 249.028510] Workqueue: kblockd blk_mq_run_work_fn [ 249.033330] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.040199] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0) [ 249.047856] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8) [ 249.055736] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0) [ 249.064316] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198) [ 249.073845] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110) [ 249.083120] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 249.092076] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 249.099990] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 249.107322] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 249.114485] INFO: task kworker/5:2H:136 blocked for more than 120 seconds. [ 249.121326] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.128232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.136074] kworker/5:2H D 0 136 2 0x00000000 [ 249.141544] Workqueue: kblockd blk_mq_run_work_fn [ 249.146187] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.153419] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0) [ 249.160825] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8) [ 249.168755] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0) [ 249.177318] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198) [ 249.186858] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110) [ 249.196124] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 249.204969] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 249.213161] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 249.220270] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 249.227505] INFO: task kworker/0:1H:145 blocked for more than 120 seconds. [ 249.234328] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.241229] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.249066] kworker/0:1H D 0 145 2 0x00000000 [ 249.254521] Workqueue: kblockd blk_mq_run_work_fn [ 249.259176] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.266233] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0) [ 249.274001] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8) [ 249.281747] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0) [ 249.290284] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198) [ 249.299843] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110) [ 249.309122] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 249.317951] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 249.326017] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 249.333408] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 249.340459] INFO: task udevd:280 blocked for more than 120 seconds. [ 249.346725] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.353644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.361452] udevd D 0 280 258 0x00000005 [ 249.366885] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.373964] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220) [ 249.381651] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0) [ 249.390110] [<c06def74>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118) [ 249.398399] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c) [ 249.407129] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20) [ 249.416571] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8) [ 249.424892] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20) [ 249.432422] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0) [ 249.439501] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc) [ 249.446677] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc) [ 249.454152] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc) [ 249.461165] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c) [ 249.468833] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4) [ 249.476015] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4) [ 249.483557] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20) [ 249.491689] INFO: task udevd:281 blocked for more than 120 seconds. [ 249.497900] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.504892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.512771] udevd D 0 281 258 0x00000005 [ 249.518097] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.525153] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220) [ 249.532853] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0) [ 249.541354] [<c06def74>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118) [ 249.549463] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c) [ 249.558331] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20) [ 249.567785] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8) [ 249.576207] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20) [ 249.583669] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0) [ 249.590710] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc) [ 249.597843] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc) [ 249.605217] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc) [ 249.612399] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c) [ 249.620000] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4) [ 249.627228] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4) [ 249.634874] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20) [ 249.642922] INFO: task kworker/u16:2:1268 blocked for more than 120 seconds. [ 249.649891] Tainted: G W 4.10.0-rc3-00123-g1bec9a6 #2726 [ 249.656847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.664654] kworker/u16:2 D 0 1268 2 0x00000000 [ 249.670094] Workqueue: writeback wb_workfn (flush-179:0) [ 249.675398] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac) [ 249.682425] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220) [ 249.690103] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0) [ 249.698738] [<c06def74>] (io_schedule_timeout) from [<c03154e4>] (bt_get+0x140/0x228) [ 249.706432] [<c03154e4>] (bt_get) from [<c03156d0>] (blk_mq_get_tag+0x24/0xa8) [ 249.713613] [<c03156d0>] (blk_mq_get_tag) from [<c03119c0>] (__blk_mq_alloc_request+0x10/0x15c) [ 249.722287] [<c03119c0>] (__blk_mq_alloc_request) from [<c0311bbc>] (blk_mq_map_request+0xb0/0xfc) [ 249.731178] [<c0311bbc>] (blk_mq_map_request) from [<c03136f0>] (blk_sq_make_request+0x8c/0x298) [ 249.739962] [<c03136f0>] (blk_sq_make_request) from [<c0308e00>] (generic_make_request+0xd8/0x180) [ 249.748891] [<c0308e00>] (generic_make_request) from [<c0308f30>] (submit_bio+0x88/0x148) [ 249.757175] [<c0308f30>] (submit_bio) from [<c0256ccc>] (ext4_io_submit+0x34/0x40) [ 249.764581] [<c0256ccc>] (ext4_io_submit) from [<c0255674>] (ext4_writepages+0x484/0x670) [ 249.772722] [<c0255674>] (ext4_writepages) from [<c01a5348>] (do_writepages+0x24/0x38) [ 249.780573] [<c01a5348>] (do_writepages) from [<c0208038>] (__writeback_single_inode+0x28/0x18c) [ 249.789359] [<c0208038>] (__writeback_single_inode) from [<c02085f0>] (writeback_sb_inodes+0x1e0/0x394) [ 249.798717] [<c02085f0>] (writeback_sb_inodes) from [<c0208814>] (__writeback_inodes_wb+0x70/0xac) [ 249.807643] [<c0208814>] (__writeback_inodes_wb) from [<c02089dc>] (wb_writeback+0x18c/0x1b4) [ 249.816241] [<c02089dc>] (wb_writeback) from [<c0208d68>] (wb_workfn+0x1c8/0x388) [ 249.823590] [<c0208d68>] (wb_workfn) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 249.831375] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 249.839408] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 249.846726] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, March 01, 2017 04:52:38 PM Bartlomiej Zolnierkiewicz wrote: > I assume that the problem got introduced even earlier, > commit 4515dc6 ("mmc: block: shuffle retry and error > handling") just makes it happen every time. It seems to be introduced by patch #6. Patch #5 survived 30 consecutive boot+sync iterations (with later patches the issue shows up during the first 12 iterations). root@target:~# sync [ 248.801846] INFO: task mmcqd/0:128 blocked for more than 120 seconds. [ 248.806866] Tainted: G W 4.10.0-rc3-00113-g5750765 #2739 [ 248.814051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.821696] mmcqd/0 D 0 128 2 0x00000000 [ 248.827123] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac) [ 248.834210] [<c06dfa24>] (schedule) from [<c06e5384>] (schedule_timeout+0x148/0x220) [ 248.841912] [<c06e5384>] (schedule_timeout) from [<c06e0310>] (wait_for_common+0xb8/0x144) [ 248.850058] [<c06e0310>] (wait_for_common) from [<c0528100>] (mmc_start_areq+0x40/0x1ac) [ 248.858209] [<c0528100>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314) [ 248.866599] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458) [ 248.875293] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180) [ 248.883789] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 248.891058] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 248.898364] INFO: task jbd2/mmcblk0p2-:136 blocked for more than 120 seconds. [ 248.905400] Tainted: G W 4.10.0-rc3-00113-g5750765 #2739 [ 248.912353] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.919923] jbd2/mmcblk0p2- D 0 136 2 0x00000000 [ 248.925693] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac) [ 248.932470] [<c06dfa24>] (schedule) from [<c0294ccc>] (jbd2_journal_commit_transaction+0x1e8/0x15c4) [ 248.941552] [<c0294ccc>] (jbd2_journal_commit_transaction) from [<c0298af0>] (kjournald2+0xbc/0x264) [ 248.950608] [<c0298af0>] (kjournald2) from [<c0135604>] (kthread+0xfc/0x134) [ 248.957660] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 248.964860] INFO: task kworker/u16:2:730 blocked for more than 120 seconds. [ 248.971780] Tainted: G W 4.10.0-rc3-00113-g5750765 #2739 [ 248.978673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.986686] kworker/u16:2 D 0 730 2 0x00000000 [ 248.991993] Workqueue: writeback wb_workfn (flush-179:0) [ 248.997230] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac) [ 249.004287] [<c06dfa24>] (schedule) from [<c06e5384>] (schedule_timeout+0x148/0x220) [ 249.011997] [<c06e5384>] (schedule_timeout) from [<c06df364>] (io_schedule_timeout+0x74/0xb0) [ 249.020451] [<c06df364>] (io_schedule_timeout) from [<c06dfd0c>] (bit_wait_io+0x10/0x58) [ 249.028545] [<c06dfd0c>] (bit_wait_io) from [<c06dff1c>] (__wait_on_bit_lock+0x74/0xd0) [ 249.036513] [<c06dff1c>] (__wait_on_bit_lock) from [<c06dffe0>] (out_of_line_wait_on_bit_lock+0x68/0x70) [ 249.046231] [<c06dffe0>] (out_of_line_wait_on_bit_lock) from [<c0293dfc>] (do_get_write_access+0x3d0/0x4c4) [ 249.055729] [<c0293dfc>] (do_get_write_access) from [<c029410c>] (jbd2_journal_get_write_access+0x38/0x64) [ 249.065336] [<c029410c>] (jbd2_journal_get_write_access) from [<c0272680>] (__ext4_journal_get_write_access+0x2c/0x68) [ 249.076016] [<c0272680>] (__ext4_journal_get_write_access) from [<c0278eb8>] (ext4_mb_mark_diskspace_used+0x64/0x474) [ 249.086515] [<c0278eb8>] (ext4_mb_mark_diskspace_used) from [<c027a334>] (ext4_mb_new_blocks+0x258/0xa1c) [ 249.096040] [<c027a334>] (ext4_mb_new_blocks) from [<c026fc80>] (ext4_ext_map_blocks+0x8b4/0xf28) [ 249.104883] [<c026fc80>] (ext4_ext_map_blocks) from [<c024f318>] (ext4_map_blocks+0x144/0x5f8) [ 249.113468] [<c024f318>] (ext4_map_blocks) from [<c0254b0c>] (mpage_map_and_submit_extent+0xa4/0x788) [ 249.122641] [<c0254b0c>] (mpage_map_and_submit_extent) from [<c02556d0>] (ext4_writepages+0x4e0/0x670) [ 249.131925] [<c02556d0>] (ext4_writepages) from [<c01a5348>] (do_writepages+0x24/0x38) [ 249.139774] [<c01a5348>] (do_writepages) from [<c0208038>] (__writeback_single_inode+0x28/0x18c) [ 249.148555] [<c0208038>] (__writeback_single_inode) from [<c02085f0>] (writeback_sb_inodes+0x1e0/0x394) [ 249.157909] [<c02085f0>] (writeback_sb_inodes) from [<c0208814>] (__writeback_inodes_wb+0x70/0xac) [ 249.166833] [<c0208814>] (__writeback_inodes_wb) from [<c02089dc>] (wb_writeback+0x18c/0x1b4) [ 249.175324] [<c02089dc>] (wb_writeback) from [<c0208c74>] (wb_workfn+0xd4/0x388) [ 249.182704] [<c0208c74>] (wb_workfn) from [<c012fdf8>] (process_one_work+0x120/0x318) [ 249.190464] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac) [ 249.198551] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 249.205904] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 249.213094] INFO: task sync:1403 blocked for more than 120 seconds. [ 249.219261] Tainted: G W 4.10.0-rc3-00113-g5750765 #2739 [ 249.226220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.234019] sync D 0 1403 1396 0x00000000 [ 249.239424] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac) [ 249.246624] [<c06dfa24>] (schedule) from [<c02078c0>] (wb_wait_for_completion+0x50/0x7c) [ 249.254538] [<c02078c0>] (wb_wait_for_completion) from [<c0207c14>] (sync_inodes_sb+0x94/0x20c) [ 249.263200] [<c0207c14>] (sync_inodes_sb) from [<c01e4dc8>] (iterate_supers+0xac/0xd4) [ 249.271056] [<c01e4dc8>] (iterate_supers) from [<c020c088>] (sys_sync+0x30/0x98) [ 249.278446] [<c020c088>] (sys_sync) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c) I also once hit another problem with patch #6 that doesn't happen with patch #5: [ 12.121767] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 12.129747] pgd = c0004000 [ 12.132425] [00000008] *pgd=00000000 [ 12.135996] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 12.141262] Modules linked in: [ 12.144304] CPU: 0 PID: 126 Comm: mmcqd/0 Tainted: G W 4.10.0-rc3-00113-g5750765 #2739 [ 12.153296] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 12.159367] task: edd19900 task.stack: edd66000 [ 12.163900] PC is at kthread_queue_work+0x18/0x64 [ 12.168574] LR is at _raw_spin_lock_irqsave+0x20/0x28 [ 12.173583] pc : [<c0135b24>] lr : [<c06e6138>] psr: 60000193 [ 12.173583] sp : edd67d10 ip : 00000000 fp : edcc9b04 [ 12.185014] r10: 00000000 r9 : edd6808c r8 : edcc9b08 [ 12.190215] r7 : 00000000 r6 : edc97320 r5 : edc97324 r4 : 00000008 [ 12.196714] r3 : edc97000 r2 : 00000000 r1 : 0b750b74 r0 : a0000113 [ 12.203216] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none [ 12.210406] Control: 10c5387d Table: 6d0e006a DAC: 00000051 [ 12.216125] Process mmcqd/0 (pid: 126, stack limit = 0xedd66210) [ 12.222102] Stack: (0xedd67d10 to 0xedd68000) [ 12.226444] 7d00: edc97000 edd68004 edd6808c edc97000 [ 12.234595] 7d20: 00000000 c0527ab8 edcc9a10 edc97000 edd68004 edd68004 edcc9b08 c0542834 [ 12.242740] 7d40: edd680c0 edcc9a10 00000001 edd680f4 edd68004 c0542b5c edd67da4 edcc9a80 [ 12.250886] 7d60: c0b108aa edcc9af0 edcc9af4 00000000 c0a62244 00000000 c0b02080 00000006 [ 12.259031] 7d80: 00000101 c011f6f0 00000000 c0b02098 00000006 c0a622c8 c0b02080 c011edac [ 12.267176] 7da0: eea15160 00000001 00000000 00000009 ffff8f8d 00208840 eea15100 00000000 [ 12.275322] 7dc0: 0000004b c0a65c20 00000000 00000001 ee818000 edd67e28 00000000 c011f1a8 [ 12.283468] 7de0: 0000008c c016068c f0802000 c0b05724 f080200c 000003eb c0b17c30 f0803000 [ 12.291614] 7e00: edd67e28 c0101470 c03448b8 20000013 ffffffff edd67e5c 00000000 edd66000 [ 12.299759] 7e20: edd68004 c010b00c c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068 [ 12.307904] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78 [ 12.316050] 7e60: c011f068 c03448b8 20000013 ffffffff 00000051 00000000 edd68004 00000001 [ 12.324195] 7e80: 00000000 00000201 edc97000 edd68004 00000001 c011f068 edc97000 c0527b8c [ 12.332340] 7ea0: 00000000 edd68004 edc97000 edd6813c 00000001 c0527d04 edd68044 edc97000 [ 12.340487] 7ec0: 00000000 c0528208 edd68000 edd31800 edd48858 edd48858 ede6fe60 edd48840 [ 12.348631] 7ee0: edd48840 00000001 00000000 c05376c0 00000000 00000001 00000000 00000000 [ 12.356777] 7f00: 00000000 c013c5ec 00000100 ede6fe60 00000000 edd48858 edd48840 edd48840 [ 12.364922] 7f20: edd31800 00000001 00000000 c0538358 edc18b50 edc97000 edd48860 00000001 [ 12.373068] 7f40: edc18b50 edd48858 00000000 ede6fe60 edc18b50 edc97000 edd48860 00000001 [ 12.381214] 7f60: 00000000 c0538868 edd19900 eeae0500 00000000 edd4e000 eeae0528 edd48858 [ 12.389358] 7f80: edc87d14 c05387d0 00000000 c0135604 edd4e000 c0135508 00000000 00000000 [ 12.397502] 7fa0: 00000000 00000000 00000000 c0107978 00000000 00000000 00000000 00000000 [ 12.405647] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 12.413795] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff [ 12.421985] [<c0135b24>] (kthread_queue_work) from [<c0527ab8>] (mmc_request_done+0xd8/0x158) [ 12.430458] [<c0527ab8>] (mmc_request_done) from [<c0542834>] (dw_mci_request_end+0xa0/0xd8) [ 12.438848] [<c0542834>] (dw_mci_request_end) from [<c0542b5c>] (dw_mci_tasklet_func+0x2f0/0x394) [ 12.447693] [<c0542b5c>] (dw_mci_tasklet_func) from [<c011f6f0>] (tasklet_action+0x84/0x12c) [ 12.456089] [<c011f6f0>] (tasklet_action) from [<c011edac>] (__do_softirq+0xec/0x244) [ 12.463885] [<c011edac>] (__do_softirq) from [<c011f1a8>] (irq_exit+0xc0/0x104) [ 12.471166] [<c011f1a8>] (irq_exit) from [<c016068c>] (__handle_domain_irq+0x70/0xe4) [ 12.478966] [<c016068c>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c) [ 12.487280] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 12.494716] Exception stack(0xedd67e28 to 0xedd67e70) [ 12.499753] 7e20: c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068 [ 12.507902] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78 [ 12.516039] 7e60: c011f068 c03448b8 20000013 ffffffff [ 12.521085] [<c010b00c>] (__irq_svc) from [<c03448b8>] (check_preemption_disabled+0x30/0x128) [ 12.529573] [<c03448b8>] (check_preemption_disabled) from [<c011f068>] (__local_bh_enable_ip+0xc8/0xec) [ 12.538931] [<c011f068>] (__local_bh_enable_ip) from [<c0527b8c>] (__mmc_start_request+0x54/0xdc) [ 12.547770] [<c0527b8c>] (__mmc_start_request) from [<c0527d04>] (mmc_start_request+0xf0/0x11c) [ 12.556437] [<c0527d04>] (mmc_start_request) from [<c0528208>] (mmc_start_areq+0x148/0x1ac) [ 12.564753] [<c0528208>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314) [ 12.573155] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458) [ 12.581733] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180) [ 12.590053] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 12.597603] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 12.604782] Code: e1a06000 e1a04001 e1a00005 eb16c17c (e5943000) [ 12.610842] ---[ end trace 86f45842e4b0b193 ]--- [ 12.615426] Kernel panic - not syncing: Fatal exception in interrupt [ 12.621786] CPU1: stopping [ 12.624455] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 12.633452] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 12.639567] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 12.647261] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 12.654445] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 12.661810] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 12.669344] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 12.676783] Exception stack(0xee8b3f78 to 0xee8b3fc0) [ 12.681813] 3f60: 00000001 00000000 [ 12.689970] 3f80: ee8b3fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 12.698113] 3fa0: 00000000 00000000 00000001 ee8b3fc8 c01083c0 c01083c4 60000013 ffffffff [ 12.706265] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 12.713653] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 12.721001] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 12.728538] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 12.735463] CPU5: stopping [ 12.738165] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 12.747156] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 12.753291] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 12.760971] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 12.768153] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 12.775515] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 12.783049] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 12.790485] Exception stack(0xee8bbf78 to 0xee8bbfc0) [ 12.795517] bf60: 00000001 00000000 [ 12.803673] bf80: ee8bbfd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 12.811817] bfa0: 00000000 00000000 00000001 ee8bbfc8 c01083c0 c01083c4 60000013 ffffffff [ 12.819968] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 12.827350] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 12.834710] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 12.842239] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 12.849159] CPU4: stopping [ 12.851846] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 12.860840] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 12.866957] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 12.874653] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 12.881835] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 12.889197] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 12.896729] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 12.904168] Exception stack(0xee8b9f78 to 0xee8b9fc0) [ 12.909204] 9f60: 00000001 00000000 [ 12.917356] 9f80: ee8b9fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 12.925499] 9fa0: 00000000 00000000 00000001 ee8b9fc8 c01083c0 c01083c4 60000013 ffffffff [ 12.933655] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 12.941028] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 12.948393] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 12.955923] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 12.962842] CPU2: stopping [ 12.965520] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 12.974517] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 12.980621] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 12.988321] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 12.995508] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 13.002875] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 13.010409] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.017849] Exception stack(0xee8b5f78 to 0xee8b5fc0) [ 13.022878] 5f60: 00000001 00000000 [ 13.031036] 5f80: ee8b5fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 13.039182] 5fa0: 00000000 00000000 00000001 ee8b5fc8 c01083c0 c01083c4 60000013 ffffffff [ 13.047329] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 13.054703] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 13.062066] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 13.069600] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 13.076519] CPU3: stopping [ 13.079209] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 13.088207] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 13.094309] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 13.102010] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 13.109197] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 13.116563] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 13.124099] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.131537] Exception stack(0xee8b7f78 to 0xee8b7fc0) [ 13.136566] 7f60: 00000001 00000000 [ 13.144723] 7f80: ee8b7fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 13.152869] 7fa0: 00000000 00000000 00000001 ee8b7fc8 c01083c0 c01083c4 60000013 ffffffff [ 13.161019] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 13.168390] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 13.175754] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 13.183286] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 13.190213] CPU6: stopping [ 13.192912] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 13.201905] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 13.208022] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 13.215716] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 13.222899] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 13.230263] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 13.237796] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.245233] Exception stack(0xee8bdf78 to 0xee8bdfc0) [ 13.250265] df60: 00000001 00000000 [ 13.258422] df80: ee8bdfd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 13.266567] dfa0: 00000000 00000000 00000001 ee8bdfc8 c01083c0 c01083c4 60000013 ffffffff [ 13.274720] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 13.282096] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 13.289459] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 13.296989] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 13.303908] CPU7: stopping [ 13.306603] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 13.315594] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 13.321711] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 13.329407] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 13.336587] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8) [ 13.343950] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c) [ 13.351484] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.358923] Exception stack(0xee8bff78 to 0xee8bffc0) [ 13.363955] ff60: 00000001 00000000 [ 13.372113] ff80: ee8bffd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000 [ 13.380256] ffa0: 00000000 00000000 00000001 ee8bffc8 c01083c0 c01083c4 60000013 ffffffff [ 13.388410] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c) [ 13.395786] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200) [ 13.403148] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c) [ 13.410678] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c) [ 13.417621] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 13.424840] ------------[ cut here ]------------ [ 13.429318] WARNING: CPU: 0 PID: 126 at kernel/workqueue.c:857 wq_worker_waking_up+0x70/0x80 [ 13.437681] Modules linked in: [ 13.440727] CPU: 0 PID: 126 Comm: mmcqd/0 Tainted: G D W 4.10.0-rc3-00113-g5750765 #2739 [ 13.449728] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 13.455823] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14) [ 13.463530] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94) [ 13.470717] [<c032956c>] (dump_stack) from [<c011ad10>] (__warn+0xd4/0x100) [ 13.477650] [<c011ad10>] (__warn) from [<c011ad5c>] (warn_slowpath_null+0x20/0x28) [ 13.485194] [<c011ad5c>] (warn_slowpath_null) from [<c0130e70>] (wq_worker_waking_up+0x70/0x80) [ 13.493873] [<c0130e70>] (wq_worker_waking_up) from [<c013ba30>] (ttwu_do_activate+0x58/0x6c) [ 13.502355] [<c013ba30>] (ttwu_do_activate) from [<c013c4ec>] (try_to_wake_up+0x190/0x290) [ 13.510586] [<c013c4ec>] (try_to_wake_up) from [<c01521dc>] (__wake_up_common+0x4c/0x80) [ 13.518645] [<c01521dc>] (__wake_up_common) from [<c0152224>] (__wake_up_locked+0x14/0x1c) [ 13.526876] [<c0152224>] (__wake_up_locked) from [<c0152c24>] (complete+0x34/0x44) [ 13.534433] [<c0152c24>] (complete) from [<c04fcd34>] (exynos5_i2c_irq+0x220/0x26c) [ 13.542042] [<c04fcd34>] (exynos5_i2c_irq) from [<c0160dac>] (__handle_irq_event_percpu+0x58/0x140) [ 13.551048] [<c0160dac>] (__handle_irq_event_percpu) from [<c0160eb0>] (handle_irq_event_percpu+0x1c/0x58) [ 13.560664] [<c0160eb0>] (handle_irq_event_percpu) from [<c0160f24>] (handle_irq_event+0x38/0x5c) [ 13.569511] [<c0160f24>] (handle_irq_event) from [<c016422c>] (handle_fasteoi_irq+0xc4/0x19c) [ 13.578016] [<c016422c>] (handle_fasteoi_irq) from [<c0160574>] (generic_handle_irq+0x18/0x28) [ 13.586579] [<c0160574>] (generic_handle_irq) from [<c0160688>] (__handle_domain_irq+0x6c/0xe4) [ 13.595239] [<c0160688>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c) [ 13.603556] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.610994] Exception stack(0xedd67b30 to 0xedd67b78) [ 13.616028] 7b20: 00000041 edd19900 00000102 edd66000 [ 13.624180] 7b40: c0b49ae8 00000000 c0881434 00000000 00000000 edd19900 60000193 edcc9b04 [ 13.632321] 7b60: 00000001 edd67b80 c0196974 c0196978 20000113 ffffffff [ 13.638933] [<c010b00c>] (__irq_svc) from [<c0196978>] (panic+0x1e8/0x26c) [ 13.645769] [<c0196978>] (panic) from [<c010a7f8>] (die+0x2b0/0x2e0) [ 13.652099] [<c010a7f8>] (die) from [<c011514c>] (__do_kernel_fault.part.0+0x54/0x1e4) [ 13.659982] [<c011514c>] (__do_kernel_fault.part.0) from [<c0110bec>] (do_page_fault+0x26c/0x294) [ 13.668812] [<c0110bec>] (do_page_fault) from [<c0101308>] (do_DataAbort+0x34/0xb4) [ 13.676432] [<c0101308>] (do_DataAbort) from [<c010af78>] (__dabt_svc+0x58/0x80) [ 13.683783] Exception stack(0xedd67cc0 to 0xedd67d08) [ 13.688825] 7cc0: a0000113 0b750b74 00000000 edc97000 00000008 edc97324 edc97320 00000000 [ 13.696970] 7ce0: edcc9b08 edd6808c 00000000 edcc9b04 00000000 edd67d10 c06e6138 c0135b24 [ 13.705102] 7d00: 60000193 ffffffff [ 13.708586] [<c010af78>] (__dabt_svc) from [<c0135b24>] (kthread_queue_work+0x18/0x64) [ 13.716478] [<c0135b24>] (kthread_queue_work) from [<c0527ab8>] (mmc_request_done+0xd8/0x158) [ 13.724970] [<c0527ab8>] (mmc_request_done) from [<c0542834>] (dw_mci_request_end+0xa0/0xd8) [ 13.733373] [<c0542834>] (dw_mci_request_end) from [<c0542b5c>] (dw_mci_tasklet_func+0x2f0/0x394) [ 13.742211] [<c0542b5c>] (dw_mci_tasklet_func) from [<c011f6f0>] (tasklet_action+0x84/0x12c) [ 13.750614] [<c011f6f0>] (tasklet_action) from [<c011edac>] (__do_softirq+0xec/0x244) [ 13.758411] [<c011edac>] (__do_softirq) from [<c011f1a8>] (irq_exit+0xc0/0x104) [ 13.765689] [<c011f1a8>] (irq_exit) from [<c016068c>] (__handle_domain_irq+0x70/0xe4) [ 13.773486] [<c016068c>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c) [ 13.781804] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8) [ 13.789241] Exception stack(0xedd67e28 to 0xedd67e70) [ 13.794279] 7e20: c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068 [ 13.802427] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78 [ 13.810565] 7e60: c011f068 c03448b8 20000013 ffffffff [ 13.815603] [<c010b00c>] (__irq_svc) from [<c03448b8>] (check_preemption_disabled+0x30/0x128) [ 13.824098] [<c03448b8>] (check_preemption_disabled) from [<c011f068>] (__local_bh_enable_ip+0xc8/0xec) [ 13.833457] [<c011f068>] (__local_bh_enable_ip) from [<c0527b8c>] (__mmc_start_request+0x54/0xdc) [ 13.842297] [<c0527b8c>] (__mmc_start_request) from [<c0527d04>] (mmc_start_request+0xf0/0x11c) [ 13.850963] [<c0527d04>] (mmc_start_request) from [<c0528208>] (mmc_start_areq+0x148/0x1ac) [ 13.859278] [<c0528208>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314) [ 13.867680] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458) [ 13.876258] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180) [ 13.884579] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134) [ 13.892121] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c) [ 13.899292] ---[ end trace 86f45842e4b0b194 ]--- Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index c459d80c66bf..0bd9070f5f2e 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1614,182 +1614,181 @@ static void mmc_blk_rw_cmd_abort(struct mmc_card *card, struct request *req) * @mq: the queue with the card and host to restart * @req: a new request that want to be started after the current one */ -static void mmc_blk_rw_try_restart(struct mmc_queue *mq, struct request *req) +static void mmc_blk_rw_try_restart(struct mmc_queue *mq) { - if (!req) - return; - - /* - * If the card was removed, just cancel everything and return. - */ - if (mmc_card_removed(mq->card)) { - req->rq_flags |= RQF_QUIET; - blk_end_request_all(req, -EIO); - return; - } - /* Else proceed and try to restart the current async request */ + /* Proceed and try to restart the current async request */ mmc_blk_rw_rq_prep(mq->mqrq_cur, mq->card, 0, mq); - mmc_start_areq(mq->card->host, &mq->mqrq_cur->areq, NULL); + mmc_restart_areq(mq->card->host, &mq->mqrq_cur->areq); } -static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) +void mmc_blk_rw_done(struct mmc_async_req *areq, + enum mmc_blk_status status) { - struct mmc_blk_data *md = mq->blkdata; - struct mmc_card *card = md->queue.card; - struct mmc_blk_request *brq; - int disable_multi = 0, retry = 0, type, retune_retry_done = 0; - enum mmc_blk_status status; + struct mmc_queue *mq; struct mmc_queue_req *mq_rq; + struct mmc_blk_request *brq; + struct mmc_blk_data *md; struct request *old_req; - struct mmc_async_req *new_areq; - struct mmc_async_req *old_areq; + struct mmc_card *card; + struct mmc_host *host; + int disable_multi = 0, retry = 0, type, retune_retry_done = 0; bool req_pending = true; - if (!new_req && !mq->mqrq_prev->req) - return; - - do { - if (new_req) { - /* - * When 4KB native sector is enabled, only 8 blocks - * multiple read or write is allowed - */ - if (mmc_large_sector(card) && - !IS_ALIGNED(blk_rq_sectors(new_req), 8)) { - pr_err("%s: Transfer size is not 4KB sector size aligned\n", - new_req->rq_disk->disk_name); - mmc_blk_rw_cmd_abort(card, new_req); - return; - } - - mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq); - new_areq = &mq->mqrq_cur->areq; - } else - new_areq = NULL; - - old_areq = mmc_start_areq(card->host, new_areq, &status); - if (!old_areq) { - /* - * We have just put the first request into the pipeline - * and there is nothing more to do until it is - * complete. - */ - return; - } - + /* + * An asynchronous request has been completed and we proceed + * to handle the result of it. + */ + mq_rq = container_of(areq, struct mmc_queue_req, areq); + mq = mq_rq->mq; + md = mq->blkdata; + card = mq->card; + host = card->host; + brq = &mq_rq->brq; + old_req = mq_rq->req; + type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; + + mmc_queue_bounce_post(mq_rq); + + switch (status) { + case MMC_BLK_SUCCESS: + case MMC_BLK_PARTIAL: /* - * An asynchronous request has been completed and we proceed - * to handle the result of it. + * A block was successfully transferred. */ - mq_rq = container_of(old_areq, struct mmc_queue_req, areq); - brq = &mq_rq->brq; - old_req = mq_rq->req; - type = rq_data_dir(old_req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; - mmc_queue_bounce_post(mq_rq); - - switch (status) { - case MMC_BLK_SUCCESS: - case MMC_BLK_PARTIAL: - /* - * A block was successfully transferred. - */ - mmc_blk_reset_success(md, type); + mmc_blk_reset_success(md, type); - req_pending = blk_end_request(old_req, 0, - brq->data.bytes_xfered); - /* - * If the blk_end_request function returns non-zero even - * though all data has been transferred and no errors - * were returned by the host controller, it's a bug. - */ - if (status == MMC_BLK_SUCCESS && req_pending) { - pr_err("%s BUG rq_tot %d d_xfer %d\n", - __func__, blk_rq_bytes(old_req), - brq->data.bytes_xfered); - mmc_blk_rw_cmd_abort(card, old_req); - return; - } - break; - case MMC_BLK_CMD_ERR: - req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, req_pending); - if (mmc_blk_reset(md, card->host, type)) { - mmc_blk_rw_cmd_abort(card, old_req); - mmc_blk_rw_try_restart(mq, new_req); - return; - } - if (!req_pending) { - mmc_blk_rw_try_restart(mq, new_req); - return; - } - break; - case MMC_BLK_RETRY: - retune_retry_done = brq->retune_retry_done; - if (retry++ < 5) - break; - /* Fall through */ - case MMC_BLK_ABORT: - if (!mmc_blk_reset(md, card->host, type)) - break; + req_pending = blk_end_request(old_req, 0, + brq->data.bytes_xfered); + /* + * If the blk_end_request function returns non-zero even + * though all data has been transferred and no errors + * were returned by the host controller, it's a bug. + */ + if (status == MMC_BLK_SUCCESS && req_pending) { + pr_err("%s BUG rq_tot %d d_xfer %d\n", + __func__, blk_rq_bytes(old_req), + brq->data.bytes_xfered); mmc_blk_rw_cmd_abort(card, old_req); - mmc_blk_rw_try_restart(mq, new_req); return; - case MMC_BLK_DATA_ERR: { - int err; - - err = mmc_blk_reset(md, card->host, type); - if (!err) - break; - if (err == -ENODEV) { - mmc_blk_rw_cmd_abort(card, old_req); - mmc_blk_rw_try_restart(mq, new_req); - return; - } - /* Fall through */ } - case MMC_BLK_ECC_ERR: - if (brq->data.blocks > 1) { - /* Redo read one sector at a time */ - pr_warn("%s: retrying using single block read\n", - old_req->rq_disk->disk_name); - disable_multi = 1; - break; - } - /* - * After an error, we redo I/O one sector at a - * time, so we only reach here after trying to - * read a single sector. - */ - req_pending = blk_end_request(old_req, -EIO, - brq->data.blksz); - if (!req_pending) { - mmc_blk_rw_try_restart(mq, new_req); - return; - } - break; - case MMC_BLK_NOMEDIUM: + break; + case MMC_BLK_CMD_ERR: + req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, req_pending); + if (mmc_blk_reset(md, host, type)) { mmc_blk_rw_cmd_abort(card, old_req); - mmc_blk_rw_try_restart(mq, new_req); + mmc_blk_rw_try_restart(mq); return; - default: - pr_err("%s: Unhandled return value (%d)", - old_req->rq_disk->disk_name, status); + } + if (!req_pending) { + mmc_blk_rw_try_restart(mq); + return; + } + break; + case MMC_BLK_RETRY: + retune_retry_done = brq->retune_retry_done; + if (retry++ < 5) + break; + /* Fall through */ + case MMC_BLK_ABORT: + if (!mmc_blk_reset(md, host, type)) + break; + mmc_blk_rw_cmd_abort(card, old_req); + mmc_blk_rw_try_restart(mq); + return; + case MMC_BLK_DATA_ERR: { + int err; + err = mmc_blk_reset(md, host, type); + if (!err) + break; + if (err == -ENODEV) { mmc_blk_rw_cmd_abort(card, old_req); - mmc_blk_rw_try_restart(mq, new_req); + mmc_blk_rw_try_restart(mq); return; } + /* Fall through */ + } + case MMC_BLK_ECC_ERR: + if (brq->data.blocks > 1) { + /* Redo read one sector at a time */ + pr_warn("%s: retrying using single block read\n", + old_req->rq_disk->disk_name); + disable_multi = 1; + break; + } + /* + * After an error, we redo I/O one sector at a + * time, so we only reach here after trying to + * read a single sector. + */ + req_pending = blk_end_request(old_req, -EIO, + brq->data.blksz); + if (!req_pending) { + mmc_blk_rw_try_restart(mq); + return; + } + break; + case MMC_BLK_NOMEDIUM: + mmc_blk_rw_cmd_abort(card, old_req); + mmc_blk_rw_try_restart(mq); + return; + default: + pr_err("%s: Unhandled return value (%d)", + old_req->rq_disk->disk_name, status); + mmc_blk_rw_cmd_abort(card, old_req); + mmc_blk_rw_try_restart(mq); + return; + } - if (req_pending) { - /* - * In case of a incomplete request - * prepare it again and resend. - */ - mmc_blk_rw_rq_prep(mq_rq, card, - disable_multi, mq); - mmc_start_areq(card->host, - &mq_rq->areq, NULL); - mq_rq->brq.retune_retry_done = retune_retry_done; + if (req_pending) { + /* + * In case of a incomplete request + * prepare it again and resend. + */ + mmc_blk_rw_rq_prep(mq_rq, card, + disable_multi, mq); + mq_rq->brq.retune_retry_done = retune_retry_done; + mmc_restart_areq(host, &mq->mqrq_cur->areq); + } +} + +static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) +{ + enum mmc_blk_status status; + struct mmc_async_req *new_areq; + struct mmc_async_req *old_areq; + struct mmc_card *card = mq->card; + + if (!new_req && !mq->mqrq_prev->req) + return; + + if (new_req) { + /* + * When 4KB native sector is enabled, only 8 blocks + * multiple read or write is allowed + */ + if (mmc_large_sector(card) && + !IS_ALIGNED(blk_rq_sectors(new_req), 8)) { + pr_err("%s: Transfer size is not 4KB sector size aligned\n", + new_req->rq_disk->disk_name); + mmc_blk_rw_cmd_abort(card, new_req); + return; } - } while (req_pending); + + mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq); + new_areq = &mq->mqrq_cur->areq; + } else + new_areq = NULL; + + old_areq = mmc_start_areq(card->host, new_areq, &status); + if (!old_areq) { + /* + * We have just put the first request into the pipeline + * and there is nothing more to do until it is + * complete. + */ + return; + } + /* FIXME: yes, we just disregard the old_areq */ } void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req) diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h index 860ca7c8df86..b4b489911599 100644 --- a/drivers/mmc/core/block.h +++ b/drivers/mmc/core/block.h @@ -1,9 +1,12 @@ #ifndef _MMC_CORE_BLOCK_H #define _MMC_CORE_BLOCK_H +struct mmc_async_req; +enum mmc_blk_status; struct mmc_queue; struct request; +void mmc_blk_rw_done(struct mmc_async_req *areq, enum mmc_blk_status status); void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req); #endif diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 4b84f18518ac..34337ef6705e 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -39,6 +39,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/mmc.h> +#include "block.h" #include "core.h" #include "card.h" #include "bus.h" @@ -632,13 +633,25 @@ void mmc_finalize_areq(struct kthread_work *work) /* Successfully postprocess the old request at this point */ mmc_post_req(host, areq->mrq, 0); + mmc_blk_rw_done(areq, status); - areq->finalization_status = status; complete(&areq->complete); } EXPORT_SYMBOL(mmc_finalize_areq); /** + * mmc_restart_areq() - restart an asynchronous request + * @host: MMC host to restart the command on + * @areq: the asynchronous request to restart + */ +int mmc_restart_areq(struct mmc_host *host, + struct mmc_async_req *areq) +{ + return __mmc_start_data_req(host, areq->mrq); +} +EXPORT_SYMBOL(mmc_restart_areq); + +/** * mmc_start_areq - start an asynchronous request * @host: MMC host to start command * @areq: asynchronous request to start @@ -667,12 +680,10 @@ struct mmc_async_req *mmc_start_areq(struct mmc_host *host, mmc_pre_req(host, areq->mrq); /* Finalize previous request, if there is one */ - if (previous) { + if (previous) wait_for_completion(&previous->complete); - status = previous->finalization_status; - } else { - status = MMC_BLK_SUCCESS; - } + + status = MMC_BLK_SUCCESS; if (ret_stat) *ret_stat = status; diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index bc116709c806..ae6837317fe0 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -268,7 +268,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, if (!mq->mqrq) goto blk_cleanup; mq->mqrq_cur = &mq->mqrq[0]; + mq->mqrq_cur->mq = mq; mq->mqrq_prev = &mq->mqrq[1]; + mq->mqrq_prev->mq = mq; mq->queue->queuedata = mq; blk_queue_prep_rq(mq->queue, mmc_prep_request); diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h index 39d8e710287e..c18d3f908433 100644 --- a/drivers/mmc/core/queue.h +++ b/drivers/mmc/core/queue.h @@ -34,6 +34,7 @@ struct mmc_queue_req { struct scatterlist *bounce_sg; unsigned int bounce_sg_len; struct mmc_async_req areq; + struct mmc_queue *mq; }; struct mmc_queue { diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h index 5db0fb722c37..55b45dcddee6 100644 --- a/include/linux/mmc/core.h +++ b/include/linux/mmc/core.h @@ -159,6 +159,7 @@ struct mmc_card; struct mmc_async_req; void mmc_finalize_areq(struct kthread_work *work); +int mmc_restart_areq(struct mmc_host *host, struct mmc_async_req *areq); struct mmc_async_req *mmc_start_areq(struct mmc_host *host, struct mmc_async_req *areq, enum mmc_blk_status *ret_stat); diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index a7c0ed887391..47d80b8470cd 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -171,7 +171,6 @@ struct mmc_async_req { */ enum mmc_blk_status (*err_check)(struct mmc_card *, struct mmc_async_req *); struct kthread_work finalization_work; - enum mmc_blk_status finalization_status; struct completion complete; struct mmc_host *host; };
Instead of doing retries at the same time as trying to submit new requests, do the retries when the request is reported as completed by the driver, in the finalization worker. This is achieved by letting the core worker call back into the block layer using mmc_blk_rw_done(), that will read the status and repeatedly try to hammer the request using single request etc by calling back to the core layer using mmc_restart_areq() The beauty of it is that the completion will not complete until the block layer has had the opportunity to hammer a bit at the card using a bunch of different approaches in the while() loop in mmc_blk_rw_done() The algorithm for recapture, retry and handle errors is essentially identical to the one we used to have in mmc_blk_issue_rw_rq(), only augmented to get called in another path. We have to add and initialize a pointer back to the struct mmc_queue from the struct mmc_queue_req to find the queue from the asynchronous request. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- drivers/mmc/core/block.c | 307 +++++++++++++++++++++++------------------------ drivers/mmc/core/block.h | 3 + drivers/mmc/core/core.c | 23 +++- drivers/mmc/core/queue.c | 2 + drivers/mmc/core/queue.h | 1 + include/linux/mmc/core.h | 1 + include/linux/mmc/host.h | 1 - 7 files changed, 177 insertions(+), 161 deletions(-)