Message ID | 1398854171-5187-1-git-send-email-ricardo.ribalda@gmail.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Wed, Apr 30, 2014 at 12:36:11PM +0200, Ricardo Ribalda Delgado wrote: > Since "786235eeba0e1e85e5cbbb9f97d1087ad03dfa21 kthread: > make kthread_create() killable" kthread_run can be killed by the user, > ie, by killing modprobe. > - flush_kthread_worker(&master->kworker); > - kthread_stop(master->kworker_task); > + if (!IS_ERR(master->kworker_task)) { > + flush_kthread_worker(&master->kworker); > + kthread_stop(master->kworker_task); > + } How does this fix avoid racing with the task being killed? It will improve things but it doesn't look like a full fix. Please don't include entire backtraces in commit messages, they're typically extremely long and don't really add anything - edited highlights are fine but try to keep it to the point.
Hello Mark Thanks for your comments. This fix does not avoid the task being killed (which is not an error). What it does is that IF the task is killed or we are out of memory we will exit with all the resources properly released and no locks helds, giving the user a chance to reload/rebind the module instead of getting and unstable system that cannot even reboot without a powercycle. The mention of 786235eeba0e1e85e5cbbb9f97d1087ad03dfa2 is because the condition if (IS_ERR(master->kworker_task)) { dev_err(&master->dev, "failed to create message pump task\n"); return -ENOMEM; } was very unlikely to happen/difficult to force . I will reword the message and clean out he backtrace and upload a v2. Thanks! On Thu, May 1, 2014 at 3:34 AM, Mark Brown <broonie@kernel.org> wrote: > On Wed, Apr 30, 2014 at 12:36:11PM +0200, Ricardo Ribalda Delgado wrote: > >> Since "786235eeba0e1e85e5cbbb9f97d1087ad03dfa21 kthread: >> make kthread_create() killable" kthread_run can be killed by the user, >> ie, by killing modprobe. > >> - flush_kthread_worker(&master->kworker); >> - kthread_stop(master->kworker_task); >> + if (!IS_ERR(master->kworker_task)) { >> + flush_kthread_worker(&master->kworker); >> + kthread_stop(master->kworker_task); >> + } > > How does this fix avoid racing with the task being killed? It will > improve things but it doesn't look like a full fix. > > Please don't include entire backtraces in commit messages, they're > typically extremely long and don't really add anything - edited > highlights are fine but try to keep it to the point.
On Thu, May 01, 2014 at 07:53:58AM +0200, Ricardo Ribalda Delgado wrote: > Thanks for your comments. This fix does not avoid the task being > killed (which is not an error). What it does is that IF the task is > killed or we are out of memory we will exit with all the resources > properly released and no locks helds, giving the user a chance to > reload/rebind the module instead of getting and unstable system that > cannot even reboot without a powercycle. How does it ensure that? What's to stop the task being killed after we check to see if the task was killed.
On Thu, May 1, 2014 at 4:11 PM, Mark Brown <broonie@kernel.org> wrote: > On Thu, May 01, 2014 at 07:53:58AM +0200, Ricardo Ribalda Delgado wrote: > >> Thanks for your comments. This fix does not avoid the task being >> killed (which is not an error). What it does is that IF the task is >> killed or we are out of memory we will exit with all the resources >> properly released and no locks helds, giving the user a chance to >> reload/rebind the module instead of getting and unstable system that >> cannot even reboot without a powercycle. > > How does it ensure that? What's to stop the task being killed after we > check to see if the task was killed. master->kworker_task is set like this: master->kworker_task = kthread_run(...) so it just contains the status of the creation of the kthread, not if it was killed, right? Hence we don't check if it was killed. So Ricardo's patch prevents the stopping and destruction of the thread if it failed to be _created_. What if it is killed? I suppose the kthread API handles that internally, as it could happen to any thread (e.g. OOM)? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 01, 2014 at 04:33:58PM +0200, Geert Uytterhoeven wrote: > On Thu, May 1, 2014 at 4:11 PM, Mark Brown <broonie@kernel.org> wrote: > master->kworker_task is set like this: > master->kworker_task = kthread_run(...) > so it just contains the status of the creation of the kthread, not if it was > killed, right? Hence we don't check if it was killed. > So Ricardo's patch prevents the stopping and destruction of the thread > if it failed to be _created_. OK, so that means that the description is very confusing then since it's talking about the issue being due to kthread_run being killed. > What if it is killed? I suppose the kthread API handles that internally, as it > could happen to any thread (e.g. OOM)? I'm not 100% clear on this to be honest. Since I'm at ELC I've not investigated fully yet and I'm mostly going on the patch descriptions here.
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 4eb9bf0..8331556 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1114,8 +1114,10 @@ static int spi_destroy_queue(struct spi_master *master) return ret; } - flush_kthread_worker(&master->kworker); - kthread_stop(master->kworker_task); + if (!IS_ERR(master->kworker_task)) { + flush_kthread_worker(&master->kworker); + kthread_stop(master->kworker_task); + } return 0; }
Since "786235eeba0e1e85e5cbbb9f97d1087ad03dfa21 kthread: make kthread_create() killable" kthread_run can be killed by the user, ie, by killing modprobe. When this happens spi_destroy_queue() fails to cleanout the resources ending up in a hung process that outputs a error message and avoids the computer to be halted. This bug is particulary critical on machines that relay on the spi subsystem to hold a file system (MTD over SPI). [ 95.900328] spi_master spi32765: failed to create message pump task [ 95.900358] spi_master spi32765: problem initializing queue [ 95.912333] qtec_pcie b0030000.pcie_bridge: Axi-pcie bridge supervisor [ 240.463698] INFO: task udevd:129 blocked for more than 120 seconds. [ 240.463724] Not tainted 3.13.0-qtec-standard+ #191 [ 240.463733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.463743] udevd D ffff880087815ac0 0 129 118 0x00000006 [ 240.463762] ffff880086593590 0000000000000002 ffff880087815ac0 ffff880086593fd8 [ 240.463778] 0000000000012c00 0000000000012c00 ffff880087815ac0 ffff880086593520 [ 240.463792] ffffffff813debe7 ffffffff810be6df ffffffff81da539c 0000000000000000 [ 240.463807] Call Trace: [ 240.463836] [<ffffffff813debe7>] ? vt_console_print+0x57/0x3e0 [ 240.463852] [<ffffffff810be6df>] ? print_prefix+0x6f/0xb0 [ 240.463867] [<ffffffff810bf45c>] ? wake_up_klogd+0x3c/0x50 [ 240.463880] [<ffffffff810bf67b>] ? console_unlock+0x20b/0x3f0 [ 240.463897] [<ffffffff8178b799>] schedule+0x29/0x70 [ 240.463910] [<ffffffff8178a8d9>] schedule_timeout+0x189/0x240 [ 240.463925] [<ffffffff813fd510>] ? dev_vprintk_emit+0x50/0x60 [ 240.463939] [<ffffffff8178c167>] wait_for_common+0xe7/0x190 [ 240.463954] [<ffffffff810a7780>] ? wake_up_process+0x40/0x40 [ 240.463968] [<ffffffff8178c22d>] wait_for_completion+0x1d/0x20 [ 240.463981] [<ffffffff810996bc>] flush_kthread_worker+0x6c/0x80 [ 240.463995] [<ffffffff8178c2cd>] ? wait_for_completion_killable+0x1d/0x40 [ 240.464007] [<ffffffff81099162>] ? kthread_create_on_node+0xd2/0x190 [ 240.464020] [<ffffffff81099000>] ? kthread_freezable_should_stop+0x80/0x80 [ 240.464036] [<ffffffff8146c637>] spi_destroy_queue+0x27/0x60 [ 240.464050] [<ffffffff8146cc46>] spi_register_master+0x5d6/0x900 [ 240.464072] [<ffffffffa0000713>] spi_bitbang_start+0x83/0x110 [spi_bitbang] [ 240.464089] [<ffffffffa0005960>] xilinx_spi_probe+0x230/0x3aa [spi_xilinx] [ 240.464104] [<ffffffff81403645>] platform_drv_probe+0x45/0xb0 [ 240.464120] [<ffffffff81401052>] ? driver_sysfs_add+0x82/0xb0 [ 240.464134] [<ffffffff81401795>] driver_probe_device+0x125/0x3b0 [ 240.464149] [<ffffffff81401a20>] ? driver_probe_device+0x3b0/0x3b0 [ 240.464163] [<ffffffff81401a5b>] __device_attach+0x3b/0x40 [ 240.464177] [<ffffffff813ff7c3>] bus_for_each_drv+0x63/0xa0 [ 240.464191] [<ffffffff814015f8>] device_attach+0x88/0xa0 [ 240.464205] [<ffffffff81400a08>] bus_probe_device+0x98/0xc0 [ 240.464218] [<ffffffff813fe865>] device_add+0x495/0x600 [ 240.464233] [<ffffffff815d860f>] of_device_add+0x2f/0x40 [ 240.464246] [<ffffffff815d8dab>] of_platform_device_create_pdata+0x6b/0xb0 [ 240.464259] [<ffffffff815d8f03>] of_platform_bus_create+0xf3/0x200 [ 240.464272] [<ffffffff815d7228>] ? __of_match_node+0x48/0xe0 [ 240.464286] [<ffffffff815d8f5e>] of_platform_bus_create+0x14e/0x200 [ 240.464298] [<ffffffff815d6adc>] ? __of_find_property+0x3c/0x80 [ 240.464312] [<ffffffff815d9177>] of_platform_populate+0x47/0x90 [ 240.464334] [<ffffffffa000bcf2>] qt5023_of_populate+0x562/0x980 [qt5023] [ 240.464357] [<ffffffffa000c363>] qt5023_of_probe+0x1d3/0x300 [qt5023] [ 240.464472] [<ffffffffa000a123>] qt5023_pci_probe+0xd3/0x1c0 [qt5023] [ 240.464491] [<ffffffff813682d4>] pci_device_probe+0x84/0xe0 [ 240.464507] [<ffffffff81401795>] driver_probe_device+0x125/0x3b0 [ 240.464522] [<ffffffff81401af3>] __driver_attach+0x93/0xa0 [ 240.464536] [<ffffffff81401a60>] ? __device_attach+0x40/0x40 [ 240.464550] [<ffffffff813ff703>] bus_for_each_dev+0x63/0xa0 [ 240.464585] [<ffffffff8140114e>] driver_attach+0x1e/0x20 [ 240.464640] [<ffffffff81400d30>] bus_add_driver+0x180/0x250 [ 240.464687] [<ffffffffa0010000>] ? 0xffffffffa000ffff [ 240.464738] [<ffffffff81402164>] driver_register+0x64/0xf0 [ 240.464789] [<ffffffffa0010000>] ? 0xffffffffa000ffff [ 240.464845] [<ffffffff81367d9b>] __pci_register_driver+0x4b/0x50 [ 240.464899] [<ffffffffa0010031>] qt5023_init+0x31/0x33 [qt5023] [ 240.464943] [<ffffffff810002f2>] do_one_initcall+0xd2/0x180 [ 240.464960] [<ffffffff8109da88>] ? __blocking_notifier_call_chain+0x58/0x70 [ 240.464976] [<ffffffff810e07c6>] load_module+0x1ad6/0x23a0 [ 240.464992] [<ffffffff810dd5d0>] ? store_uevent+0x40/0x40 [ 240.465010] [<ffffffff810de5ea>] ? copy_module_from_fd.isra.50+0x12a/0x190 [ 240.465024] [<ffffffff810e1226>] SyS_finit_module+0x86/0xb0 [ 240.465041] [<ffffffff81797049>] tracesys+0xd0/0xd5 [ 360.540806] INFO: task udevd:129 blocked for more than 120 seconds. [ 360.540832] Not tainted 3.13.0-qtec-standard+ #191 [ 360.540841] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 360.540851] udevd D ffff880087815ac0 0 129 118 0x00000006 [ 360.540871] ffff880086593590 0000000000000002 ffff880087815ac0 ffff880086593fd8 [ 360.540886] 0000000000012c00 0000000000012c00 ffff880087815ac0 ffff880086593520 [ 360.540900] ffffffff813debe7 ffffffff810be6df ffffffff81da539c 0000000000000000 [ 360.540915] Call Trace: [ 360.540943] [<ffffffff813debe7>] ? vt_console_print+0x57/0x3e0 [ 360.540959] [<ffffffff810be6df>] ? print_prefix+0x6f/0xb0 [ 360.540974] [<ffffffff810bf45c>] ? wake_up_klogd+0x3c/0x50 [ 360.540988] [<ffffffff810bf67b>] ? console_unlock+0x20b/0x3f0 [ 360.541004] [<ffffffff8178b799>] schedule+0x29/0x70 [ 360.541017] [<ffffffff8178a8d9>] schedule_timeout+0x189/0x240 [ 360.541032] [<ffffffff813fd510>] ? dev_vprintk_emit+0x50/0x60 [ 360.541046] [<ffffffff8178c167>] wait_for_common+0xe7/0x190 [ 360.541061] [<ffffffff810a7780>] ? wake_up_process+0x40/0x40 [ 360.541075] [<ffffffff8178c22d>] wait_for_completion+0x1d/0x20 [ 360.541089] [<ffffffff810996bc>] flush_kthread_worker+0x6c/0x80 [ 360.541102] [<ffffffff8178c2cd>] ? wait_for_completion_killable+0x1d/0x40 [ 360.541115] [<ffffffff81099162>] ? kthread_create_on_node+0xd2/0x190 [ 360.541127] [<ffffffff81099000>] ? kthread_freezable_should_stop+0x80/0x80 [ 360.541144] [<ffffffff8146c637>] spi_destroy_queue+0x27/0x60 [ 360.541157] [<ffffffff8146cc46>] spi_register_master+0x5d6/0x900 [ 360.541179] [<ffffffffa0000713>] spi_bitbang_start+0x83/0x110 [spi_bitbang] [ 360.541196] [<ffffffffa0005960>] xilinx_spi_probe+0x230/0x3aa [spi_xilinx] [ 360.541211] [<ffffffff81403645>] platform_drv_probe+0x45/0xb0 [ 360.541226] [<ffffffff81401052>] ? driver_sysfs_add+0x82/0xb0 [ 360.541240] [<ffffffff81401795>] driver_probe_device+0x125/0x3b0 [ 360.541255] [<ffffffff81401a20>] ? driver_probe_device+0x3b0/0x3b0 [ 360.541269] [<ffffffff81401a5b>] __device_attach+0x3b/0x40 [ 360.541283] [<ffffffff813ff7c3>] bus_for_each_drv+0x63/0xa0 [ 360.541297] [<ffffffff814015f8>] device_attach+0x88/0xa0 [ 360.541311] [<ffffffff81400a08>] bus_probe_device+0x98/0xc0 [ 360.541324] [<ffffffff813fe865>] device_add+0x495/0x600 [ 360.541339] [<ffffffff815d860f>] of_device_add+0x2f/0x40 [ 360.541352] [<ffffffff815d8dab>] of_platform_device_create_pdata+0x6b/0xb0 [ 360.541366] [<ffffffff815d8f03>] of_platform_bus_create+0xf3/0x200 [ 360.541378] [<ffffffff815d7228>] ? __of_match_node+0x48/0xe0 [ 360.541392] [<ffffffff815d8f5e>] of_platform_bus_create+0x14e/0x200 [ 360.541404] [<ffffffff815d6adc>] ? __of_find_property+0x3c/0x80 [ 360.541418] [<ffffffff815d9177>] of_platform_populate+0x47/0x90 [ 360.541441] [<ffffffffa000bcf2>] qt5023_of_populate+0x562/0x980 [qt5023] [ 360.541463] [<ffffffffa000c363>] qt5023_of_probe+0x1d3/0x300 [qt5023] [ 360.541484] [<ffffffffa000a123>] qt5023_pci_probe+0xd3/0x1c0 [qt5023] [ 360.541502] [<ffffffff813682d4>] pci_device_probe+0x84/0xe0 [ 360.541517] [<ffffffff81401795>] driver_probe_device+0x125/0x3b0 [ 360.541620] [<ffffffff81401af3>] __driver_attach+0x93/0xa0 [ 360.541636] [<ffffffff81401a60>] ? __device_attach+0x40/0x40 [ 360.541650] [<ffffffff813ff703>] bus_for_each_dev+0x63/0xa0 [ 360.541665] [<ffffffff8140114e>] driver_attach+0x1e/0x20 [ 360.541680] [<ffffffff81400d30>] bus_add_driver+0x180/0x250 [ 360.541748] [<ffffffffa0010000>] ? 0xffffffffa000ffff [ 360.541793] [<ffffffff81402164>] driver_register+0x64/0xf0 [ 360.541845] [<ffffffffa0010000>] ? 0xffffffffa000ffff [ 360.541895] [<ffffffff81367d9b>] __pci_register_driver+0x4b/0x50 [ 360.541954] [<ffffffffa0010031>] qt5023_init+0x31/0x33 [qt5023] [ 360.542007] [<ffffffff810002f2>] do_one_initcall+0xd2/0x180 [ 360.542062] [<ffffffff8109da88>] ? __blocking_notifier_call_chain+0x58/0x70 [ 360.542085] [<ffffffff810e07c6>] load_module+0x1ad6/0x23a0 [ 360.542101] [<ffffffff810dd5d0>] ? store_uevent+0x40/0x40 [ 360.542119] [<ffffffff810de5ea>] ? copy_module_from_fd.isra.50+0x12a/0x190 [ 360.542133] [<ffffffff810e1226>] SyS_finit_module+0x86/0xb0 [ 360.542150] [<ffffffff81797049>] tracesys+0xd0/0xd5 Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> --- drivers/spi/spi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)