Message ID | 20131001203520.GA8248@p100.box (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hello, On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote: > print_worker_info() includes no validity check on the pwq and wq > pointers before handing them over to the probe_kernel_read() functions. > > It seems that most architectures don't care about that, but at least on > the parisc architecture this leads to a kernel crash since accesses to > page zero are protected by the kernel for security reasons. > > Fix this problem by verifying the contents of pwq and wq before usage. > Even if probe_kernel_read() usually prevents such crashes by disabling > page faults, clean code should always include such checks. > > Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately > crash the Linux kernel on the parisc architecture. Hmm... um had similar problem but the root cause here is that the arch isn't implementing probe_kernel_read() properly. We really have no idea what the pointer value may be at the dump point and that's why we use probe_kernel_read(). If something like the above is necessary for the time being, the correct place would be the arch probe_kernel_read() implementation. James, would it be difficult implement proper probe_kernel_read() on parisc? Thanks.
On 10/01/2013 10:43 PM, Tejun Heo wrote: > Hello, > > On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote: >> print_worker_info() includes no validity check on the pwq and wq >> pointers before handing them over to the probe_kernel_read() functions. >> >> It seems that most architectures don't care about that, but at least on >> the parisc architecture this leads to a kernel crash since accesses to >> page zero are protected by the kernel for security reasons. >> >> Fix this problem by verifying the contents of pwq and wq before usage. >> Even if probe_kernel_read() usually prevents such crashes by disabling >> page faults, clean code should always include such checks. >> >> Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately >> crash the Linux kernel on the parisc architecture. > > Hmm... um had similar problem but the root cause here is that the arch > isn't implementing probe_kernel_read() properly. We really have no > idea what the pointer value may be at the dump point and that's why we > use probe_kernel_read(). If something like the above is necessary for > the time being, the correct place would be the arch > probe_kernel_read() implementation. James, would it be difficult > implement proper probe_kernel_read() on parisc? No, it's not really complicated. That was my initial way to work around that problem. But is this really necessary? Isn't a pointer which points to mem zero most likely wrong on any architecture? In addition I wrote another patch to work around that problem in the parisc page fault handler (which is needed anyway) too: https://patchwork.kernel.org/patch/2971701/ So, in summary my patch here is not really necessary, but for the sake of clean code I think it doesn't hurt either and as such it would be nice if you could apply it. Helge -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote: > So, in summary my patch here is not really necessary, but for the sake of > clean code I think it doesn't hurt either and as such it would be nice if > you could apply it. What? function *must* take any value and try to access it and not cause failure. That's the *whole* purpose of that interface. How is having incomplete spurious checks around it "clean code" in any sense of the word? That doesn't make any sense. Nacked-by: Tejun Heo <tj@kernel.org> and *please* don't add any checks like that anywhere else in the kernel. Thanks.
On Tue, Oct 01, 2013 at 05:03:48PM -0400, Tejun Heo wrote: > On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote: > > So, in summary my patch here is not really necessary, but for the sake of > > clean code I think it doesn't hurt either and as such it would be nice if > > you could apply it. > > What? function *must* take any value and try to access it and not > cause failure. That's the *whole* purpose of that interface. How is > having incomplete spurious checks around it "clean code" in any sense > of the word? That doesn't make any sense. Just in case you didn't know already. probe_kernel_read()'s role is to take any ulong value and dereference it if it can. If not, it can return any value, but it shouldn't crash in any case. If you're just adding NULL test in probe_kernel_read(), you're just masking a common failure pattern and the kernel still *will* panic while dumping the states. If a specific arch doesn't have proper probe_kernel_read() implementation, adding if (!NULL) test there could be a temporary workaround, but it should be clearly marked as such.
On Tue, 2013-10-01 at 16:43 -0400, Tejun Heo wrote: > Hello, > > On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote: > > print_worker_info() includes no validity check on the pwq and wq > > pointers before handing them over to the probe_kernel_read() functions. > > > > It seems that most architectures don't care about that, but at least on > > the parisc architecture this leads to a kernel crash since accesses to > > page zero are protected by the kernel for security reasons. > > > > Fix this problem by verifying the contents of pwq and wq before usage. > > Even if probe_kernel_read() usually prevents such crashes by disabling > > page faults, clean code should always include such checks. > > > > Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately > > crash the Linux kernel on the parisc architecture. > > Hmm... um had similar problem but the root cause here is that the arch > isn't implementing probe_kernel_read() properly. We really have no > idea what the pointer value may be at the dump point and that's why we > use probe_kernel_read(). If something like the above is necessary for > the time being, the correct place would be the arch > probe_kernel_read() implementation. James, would it be difficult > implement proper probe_kernel_read() on parisc? The problem seems to be that some traps bypass our exception table handling. Helge, do you have the actual stack trace for this? That should show where the exception handling is missing. Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/01/2013 11:40 PM, James Bottomley wrote: > On Tue, 2013-10-01 at 16:43 -0400, Tejun Heo wrote: >> Hello, >> >> On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote: >>> print_worker_info() includes no validity check on the pwq and wq >>> pointers before handing them over to the probe_kernel_read() functions. >>> >>> It seems that most architectures don't care about that, but at least on >>> the parisc architecture this leads to a kernel crash since accesses to >>> page zero are protected by the kernel for security reasons. >>> >>> Fix this problem by verifying the contents of pwq and wq before usage. >>> Even if probe_kernel_read() usually prevents such crashes by disabling >>> page faults, clean code should always include such checks. >>> >>> Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately >>> crash the Linux kernel on the parisc architecture. >> >> Hmm... um had similar problem but the root cause here is that the arch >> isn't implementing probe_kernel_read() properly. We really have no >> idea what the pointer value may be at the dump point and that's why we >> use probe_kernel_read(). If something like the above is necessary for >> the time being, the correct place would be the arch >> probe_kernel_read() implementation. James, would it be difficult >> implement proper probe_kernel_read() on parisc? > > The problem seems to be that some traps bypass our exception table > handling. Yes, that's correct. It's trap #26 and we directly call parisc_terminate() for fault_space==0 without checking the exception table. See my patch I posted a few hours ago which fixes this: https://patchwork.kernel.org/patch/2971701/ > Helge, do you have the actual stack trace for this? That > should show where the exception handling is missing. Here it is: [47072.976000] ksoftirqd/0 R running task 0 3 2 0x00000000 [47072.976000] Backtrace: [47072.976000] [<0000000040113a54>] __schedule+0x62c/0x808 [47072.976000] [47072.976000] kworker/0:0H S 00000000401040c0 0 5 2 0x00000000 [47073.468000] Backtrace: [47073.468000] [<0000000040464264>] pa_memcpy+0x44/0xb0 [47073.468000] [<00000000404643e0>] __copy_from_user+0x60/0x90 [47073.468000] [<00000000401d99bc>] __probe_kernel_read+0x54/0x90 [47073.468000] [<000000004016cc70>] print_worker_info+0x158/0x2c0 [47073.468000] [<0000000040185a60>] sched_show_task+0x1c8/0x210 [47073.468000] [<0000000040185b64>] show_state_filter+0xbc/0x138 [47073.468000] [<00000000404e85c4>] sysrq_handle_showstate+0x34/0x48 [47073.468000] [<00000000404e9154>] __handle_sysrq+0x174/0x2f0 [47073.468000] [<00000000404e933c>] write_sysrq_trigger+0x6c/0x90 [47073.468000] [<00000000402ca2fc>] proc_reg_write+0xbc/0x130 [47073.468000] [<0000000040236d44>] vfs_write+0x114/0x268 [47073.468000] [<00000000402373a4>] SyS_write+0x94/0xf8 [47073.468000] [<0000000040105fc0>] syscall_exit+0x0/0x14 [47073.468000] [47073.468000] [47073.468000] Kernel Fault: Code=26 regs=00000000958a09b0 (Addr=0000000000000008) [47073.468000] CPU: 0 PID: 30189 Comm: bash Not tainted 3.12.0-rc3-64bit+ #1 [47073.468000] task: 000000007ba64100 ti: 00000000958a0000 task.ti: 00000000958a0000 [47073.468000] [47073.468000] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI [47073.468000] PSW: 00001000000001001111111100001110 Not tainted [47073.468000] r00-03 000000ff0804ff0e 00000000958a08c0 0000000040464264 00000000958a0960 [47073.468000] r04-07 0000000040d73db0 0000000000000008 0000000000000008 00000000958a06f8 [47073.468000] r08-11 00000000958a0600 0000000040c49d18 00000000af535494 00000000958a0370 [47073.468000] r12-15 0000000000000000 0000000000000000 000000000010e7e8 00000000000fde28 [47073.468000] r16-19 0000000000000000 00000000000c7800 0000000000000000 0000000000000000 [47073.468000] r20-23 00000000958a06e0 0000000000000018 0000000000000018 0000000000000003 [47073.468000] r24-27 0000000000000008 0000000000000008 00000000958a06f8 0000000040d73db0 [47073.468000] r28-31 00000000958a06f8 00000000958a0930 00000000958a09b0 0000000000000008 [47073.468000] sr00-03 0000000005dc5000 0000000000000000 0000000000000000 0000000005dc5000 [47073.468000] sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [47073.468000] [47073.468000] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040463fdc 0000000040463fe0 [47073.468000] IIR: 0fe25033 ISR: 0000000000000000 IOR: 0000000000000008 [47073.468000] CPU: 0 CR30: 00000000958a0000 CR31: 0000000011111111 [47073.468000] ORIG_R28: 00000000958a0b40 [47073.468000] IAOQ[0]: pa_memcpy_internal+0xec/0x2b4 [47073.468000] IAOQ[1]: pa_memcpy_internal+0xf0/0x2b4 [47073.468000] RP(r2): pa_memcpy+0x44/0xb0 [47073.468000] Backtrace: [47073.468000] [<0000000040464264>] pa_memcpy+0x44/0xb0 [47073.468000] [<00000000404643e0>] __copy_from_user+0x60/0x90 [47073.468000] [<00000000401d99bc>] __probe_kernel_read+0x54/0x90 [47073.468000] [<000000004016cc70>] print_worker_info+0x158/0x2c0 [47073.468000] [<0000000040185a60>] sched_show_task+0x1c8/0x210 [47073.468000] [<0000000040185b64>] show_state_filter+0xbc/0x138 [47073.468000] [<00000000404e85c4>] sysrq_handle_showstate+0x34/0x48 [47073.468000] [<00000000404e9154>] __handle_sysrq+0x174/0x2f0 [47073.468000] [<00000000404e933c>] write_sysrq_trigger+0x6c/0x90 [47073.468000] [<00000000402ca2fc>] proc_reg_write+0xbc/0x130 [47073.468000] [<0000000040236d44>] vfs_write+0x114/0x268 [47073.468000] [<00000000402373a4>] SyS_write+0x94/0xf8 [47073.468000] [<0000000040105fc0>] syscall_exit+0x0/0x14 [47073.468000] [47073.468000] Kernel panic - not syncing: Kernel Fault -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/01/2013 11:07 PM, Tejun Heo wrote: > On Tue, Oct 01, 2013 at 05:03:48PM -0400, Tejun Heo wrote: >> On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote: >>> So, in summary my patch here is not really necessary, but for the sake of >>> clean code I think it doesn't hurt either and as such it would be nice if >>> you could apply it. >> >> What? function *must* take any value and try to access it and not >> cause failure. That's the *whole* purpose of that interface. How is >> having incomplete spurious checks around it "clean code" in any sense >> of the word? That doesn't make any sense. > > Just in case you didn't know already. probe_kernel_read()'s role is > to take any ulong value and dereference it if it can. If not, it can > return any value, but it shouldn't crash in any case. If you're just > adding NULL test in probe_kernel_read(), you're just masking a common > failure pattern and the kernel still *will* panic while dumping the > states. If a specific arch doesn't have proper probe_kernel_read() > implementation, adding if (!NULL) test there could be a temporary > workaround, but it should be clearly marked as such. Sure, probe_kernel_read() takes care that no segfaults will happen. Nevertheless, if we know that "pwq" might become NULL, why access pwq->wq at all? struct pool_workqueue *pwq = NULL; probe_kernel_read(&wq, &pwq>wq, sizeof(wq)); If you wouldn't have used probe_kernel_read() you would never code it like that. That's what I meant when I wrote "clean coding" (aka "similar to what you would have done without probe_kernel_read()"). Helge -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Wed, Oct 02, 2013 at 12:34:53AM +0200, Helge Deller wrote: > Sure, probe_kernel_read() takes care that no segfaults will happen. > Nevertheless, if we know that "pwq" might become NULL, why access pwq->wq at all? > struct pool_workqueue *pwq = NULL; > probe_kernel_read(&wq, &pwq>wq, sizeof(wq)); > > If you wouldn't have used probe_kernel_read() you would never code it > like that. That's what I meant when I wrote "clean coding" (aka "similar > to what you would have done without probe_kernel_read()"). Because it is using probe_kernel_read() and such test wouldn't mean anything? It may be NULL, it may be 1 or full Fs. NULL is just one of many illegal pointers which may happen. Why add code which doesn't achieve anything when you're explicitly trying to access pointers which you know could be invalid? Why is that "clean"? Is "if (p) kfree(p)" cleaner than "kfree(p)"? Thanks.
On Tue, Oct 01, 2013 at 06:40:23PM -0400, Tejun Heo wrote: > Because it is using probe_kernel_read() and such test wouldn't mean > anything? It may be NULL, it may be 1 or full Fs. NULL is just one > of many illegal pointers which may happen. Why add code which doesn't > achieve anything when you're explicitly trying to access pointers > which you know could be invalid? Why is that "clean"? Is "if (p) > kfree(p)" cleaner than "kfree(p)"? Here's one general rule of thumb for "cleanliness" - try to do the minimal because that's something many people can agree on. If people do stuff which aren't necessary, naturally different people would have different opinions on what's cleaner / better and inevitably end up with different choices as the choices made are functionally superflous none would fail and we'll end up with various variants for the same thing for no good reason, which is messy. Adding if (p) in front of probe_kernel_read(p) is inherently superflous and you wouldn't have any way to enforce or even encourage such practice and the end result would inevitably be if (p) being sprayed randomly, which is the opposite of cleanliness. So, no, please don't add random tests which aren't essential. It is inherently messy thing to do. Thanks.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d..c03b47f 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4512,8 +4512,10 @@ void print_worker_info(const char *log_lvl, struct task_struct *task) */ probe_kernel_read(&fn, &worker->current_func, sizeof(fn)); probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq)); - probe_kernel_read(&wq, &pwq->wq, sizeof(wq)); - probe_kernel_read(name, wq->name, sizeof(name) - 1); + if (pwq) + probe_kernel_read(&wq, &pwq->wq, sizeof(wq)); + if (wq) + probe_kernel_read(name, wq->name, sizeof(name) - 1); /* copy worker description */ probe_kernel_read(&desc_valid, &worker->desc_valid, sizeof(desc_valid));
print_worker_info() includes no validity check on the pwq and wq pointers before handing them over to the probe_kernel_read() functions. It seems that most architectures don't care about that, but at least on the parisc architecture this leads to a kernel crash since accesses to page zero are protected by the kernel for security reasons. Fix this problem by verifying the contents of pwq and wq before usage. Even if probe_kernel_read() usually prevents such crashes by disabling page faults, clean code should always include such checks. Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately crash the Linux kernel on the parisc architecture. CC: Tejun Heo <tj@kernel.org> CC: Libin <huawei.libin@huawei.com> CC: linux-parisc@vger.kernel.org CC: James.Bottomley@HansenPartnership.com Signed-off-by: Helge Deller <deller@gmx.de> -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html