diff mbox

[workqueue] check values of pwq and wq in print_worker_info() before use

Message ID 20131001203520.GA8248@p100.box (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Helge Deller Oct. 1, 2013, 8:35 p.m. UTC
print_worker_info() includes no validity check on the pwq and wq
pointers before handing them over to the probe_kernel_read() functions.

It seems that most architectures don't care about that, but at least on
the parisc architecture this leads to a kernel crash since accesses to
page zero are protected by the kernel for security reasons.

Fix this problem by verifying the contents of pwq and wq before usage.
Even if probe_kernel_read() usually prevents such crashes by disabling
page faults, clean code should always include such checks. 

Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately
crash the Linux kernel on the parisc architecture.

CC: Tejun Heo <tj@kernel.org>
CC: Libin <huawei.libin@huawei.com>
CC: linux-parisc@vger.kernel.org
CC: James.Bottomley@HansenPartnership.com
Signed-off-by: Helge Deller <deller@gmx.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tejun Heo Oct. 1, 2013, 8:43 p.m. UTC | #1
Hello,

On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote:
> print_worker_info() includes no validity check on the pwq and wq
> pointers before handing them over to the probe_kernel_read() functions.
> 
> It seems that most architectures don't care about that, but at least on
> the parisc architecture this leads to a kernel crash since accesses to
> page zero are protected by the kernel for security reasons.
> 
> Fix this problem by verifying the contents of pwq and wq before usage.
> Even if probe_kernel_read() usually prevents such crashes by disabling
> page faults, clean code should always include such checks. 
> 
> Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately
> crash the Linux kernel on the parisc architecture.

Hmm... um had similar problem but the root cause here is that the arch
isn't implementing probe_kernel_read() properly.  We really have no
idea what the pointer value may be at the dump point and that's why we
use probe_kernel_read().  If something like the above is necessary for
the time being, the correct place would be the arch
probe_kernel_read() implementation.  James, would it be difficult
implement proper probe_kernel_read() on parisc?

Thanks.
Helge Deller Oct. 1, 2013, 8:53 p.m. UTC | #2
On 10/01/2013 10:43 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote:
>> print_worker_info() includes no validity check on the pwq and wq
>> pointers before handing them over to the probe_kernel_read() functions.
>>
>> It seems that most architectures don't care about that, but at least on
>> the parisc architecture this leads to a kernel crash since accesses to
>> page zero are protected by the kernel for security reasons.
>>
>> Fix this problem by verifying the contents of pwq and wq before usage.
>> Even if probe_kernel_read() usually prevents such crashes by disabling
>> page faults, clean code should always include such checks. 
>>
>> Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately
>> crash the Linux kernel on the parisc architecture.
> 
> Hmm... um had similar problem but the root cause here is that the arch
> isn't implementing probe_kernel_read() properly.  We really have no
> idea what the pointer value may be at the dump point and that's why we
> use probe_kernel_read().  If something like the above is necessary for
> the time being, the correct place would be the arch
> probe_kernel_read() implementation.  James, would it be difficult
> implement proper probe_kernel_read() on parisc?

No, it's not really complicated.
That was my initial way to work around that problem.

But is this really necessary? Isn't a pointer which points to mem zero most
likely wrong on any architecture?

In addition I wrote another patch to work around that problem in the parisc
page fault handler (which is needed anyway) too:
https://patchwork.kernel.org/patch/2971701/

So, in summary my patch here is not really necessary, but for the sake of
clean code I think it doesn't hurt either and as such it would be nice if
you could apply it.

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Oct. 1, 2013, 9:03 p.m. UTC | #3
On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote:
> So, in summary my patch here is not really necessary, but for the sake of
> clean code I think it doesn't hurt either and as such it would be nice if
> you could apply it.

What? function *must* take any value and try to access it and not
cause failure.  That's the *whole* purpose of that interface.  How is
having incomplete spurious checks around it "clean code" in any sense
of the word?  That doesn't make any sense.

 Nacked-by: Tejun Heo <tj@kernel.org>

and *please* don't add any checks like that anywhere else in the
kernel.

Thanks.
Tejun Heo Oct. 1, 2013, 9:07 p.m. UTC | #4
On Tue, Oct 01, 2013 at 05:03:48PM -0400, Tejun Heo wrote:
> On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote:
> > So, in summary my patch here is not really necessary, but for the sake of
> > clean code I think it doesn't hurt either and as such it would be nice if
> > you could apply it.
> 
> What? function *must* take any value and try to access it and not
> cause failure.  That's the *whole* purpose of that interface.  How is
> having incomplete spurious checks around it "clean code" in any sense
> of the word?  That doesn't make any sense.

Just in case you didn't know already.  probe_kernel_read()'s role is
to take any ulong value and dereference it if it can.  If not, it can
return any value, but it shouldn't crash in any case.  If you're just
adding NULL test in probe_kernel_read(), you're just masking a common
failure pattern and the kernel still *will* panic while dumping the
states.  If a specific arch doesn't have proper probe_kernel_read()
implementation, adding if (!NULL) test there could be a temporary
workaround, but it should be clearly marked as such.
James Bottomley Oct. 1, 2013, 9:40 p.m. UTC | #5
On Tue, 2013-10-01 at 16:43 -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote:
> > print_worker_info() includes no validity check on the pwq and wq
> > pointers before handing them over to the probe_kernel_read() functions.
> > 
> > It seems that most architectures don't care about that, but at least on
> > the parisc architecture this leads to a kernel crash since accesses to
> > page zero are protected by the kernel for security reasons.
> > 
> > Fix this problem by verifying the contents of pwq and wq before usage.
> > Even if probe_kernel_read() usually prevents such crashes by disabling
> > page faults, clean code should always include such checks. 
> > 
> > Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately
> > crash the Linux kernel on the parisc architecture.
> 
> Hmm... um had similar problem but the root cause here is that the arch
> isn't implementing probe_kernel_read() properly.  We really have no
> idea what the pointer value may be at the dump point and that's why we
> use probe_kernel_read().  If something like the above is necessary for
> the time being, the correct place would be the arch
> probe_kernel_read() implementation.  James, would it be difficult
> implement proper probe_kernel_read() on parisc?

The problem seems to be that some traps bypass our exception table
handling.  Helge, do you have the actual stack trace for this?  That
should show where the exception handling is missing.

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Oct. 1, 2013, 10:07 p.m. UTC | #6
On 10/01/2013 11:40 PM, James Bottomley wrote:
> On Tue, 2013-10-01 at 16:43 -0400, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Oct 01, 2013 at 10:35:20PM +0200, Helge Deller wrote:
>>> print_worker_info() includes no validity check on the pwq and wq
>>> pointers before handing them over to the probe_kernel_read() functions.
>>>
>>> It seems that most architectures don't care about that, but at least on
>>> the parisc architecture this leads to a kernel crash since accesses to
>>> page zero are protected by the kernel for security reasons.
>>>
>>> Fix this problem by verifying the contents of pwq and wq before usage.
>>> Even if probe_kernel_read() usually prevents such crashes by disabling
>>> page faults, clean code should always include such checks. 
>>>
>>> Without this fix issuing "echo t > /proc/sysrq-trigger" will immediately
>>> crash the Linux kernel on the parisc architecture.
>>
>> Hmm... um had similar problem but the root cause here is that the arch
>> isn't implementing probe_kernel_read() properly.  We really have no
>> idea what the pointer value may be at the dump point and that's why we
>> use probe_kernel_read().  If something like the above is necessary for
>> the time being, the correct place would be the arch
>> probe_kernel_read() implementation.  James, would it be difficult
>> implement proper probe_kernel_read() on parisc?
> 
> The problem seems to be that some traps bypass our exception table
> handling.  

Yes, that's correct.
It's trap #26 and we directly call parisc_terminate() for fault_space==0
without checking the exception table.
See my patch I posted a few hours ago which fixes this:
https://patchwork.kernel.org/patch/2971701/

> Helge, do you have the actual stack trace for this?  That
> should show where the exception handling is missing.

Here it is:
[47072.976000] ksoftirqd/0     R  running task        0     3      2 0x00000000
[47072.976000] Backtrace:
[47072.976000]  [<0000000040113a54>] __schedule+0x62c/0x808
[47072.976000]
[47072.976000] kworker/0:0H    S 00000000401040c0     0     5      2 0x00000000
[47073.468000] Backtrace:
[47073.468000]  [<0000000040464264>] pa_memcpy+0x44/0xb0
[47073.468000]  [<00000000404643e0>] __copy_from_user+0x60/0x90
[47073.468000]  [<00000000401d99bc>] __probe_kernel_read+0x54/0x90
[47073.468000]  [<000000004016cc70>] print_worker_info+0x158/0x2c0
[47073.468000]  [<0000000040185a60>] sched_show_task+0x1c8/0x210
[47073.468000]  [<0000000040185b64>] show_state_filter+0xbc/0x138
[47073.468000]  [<00000000404e85c4>] sysrq_handle_showstate+0x34/0x48
[47073.468000]  [<00000000404e9154>] __handle_sysrq+0x174/0x2f0
[47073.468000]  [<00000000404e933c>] write_sysrq_trigger+0x6c/0x90
[47073.468000]  [<00000000402ca2fc>] proc_reg_write+0xbc/0x130
[47073.468000]  [<0000000040236d44>] vfs_write+0x114/0x268
[47073.468000]  [<00000000402373a4>] SyS_write+0x94/0xf8
[47073.468000]  [<0000000040105fc0>] syscall_exit+0x0/0x14
[47073.468000]
[47073.468000]
[47073.468000] Kernel Fault: Code=26 regs=00000000958a09b0 (Addr=0000000000000008)
[47073.468000] CPU: 0 PID: 30189 Comm: bash Not tainted 3.12.0-rc3-64bit+ #1
[47073.468000] task: 000000007ba64100 ti: 00000000958a0000 task.ti: 00000000958a0000
[47073.468000]
[47073.468000]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[47073.468000] PSW: 00001000000001001111111100001110 Not tainted
[47073.468000] r00-03  000000ff0804ff0e 00000000958a08c0 0000000040464264 00000000958a0960
[47073.468000] r04-07  0000000040d73db0 0000000000000008 0000000000000008 00000000958a06f8
[47073.468000] r08-11  00000000958a0600 0000000040c49d18 00000000af535494 00000000958a0370
[47073.468000] r12-15  0000000000000000 0000000000000000 000000000010e7e8 00000000000fde28
[47073.468000] r16-19  0000000000000000 00000000000c7800 0000000000000000 0000000000000000
[47073.468000] r20-23  00000000958a06e0 0000000000000018 0000000000000018 0000000000000003
[47073.468000] r24-27  0000000000000008 0000000000000008 00000000958a06f8 0000000040d73db0
[47073.468000] r28-31  00000000958a06f8 00000000958a0930 00000000958a09b0 0000000000000008
[47073.468000] sr00-03  0000000005dc5000 0000000000000000 0000000000000000 0000000005dc5000
[47073.468000] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[47073.468000]
[47073.468000] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040463fdc 0000000040463fe0
[47073.468000]  IIR: 0fe25033    ISR: 0000000000000000  IOR: 0000000000000008
[47073.468000]  CPU:        0   CR30: 00000000958a0000 CR31: 0000000011111111
[47073.468000]  ORIG_R28: 00000000958a0b40
[47073.468000]  IAOQ[0]: pa_memcpy_internal+0xec/0x2b4
[47073.468000]  IAOQ[1]: pa_memcpy_internal+0xf0/0x2b4
[47073.468000]  RP(r2): pa_memcpy+0x44/0xb0
[47073.468000] Backtrace:
[47073.468000]  [<0000000040464264>] pa_memcpy+0x44/0xb0
[47073.468000]  [<00000000404643e0>] __copy_from_user+0x60/0x90
[47073.468000]  [<00000000401d99bc>] __probe_kernel_read+0x54/0x90
[47073.468000]  [<000000004016cc70>] print_worker_info+0x158/0x2c0
[47073.468000]  [<0000000040185a60>] sched_show_task+0x1c8/0x210
[47073.468000]  [<0000000040185b64>] show_state_filter+0xbc/0x138
[47073.468000]  [<00000000404e85c4>] sysrq_handle_showstate+0x34/0x48
[47073.468000]  [<00000000404e9154>] __handle_sysrq+0x174/0x2f0
[47073.468000]  [<00000000404e933c>] write_sysrq_trigger+0x6c/0x90
[47073.468000]  [<00000000402ca2fc>] proc_reg_write+0xbc/0x130
[47073.468000]  [<0000000040236d44>] vfs_write+0x114/0x268
[47073.468000]  [<00000000402373a4>] SyS_write+0x94/0xf8
[47073.468000]  [<0000000040105fc0>] syscall_exit+0x0/0x14
[47073.468000]
[47073.468000] Kernel panic - not syncing: Kernel Fault

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Oct. 1, 2013, 10:34 p.m. UTC | #7
On 10/01/2013 11:07 PM, Tejun Heo wrote:
> On Tue, Oct 01, 2013 at 05:03:48PM -0400, Tejun Heo wrote:
>> On Tue, Oct 01, 2013 at 10:53:31PM +0200, Helge Deller wrote:
>>> So, in summary my patch here is not really necessary, but for the sake of
>>> clean code I think it doesn't hurt either and as such it would be nice if
>>> you could apply it.
>>
>> What? function *must* take any value and try to access it and not
>> cause failure.  That's the *whole* purpose of that interface.  How is
>> having incomplete spurious checks around it "clean code" in any sense
>> of the word?  That doesn't make any sense.
> 
> Just in case you didn't know already.  probe_kernel_read()'s role is
> to take any ulong value and dereference it if it can.  If not, it can
> return any value, but it shouldn't crash in any case.  If you're just
> adding NULL test in probe_kernel_read(), you're just masking a common
> failure pattern and the kernel still *will* panic while dumping the
> states.  If a specific arch doesn't have proper probe_kernel_read()
> implementation, adding if (!NULL) test there could be a temporary
> workaround, but it should be clearly marked as such.

Sure, probe_kernel_read() takes care that no segfaults will happen.
Nevertheless, if we know that "pwq" might become NULL, why access pwq->wq at all?
  struct pool_workqueue *pwq = NULL;
  probe_kernel_read(&wq, &pwq>wq, sizeof(wq));

If you wouldn't have used probe_kernel_read() you would never code it 
like that. That's what I meant when I wrote "clean coding" (aka "similar
to what you would have done without probe_kernel_read()").

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Oct. 1, 2013, 10:40 p.m. UTC | #8
Hello,

On Wed, Oct 02, 2013 at 12:34:53AM +0200, Helge Deller wrote:
> Sure, probe_kernel_read() takes care that no segfaults will happen.
> Nevertheless, if we know that "pwq" might become NULL, why access pwq->wq at all?
>   struct pool_workqueue *pwq = NULL;
>   probe_kernel_read(&wq, &pwq>wq, sizeof(wq));
> 
> If you wouldn't have used probe_kernel_read() you would never code it 
> like that. That's what I meant when I wrote "clean coding" (aka "similar
> to what you would have done without probe_kernel_read()").

Because it is using probe_kernel_read() and such test wouldn't mean
anything?  It may be NULL, it may be 1 or full Fs.  NULL is just one
of many illegal pointers which may happen.  Why add code which doesn't
achieve anything when you're explicitly trying to access pointers
which you know could be invalid?  Why is that "clean"?  Is "if (p)
kfree(p)" cleaner than "kfree(p)"?

Thanks.
Tejun Heo Oct. 1, 2013, 10:47 p.m. UTC | #9
On Tue, Oct 01, 2013 at 06:40:23PM -0400, Tejun Heo wrote:
> Because it is using probe_kernel_read() and such test wouldn't mean
> anything?  It may be NULL, it may be 1 or full Fs.  NULL is just one
> of many illegal pointers which may happen.  Why add code which doesn't
> achieve anything when you're explicitly trying to access pointers
> which you know could be invalid?  Why is that "clean"?  Is "if (p)
> kfree(p)" cleaner than "kfree(p)"?

Here's one general rule of thumb for "cleanliness" - try to do the
minimal because that's something many people can agree on.  If people
do stuff which aren't necessary, naturally different people would have
different opinions on what's cleaner / better and inevitably end up
with different choices as the choices made are functionally superflous
none would fail and we'll end up with various variants for the same
thing for no good reason, which is messy.  Adding if (p) in front of
probe_kernel_read(p) is inherently superflous and you wouldn't have
any way to enforce or even encourage such practice and the end result
would inevitably be if (p) being sprayed randomly, which is the
opposite of cleanliness.

So, no, please don't add random tests which aren't essential.  It is
inherently messy thing to do.

Thanks.
diff mbox

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 987293d..c03b47f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4512,8 +4512,10 @@  void print_worker_info(const char *log_lvl, struct task_struct *task)
 	 */
 	probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
 	probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
-	probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
-	probe_kernel_read(name, wq->name, sizeof(name) - 1);
+	if (pwq)
+		probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
+	if (wq)
+		probe_kernel_read(name, wq->name, sizeof(name) - 1);
 
 	/* copy worker description */
 	probe_kernel_read(&desc_valid, &worker->desc_valid, sizeof(desc_valid));