printk: introduce kptr_restrict level 3
diff mbox

Message ID 1475690686-16138-1-git-send-email-william.c.roberts@intel.com
State New
Headers show

Commit Message

Roberts, William C Oct. 5, 2016, 6:04 p.m. UTC
From: William Roberts <william.c.roberts@intel.com>

Some out-of-tree modules do not use %pK and just use %p, as it's
the common C paradigm for printing pointers. Because of this,
kptr_restrict has no affect on the output and thus, no way to
contain the kernel address leak.

Introduce kptr_restrict level 3 that causes the kernel to
treat %p as if it was %pK and thus always prints zeros.

Sample Output:
kptr_restrict == 2:
p: 00000000604369f4
pK: 0000000000000000

kptr_restrict == 3:
p: 0000000000000000
pK: 0000000000000000

Signed-off-by: William Roberts <william.c.roberts@intel.com>
---
 Documentation/sysctl/kernel.txt |   3 ++
 kernel/sysctl.c                 |   3 +-
 lib/vsprintf.c                  | 107 ++++++++++++++++++++++++----------------
 3 files changed, 69 insertions(+), 44 deletions(-)

Comments

Kees Cook Oct. 5, 2016, 7:34 p.m. UTC | #1
On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
> From: William Roberts <william.c.roberts@intel.com>
>
> Some out-of-tree modules do not use %pK and just use %p, as it's
> the common C paradigm for printing pointers. Because of this,
> kptr_restrict has no affect on the output and thus, no way to
> contain the kernel address leak.

Solving this is certainly a good idea -- I'm all for finding a solid solution.

> Introduce kptr_restrict level 3 that causes the kernel to
> treat %p as if it was %pK and thus always prints zeros.

I'm worried that this could break kernel internals where %p is being
used and not exposed to userspace. Maybe those situations don't
exist...

Regardless, I would rather do what Grsecurity has done in this area,
and whitelist known-safe values instead. For example, they have %pP
for approved pointers, and %pX for approved
dereference_function_descriptor() output. Everything else is censored
if it is a value in kernel memory and destined for a user-space memory
buffer:

        if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt !=
'X' && *fmt != 'K' && is_usercopy_object(buf)) {
                printk(KERN_ALERT "grsec: kernel infoleak detected!
Please report this log to spender@grsecurity.net.\n");
                dump_stack();
                ptr = NULL;
        }

The "is_usercopy_object()" test is something we can add, which is
testing for a new SLAB flag that is used to mark slab caches as either
used by user-space or not, which is done also through whitelisting.
(For more details on this, see:
http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)

Would you have time/interest to add the slab flags and
is_usercopy_object()? The hardened usercopy part of the slab
whitelisting can be separate, since it likely needs a different
usercopy interface to sanely integrate with upstream.

-Kees
Roberts, William C Oct. 6, 2016, 1:17 p.m. UTC | #2
> -----Original Message-----

> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees

> Cook

> Sent: Wednesday, October 5, 2016 3:34 PM

> To: Roberts, William C <william.c.roberts@intel.com>

> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet <corbet@lwn.net>;

> linux-doc@vger.kernel.org; LKML <linux-kernel@vger.kernel.org>; Nick

> Desaulniers <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>

> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3

> 

> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:

> > From: William Roberts <william.c.roberts@intel.com>

> >

> > Some out-of-tree modules do not use %pK and just use %p, as it's the

> > common C paradigm for printing pointers. Because of this,

> > kptr_restrict has no affect on the output and thus, no way to contain

> > the kernel address leak.

> 

> Solving this is certainly a good idea -- I'm all for finding a solid solution.

> 

> > Introduce kptr_restrict level 3 that causes the kernel to treat %p as

> > if it was %pK and thus always prints zeros.

> 

> I'm worried that this could break kernel internals where %p is being used and not

> exposed to userspace. Maybe those situations don't exist...


Not saying they don't I didn't find any.

> 

> Regardless, I would rather do what Grsecurity has done in this area, and whitelist

> known-safe values instead. For example, they have %pP for approved pointers,

> and %pX for approved

> dereference_function_descriptor() output. Everything else is censored if it is a

> value in kernel memory and destined for a user-space memory

> buffer:

> 

>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt != 'X' && *fmt !=

> 'K' && is_usercopy_object(buf)) {

>                 printk(KERN_ALERT "grsec: kernel infoleak detected!

> Please report this log to spender@grsecurity.net.\n");

>                 dump_stack();

>                 ptr = NULL;

>         }

> 

> The "is_usercopy_object()" test is something we can add, which is testing for a

> new SLAB flag that is used to mark slab caches as either used by user-space or

> not, which is done also through whitelisting.

> (For more details on this, see:

> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)

> 

> Would you have time/interest to add the slab flags and is_usercopy_object()?

> The hardened usercopy part of the slab whitelisting can be separate, since it likely

> needs a different usercopy interface to sanely integrate with upstream.


I could likely take this on. I would need to read up on the links and have a better concept
of what it is.

> 

> -Kees

> 

> --

> Kees Cook

> Nexus Security
Christoph Hellwig Oct. 6, 2016, 1:31 p.m. UTC | #3
On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.roberts@intel.com wrote:
> From: William Roberts <william.c.roberts@intel.com>
> 
> Some out-of-tree modules do not use %pK and just use %p, as it's
> the common C paradigm for printing pointers. Because of this,
> kptr_restrict has no affect on the output and thus, no way to
> contain the kernel address leak.

So what?  We a) don't care about out of tree modules and b) you could
just triviall fix them up if you care.

No need to bloat the kernel with crap like this.
Roberts, William C Oct. 6, 2016, 1:47 p.m. UTC | #4
> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Thursday, October 6, 2016 9:32 AM
> To: Roberts, William C <william.c.roberts@intel.com>
> Cc: kernel-hardening@lists.openwall.com; corbet@lwn.net; linux-
> doc@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> 
> On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.roberts@intel.com wrote:
> > From: William Roberts <william.c.roberts@intel.com>
> >
> > Some out-of-tree modules do not use %pK and just use %p, as it's the
> > common C paradigm for printing pointers. Because of this,
> > kptr_restrict has no affect on the output and thus, no way to contain
> > the kernel address leak.
> 
> So what?  We a) don't care about out of tree modules and b) you could just triviall
> fix them up if you care.

Out of tree modules still affect core kernel security. I would also bet money, that somewhere
In-tree someone has put a %p when they wanted a %pK. So this method is just quite error
prone. We currently have a blacklist approach versus whitelist.

> 
> No need to bloat the kernel with crap like this.

It's unconstructive comments like this that do the whole community harm. Notice how
responses from Kees Cook were aimed at finding a different solution to the problem and were
very constructive. As far as "bloating" goes, it really didn't change a whole lot, most of it was moved
lines, and adds maybe a few lines of code.
Christoph Hellwig Oct. 6, 2016, 1:56 p.m. UTC | #5
On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> Out of tree modules still affect core kernel security.

So don't use them.

> I would also bet money, that somewhere
> In-tree someone has put a %p when they wanted a %pK.

So fix them.

> So this method is just quite error
> prone. We currently have a blacklist approach versus whitelist.

Or fix the entire thing, get rid of %pK and always protect %p if you
can show that it doesn't break anything.

But stop posting patches with bullshit arguments like out of tree
modules.
Jann Horn Oct. 6, 2016, 2:05 p.m. UTC | #6
On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> > -----Original Message-----
> > From: Christoph Hellwig [mailto:hch@infradead.org]
> > Sent: Thursday, October 6, 2016 9:32 AM
> > To: Roberts, William C <william.c.roberts@intel.com>
> > Cc: kernel-hardening@lists.openwall.com; corbet@lwn.net; linux-
> > doc@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > 
> > On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.roberts@intel.com wrote:
> > > From: William Roberts <william.c.roberts@intel.com>
> > >
> > > Some out-of-tree modules do not use %pK and just use %p, as it's the
> > > common C paradigm for printing pointers. Because of this,
> > > kptr_restrict has no affect on the output and thus, no way to contain
> > > the kernel address leak.
> > 
> > So what?  We a) don't care about out of tree modules and b) you could just triviall
> > fix them up if you care.
> 
> Out of tree modules still affect core kernel security. I would also bet money, that somewhere
> In-tree someone has put a %p when they wanted a %pK. So this method is just quite error
> prone. We currently have a blacklist approach versus whitelist.

grep says you have a point:

$ grep -IR 'seq_printf.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_trca=%p\n", &dmadev->dev_trca);
drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_evca=%p\n", &dmadev->dev_evca);

$ grep -IR 'pr_info.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
drivers/misc/lkdtm_heap.c:	pr_info("Allocated memory %p-%p\n", base, &base[offset * 2]);

$ grep -IR 'pr_err.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:	pr_err("rx_ring->cqicb = %p\n", &rx_ring->cqicb);

And these are just trivially greppable, low-hanging-fruit ones.
With somewhat broader greps, there seem to be lots more, but they'd
require manual review.

And in total, there are 13578 matches for %p[^FfSsBRrhbMmIiEUVKNadCDgG]
throughout the kernel. Reviewing all of those manually would suck.
Jann Horn Oct. 6, 2016, 2:46 p.m. UTC | #7
On Thu, Oct 06, 2016 at 04:05:53PM +0200, Jann Horn wrote:
> On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> > > -----Original Message-----
> > > From: Christoph Hellwig [mailto:hch@infradead.org]
> > > Sent: Thursday, October 6, 2016 9:32 AM
> > > To: Roberts, William C <william.c.roberts@intel.com>
> > > Cc: kernel-hardening@lists.openwall.com; corbet@lwn.net; linux-
> > > doc@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > > 
> > > On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.roberts@intel.com wrote:
> > > > From: William Roberts <william.c.roberts@intel.com>
> > > >
> > > > Some out-of-tree modules do not use %pK and just use %p, as it's the
> > > > common C paradigm for printing pointers. Because of this,
> > > > kptr_restrict has no affect on the output and thus, no way to contain
> > > > the kernel address leak.
> > > 
> > > So what?  We a) don't care about out of tree modules and b) you could just triviall
> > > fix them up if you care.
> > 
> > Out of tree modules still affect core kernel security. I would also bet money, that somewhere
> > In-tree someone has put a %p when they wanted a %pK. So this method is just quite error
> > prone. We currently have a blacklist approach versus whitelist.
> 
> grep says you have a point:
> 
> $ grep -IR 'seq_printf.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_trca=%p\n", &dmadev->dev_trca);
> drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_evca=%p\n", &dmadev->dev_evca);
> 
> $ grep -IR 'pr_info.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/misc/lkdtm_heap.c:	pr_info("Allocated memory %p-%p\n", base, &base[offset * 2]);
> 
> $ grep -IR 'pr_err.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:	pr_err("rx_ring->cqicb = %p\n", &rx_ring->cqicb);
> 
> And these are just trivially greppable, low-hanging-fruit ones.
> With somewhat broader greps, there seem to be lots more, but they'd
> require manual review.

(Although, of course, most matches for seq_printf are in debugfs
files or stuff that's only enabled with some DEBUG config option
or so. But there are also e.g. pr_warn() users of %p that are not
in debug code.)
Roberts, William C Oct. 6, 2016, 2:59 p.m. UTC | #8
> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Thursday, October 6, 2016 9:56 AM
> To: Roberts, William C <william.c.roberts@intel.com>
> Cc: Christoph Hellwig <hch@infradead.org>; kernel-
> hardening@lists.openwall.com; corbet@lwn.net; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> 
> On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> > Out of tree modules still affect core kernel security.
> 
> So don't use them.
> 
> > I would also bet money, that somewhere In-tree someone has put a %p
> > when they wanted a %pK.
> 
> So fix them.

As Jann Horn points out, "And in total, there are 13578 matches for %p[^FfSsBRrhbMmIiEUVKNadCDgG] throughout the kernel. Reviewing all of those manually would suck."

> 
> > So this method is just quite error
> > prone. We currently have a blacklist approach versus whitelist.
> 
> Or fix the entire thing, get rid of %pK and always protect %p if you can show that
> it doesn't break anything.
> 
> But stop posting patches with bullshit arguments like out of tree modules.

Ok perhaps the commit message sucks, and I should have included the large spread usages of %p throughout
the kernel, I assumed those would just be known, I shouldn't have made that assumption.

We should care about out-of-tree modules wrt security as they affect the security of the whole system, especially when the
modules are linking to core symbols like printing and string routines. There are tons of %p usages throughout the
kernel as noted above.

This is pretty low hanging fruit and we should fix this, as Kees points out.
Roberts, William C Oct. 6, 2016, 3:18 p.m. UTC | #9
> -----Original Message-----

> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees

> Cook

> Sent: Wednesday, October 5, 2016 3:34 PM

> To: Roberts, William C <william.c.roberts@intel.com>

> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet <corbet@lwn.net>;

> linux-doc@vger.kernel.org; LKML <linux-kernel@vger.kernel.org>; Nick

> Desaulniers <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>

> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3

> 

> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:

> > From: William Roberts <william.c.roberts@intel.com>

> >

> > Some out-of-tree modules do not use %pK and just use %p, as it's the

> > common C paradigm for printing pointers. Because of this,

> > kptr_restrict has no affect on the output and thus, no way to contain

> > the kernel address leak.

> 

> Solving this is certainly a good idea -- I'm all for finding a solid solution.

> 

> > Introduce kptr_restrict level 3 that causes the kernel to treat %p as

> > if it was %pK and thus always prints zeros.

> 

> I'm worried that this could break kernel internals where %p is being used and not

> exposed to userspace. Maybe those situations don't exist...

> 

> Regardless, I would rather do what Grsecurity has done in this area, and whitelist

> known-safe values instead. For example, they have %pP for approved pointers,

> and %pX for approved

> dereference_function_descriptor() output. Everything else is censored if it is a

> value in kernel memory and destined for a user-space memory

> buffer:

> 

>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt != 'X' && *fmt !=

> 'K' && is_usercopy_object(buf)) {

>                 printk(KERN_ALERT "grsec: kernel infoleak detected!

> Please report this log to spender@grsecurity.net.\n");

>                 dump_stack();

>                 ptr = NULL;

>         }

> 

> The "is_usercopy_object()" test is something we can add, which is testing for a

> new SLAB flag that is used to mark slab caches as either used by user-space or

> not, which is done also through whitelisting.

> (For more details on this, see:

> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)

> 

> Would you have time/interest to add the slab flags and is_usercopy_object()?

> The hardened usercopy part of the slab whitelisting can be separate, since it likely

> needs a different usercopy interface to sanely integrate with upstream.


A couple of questions off hand:
1. What about bss statics? I am assuming that when the loader loads up a module
     That it's dynamically allocating the .bss section or some equivalent. I would
     Also assume the method you describe would catch that, is that correct?

2. What about stack variables? 

> 

> -Kees

> 

> --

> Kees Cook

> Nexus Security
Kees Cook Oct. 6, 2016, 9 p.m. UTC | #10
On Thu, Oct 6, 2016 at 6:56 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> > On Thu, Oct 6, 2016 at 6:31 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > > So what?  We a) don't care about out of tree modules and b) you could just triviall
> > > fix them up if you care.
> >
> > Out of tree modules still affect core kernel security.
>
> So don't use them.

I'm really interested in keeping these discussions productive.
Declaring that the upstream kernel community (your implied "we")
doesn't care about out-of-tree code is both false and short-sighted.
Huge numbers of people depend on Linux on their phones, their IoT
devices, cars, etc, and huge volumes of those (unfortunately) are
running out-of-tree code.

This isn't about supporting some special interface that a single third
party driver uses; this is about robust engineering and security
practices that benefits everything, even out-of-tree code.

>> I would also bet money, that somewhere
>> In-tree someone has put a %p when they wanted a %pK.
>
> So fix them.

%pK is an ugly situation that I'd like to fix correctly. It was
designed originally as a blacklisting method, when it should have been
designed as a whitelist (as William mentions). This is what will need
fixing (and is what I was suggesting in my original reply to the
patch).

All that said, "so fix them" sounds very much like "just fix the bugs"
which is another troublesome attitude to have for dealing with
potential security flaws. The kernel community already finds/fixes
obvious flaws; the efforts with changes like what William is proposing
are based around assuming (correctly, I can argue) that there are
already bugs present, and they will remain in the kernel for years to
come. Designing the kernel to safely deal with bugs being present is a
fundamental design principle for kernel security self-defense
technologies.

The challenge is not "fix them", the challenge is "design the kernel
to do the right thing even when the bugs are unfixed".

>> So this method is just quite error
>> prone. We currently have a blacklist approach versus whitelist.
>
> Or fix the entire thing, get rid of %pK and always protect %p if you
> can show that it doesn't break anything.
>
> But stop posting patches with bullshit arguments like out of tree
> modules.

As a _singlular_ argument, "it's for out-of-tree code" is weak. As an
_additional_ argument, it has value. Saying "this only helps
out-of-tree code" doesn't carry much weight. Saying "this helps kernel
security, even for out-of-tree code" is perfectly valid. And a wrinkle
in this is that some day, either that out-of-tree code, or brand new
code, will land in the kernel, and we don't want to continue to
require authors be aware of an opt-in security feature. The kernel
should protect itself (and all of itself, including out-of-tree or
future code) by default.

And based on my read of this thread, we all appear to be in violent
agreement. :) "always protect %p" is absolutely the goal, and we can
figure out the best way to get there.

-Kees
Kees Cook Oct. 6, 2016, 9:04 p.m. UTC | #11
On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C
<william.c.roberts@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees
>> Cook
>> Sent: Wednesday, October 5, 2016 3:34 PM
>> To: Roberts, William C <william.c.roberts@intel.com>
>> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet <corbet@lwn.net>;
>> linux-doc@vger.kernel.org; LKML <linux-kernel@vger.kernel.org>; Nick
>> Desaulniers <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>
>> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
>>
>> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
>> > From: William Roberts <william.c.roberts@intel.com>
>> >
>> > Some out-of-tree modules do not use %pK and just use %p, as it's the
>> > common C paradigm for printing pointers. Because of this,
>> > kptr_restrict has no affect on the output and thus, no way to contain
>> > the kernel address leak.
>>
>> Solving this is certainly a good idea -- I'm all for finding a solid solution.
>>
>> > Introduce kptr_restrict level 3 that causes the kernel to treat %p as
>> > if it was %pK and thus always prints zeros.
>>
>> I'm worried that this could break kernel internals where %p is being used and not
>> exposed to userspace. Maybe those situations don't exist...
>>
>> Regardless, I would rather do what Grsecurity has done in this area, and whitelist
>> known-safe values instead. For example, they have %pP for approved pointers,
>> and %pX for approved
>> dereference_function_descriptor() output. Everything else is censored if it is a
>> value in kernel memory and destined for a user-space memory
>> buffer:
>>
>>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt != 'X' && *fmt !=
>> 'K' && is_usercopy_object(buf)) {
>>                 printk(KERN_ALERT "grsec: kernel infoleak detected!
>> Please report this log to spender@grsecurity.net.\n");
>>                 dump_stack();
>>                 ptr = NULL;
>>         }
>>
>> The "is_usercopy_object()" test is something we can add, which is testing for a
>> new SLAB flag that is used to mark slab caches as either used by user-space or
>> not, which is done also through whitelisting.
>> (For more details on this, see:
>> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)
>>
>> Would you have time/interest to add the slab flags and is_usercopy_object()?
>> The hardened usercopy part of the slab whitelisting can be separate, since it likely
>> needs a different usercopy interface to sanely integrate with upstream.
>
> A couple of questions off hand:
> 1. What about bss statics? I am assuming that when the loader loads up a module
>      That it's dynamically allocating the .bss section or some equivalent. I would
>      Also assume the method you describe would catch that, is that correct?
>
> 2. What about stack variables?

It looks like what Grsecurity is doing is saying "if the address is
outside of user-space" (" > TASK_SIZE") and it's not whitelisted ('P',
'X') and it's going to land in a user-space buffer
("is_usercopy_object()", censor it. ("K" is already censored --
they're just optimizing to avoid re-checking it needlessly.)

So, in this case, all kernel memory, bss and stack included, would be
outside the user-space address range. (I am curious, however, how to
apply this to an architecture like s390 which has overlapping address
ranges... probably the TASK_SIZE test needs to use some other "is in
kernel memory" check that compiles down to TASK_SIZE on non-s390, and
DTRT on s390, etc.)

-Kees
Joe Perches Oct. 6, 2016, 9:19 p.m. UTC | #12
On Thu, 2016-10-06 at 14:00 -0700, Kees Cook wrote:

> And based on my read of this thread, we all appear to be in violent
> agreement. :) "always protect %p" is absolutely the goal, and we can
> figure out the best way to get there.

I proposed emitting pointers from the const and text sections by default
and using NULL for data pointers.

https://lkml.org/lkml/2016/8/5/380
Kees Cook Oct. 6, 2016, 9:25 p.m. UTC | #13
On Thu, Oct 6, 2016 at 2:19 PM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2016-10-06 at 14:00 -0700, Kees Cook wrote:
>
>> And based on my read of this thread, we all appear to be in violent
>> agreement. :) "always protect %p" is absolutely the goal, and we can
>> figure out the best way to get there.
>
> I proposed emitting pointers from the const and text sections by default
> and using NULL for data pointers.
>
> https://lkml.org/lkml/2016/8/5/380

Leaks of const and text (while not useful for write-attacks) can leak
KASLR offset (though yes, yes, there are many existing leaks -- but we
should avoid adding a new one regardless).

I think the logic of "is this destined for userspace" is likely the
cleanest approach. There still may be many things this breaks, though.
(I expect perf. Everything breaks perf. ;)

-Kees
Jann Horn Oct. 7, 2016, 11:52 a.m. UTC | #14
On Thu, Oct 06, 2016 at 04:05:53PM +0200, Jann Horn wrote:
> On Thu, Oct 06, 2016 at 01:47:47PM +0000, Roberts, William C wrote:
> > > -----Original Message-----
> > > From: Christoph Hellwig [mailto:hch@infradead.org]
> > > Sent: Thursday, October 6, 2016 9:32 AM
> > > To: Roberts, William C <william.c.roberts@intel.com>
> > > Cc: kernel-hardening@lists.openwall.com; corbet@lwn.net; linux-
> > > doc@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > > 
> > > On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.roberts@intel.com wrote:
> > > > From: William Roberts <william.c.roberts@intel.com>
> > > >
> > > > Some out-of-tree modules do not use %pK and just use %p, as it's the
> > > > common C paradigm for printing pointers. Because of this,
> > > > kptr_restrict has no affect on the output and thus, no way to contain
> > > > the kernel address leak.
> > > 
> > > So what?  We a) don't care about out of tree modules and b) you could just triviall
> > > fix them up if you care.
> > 
> > Out of tree modules still affect core kernel security. I would also bet money, that somewhere
> > In-tree someone has put a %p when they wanted a %pK. So this method is just quite error
> > prone. We currently have a blacklist approach versus whitelist.
> 
> grep says you have a point:
> 
> $ grep -IR 'seq_printf.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_trca=%p\n", &dmadev->dev_trca);
> drivers/dma/qcom/hidma_dbg.c:	seq_printf(s, "dev_evca=%p\n", &dmadev->dev_evca);
> 
> $ grep -IR 'pr_info.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/misc/lkdtm_heap.c:	pr_info("Allocated memory %p-%p\n", base, &base[offset * 2]);
> 
> $ grep -IR 'pr_err.*%p[^FfSsBRrhbMmIiEUVKNadCDgG].*&'
> drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:	pr_err("rx_ring->cqicb = %p\n", &rx_ring->cqicb);

Actually, I think I missed something here.

We don't really want to censor pointers in dmesg, right? dmesg can
contain pointers by design - and in particular, when an oops
happens, the register dump will reveal pointers. dmesg should just
be restricted using dmesg_restrict.

(What is annoying, though, is that e.g. Debian's default rsyslog
config sends all KERN_EMERG messages to all active PTYs by default,
meaning that on a system with such a config, making the kernel oops
can be used to leak some ASLR information.)

(And why does __die() in arch/x86/kernel/dumpstack.c use KERN_EMERG
for CONFIG_X86_32, but KERN_ALERT for !CONFIG_X86_32?)
Roberts, William C Oct. 7, 2016, 2:19 p.m. UTC | #15
> -----Original Message-----

> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees

> Cook

> Sent: Thursday, October 6, 2016 5:05 PM

> To: Roberts, William C <william.c.roberts@intel.com>

> Cc: kernel-hardening@lists.openwall.com

> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3

> 

> On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C <william.c.roberts@intel.com>

> wrote:

> >

> >

> >> -----Original Message-----

> >> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of

> >> Kees Cook

> >> Sent: Wednesday, October 5, 2016 3:34 PM

> >> To: Roberts, William C <william.c.roberts@intel.com>

> >> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet

> >> <corbet@lwn.net>; linux-doc@vger.kernel.org; LKML

> >> <linux-kernel@vger.kernel.org>; Nick Desaulniers

> >> <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>

> >> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3

> >>

> >> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:

> >> > From: William Roberts <william.c.roberts@intel.com>

> >> >

> >> > Some out-of-tree modules do not use %pK and just use %p, as it's

> >> > the common C paradigm for printing pointers. Because of this,

> >> > kptr_restrict has no affect on the output and thus, no way to

> >> > contain the kernel address leak.

> >>

> >> Solving this is certainly a good idea -- I'm all for finding a solid solution.

> >>

> >> > Introduce kptr_restrict level 3 that causes the kernel to treat %p

> >> > as if it was %pK and thus always prints zeros.

> >>

> >> I'm worried that this could break kernel internals where %p is being

> >> used and not exposed to userspace. Maybe those situations don't exist...

> >>

> >> Regardless, I would rather do what Grsecurity has done in this area,

> >> and whitelist known-safe values instead. For example, they have %pP

> >> for approved pointers, and %pX for approved

> >> dereference_function_descriptor() output. Everything else is censored

> >> if it is a value in kernel memory and destined for a user-space

> >> memory

> >> buffer:

> >>

> >>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt !=

> >> 'X' && *fmt != 'K' && is_usercopy_object(buf)) {

> >>                 printk(KERN_ALERT "grsec: kernel infoleak detected!

> >> Please report this log to spender@grsecurity.net.\n");

> >>                 dump_stack();

> >>                 ptr = NULL;

> >>         }

> >>

> >> The "is_usercopy_object()" test is something we can add, which is

> >> testing for a new SLAB flag that is used to mark slab caches as

> >> either used by user-space or not, which is done also through whitelisting.

> >> (For more details on this, see:

> >> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)

> >>

> >> Would you have time/interest to add the slab flags and is_usercopy_object()?

> >> The hardened usercopy part of the slab whitelisting can be separate,

> >> since it likely needs a different usercopy interface to sanely integrate with

> upstream.

> >

> > A couple of questions off hand:

> > 1. What about bss statics? I am assuming that when the loader loads up a

> module

> >      That it's dynamically allocating the .bss section or some equivalent. I would

> >      Also assume the method you describe would catch that, is that correct?

> >

> > 2. What about stack variables?

> 

> It looks like what Grsecurity is doing is saying "if the address is outside of user-

> space" (" > TASK_SIZE") and it's not whitelisted ('P',

> 'X') and it's going to land in a user-space buffer ("is_usercopy_object()", censor it.

> ("K" is already censored -- they're just optimizing to avoid re-checking it

> needlessly.)

> 

> So, in this case, all kernel memory, bss and stack included, would be outside the

> user-space address range. (I am curious, however, how to apply this to an

> architecture like s390 which has overlapping address ranges... probably the

> TASK_SIZE test needs to use some other "is in kernel memory" check that

> compiles down to TASK_SIZE on non-s390, and DTRT on s390, etc.)

> 


Before I go off and attempt this, I just have another dumb question to ask:

If the printk copies it into the kernel ring buffer, at some point, someone comes
And asks for a copy into a userspace buffer either via dmesg or proc/kmsg interfaces.
Heck, it may even be on an open uart serial. How would the check actually work,
My understanding, at the time %p is resolved into a string, it might not be heading
to a userspace buffer. Perhaps you can help fill in what I am missing?
Roberts, William C Oct. 7, 2016, 2:21 p.m. UTC | #16
<snip>

> 

> As a _singlular_ argument, "it's for out-of-tree code" is weak. As an _additional_

> argument, it has value. Saying "this only helps out-of-tree code" doesn't carry

> much weight. Saying "this helps kernel security, even for out-of-tree code" is

> perfectly valid. And a wrinkle in this is that some day, either that out-of-tree

> code, or brand new code, will land in the kernel, and we don't want to continue

> to require authors be aware of an opt-in security feature. The kernel should

> protect itself (and all of itself, including out-of-tree or future code) by default.

> 


I should have made this more clear in my message, this was in my head and I assumed
that people would just get it. But I shouldn't have made such an assumption.

> And based on my read of this thread, we all appear to be in violent agreement. :)

> "always protect %p" is absolutely the goal, and we can figure out the best way to

> get there.

> 

> -Kees

> 

> --

> Kees Cook

> Nexus Security
Jann Horn Oct. 7, 2016, 2:29 p.m. UTC | #17
On Fri, Oct 07, 2016 at 02:19:43PM +0000, Roberts, William C wrote:
> 
> 
> > -----Original Message-----
> > From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees
> > Cook
> > Sent: Thursday, October 6, 2016 5:05 PM
> > To: Roberts, William C <william.c.roberts@intel.com>
> > Cc: kernel-hardening@lists.openwall.com
> > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > 
> > On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C <william.c.roberts@intel.com>
> > wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of
> > >> Kees Cook
> > >> Sent: Wednesday, October 5, 2016 3:34 PM
> > >> To: Roberts, William C <william.c.roberts@intel.com>
> > >> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet
> > >> <corbet@lwn.net>; linux-doc@vger.kernel.org; LKML
> > >> <linux-kernel@vger.kernel.org>; Nick Desaulniers
> > >> <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>
> > >> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > >>
> > >> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
> > >> > From: William Roberts <william.c.roberts@intel.com>
> > >> >
> > >> > Some out-of-tree modules do not use %pK and just use %p, as it's
> > >> > the common C paradigm for printing pointers. Because of this,
> > >> > kptr_restrict has no affect on the output and thus, no way to
> > >> > contain the kernel address leak.
> > >>
> > >> Solving this is certainly a good idea -- I'm all for finding a solid solution.
> > >>
> > >> > Introduce kptr_restrict level 3 that causes the kernel to treat %p
> > >> > as if it was %pK and thus always prints zeros.
> > >>
> > >> I'm worried that this could break kernel internals where %p is being
> > >> used and not exposed to userspace. Maybe those situations don't exist...
> > >>
> > >> Regardless, I would rather do what Grsecurity has done in this area,
> > >> and whitelist known-safe values instead. For example, they have %pP
> > >> for approved pointers, and %pX for approved
> > >> dereference_function_descriptor() output. Everything else is censored
> > >> if it is a value in kernel memory and destined for a user-space
> > >> memory
> > >> buffer:
> > >>
> > >>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt !=
> > >> 'X' && *fmt != 'K' && is_usercopy_object(buf)) {
> > >>                 printk(KERN_ALERT "grsec: kernel infoleak detected!
> > >> Please report this log to spender@grsecurity.net.\n");
> > >>                 dump_stack();
> > >>                 ptr = NULL;
> > >>         }
> > >>
> > >> The "is_usercopy_object()" test is something we can add, which is
> > >> testing for a new SLAB flag that is used to mark slab caches as
> > >> either used by user-space or not, which is done also through whitelisting.
> > >> (For more details on this, see:
> > >> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)
> > >>
> > >> Would you have time/interest to add the slab flags and is_usercopy_object()?
> > >> The hardened usercopy part of the slab whitelisting can be separate,
> > >> since it likely needs a different usercopy interface to sanely integrate with
> > upstream.
> > >
> > > A couple of questions off hand:
> > > 1. What about bss statics? I am assuming that when the loader loads up a
> > module
> > >      That it's dynamically allocating the .bss section or some equivalent. I would
> > >      Also assume the method you describe would catch that, is that correct?
> > >
> > > 2. What about stack variables?
> > 
> > It looks like what Grsecurity is doing is saying "if the address is outside of user-
> > space" (" > TASK_SIZE") and it's not whitelisted ('P',
> > 'X') and it's going to land in a user-space buffer ("is_usercopy_object()", censor it.
> > ("K" is already censored -- they're just optimizing to avoid re-checking it
> > needlessly.)
> > 
> > So, in this case, all kernel memory, bss and stack included, would be outside the
> > user-space address range. (I am curious, however, how to apply this to an
> > architecture like s390 which has overlapping address ranges... probably the
> > TASK_SIZE test needs to use some other "is in kernel memory" check that
> > compiles down to TASK_SIZE on non-s390, and DTRT on s390, etc.)
> > 
> 
> Before I go off and attempt this, I just have another dumb question to ask:
> 
> If the printk copies it into the kernel ring buffer, at some point, someone comes
> And asks for a copy into a userspace buffer either via dmesg or proc/kmsg interfaces.

IMO that's fine - I don't think pointers in the kernel ring buffer should be restricted.
Instead, access to dmesg / proc/kmsg should be restricted appropriately.

I guess it depends on what the goal here is. Do we really want to stop root from
ever seeing a kernel pointer (in which case OOPS messages wouldn't really work
anymore)? My view is that restricting these interfaces so far that only root can
access them and it's unlikely that root accidentally does so is sufficient.
Roberts, William C Oct. 7, 2016, 3:05 p.m. UTC | #18
> -----Original Message-----
> From: Jann Horn [mailto:jann@thejh.net]
> Sent: Friday, October 7, 2016 10:30 AM
> To: Roberts, William C <william.c.roberts@intel.com>
> Cc: Kees Cook <keescook@chromium.org>; kernel-
> hardening@lists.openwall.com
> Subject: Re: [kernel-hardening] RE: [PATCH] printk: introduce kptr_restrict level 3
> 
> On Fri, Oct 07, 2016 at 02:19:43PM +0000, Roberts, William C wrote:
> >
> >
> > > -----Original Message-----
> > > From: keescook@google.com [mailto:keescook@google.com] On Behalf Of
> > > Kees Cook
> > > Sent: Thursday, October 6, 2016 5:05 PM
> > > To: Roberts, William C <william.c.roberts@intel.com>
> > > Cc: kernel-hardening@lists.openwall.com
> > > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > >
> > > On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C
> > > <william.c.roberts@intel.com>
> > > wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: keescook@google.com [mailto:keescook@google.com] On Behalf
> > > >> Of Kees Cook
> > > >> Sent: Wednesday, October 5, 2016 3:34 PM
> > > >> To: Roberts, William C <william.c.roberts@intel.com>
> > > >> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet
> > > >> <corbet@lwn.net>; linux-doc@vger.kernel.org; LKML
> > > >> <linux-kernel@vger.kernel.org>; Nick Desaulniers
> > > >> <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>
> > > >> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > > >>
> > > >> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
> > > >> > From: William Roberts <william.c.roberts@intel.com>
> > > >> >
> > > >> > Some out-of-tree modules do not use %pK and just use %p, as
> > > >> > it's the common C paradigm for printing pointers. Because of
> > > >> > this, kptr_restrict has no affect on the output and thus, no
> > > >> > way to contain the kernel address leak.
> > > >>
> > > >> Solving this is certainly a good idea -- I'm all for finding a solid solution.
> > > >>
> > > >> > Introduce kptr_restrict level 3 that causes the kernel to treat
> > > >> > %p as if it was %pK and thus always prints zeros.
> > > >>
> > > >> I'm worried that this could break kernel internals where %p is
> > > >> being used and not exposed to userspace. Maybe those situations don't
> exist...
> > > >>
> > > >> Regardless, I would rather do what Grsecurity has done in this
> > > >> area, and whitelist known-safe values instead. For example, they
> > > >> have %pP for approved pointers, and %pX for approved
> > > >> dereference_function_descriptor() output. Everything else is
> > > >> censored if it is a value in kernel memory and destined for a
> > > >> user-space memory
> > > >> buffer:
> > > >>
> > > >>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt
> > > >> != 'X' && *fmt != 'K' && is_usercopy_object(buf)) {
> > > >>                 printk(KERN_ALERT "grsec: kernel infoleak detected!
> > > >> Please report this log to spender@grsecurity.net.\n");
> > > >>                 dump_stack();
> > > >>                 ptr = NULL;
> > > >>         }
> > > >>
> > > >> The "is_usercopy_object()" test is something we can add, which is
> > > >> testing for a new SLAB flag that is used to mark slab caches as
> > > >> either used by user-space or not, which is done also through whitelisting.
> > > >> (For more details on this, see:
> > > >> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)
> > > >>
> > > >> Would you have time/interest to add the slab flags and
> is_usercopy_object()?
> > > >> The hardened usercopy part of the slab whitelisting can be
> > > >> separate, since it likely needs a different usercopy interface to
> > > >> sanely integrate with
> > > upstream.
> > > >
> > > > A couple of questions off hand:
> > > > 1. What about bss statics? I am assuming that when the loader
> > > > loads up a
> > > module
> > > >      That it's dynamically allocating the .bss section or some equivalent. I
> would
> > > >      Also assume the method you describe would catch that, is that correct?
> > > >
> > > > 2. What about stack variables?
> > >
> > > It looks like what Grsecurity is doing is saying "if the address is
> > > outside of user- space" (" > TASK_SIZE") and it's not whitelisted
> > > ('P',
> > > 'X') and it's going to land in a user-space buffer ("is_usercopy_object()",
> censor it.
> > > ("K" is already censored -- they're just optimizing to avoid
> > > re-checking it
> > > needlessly.)
> > >
> > > So, in this case, all kernel memory, bss and stack included, would
> > > be outside the user-space address range. (I am curious, however, how
> > > to apply this to an architecture like s390 which has overlapping
> > > address ranges... probably the TASK_SIZE test needs to use some
> > > other "is in kernel memory" check that compiles down to TASK_SIZE on
> > > non-s390, and DTRT on s390, etc.)
> > >
> >
> > Before I go off and attempt this, I just have another dumb question to ask:
> >
> > If the printk copies it into the kernel ring buffer, at some point,
> > someone comes And asks for a copy into a userspace buffer either via dmesg or
> proc/kmsg interfaces.
> 
> IMO that's fine - I don't think pointers in the kernel ring buffer should be
> restricted.
> Instead, access to dmesg / proc/kmsg should be restricted appropriately.
> 
> I guess it depends on what the goal here is. Do we really want to stop root from
> ever seeing a kernel pointer (in which case OOPS messages wouldn't really work
> anymore)? My view is that restricting these interfaces so far that only root can
> access them and it's unlikely that root accidentally does so is sufficient.

I'm running Ubuntu 14.04, and perhaps they just got it wrong, but I can do dmesg as
an unprivileged user. I usually only work on Android, and its restricted to root. I'm
not sure how other distro's set this up. Ideally, on non-debug systems, I'd like to
see %p's filtered no matter what, I'd think opt in (whitelist) is ok. At least than,
you did it intentionally. I verified that %p filtering gets us coredumps and kernel
oops's have addresses.

Ideally on user images debug interfaces are closed, but I would like to make rogue %p's
without opt in go away. 

I did a little research, and /proc/sys/kernel/dmesg_restrict seems to enable or disable this
ability.

After setting that to 1, it appears I got the restriction I wanted:
$ dmesg
dmesg: klogctl failed: Operation not permitted

One of my friends who does hypervisor development and is familiar with a few systems
told me that Apple products just kill %p on production builds from logs.

I'm kind of torn between the various approaches discussed on this thread.
Jann Horn Oct. 7, 2016, 3:15 p.m. UTC | #19
On Fri, Oct 07, 2016 at 03:05:43PM +0000, Roberts, William C wrote:
> 
> 
> > -----Original Message-----
> > From: Jann Horn [mailto:jann@thejh.net]
> > Sent: Friday, October 7, 2016 10:30 AM
> > To: Roberts, William C <william.c.roberts@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>; kernel-
> > hardening@lists.openwall.com
> > Subject: Re: [kernel-hardening] RE: [PATCH] printk: introduce kptr_restrict level 3
> > 
> > On Fri, Oct 07, 2016 at 02:19:43PM +0000, Roberts, William C wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: keescook@google.com [mailto:keescook@google.com] On Behalf Of
> > > > Kees Cook
> > > > Sent: Thursday, October 6, 2016 5:05 PM
> > > > To: Roberts, William C <william.c.roberts@intel.com>
> > > > Cc: kernel-hardening@lists.openwall.com
> > > > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > > >
> > > > On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C
> > > > <william.c.roberts@intel.com>
> > > > wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: keescook@google.com [mailto:keescook@google.com] On Behalf
> > > > >> Of Kees Cook
> > > > >> Sent: Wednesday, October 5, 2016 3:34 PM
> > > > >> To: Roberts, William C <william.c.roberts@intel.com>
> > > > >> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet
> > > > >> <corbet@lwn.net>; linux-doc@vger.kernel.org; LKML
> > > > >> <linux-kernel@vger.kernel.org>; Nick Desaulniers
> > > > >> <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>
> > > > >> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
> > > > >>
> > > > >> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
> > > > >> > From: William Roberts <william.c.roberts@intel.com>
> > > > >> >
> > > > >> > Some out-of-tree modules do not use %pK and just use %p, as
> > > > >> > it's the common C paradigm for printing pointers. Because of
> > > > >> > this, kptr_restrict has no affect on the output and thus, no
> > > > >> > way to contain the kernel address leak.
> > > > >>
> > > > >> Solving this is certainly a good idea -- I'm all for finding a solid solution.
> > > > >>
> > > > >> > Introduce kptr_restrict level 3 that causes the kernel to treat
> > > > >> > %p as if it was %pK and thus always prints zeros.
> > > > >>
> > > > >> I'm worried that this could break kernel internals where %p is
> > > > >> being used and not exposed to userspace. Maybe those situations don't
> > exist...
> > > > >>
> > > > >> Regardless, I would rather do what Grsecurity has done in this
> > > > >> area, and whitelist known-safe values instead. For example, they
> > > > >> have %pP for approved pointers, and %pX for approved
> > > > >> dereference_function_descriptor() output. Everything else is
> > > > >> censored if it is a value in kernel memory and destined for a
> > > > >> user-space memory
> > > > >> buffer:
> > > > >>
> > > > >>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt
> > > > >> != 'X' && *fmt != 'K' && is_usercopy_object(buf)) {
> > > > >>                 printk(KERN_ALERT "grsec: kernel infoleak detected!
> > > > >> Please report this log to spender@grsecurity.net.\n");
> > > > >>                 dump_stack();
> > > > >>                 ptr = NULL;
> > > > >>         }
> > > > >>
> > > > >> The "is_usercopy_object()" test is something we can add, which is
> > > > >> testing for a new SLAB flag that is used to mark slab caches as
> > > > >> either used by user-space or not, which is done also through whitelisting.
> > > > >> (For more details on this, see:
> > > > >> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)
> > > > >>
> > > > >> Would you have time/interest to add the slab flags and
> > is_usercopy_object()?
> > > > >> The hardened usercopy part of the slab whitelisting can be
> > > > >> separate, since it likely needs a different usercopy interface to
> > > > >> sanely integrate with
> > > > upstream.
> > > > >
> > > > > A couple of questions off hand:
> > > > > 1. What about bss statics? I am assuming that when the loader
> > > > > loads up a
> > > > module
> > > > >      That it's dynamically allocating the .bss section or some equivalent. I
> > would
> > > > >      Also assume the method you describe would catch that, is that correct?
> > > > >
> > > > > 2. What about stack variables?
> > > >
> > > > It looks like what Grsecurity is doing is saying "if the address is
> > > > outside of user- space" (" > TASK_SIZE") and it's not whitelisted
> > > > ('P',
> > > > 'X') and it's going to land in a user-space buffer ("is_usercopy_object()",
> > censor it.
> > > > ("K" is already censored -- they're just optimizing to avoid
> > > > re-checking it
> > > > needlessly.)
> > > >
> > > > So, in this case, all kernel memory, bss and stack included, would
> > > > be outside the user-space address range. (I am curious, however, how
> > > > to apply this to an architecture like s390 which has overlapping
> > > > address ranges... probably the TASK_SIZE test needs to use some
> > > > other "is in kernel memory" check that compiles down to TASK_SIZE on
> > > > non-s390, and DTRT on s390, etc.)
> > > >
> > >
> > > Before I go off and attempt this, I just have another dumb question to ask:
> > >
> > > If the printk copies it into the kernel ring buffer, at some point,
> > > someone comes And asks for a copy into a userspace buffer either via dmesg or
> > proc/kmsg interfaces.
> > 
> > IMO that's fine - I don't think pointers in the kernel ring buffer should be
> > restricted.
> > Instead, access to dmesg / proc/kmsg should be restricted appropriately.
> > 
> > I guess it depends on what the goal here is. Do we really want to stop root from
> > ever seeing a kernel pointer (in which case OOPS messages wouldn't really work
> > anymore)? My view is that restricting these interfaces so far that only root can
> > access them and it's unlikely that root accidentally does so is sufficient.
> 
> I'm running Ubuntu 14.04, and perhaps they just got it wrong, but I can do dmesg as
> an unprivileged user. I usually only work on Android, and its restricted to root. I'm
> not sure how other distro's set this up. Ideally, on non-debug systems, I'd like to
> see %p's filtered no matter what, I'd think opt in (whitelist) is ok. At least than,
> you did it intentionally. I verified that %p filtering gets us coredumps and kernel
> oops's have addresses.

On a system with panic_on_oops=0 and dmesg_restrict=0, if you want to leak where the
kernel and data are to be able to exploit a memory corruption properly, you can
probably often just trigger the bug, get the addresses from the resulting oops, set up
the addresses in your exploit properly and retry the attack.


> Ideally on user images debug interfaces are closed, but I would like to make rogue %p's
> without opt in go away.


> I did a little research, and /proc/sys/kernel/dmesg_restrict seems to enable or disable this
> ability.
> 
> After setting that to 1, it appears I got the restriction I wanted:
> $ dmesg
> dmesg: klogctl failed: Operation not permitted

Yup, exactly.


> One of my friends who does hypervisor development and is familiar with a few systems
> told me that Apple products just kill %p on production builds from logs.
> 
> I'm kind of torn between the various approaches discussed on this thread.
Kees Cook Oct. 7, 2016, 7:12 p.m. UTC | #20
On Fri, Oct 7, 2016 at 7:29 AM, Jann Horn <jann@thejh.net> wrote:
> On Fri, Oct 07, 2016 at 02:19:43PM +0000, Roberts, William C wrote:
>>
>>
>> > -----Original Message-----
>> > From: keescook@google.com [mailto:keescook@google.com] On Behalf Of Kees
>> > Cook
>> > Sent: Thursday, October 6, 2016 5:05 PM
>> > To: Roberts, William C <william.c.roberts@intel.com>
>> > Cc: kernel-hardening@lists.openwall.com
>> > Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
>> >
>> > On Thu, Oct 6, 2016 at 8:18 AM, Roberts, William C <william.c.roberts@intel.com>
>> > wrote:
>> > >
>> > >
>> > >> -----Original Message-----
>> > >> From: keescook@google.com [mailto:keescook@google.com] On Behalf Of
>> > >> Kees Cook
>> > >> Sent: Wednesday, October 5, 2016 3:34 PM
>> > >> To: Roberts, William C <william.c.roberts@intel.com>
>> > >> Cc: kernel-hardening@lists.openwall.com; Jonathan Corbet
>> > >> <corbet@lwn.net>; linux-doc@vger.kernel.org; LKML
>> > >> <linux-kernel@vger.kernel.org>; Nick Desaulniers
>> > >> <ndesaulniers@google.com>; Dave Weinstein <olorin@google.com>
>> > >> Subject: Re: [PATCH] printk: introduce kptr_restrict level 3
>> > >>
>> > >> On Wed, Oct 5, 2016 at 11:04 AM,  <william.c.roberts@intel.com> wrote:
>> > >> > From: William Roberts <william.c.roberts@intel.com>
>> > >> >
>> > >> > Some out-of-tree modules do not use %pK and just use %p, as it's
>> > >> > the common C paradigm for printing pointers. Because of this,
>> > >> > kptr_restrict has no affect on the output and thus, no way to
>> > >> > contain the kernel address leak.
>> > >>
>> > >> Solving this is certainly a good idea -- I'm all for finding a solid solution.
>> > >>
>> > >> > Introduce kptr_restrict level 3 that causes the kernel to treat %p
>> > >> > as if it was %pK and thus always prints zeros.
>> > >>
>> > >> I'm worried that this could break kernel internals where %p is being
>> > >> used and not exposed to userspace. Maybe those situations don't exist...
>> > >>
>> > >> Regardless, I would rather do what Grsecurity has done in this area,
>> > >> and whitelist known-safe values instead. For example, they have %pP
>> > >> for approved pointers, and %pX for approved
>> > >> dereference_function_descriptor() output. Everything else is censored
>> > >> if it is a value in kernel memory and destined for a user-space
>> > >> memory
>> > >> buffer:
>> > >>
>> > >>         if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt !=
>> > >> 'X' && *fmt != 'K' && is_usercopy_object(buf)) {
>> > >>                 printk(KERN_ALERT "grsec: kernel infoleak detected!
>> > >> Please report this log to spender@grsecurity.net.\n");
>> > >>                 dump_stack();
>> > >>                 ptr = NULL;
>> > >>         }
>> > >>
>> > >> The "is_usercopy_object()" test is something we can add, which is
>> > >> testing for a new SLAB flag that is used to mark slab caches as
>> > >> either used by user-space or not, which is done also through whitelisting.
>> > >> (For more details on this, see:
>> > >> http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)
>> > >>
>> > >> Would you have time/interest to add the slab flags and is_usercopy_object()?
>> > >> The hardened usercopy part of the slab whitelisting can be separate,
>> > >> since it likely needs a different usercopy interface to sanely integrate with
>> > upstream.
>> > >
>> > > A couple of questions off hand:
>> > > 1. What about bss statics? I am assuming that when the loader loads up a
>> > module
>> > >      That it's dynamically allocating the .bss section or some equivalent. I would
>> > >      Also assume the method you describe would catch that, is that correct?
>> > >
>> > > 2. What about stack variables?
>> >
>> > It looks like what Grsecurity is doing is saying "if the address is outside of user-
>> > space" (" > TASK_SIZE") and it's not whitelisted ('P',
>> > 'X') and it's going to land in a user-space buffer ("is_usercopy_object()", censor it.
>> > ("K" is already censored -- they're just optimizing to avoid re-checking it
>> > needlessly.)
>> >
>> > So, in this case, all kernel memory, bss and stack included, would be outside the
>> > user-space address range. (I am curious, however, how to apply this to an
>> > architecture like s390 which has overlapping address ranges... probably the
>> > TASK_SIZE test needs to use some other "is in kernel memory" check that
>> > compiles down to TASK_SIZE on non-s390, and DTRT on s390, etc.)
>> >
>>
>> Before I go off and attempt this, I just have another dumb question to ask:
>>
>> If the printk copies it into the kernel ring buffer, at some point, someone comes
>> And asks for a copy into a userspace buffer either via dmesg or proc/kmsg interfaces.
>
> IMO that's fine - I don't think pointers in the kernel ring buffer should be restricted.
> Instead, access to dmesg / proc/kmsg should be restricted appropriately.
>
> I guess it depends on what the goal here is. Do we really want to stop root from
> ever seeing a kernel pointer (in which case OOPS messages wouldn't really work
> anymore)? My view is that restricting these interfaces so far that only root can
> access them and it's unlikely that root accidentally does so is sufficient.

I don't think it's worth worrying about dmesg right now. Maybe later
on, but I don't think it's worth it right now. Assuming it'll work for
things landing in /proc and /sys, focusing on the user-buffer-destined
stuff seems the best use of time to me.

-Kees
Roberts, William C Oct. 11, 2016, 6:11 p.m. UTC | #21
<snip>

> > I guess it depends on what the goal here is. Do we really want to stop

> > root from ever seeing a kernel pointer (in which case OOPS messages

> > wouldn't really work anymore)? My view is that restricting these

> > interfaces so far that only root can access them and it's unlikely that root

> accidentally does so is sufficient.

> 

> I don't think it's worth worrying about dmesg right now. Maybe later on, but I

> don't think it's worth it right now. Assuming it'll work for things landing in /proc

> and /sys, focusing on the user-buffer-destined stuff seems the best use of time

> to me.

> 


SGTM, I'll start this soon once I get a few other things off of my plate.

Patch
diff mbox

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..bca72a0 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -393,6 +393,9 @@  values to unprivileged users is a concern.
 When kptr_restrict is set to (2), kernel pointers printed using
 %pK will be replaced with 0's regardless of privileges.
 
+When kptr_restrict is set to (3), kernel pointers printed using
+%p and %pK will be replaced with 0's regardless of privileges.
+
 ==============================================================
 
 kstack_depth_to_print: (X86 only)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a377bfa..0d4e4af 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -128,6 +128,7 @@  static unsigned long one_ul = 1;
 static int one_hundred = 100;
 static int one_thousand = 1000;
 #ifdef CONFIG_PRINTK
+static int three = 3;
 static int ten_thousand = 10000;
 #endif
 #ifdef CONFIG_PERF_EVENTS
@@ -847,7 +848,7 @@  static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax_sysadmin,
 		.extra1		= &zero,
-		.extra2		= &two,
+		.extra2		= &three,
 	},
 #endif
 	{
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 0967771..371cfab 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1472,6 +1472,56 @@  char *flags_string(char *buf, char *end, void *flags_ptr, const char *fmt)
 
 int kptr_restrict __read_mostly;
 
+static inline void *cleanse_pointer(void *ptr, struct printf_spec spec,
+	char *buf, char *end, int default_width)
+{
+	switch (kptr_restrict) {
+	case 0:
+		/* Always print %p values */
+		break;
+	case 1: {
+		const struct cred *cred;
+
+		/*
+		 * kptr_restrict==1 cannot be used in IRQ context
+		 * because its test for CAP_SYSLOG would be meaningless.
+		 */
+		if (in_irq() || in_serving_softirq() || in_nmi()) {
+			if (spec.field_width == -1)
+				spec.field_width = default_width;
+			return string(buf, end, "pK-error", spec);
+		}
+
+		/*
+		 * Only print the real pointer value if the current
+		 * process has CAP_SYSLOG and is running with the
+		 * same credentials it started with. This is because
+		 * access to files is checked at open() time, but %p
+		 * checks permission at read() time. We don't want to
+		 * leak pointer values if a binary opens a file using
+		 * %pK and then elevates privileges before reading it.
+		 */
+		cred = current_cred();
+		if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+		    !uid_eq(cred->euid, cred->uid) ||
+		    !gid_eq(cred->egid, cred->gid))
+			ptr = NULL;
+		break;
+	}
+	case 2: /* restrict only %pK */
+	case 3: /* restrict all non-extensioned %p and %pK */
+	default:
+		ptr = NULL;
+	break;
+	}
+	return ptr;
+}
+
+static inline int kptr_restrict_always_cleanse(void)
+{
+	return kptr_restrict == 3;
+}
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -1569,6 +1619,9 @@  int kptr_restrict __read_mostly;
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
  * pointer to the real address.
+ *
+ * Note: That for kptr_restrict set to 3, %p and %pK have the same
+ * meaning.
  */
 static noinline_for_stack
 char *pointer(const char *fmt, char *buf, char *end, void *ptr,
@@ -1576,7 +1629,7 @@  char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 {
 	const int default_width = 2 * sizeof(void *);
 
-	if (!ptr && *fmt != 'K') {
+	if (!ptr && *fmt != 'K' && !kptr_restrict_always_cleanse()) {
 		/*
 		 * Print (null) with the same width as a pointer so it makes
 		 * tabular output look nice.
@@ -1657,48 +1710,6 @@  char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 			va_end(va);
 			return buf;
 		}
-	case 'K':
-		switch (kptr_restrict) {
-		case 0:
-			/* Always print %pK values */
-			break;
-		case 1: {
-			const struct cred *cred;
-
-			/*
-			 * kptr_restrict==1 cannot be used in IRQ context
-			 * because its test for CAP_SYSLOG would be meaningless.
-			 */
-			if (in_irq() || in_serving_softirq() || in_nmi()) {
-				if (spec.field_width == -1)
-					spec.field_width = default_width;
-				return string(buf, end, "pK-error", spec);
-			}
-
-			/*
-			 * Only print the real pointer value if the current
-			 * process has CAP_SYSLOG and is running with the
-			 * same credentials it started with. This is because
-			 * access to files is checked at open() time, but %pK
-			 * checks permission at read() time. We don't want to
-			 * leak pointer values if a binary opens a file using
-			 * %pK and then elevates privileges before reading it.
-			 */
-			cred = current_cred();
-			if (!has_capability_noaudit(current, CAP_SYSLOG) ||
-			    !uid_eq(cred->euid, cred->uid) ||
-			    !gid_eq(cred->egid, cred->gid))
-				ptr = NULL;
-			break;
-		}
-		case 2:
-		default:
-			/* Always print 0's for %pK */
-			ptr = NULL;
-			break;
-		}
-		break;
-
 	case 'N':
 		return netdev_bits(buf, end, ptr, fmt);
 	case 'a':
@@ -1718,6 +1729,16 @@  char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 
 	case 'G':
 		return flags_string(buf, end, ptr, fmt);
+	default:
+		/*
+		 * plain %p, no extension, check if we should always cleanse and
+		 * treat like %pK.
+		 */
+		if (!kptr_restrict_always_cleanse())
+			break;
+	case 'K':
+		/* always check whether or not to cleanse kernel addresses */
+		ptr = cleanse_pointer(ptr, spec, buf, end, default_width);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {