Message ID | 20170421220939.GA65363@beast (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > This patch ports the x86-specific atomic overflow handling from PaX's > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > from PaX that eliminates the saturation race condition by resetting the > atomic counter back to the INT_MAX saturation value on both overflow and > underflow. To win a race, a system would have to have INT_MAX threads > simultaneously overflow before the saturation handler runs. And is this impossible? Highly unlikely I'll grant you, but absolutely impossible? Also, you forgot nr_cpus in your bound. Afaict the worst case here is O(nr_tasks + 3*nr_cpus). Because PaX does it, is not a correctness argument. And this really wants one.
On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. From include/linux/threads.h: /* * A maximum of 4 million PIDs should be enough for a while. * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] */ #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) AFAICS that means you can only have up to 2^22 running tasks at a time, since every running task requires a PID in the init pid namespace.
On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote: > On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> This patch ports the x86-specific atomic overflow handling from PaX's > >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> from PaX that eliminates the saturation race condition by resetting the > >> atomic counter back to the INT_MAX saturation value on both overflow and > >> underflow. To win a race, a system would have to have INT_MAX threads > >> simultaneously overflow before the saturation handler runs. > > > > And is this impossible? Highly unlikely I'll grant you, but absolutely > > impossible? > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > From include/linux/threads.h: > > /* > * A maximum of 4 million PIDs should be enough for a while. > * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] > */ > #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ > (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) > > AFAICS that means you can only have up to 2^22 running tasks at > a time, since every running task requires a PID in the init pid namespace. That's but an artificial limit and bound to be increased sooner rather than later, given the trend in building ever larger machines. If we go with the futex limit on the full tid space, we end up with 2^30 (the comment here about 29 is wrong, see FUTEX_TID_MASK). We then still have to add a multiple of nr_cpus. Granted, that will still be slightly short of 3^31, but not to any amount I'm really comfortable with, we're _really_ close. Point remains though; Changelog (and arguably the code itself) should very much include such bits.
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) > +{ > + const int a = 1; > + const int u = 0; > + int c, old; > + > + c = atomic_read(&(r->refs)); > + for (;;) { > + if (unlikely(c == (u))) > + break; > + old = atomic_cmpxchg(&(r->refs), c, c + (a)); Please use atomic_try_cmpxchg(), that generates saner code. > + if (likely(old == c)) > + break; > + c = old; > + } > + return c != u; > +}
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c > index e3f4cd8876b5..1bdafb29b802 100644 > --- a/drivers/misc/lkdtm_bugs.c > +++ b/drivers/misc/lkdtm_bugs.c > @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) > schedule(); > } > > +#ifdef CONFIG_FAST_REFCOUNT > +#define REFCOUNT_MAX INT_MAX > +#else > +#define REFCOUNT_MAX UINT_MAX > +#endif That doesn't seem like a sensible place for this.
On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > > This patch ports the x86-specific atomic overflow handling from PaX's > > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > > from PaX that eliminates the saturation race condition by resetting the > > atomic counter back to the INT_MAX saturation value on both overflow and > > underflow. To win a race, a system would have to have INT_MAX threads > > simultaneously overflow before the saturation handler runs. note that the above is wrong (and even contradicting itself and the code). > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? here's my analysis from a while ago: http://www.openwall.com/lists/kernel-hardening/2017/01/05/19 > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). what does nr_cpus have to do with winning the race? > Because PaX does it, is not a correctness argument. And this really > wants one. heh, do you want to tell me about how checking for a 0 refcount prevents exploiting a bug?
On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > what does nr_cpus have to do with winning the race? The CPUs could each run nested softirq/hardirq/nmi context poking at the refcount, irrespective of the (preempted) task context. > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > heh, do you want to tell me about how checking for a 0 refcount prevents > exploiting a bug? Not the point. All I said was that saying somebody else does it (anybody else, doesn't matter it was you) isn't an argument for correctness.
On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > O(nr_tasks + 3*nr_cpus). > > > > what does nr_cpus have to do with winning the race? > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > refcount, irrespective of the (preempted) task context. that's fine but are you also assuming that the code executed in each of those contexts leaks the same refcount? otherwise whatever they do to the refcount is no more relevant than a non-leaking preemptible path that runs to completion in a bounded amount of time (i.e., you get temporary bumps and thus need to win yet another set of races to get their effects at once). > > > Because PaX does it, is not a correctness argument. And this really > > > wants one. > > > > heh, do you want to tell me about how checking for a 0 refcount prevents > > exploiting a bug? > > Not the point. All I said was that saying somebody else does it (anybody > else, doesn't matter it was you) isn't an argument for correctness. that was exactly my point: all this applies to you as well. so let me ask the 3rd time: what is your "argument for correctness" for a 0 refcount value check? how does it prevent exploitation?
On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > O(nr_tasks + 3*nr_cpus). > > > > > > what does nr_cpus have to do with winning the race? > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > refcount, irrespective of the (preempted) task context. > > that's fine but are you also assuming that the code executed in each of > those contexts leaks the same refcount? otherwise whatever they do to the > refcount is no more relevant than a non-leaking preemptible path that runs > to completion in a bounded amount of time (i.e., you get temporary bumps > and thus need to win yet another set of races to get their effects at once). For worst case analysis we have to assume it does, unless we can proof it doesn't. And that proof is very very hard, and would need to be redone every time the kernel changes. > that was exactly my point: all this applies to you as well. so let me ask > the 3rd time: what is your "argument for correctness" for a 0 refcount > value check? how does it prevent exploitation? What 0 count check are you talking about, the one that triggers when we want to increment 0 ? I think I've explained that before; per reference count rules 0 means freed (or about to be freed when we talk RCU). The whole pattern: if (dec_and_test(&obj->ref)) kfree(obj); expresses this etc.. Other reference counts also do this. No references means its getting freed. Can you agree with this? If so; any attempt to increase the reference count while its (being) freed() is a use-after-free. Therefore we disallow 0 increment. Yes, this is an annoyance when you consider usage-counts, where 0 means something else. But then, we were talking about reference counts, not something else.
On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > > O(nr_tasks + 3*nr_cpus). > > > > > > > > what does nr_cpus have to do with winning the race? > > > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > > refcount, irrespective of the (preempted) task context. > > > > that's fine but are you also assuming that the code executed in each of > > those contexts leaks the same refcount? otherwise whatever they do to the > > refcount is no more relevant than a non-leaking preemptible path that runs > > to completion in a bounded amount of time (i.e., you get temporary bumps > > and thus need to win yet another set of races to get their effects at once). > > For worst case analysis we have to assume it does, unless we can proof > it doesn't. And that proof is very very hard, and would need to be > redone every time the kernel changes. for worst case analysis you need to show the existence of an amd64 system that can spawn 2G tasks. then you'll have to show the feasibility of making all of them get preempted (without a later reschedule) inside a 2 insn window. > > that was exactly my point: all this applies to you as well. so let me ask > > the 3rd time: what is your "argument for correctness" for a 0 refcount > > value check? how does it prevent exploitation? > > What 0 count check are you talking about, the one that triggers when we > want to increment 0 ? are there any other 0 checks in there? > I think I've explained that before; per reference count rules 0 means > freed (or about to be freed when we talk RCU). you only said the same thing, what 0 means. you (still) didn't explain how checking for it prevents exploitation. > The whole pattern: > > if (dec_and_test(&obj->ref)) > kfree(obj); > > expresses this etc.. Other reference counts also do this. No references > means its getting freed. > > Can you agree with this? sure, so far so good. > If so; any attempt to increase the reference count while its (being) > freed() is a use-after-free. why would ever be there such an attempt? a freed object with intact memory content is as useful for an attacker as a live one, that is, not at all.
On Mon, Apr 24, 2017 at 1:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? I'll adjust the language. "Highly unlikely" is still better than "trivially doable with a single thread". :) > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. Sure, I didn't mean to imply anything other than a demonstration of what PaX is doing (and that it's better than not having it). If we can improve it, that's great. -Kees
On Mon, Apr 24, 2017 at 3:45 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) >> +{ >> + const int a = 1; >> + const int u = 0; >> + int c, old; >> + >> + c = atomic_read(&(r->refs)); >> + for (;;) { >> + if (unlikely(c == (u))) >> + break; >> + old = atomic_cmpxchg(&(r->refs), c, c + (a)); > > Please use atomic_try_cmpxchg(), that generates saner code. Ah-ha, thanks. I actually copied this directly out of the existing atomic_t function, so we should probably update it there too. -Kees > >> + if (likely(old == c)) >> + break; >> + c = old; >> + } >> + return c != u; >> +}
On Mon, Apr 24, 2017 at 3:48 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c >> index e3f4cd8876b5..1bdafb29b802 100644 >> --- a/drivers/misc/lkdtm_bugs.c >> +++ b/drivers/misc/lkdtm_bugs.c >> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) >> schedule(); >> } >> >> +#ifdef CONFIG_FAST_REFCOUNT >> +#define REFCOUNT_MAX INT_MAX >> +#else >> +#define REFCOUNT_MAX UINT_MAX >> +#endif > > That doesn't seem like a sensible place for this. I'll drop the LKDTM changes from this particular patch. As for the define, I think it's only interesting to LKDTM since it's the only part interested in refcount_t internals. (i.e. nothing else would (or should) use this information.) -Kees
On Mon, Apr 24, 2017 at 4:00 AM, PaX Team <pageexec@freemail.hu> wrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> > This patch ports the x86-specific atomic overflow handling from PaX's >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> > from PaX that eliminates the saturation race condition by resetting the >> > atomic counter back to the INT_MAX saturation value on both overflow and >> > underflow. To win a race, a system would have to have INT_MAX threads >> > simultaneously overflow before the saturation handler runs. > > note that the above is wrong (and even contradicting itself and the code). True, this changelog could be more accurate (it resets to INT_MAX on overflow and INT_MIN on underflow). I think I'm right in saying that a system would need INT_MAX threads running a refcount_inc() (and a refcount_dec_and_test() at exactly the right moment) before the reset handler got scheduled, though, yes? I'll attempt to clarify this. -Kees
On Mon, Apr 24, 2017 at 8:15 AM, PaX Team <pageexec@freemail.hu> wrote: > On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: >> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: >> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: >> > that was exactly my point: all this applies to you as well. so let me ask >> > the 3rd time: what is your "argument for correctness" for a 0 refcount >> > value check? how does it prevent exploitation? >> >> I think I've explained that before; per reference count rules 0 means >> freed (or about to be freed when we talk RCU). > > you only said the same thing, what 0 means. you (still) didn't explain how > checking for it prevents exploitation. > >> The whole pattern: >> >> if (dec_and_test(&obj->ref)) >> kfree(obj); >> >> expresses this etc.. Other reference counts also do this. No references >> means its getting freed. >> >> Can you agree with this? > > sure, so far so good. > >> If so; any attempt to increase the reference count while its (being) >> freed() is a use-after-free. > > why would ever be there such an attempt? a freed object with intact memory > content is as useful for an attacker as a live one, that is, not at all. I think we're way off in the weeds here. The "cannot inc from 0" check is about general sanity checks on refcounts. It should never happen, and if it does, there's a bug. However, what the refcount hardening protection is trying to do is protect again the exploitable condition: overflow. Inc-from-0 isn't an exploitable condition since in theory the memory suddenly becomes correctly managed again. We're just discussing different things. The point is to have very fast refcount_t that protects against overflow so the mm, net, and block subsystems aren't worried about making the atomic_t -> refcount_t changes there. -Kees
On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > I think we're way off in the weeds here. The "cannot inc from 0" check > is about general sanity checks on refcounts. I disagree, although sanity check are good too. > It should never happen, and if it does, there's a bug. The very same is true of the overflow thing. > However, what the refcount hardening protection is trying to do is > protect again the exploitable condition: overflow. Sure.. > Inc-from-0 isn't an exploitable condition since in theory > the memory suddenly becomes correctly managed again. It does not. It just got free'ed. Nothing will stop the free from happening (or already having happened). > We're just discussing different things. No, both are bugs, neither overflow not inc-from-zero _should_ happen. The whole point is that code is buggy. If it weren't for that, we'd not be doing this. How is the below not useful fodder for an exploit? It might be a less common bug, and perhaps a bit more fiddly to make work, but afaict its still a full use-after-free and therefore useful. --- Thread-A Thread-B if(dec_and_test(&obj->ref)) { // true, ref==0 inc(&obj->ref) // ref: 0->1 kfree(obj); } ~~~/ Thread-C /~~~ obj = kmalloc(); // is obj from Thread-A obj->ref = 1; obj->func = .... obj->func(); -- OR -- put(obj); if (dec_and_test(&obj->ref)) kfree(obj); // which was of Thread-C
On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: >> I think we're way off in the weeds here. The "cannot inc from 0" check >> is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. > >> It should never happen, and if it does, there's a bug. > > The very same is true of the overflow thing. > >> However, what the refcount hardening protection is trying to do is >> protect again the exploitable condition: overflow. > > Sure.. > >> Inc-from-0 isn't an exploitable condition since in theory >> the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). Well, yes, but that's kind of my point. Detecting inc-from-0 is "too late" to offer a protection. It offers notification of a bug, rather than stopping an exploit from happening. >> We're just discussing different things. > > No, both are bugs, neither overflow not inc-from-zero _should_ happen. > The whole point is that code is buggy. If it weren't for that, we'd not > be doing this. > > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-A Thread-B > > if(dec_and_test(&obj->ref)) { // true, ref==0 > > inc(&obj->ref) // ref: 0->1 > > kfree(obj); > } > > > > ~~~/ Thread-C /~~~ > > obj = kmalloc(); // is obj from Thread-A > > obj->ref = 1; > obj->func = .... > > > obj->func(); > > -- OR -- > > put(obj); > if (dec_and_test(&obj->ref)) > kfree(obj); // which was of Thread-C Right, both are bugs, but the exploitable condition is "refcount hit zero and got freed while there were still things using it". Having too many put()s that cause the refcount to reach zero early can't be detected, but having too many get()s that wrap around, ultimately to zero, can be (the overflow condition). With overflow protection, the refcount can never (realistically) hit zero since the refcount will get saturated at INT_MAX, so no exploit is possible -- no memory is ever freed (it just leaks the allocation). The inc-from-zero case performing a notification is certainly nice, but the exploit already happened. I want the overflow protection to work everywhere the kernel uses refcounts, which means dealing with performance concerns from mm, net, block, etc. Since this is a 1 instruction change (which branch prediction should make almost no cost at all), this will easily address any performance concerns from the other subsystems. I'd rather have the overflow protection everywhere than only in some areas, and I want to break the deadlock of this catch-22 of subsystems not being able to say what benchmarks are important to them but refusing to switch to refcount_t due to performance concerns. -Kees
On Mon, 2017-04-24 at 15:37 -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra <peterz@infradead.org > > wrote: > > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > > I think we're way off in the weeds here. The "cannot inc from 0" > > > check > > > is about general sanity checks on refcounts. > > > > I disagree, although sanity check are good too. > > > > > It should never happen, and if it does, there's a bug. > > > > The very same is true of the overflow thing. > > > > > However, what the refcount hardening protection is trying to do > > > is > > > protect again the exploitable condition: overflow. > > > > Sure.. > > > > > Inc-from-0 isn't an exploitable condition since in theory > > > the memory suddenly becomes correctly managed again. > > > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. inc-from-0 could allow the attacker to gain access to an object which gets allocated to a new user afterwards. Certainly much less useful as an exploit, but still a potential privilege escalation.
On Mon, Apr 24, 2017 at 03:37:32PM -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. Well, your setup (panic_on_warn et al) would have it panic the box. That will effectively stop the exploit by virtue of stopping everything. And warn/bug/panic etc.. are I think a better option that silently letting it happen.
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline void refcount_inc(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "incl %0\n\t" > + REFCOUNT_CHECK_OVERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > + > +static __always_inline void refcount_dec(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "decl %0\n\t" > + REFCOUNT_CHECK_UNDERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) > +{ > + const char *str = NULL; > + > + BUG_ON(!(regs->flags & X86_EFLAGS_OF)); > + > +#define range_check(size, direction, type, value) \ > + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \ > + regs->ip < (unsigned long)__##size##_##direction##_end) { \ > + *(type *)regs->cx = value; \ > + str = #size " " #direction; \ > + } > + > + range_check(refcount, overflow, int, INT_MAX) > + range_check(refcount, underflow, int, INT_MIN) > + > +#undef range_check > + > + BUG_ON(!str); > + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR, > + SIGILL); > +} > +#endif So what avoids this: CPU0 CPU1 lock inc %[val]; # 0x7fffffff jo 2f 1: ... lock dec %[val]; # 0x80000000 jo 2f 1: ... 2: mov $0x7fffffff, %[val] jmp 1b 2: mov $0x80000000, %[val] jmp 1b ~~~~//~~~~ lock inc %val; #0x80000000 .... lock inc %val; 0xffffffff lock inc %val; 0x00000000
On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > I think we're way off in the weeds here. The "cannot inc from 0" check > > is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. exactly, an attacker doesn't care how a premature free occurs due to reaching a 0 refcount, afterwards it's memory corruption time for both old and new references regardless. > > However, what the refcount hardening protection is trying to do is > > protect again the exploitable condition: overflow. > > Sure.. underflow is also exploitable, it's just much harder to defend against (there're no known practical solutions). > > Inc-from-0 isn't an exploitable condition since in theory > > the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). now hold this thought... > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-A Thread-B > > if(dec_and_test(&obj->ref)) { // true, ref==0 > > inc(&obj->ref) // ref: 0->1 > > kfree(obj); > } ... and tell me why an attacker would let Thread-B do that increment (that you're trying to detect) *before* the underlying memory gets reused and thus the 0 changed to something else? hint: he'll do everything in his power to prevent that, either by winning the race or if there's no race (no refcount users outside his control), he'll win every time. IOW, checking for 0 is pointless and you kinda proved it yourself now.
On 24 Apr 2017 at 13:33, Kees Cook wrote: > On Mon, Apr 24, 2017 at 4:00 AM, PaX Team <pageexec@freemail.hu> wrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> > This patch ports the x86-specific atomic overflow handling from PaX's > >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> > from PaX that eliminates the saturation race condition by resetting the > >> > atomic counter back to the INT_MAX saturation value on both overflow and > >> > underflow. To win a race, a system would have to have INT_MAX threads > >> > simultaneously overflow before the saturation handler runs. > > > > note that the above is wrong (and even contradicting itself and the code). > > True, this changelog could be more accurate (it resets to INT_MAX on > overflow and INT_MIN on underflow). I think I'm right in saying that a > system would need INT_MAX threads running a refcount_inc() (and a > refcount_dec_and_test() at exactly the right moment) before the reset > handler got scheduled, though, yes? there's no uniform answer to this as there're several conditions that can affect the effectiveness of the refcount protection. e.g., how many independent leaking paths can the attacker exercise (typically one), are these paths under some kind of locks (would already prevent unbounded leaks/increments should the overflow detecting thread be preempted), are negative refcounts allowed and checked for or only signed overflow, etc. INT_MAX threads would be needed when the leaking path is locked so that it can only be exercised once and you'll need to get normal (balanced) paths preempted just after the increment. if the leaking path is lockless (can be exercised in parallel without bounds) then 2 threads are enough where the one triggering the signed overflow would have to be preempted while the other one does INT_MAX increments and trigger the UAF. this is where the other mechanisms i talked about in the past become relevant: preemption or interrupts can be disabled or negative refcount values can be detected and acted upon (your blind copy-pasting effort passed upon this latter opportunity by not specializing the 'jo' into 'js' for the refcount case).
On 25 Apr 2017 at 12:23, Peter Zijlstra wrote: > So what avoids this: simple, you noted it yourself in your previous mail: > Well, your setup (panic_on_warn et al) would have it panic the box. That > will effectively stop the exploit by virtue of stopping everything. with that in mind the actual code looks like this: > CPU0 CPU1 > > > lock inc %[val]; # 0x7fffffff > jo 2f >1: ... > > lock dec %[val]; # 0x80000000 > jo 2f > 1: ... > > > > >2: mov $0x7fffffff, %[val] panic() > jmp 1b > > 2: mov $0x80000000, %[val] panic() > jmp 1b > ... and we never get this far.
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team <pageexec@freemail.hu> wrote: > On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: >> How is the below not useful fodder for an exploit? It might be a less >> common bug, and perhaps a bit more fiddly to make work, but afaict its >> still a full use-after-free and therefore useful. >> >> --- >> >> Thread-A Thread-B >> >> if(dec_and_test(&obj->ref)) { // true, ref==0 >> >> inc(&obj->ref) // ref: 0->1 >> >> kfree(obj); >> } > > ... and tell me why an attacker would let Thread-B do that increment > (that you're trying to detect) *before* the underlying memory gets > reused and thus the 0 changed to something else? hint: he'll do everything > in his power to prevent that, either by winning the race or if there's > no race (no refcount users outside his control), he'll win every time. > IOW, checking for 0 is pointless and you kinda proved it yourself now. Right, having a deterministic protection (checking for overflow) is best since it stops all exploits using that path. Hoping that an attacker is unlucky and hits a notification after they've already landed their corruption is not a very useful defense. It certainly has a non-zero value, but stopping overflow 100% is better. Especially when we can do it with no meaningful change in performance which lets us actually do the atomic_t -> refcount_t conversion everywhere. -Kees
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team <pageexec@freemail.hu> wrote: > INT_MAX threads would be needed when the leaking path is locked so > that it can only be exercised once and you'll need to get normal > (balanced) paths preempted just after the increment. if the leaking > path is lockless (can be exercised in parallel without bounds) then > 2 threads are enough where the one triggering the signed overflow > would have to be preempted while the other one does INT_MAX increments > and trigger the UAF. this is where the other mechanisms i talked about > in the past become relevant: preemption or interrupts can be disabled > or negative refcount values can be detected and acted upon (your blind > copy-pasting effort passed upon this latter opportunity by not > specializing the 'jo' into 'js' for the refcount case). Well, it's not "blind" -- I'm trying to bring the code as-is to upstream for discussion/examination with as little functional differences as possible so it's easier to compare apples to apples. (Which already resulted in more eyes looking at the code to find a bug -- thanks Jann!) But yes, jo -> js hugely increases the coverage. I'll make that change for v2. Thanks! -Kees
On 25 Apr 2017 at 9:39, Kees Cook wrote: > On Tue, Apr 25, 2017 at 4:26 AM, PaX Team <pageexec@freemail.hu> wrote: > > INT_MAX threads would be needed when the leaking path is locked so > > that it can only be exercised once and you'll need to get normal > > (balanced) paths preempted just after the increment. if the leaking > > path is lockless (can be exercised in parallel without bounds) then > > 2 threads are enough where the one triggering the signed overflow > > would have to be preempted while the other one does INT_MAX increments > > and trigger the UAF. this is where the other mechanisms i talked about > > in the past become relevant: preemption or interrupts can be disabled > > or negative refcount values can be detected and acted upon (your blind > > copy-pasting effort passed upon this latter opportunity by not > > specializing the 'jo' into 'js' for the refcount case). > > Well, it's not "blind" -- I'm trying to bring the code as-is to > upstream for discussion/examination with as little functional > differences as possible so it's easier to compare apples to apples. you copied code from a version which is at least 2 major kernel revisions behind (so much for those apples), you chose the one version which had a bug that you didn't spot nor fix properly, you didn't realize the opportunity that a special refcount type represents, you claimed refcount underflows aren't exploitable but copied code that would detect signed underflow, you didn't understand the limits and edge cases i explained above... need i go on? doesn't leave one with great confidence in your ability to understand and maintain this code...
On Tue, Apr 25, 2017 at 7:14 PM, PaX Team <pageexec@freemail.hu> wrote: > On 25 Apr 2017 at 9:39, Kees Cook wrote: > >> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team <pageexec@freemail.hu> wrote: >> > INT_MAX threads would be needed when the leaking path is locked so >> > that it can only be exercised once and you'll need to get normal >> > (balanced) paths preempted just after the increment. if the leaking >> > path is lockless (can be exercised in parallel without bounds) then >> > 2 threads are enough where the one triggering the signed overflow >> > would have to be preempted while the other one does INT_MAX increments >> > and trigger the UAF. this is where the other mechanisms i talked about >> > in the past become relevant: preemption or interrupts can be disabled >> > or negative refcount values can be detected and acted upon (your blind >> > copy-pasting effort passed upon this latter opportunity by not >> > specializing the 'jo' into 'js' for the refcount case). >> >> Well, it's not "blind" -- I'm trying to bring the code as-is to >> upstream for discussion/examination with as little functional >> differences as possible so it's easier to compare apples to apples. > > you copied code from a version which is at least 2 major kernel revisions > behind (so much for those apples) Hmm, this was from your 4.9 port. Linus hasn't quite released 4.11 yet, so that's actually "at most 2 major kernel revisions behind". :) Regardless, I'd be happy to refresh the port. Will you share a URL to your latest rebase against upstream? > you chose the one version which had a > bug that you didn't spot nor fix properly, you didn't realize the opportunity > that a special refcount type represents, you claimed refcount underflows > aren't exploitable but copied code that would detect signed underflow, you > didn't understand the limits and edge cases i explained above... need i go As I said, I was trying to minimize changes to your implementation, which included the bug and the other issues. The point of this was to share it with others, and work collaboratively on it. I think this clearly succeeded with benefits to both upstream and PaX: Jann spotted the fix for the bug causing weird crashes I saw when doing initial testing, you pointed out the benefit of using js over jo, I've reorganized the RMWcc macros for more easily adding trailing instructions, Peter is thinking about ways around the protection, etc. > on? doesn't leave one with great confidence in your ability to understand > and maintain this code... Well, that's your opinion. I think the patch and its discussion helped several people, including myself, understand this code. Since many people will share its maintenance, I think this is the right way to handle upstreaming these kinds of things. I don't claim to be omniscient, just persistent. Getting this protection into upstream means every Linux user will benefit from what you created, which I think is awesome; this whole class of refcount flaws goes away. Thank you for writing it, sharing it, and discussing it! -Kees
diff --git a/arch/Kconfig b/arch/Kconfig index cd211a14a88f..2cd150f03175 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -847,4 +847,23 @@ config STRICT_MODULE_RWX config ARCH_WANT_RELAX_ORDER bool +config ARCH_HAS_FAST_REFCOUNT + bool + help + An architecture selects this when it has implemented refcount_t + using primitizes that provide a faster runtime at the expense + of some refcount state checks. The refcount overflow condition, + however, must be retained. Catching overflows is the primary + security concern for protecting against bugs in reference counts. + +config FAST_REFCOUNT + bool "Speed up reference counting at the expense of full validation" + depends on ARCH_HAS_FAST_REFCOUNT + help + The regular reference counting infrastructure in the kernel checks + many error conditions. If this option is selected, refcounting + is made faster using architecture-specific implementions that may + only check for reference count overflows (which is the primary + way reference counting bugs are turned into security exploits). + source "kernel/gcov/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc98d5a294ee..a13db97e0d71 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -50,6 +50,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER + select ARCH_HAS_FAST_REFCOUNT select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 57f7ec35216e..9e8d9e2d70bf 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -792,6 +792,15 @@ ENTRY(spurious_interrupt_bug) jmp common_exception END(spurious_interrupt_bug) +#ifdef CONFIG_FAST_REFCOUNT +ENTRY(refcount_error) + ASM_CLAC + pushl $0 + pushl $do_refcount_error + jmp error_code +ENDPROC(refcount_error) +#endif + #ifdef CONFIG_XEN ENTRY(xen_hypervisor_callback) pushl $-1 /* orig_ax = -1 => not a system call */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 044d18ebc43c..a736b882ec76 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -858,6 +858,9 @@ idtentry coprocessor_error do_coprocessor_error has_error_code=0 idtentry alignment_check do_alignment_check has_error_code=1 idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0 +#ifdef CONFIG_FAST_REFCOUNT +idtentry refcount_error do_refcount_error has_error_code=0 +#endif /* * Reload gs selector with exception handling diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 6ca9fd6234e1..64ca4dcc29ec 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -48,6 +48,9 @@ #define IA32_SYSCALL_VECTOR 0x80 +/* Refcount Overflow or Underflow Exception. */ +#define X86_REFCOUNT_VECTOR 0x81 + /* * Vectors 0x30-0x3f are used for ISA interrupts. * round up to the next 16-vector boundary diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h new file mode 100644 index 000000000000..79e35981e42f --- /dev/null +++ b/arch/x86/include/asm/refcount.h @@ -0,0 +1,123 @@ +#ifndef __ASM_X86_REFCOUNT_H +#define __ASM_X86_REFCOUNT_H +/* + * x86-specific implementation of refcount_t. Ported from PAX_REFCOUNT in + * PaX/grsecurity. + */ +#include <linux/refcount.h> +#include <asm/irq_vectors.h> + +#define __REFCOUNT_CHECK(size) \ + "jo 111f\n" \ + ".if "__stringify(size)" == 4\n\t" \ + ".pushsection .text.refcount_overflow\n" \ + ".elseif "__stringify(size)" == -4\n\t" \ + ".pushsection .text.refcount_underflow\n" \ + ".else\n" \ + ".error \"invalid size\"\n" \ + ".endif\n" \ + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ + "222:\n\t" \ + ".popsection\n" \ + "333:\n" \ + _ASM_EXTABLE(222b, 333b) + +#define REFCOUNT_CHECK_OVERFLOW(size) __REFCOUNT_CHECK(size) +#define REFCOUNT_CHECK_UNDERFLOW(size) __REFCOUNT_CHECK(-(size)) + +#if !defined(__GCC_ASM_FLAG_OUTPUTS__) && defined(CC_HAVE_ASM_GOTO) +/* Use asm goto */ +#define __GEN_CHECKED_RMWcc(fullop, var, size, cc, ...) \ +do { \ + asm_volatile_goto(fullop \ + "\n\t"__REFCOUNT_CHECK(size) \ + ";j" #cc " %l[cc_label]" \ + : : [counter] "m" (var), ## __VA_ARGS__ \ + : "memory", "cc", "cx" : cc_label); \ + return 0; \ +cc_label: \ + return 1; \ +} while (0) + +#define GEN_BINARY_CHECKED_RMWcc(op, var, size, vcon, val, arg0, cc) \ + __GEN_CHECKED_RMWcc(op " %1, " arg0, var, size, cc, vcon (val)) + +#else /* defined(__GCC_ASM_FLAG_OUTPUTS__) || !defined(CC_HAVE_ASM_GOTO) */ + +#define __GEN_CHECKED_RMWcc(fullop, var, size, cc, ...) \ +do { \ + bool c; \ + asm volatile (fullop \ + "\n\t"__REFCOUNT_CHECK(size) \ + ";" CC_SET(cc) \ + : [counter] "+m" (var), CC_OUT(cc) (c) \ + : __VA_ARGS__ : "memory", "cc", "cx"); \ + return c != 0; \ +} while (0) + +#define GEN_BINARY_CHECKED_RMWcc(op, var, size, vcon, val, arg0, cc) \ + __GEN_CHECKED_RMWcc(op " %2, " arg0, var, size, cc, vcon (val)) + +#endif /* defined(__GCC_ASM_FLAG_OUTPUTS__) || !defined(CC_HAVE_ASM_GOTO) */ + +#define GEN_UNARY_CHECKED_RMWcc(op, var, size, arg0, cc) \ + __GEN_CHECKED_RMWcc(op " " arg0, var, size, cc) + +static __always_inline void refcount_add(unsigned int i, refcount_t *r) +{ + asm volatile(LOCK_PREFIX "addl %1,%0\n\t" + REFCOUNT_CHECK_OVERFLOW(4) + : [counter] "+m" (r->refs.counter) + : "ir" (i) + : "cc", "cx"); +} + +static __always_inline void refcount_inc(refcount_t *r) +{ + asm volatile(LOCK_PREFIX "incl %0\n\t" + REFCOUNT_CHECK_OVERFLOW(4) + : [counter] "+m" (r->refs.counter) + : : "cc", "cx"); +} + +static __always_inline void refcount_dec(refcount_t *r) +{ + asm volatile(LOCK_PREFIX "decl %0\n\t" + REFCOUNT_CHECK_UNDERFLOW(4) + : [counter] "+m" (r->refs.counter) + : : "cc", "cx"); +} + +static __always_inline __must_check +bool refcount_sub_and_test(unsigned int i, refcount_t *r) +{ + GEN_BINARY_CHECKED_RMWcc(LOCK_PREFIX "subl", r->refs.counter, + -4, "er", i, "%0", e); +} + +static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r) +{ + GEN_UNARY_CHECKED_RMWcc(LOCK_PREFIX "decl", r->refs.counter, + -4, "%0", e); +} + +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) +{ + const int a = 1; + const int u = 0; + int c, old; + + c = atomic_read(&(r->refs)); + for (;;) { + if (unlikely(c == (u))) + break; + old = atomic_cmpxchg(&(r->refs), c, c + (a)); + if (likely(old == c)) + break; + c = old; + } + return c != u; +} + +#endif diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 01fd0a7f48cd..e4d8db75d85e 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -38,6 +38,10 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_FAST_REFCOUNT +asmlinkage void refcount_error(void); +#endif + #ifdef CONFIG_TRACING asmlinkage void trace_page_fault(void); #define trace_stack_segment stack_segment @@ -54,6 +58,7 @@ asmlinkage void trace_page_fault(void); #define trace_alignment_check alignment_check #define trace_simd_coprocessor_error simd_coprocessor_error #define trace_async_page_fault async_page_fault +#define trace_refcount_error refcount_error #endif dotraplinkage void do_divide_error(struct pt_regs *, long); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4e496379a871..999d324119c0 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -192,6 +192,13 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str, tsk->thread.trap_nr = trapnr; die(str, regs, error_code); } + +#ifdef CONFIG_FAST_REFCOUNT + if (trapnr == X86_REFCOUNT_VECTOR) { + regs->ip -= 2; /* sizeof(int $xx) */ + refcount_error_report(regs, str); + } +#endif return 0; } @@ -308,6 +315,32 @@ __visible void __noreturn handle_stack_overflow(const char *message, } #endif +#ifdef CONFIG_FAST_REFCOUNT + +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) +{ + const char *str = NULL; + + BUG_ON(!(regs->flags & X86_EFLAGS_OF)); + +#define range_check(size, direction, type, value) \ + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \ + regs->ip < (unsigned long)__##size##_##direction##_end) { \ + *(type *)regs->cx = value; \ + str = #size " " #direction; \ + } + + range_check(refcount, overflow, int, INT_MAX) + range_check(refcount, underflow, int, INT_MIN) + +#undef range_check + + BUG_ON(!str); + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR, + SIGILL); +} +#endif + #ifdef CONFIG_X86_64 /* Runs on IST stack */ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) @@ -983,6 +1016,11 @@ void __init trap_init(void) set_bit(IA32_SYSCALL_VECTOR, used_vectors); #endif +#ifdef CONFIG_FAST_REFCOUNT + set_intr_gate(X86_REFCOUNT_VECTOR, refcount_error); + set_bit(X86_REFCOUNT_VECTOR, used_vectors); +#endif + /* * Set the IDT descriptor to a fixed read-only location, so that the * "sidt" instruction will not leak the location of the kernel, and diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c index e3f4cd8876b5..1bdafb29b802 100644 --- a/drivers/misc/lkdtm_bugs.c +++ b/drivers/misc/lkdtm_bugs.c @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) schedule(); } +#ifdef CONFIG_FAST_REFCOUNT +#define REFCOUNT_MAX INT_MAX +#else +#define REFCOUNT_MAX UINT_MAX +#endif + void lkdtm_REFCOUNT_SATURATE_INC(void) { - refcount_t over = REFCOUNT_INIT(UINT_MAX - 1); + refcount_t over = REFCOUNT_INIT(REFCOUNT_MAX - 1); pr_info("attempting good refcount decrement\n"); refcount_dec(&over); @@ -146,7 +152,7 @@ void lkdtm_REFCOUNT_SATURATE_INC(void) pr_info("attempting bad refcount inc overflow\n"); refcount_inc(&over); refcount_inc(&over); - if (refcount_read(&over) == UINT_MAX) + if (refcount_read(&over) == REFCOUNT_MAX) pr_err("Correctly stayed saturated, but no BUG?!\n"); else pr_err("Fail: refcount wrapped\n"); @@ -154,7 +160,7 @@ void lkdtm_REFCOUNT_SATURATE_INC(void) void lkdtm_REFCOUNT_SATURATE_ADD(void) { - refcount_t over = REFCOUNT_INIT(UINT_MAX - 1); + refcount_t over = REFCOUNT_INIT(REFCOUNT_MAX - 1); pr_info("attempting good refcount decrement\n"); refcount_dec(&over); @@ -162,7 +168,7 @@ void lkdtm_REFCOUNT_SATURATE_ADD(void) pr_info("attempting bad refcount add overflow\n"); refcount_add(2, &over); - if (refcount_read(&over) == UINT_MAX) + if (refcount_read(&over) == REFCOUNT_MAX) pr_err("Correctly stayed saturated, but no BUG?!\n"); else pr_err("Fail: refcount wrapped\n"); @@ -178,6 +184,11 @@ void lkdtm_REFCOUNT_ZERO_DEC(void) pr_err("Stayed at zero, but no BUG?!\n"); else pr_err("Fail: refcount went crazy\n"); + + pr_info("attempting bad refcount decrement past INT_MIN\n"); + atomic_set(&zero.refs, INT_MIN); + refcount_dec(&zero); + pr_err("Fail: wrap not detected\n"); } void lkdtm_REFCOUNT_ZERO_SUB(void) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 532372c6cf15..0590f384f234 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -20,6 +20,8 @@ * may be out of this range on some architectures. * [_sinittext, _einittext]: contains .init.text.* sections * [__bss_start, __bss_stop]: contains BSS sections + * [__refcount_overflow/underflow_start, ..._end]: contains .text sections + * for refcount error handling. * * Following global variables are optional and may be unavailable on some * architectures and/or kernel configurations. @@ -39,6 +41,8 @@ extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[]; extern char __kprobes_text_start[], __kprobes_text_end[]; extern char __entry_text_start[], __entry_text_end[]; extern char __start_rodata[], __end_rodata[]; +extern char __refcount_overflow_start[], __refcount_overflow_end[]; +extern char __refcount_underflow_start[], __refcount_underflow_end[]; /* Start and end of .ctors section - used for constructor calls. */ extern char __ctors_start[], __ctors_end[]; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 143db9c523e2..a04aae39e820 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -448,9 +448,18 @@ ALIGN_FUNCTION(); \ *(.text.hot .text .text.fixup .text.unlikely) \ *(.ref.text) \ + REFCOUNT_TEXT \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ +#define __REFCOUNT_TEXT(section) \ + VMLINUX_SYMBOL(__##section##_start) = .; \ + *(.text.##section) \ + VMLINUX_SYMBOL(__##section##_end) = .; + +#define REFCOUNT_TEXT \ + __REFCOUNT_TEXT(refcount_overflow) \ + __REFCOUNT_TEXT(refcount_underflow) /* sched.text is aling to function alignment to secure we have same * address even at second ld pass when generating System.map */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 4c26dc3a8295..bc15822b24eb 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -275,6 +275,8 @@ extern int oops_may_print(void); void do_exit(long error_code) __noreturn; void complete_and_exit(struct completion *, long) __noreturn; +void refcount_error_report(struct pt_regs *regs, const char *kind); + /* Internal, do not use. */ int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res); int __must_check _kstrtol(const char *s, unsigned int base, long *res); diff --git a/include/linux/refcount.h b/include/linux/refcount.h index 0023fee4bbbc..fdb82bcaf975 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -22,6 +22,9 @@ static inline unsigned int refcount_read(const refcount_t *r) return atomic_read(&r->refs); } +#ifdef CONFIG_FAST_REFCOUNT +#include <asm/refcount.h> +#else extern __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r); extern void refcount_add(unsigned int i, refcount_t *r); @@ -33,6 +36,7 @@ extern void refcount_sub(unsigned int i, refcount_t *r); extern __must_check bool refcount_dec_and_test(refcount_t *r); extern void refcount_dec(refcount_t *r); +#endif extern __must_check bool refcount_dec_if_one(refcount_t *r); extern __must_check bool refcount_dec_not_one(refcount_t *r); diff --git a/kernel/panic.c b/kernel/panic.c index a58932b41700..a1745b60cc36 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -26,6 +26,7 @@ #include <linux/nmi.h> #include <linux/console.h> #include <linux/bug.h> +#include <linux/ratelimit.h> #define PANIC_TIMER_STEP 100 #define PANIC_BLINK_SPD 18 @@ -601,6 +602,28 @@ EXPORT_SYMBOL(__stack_chk_fail); #endif +#ifdef CONFIG_FAST_REFCOUNT +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3); + +void refcount_error_report(struct pt_regs *regs, const char *kind) +{ + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); + + if (!__ratelimit(&refcount_ratelimit)) + return; + + pr_emerg("%s detected in: %s:%d, uid/euid: %u/%u\n", + kind ? kind : "refcount error", + current->comm, task_pid_nr(current), + from_kuid_munged(&init_user_ns, current_uid()), + from_kuid_munged(&init_user_ns, current_euid())); + print_symbol(KERN_EMERG "refcount error occurred at: %s\n", + instruction_pointer(regs)); + BUG(); +} +EXPORT_SYMBOL(refcount_error_report); +#endif + core_param(panic, panic_timeout, int, 0644); core_param(pause_on_oops, pause_on_oops, int, 0644); core_param(panic_on_warn, panic_on_warn, int, 0644); diff --git a/lib/refcount.c b/lib/refcount.c index aa09ad3c30b0..903a59557893 100644 --- a/lib/refcount.c +++ b/lib/refcount.c @@ -37,6 +37,9 @@ #include <linux/refcount.h> #include <linux/bug.h> +/* Leave out architecture-specific implementations. */ +#ifndef CONFIG_FAST_REFCOUNT + bool refcount_add_not_zero(unsigned int i, refcount_t *r) { unsigned int old, new, val = atomic_read(&r->refs); @@ -168,6 +171,8 @@ void refcount_dec(refcount_t *r) } EXPORT_SYMBOL_GPL(refcount_dec); +#endif /* CONFIG_FAST_REFCOUNT */ + /* * No atomic_t counterpart, it attempts a 1 -> 0 transition and returns the * success thereof. @@ -264,4 +269,3 @@ bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock) return true; } EXPORT_SYMBOL_GPL(refcount_dec_and_lock); -
This patch ports the x86-specific atomic overflow handling from PaX's PAX_REFCOUNT to the upstream refcount_t API. This is an updated version from PaX that eliminates the saturation race condition by resetting the atomic counter back to the INT_MAX saturation value on both overflow and underflow. To win a race, a system would have to have INT_MAX threads simultaneously overflow before the saturation handler runs. With this, the commonly used inc/dec_and_test usage patterns present in performance-sensitive areas of the kernel (mm, net, block) will use the regular inline atomic operations with only a single overflow test instruction added to the fast path. Signed-off-by: Kees Cook <keescook@chromium.org> --- arch/Kconfig | 19 ++++++ arch/x86/Kconfig | 1 + arch/x86/entry/entry_32.S | 9 +++ arch/x86/entry/entry_64.S | 3 + arch/x86/include/asm/irq_vectors.h | 3 + arch/x86/include/asm/refcount.h | 123 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/traps.h | 5 ++ arch/x86/kernel/traps.c | 38 ++++++++++++ drivers/misc/lkdtm_bugs.c | 19 ++++-- include/asm-generic/sections.h | 4 ++ include/asm-generic/vmlinux.lds.h | 9 +++ include/linux/kernel.h | 2 + include/linux/refcount.h | 4 ++ kernel/panic.c | 23 +++++++ lib/refcount.c | 6 +- 15 files changed, 263 insertions(+), 5 deletions(-) create mode 100644 arch/x86/include/asm/refcount.h