Message ID | 1493160997-126108-3-git-send-email-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 26, 2017 at 12:56 AM, Kees Cook <keescook@chromium.org> wrote: > This protection is a modified version of the x86 PAX_REFCOUNT > implementation from PaX/grsecurity. This speeds up the refcount_t API by > duplicating the existing atomic_t implementation with a single instruction > added to detect if the refcount has wrapped past INT_MAX (or below 0) > resulting in a signed value. [...] > +static __always_inline void refcount_dec(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "decl %0\n\t" > + REFCOUNT_CHECK_UNDERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} What purpose do checks on decrement now have? The mitigation is only intended to deal with (positive) overflows, right? AFAICS if you hit this code, similar to the inc-from-0 case, you're already in a UAF situation?
On Tue, Apr 25, 2017 at 5:25 PM, Jann Horn <jannh@google.com> wrote: > On Wed, Apr 26, 2017 at 12:56 AM, Kees Cook <keescook@chromium.org> wrote: >> This protection is a modified version of the x86 PAX_REFCOUNT >> implementation from PaX/grsecurity. This speeds up the refcount_t API by >> duplicating the existing atomic_t implementation with a single instruction >> added to detect if the refcount has wrapped past INT_MAX (or below 0) >> resulting in a signed value. > [...] >> +static __always_inline void refcount_dec(refcount_t *r) >> +{ >> + asm volatile(LOCK_PREFIX "decl %0\n\t" >> + REFCOUNT_CHECK_UNDERFLOW(4) >> + : [counter] "+m" (r->refs.counter) >> + : : "cc", "cx"); >> +} > > What purpose do checks on decrement now have? The mitigation is only > intended to deal with (positive) overflows, right? AFAICS if you hit this code, > similar to the inc-from-0 case, you're already in a UAF situation? Yeah, I think that's true, but as Peter has mentioned: it's better than not having it. The inc path can be deterministic, and the dec path can be lucky? :) -Kees
Hi Kees, [auto build test WARNING on next-20170424] [cannot apply to tip/x86/core linus/master linux/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Kees-Cook/x86-refcount-Implement-fast-refcount-overflow/20170426-210530 config: x86_64-allmodconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): >> drivers//scsi/scsi_scan.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> drivers/scsi/qla2xxx/qla_target.o: warning: objtool: .text.refcount_overflow+0x23: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> drivers/nvme/target/core.o: warning: objtool: .text.refcount_overflow+0x19: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> drivers/nvme/target/rdma.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> drivers/nvme/target/fcloop.o: warning: objtool: .text.refcount_overflow+0x6: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/bat_v.o: warning: objtool: .text.refcount_underflow+0xc: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/bat_v_elp.o: warning: objtool: .text.refcount_overflow+0xa: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/fragmentation.o: warning: objtool: .text.refcount_overflow+0xa: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/icmp_socket.o: warning: objtool: .text.refcount_overflow+0xb: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/main.o: warning: objtool: .text.refcount_overflow+0xb: special: can't find orig instruction -- /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >> net/batman-adv/multicast.o: warning: objtool: .text.refcount_overflow+0xf: special: can't find orig instruction .. --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Apr 26, 2017 at 6:31 PM, kbuild test robot <lkp@intel.com> wrote: > Hi Kees, > > [auto build test WARNING on next-20170424] > [cannot apply to tip/x86/core linus/master linux/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Kees-Cook/x86-refcount-Implement-fast-refcount-overflow/20170426-210530 > config: x86_64-allmodconfig (attached as .config) > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All warnings (new ones prefixed by >>): > >>> drivers//scsi/scsi_scan.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction Hi Josh, I'm seeing this error being generated on areas that are using a cross-section exception handler. I can't quite see why the .o checker is unhappy, so I figured I'd ask you first. :) The code is generated with calls to __REFCOUNT_CHECK() which is defined like this: +#define __REFCOUNT_EXCEPTION(size) \ + ".if "__stringify(size)" == 4\n\t" \ + ".pushsection .text.refcount_overflow\n" \ + ".elseif "__stringify(size)" == -4\n\t" \ + ".pushsection .text.refcount_underflow\n" \ + ".else\n" \ + ".error \"invalid size\"\n" \ + ".endif\n" \ + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ + "222:\n\t" \ + ".popsection\n" \ + "333:\n" \ + _ASM_EXTABLE(222b, 333b) + +#define __REFCOUNT_CHECK(size) \ + "js 111f\n" \ + __REFCOUNT_EXCEPTION(size) + +#define __REFCOUNT_ERROR(size) \ + "jmp 111f\n" \ + __REFCOUNT_EXCEPTION(size) I assume it doesn't like seeing an exception split across .text and .text.refcount_overflow, but I haven't been able to figure out how that distinction would be made by the checker. :P Thanks! -Kees > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> drivers/scsi/qla2xxx/qla_target.o: warning: objtool: .text.refcount_overflow+0x23: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> drivers/nvme/target/core.o: warning: objtool: .text.refcount_overflow+0x19: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> drivers/nvme/target/rdma.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> drivers/nvme/target/fcloop.o: warning: objtool: .text.refcount_overflow+0x6: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/bat_v.o: warning: objtool: .text.refcount_underflow+0xc: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/bat_v_elp.o: warning: objtool: .text.refcount_overflow+0xa: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/fragmentation.o: warning: objtool: .text.refcount_overflow+0xa: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/icmp_socket.o: warning: objtool: .text.refcount_overflow+0xb: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/main.o: warning: objtool: .text.refcount_overflow+0xb: special: can't find orig instruction > -- > /kbuild/src/consumer/include/linux/compiler.h:264:8: sparse: attribute 'no_sanitize_address': unknown attribute >>> net/batman-adv/multicast.o: warning: objtool: .text.refcount_overflow+0xf: special: can't find orig instruction > .. > > --- > 0-DAY kernel test infrastructure Open Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, Apr 27, 2017 at 01:22:05PM -0700, Kees Cook wrote: > On Wed, Apr 26, 2017 at 6:31 PM, kbuild test robot <lkp@intel.com> wrote: > > Hi Kees, > > > > [auto build test WARNING on next-20170424] > > [cannot apply to tip/x86/core linus/master linux/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8] > > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > > > url: https://github.com/0day-ci/linux/commits/Kees-Cook/x86-refcount-Implement-fast-refcount-overflow/20170426-210530 > > config: x86_64-allmodconfig (attached as .config) > > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > > reproduce: > > # save the attached .config to linux build tree > > make ARCH=x86_64 > > > > All warnings (new ones prefixed by >>): > > > >>> drivers//scsi/scsi_scan.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction > > Hi Josh, > > I'm seeing this error being generated on areas that are using a > cross-section exception handler. I can't quite see why the .o checker > is unhappy, so I figured I'd ask you first. :) > > The code is generated with calls to __REFCOUNT_CHECK() which is > defined like this: > > +#define __REFCOUNT_EXCEPTION(size) \ > + ".if "__stringify(size)" == 4\n\t" \ > + ".pushsection .text.refcount_overflow\n" \ > + ".elseif "__stringify(size)" == -4\n\t" \ > + ".pushsection .text.refcount_underflow\n" \ > + ".else\n" \ > + ".error \"invalid size\"\n" \ > + ".endif\n" \ > + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ > + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ > + "222:\n\t" \ > + ".popsection\n" \ > + "333:\n" \ > + _ASM_EXTABLE(222b, 333b) > + > +#define __REFCOUNT_CHECK(size) \ > + "js 111f\n" \ > + __REFCOUNT_EXCEPTION(size) > + > +#define __REFCOUNT_ERROR(size) \ > + "jmp 111f\n" \ > + __REFCOUNT_EXCEPTION(size) > > I assume it doesn't like seeing an exception split across .text and > .text.refcount_overflow, but I haven't been able to figure out how > that distinction would be made by the checker. :P This code uses the exception table a little differently than normal. Usually it's used for catching page faults, where the exception table points to the faulting instruction. But instead of a page fault, here it's doing a software interrupt. So the __ex_table entry doesn't point to the 'int 0x81' instruction, it points to the instruction immediately after it. In this case there isn't actually an instruction there, which is why objtool is complaining. Is it superfluous to use the exception table here, when a simple 'jmp 333f' could be used instead after the 'int'? Also it looks like the handler sends a SIGKILL to the current task. I wonder if something like BUG_ON() could be used instead of implementing a custom error interrupt.
> +#define __REFCOUNT_EXCEPTION(size) \ > + ".if "__stringify(size)" == 4\n\t" \ > + ".pushsection .text.refcount_overflow\n" \ > + ".elseif "__stringify(size)" == -4\n\t" \ > + ".pushsection .text.refcount_underflow\n" \ > + ".else\n" \ > + ".error \"invalid size\"\n" \ > + ".endif\n" \ > + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ > + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ > + "222:\n\t" \ > + ".popsection\n" \ > + "333:\n" \ > + _ASM_EXTABLE(222b, 333b) The 'size' argument doesn't seem to correspond to an actual size of anything. Its value '4' or '-4' only seems to indicate whether it's an overflow or an underflow. Also there's some inconsistent use of "\n\t" on some lines, with "\n" on others. > +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) > +{ > + const char *str = NULL; > + > + BUG_ON(!(regs->flags & X86_EFLAGS_SF)); > + > +#define range_check(size, dir, type, value) \ > + do { \ > + if ((unsigned long)__##size##_##dir##_start <= regs->ip && \ > + regs->ip < (unsigned long)__##size##_##dir##_end) { \ > + *(type *)regs->cx = (value); \ > + str = #size " " #dir; \ > + } \ > + } while (0) An interrupt was used, not a faulting exception, so regs->ip refers to the address *after* the 'int' instruction. So the beginning of the range should be exclusive, and the end of the range should be inclusive, like: > + if ((unsigned long)__##size##_##dir##_start < regs->ip && \ > + regs->ip <= (unsigned long)__##size##_##dir##_end) { \ > + > + /* > + * Reset to INT_MAX in both cases to attempt to let system > + * continue operating. > + */ > + range_check(refcount, overflow, int, INT_MAX); > + range_check(refcount, underflow, int, INT_MAX); I think "range_check" doesn't adequately describe the macro. In addition to checking, it has a subtle side effect: it updates the counter value with INT_MAX. It's not clear why the 'size' argument has its name. Also, three of the arguments are always called with the same value. Anyway I suspect the code would be more readable if it were open coded without the macro. > +#ifdef CONFIG_FAST_REFCOUNT > +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3); > + > +void refcount_error_report(struct pt_regs *regs, const char *kind) > +{ > + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); > + > + if (!__ratelimit(&refcount_ratelimit)) > + return; > + > + pr_emerg("%s detected in: %s:%d, uid/euid: %u/%u\n", > + kind ? kind : "refcount error", > + current->comm, task_pid_nr(current), > + from_kuid_munged(&init_user_ns, current_uid()), > + from_kuid_munged(&init_user_ns, current_euid())); > + print_symbol(KERN_EMERG "refcount error occurred at: %s\n", > + instruction_pointer(regs)); > + preempt_disable(); > + show_regs(regs); > + preempt_enable(); > +} Why is preemption disabled before calling show_regs()? > +EXPORT_SYMBOL(refcount_error_report); Why is this exported? It looks like it's only called internally from traps.c.
On Mon, May 1, 2017 at 8:54 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Thu, Apr 27, 2017 at 01:22:05PM -0700, Kees Cook wrote: >> On Wed, Apr 26, 2017 at 6:31 PM, kbuild test robot <lkp@intel.com> wrote: >> > Hi Kees, >> > >> > [auto build test WARNING on next-20170424] >> > [cannot apply to tip/x86/core linus/master linux/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8] >> > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] >> > >> > url: https://github.com/0day-ci/linux/commits/Kees-Cook/x86-refcount-Implement-fast-refcount-overflow/20170426-210530 >> > config: x86_64-allmodconfig (attached as .config) >> > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 >> > reproduce: >> > # save the attached .config to linux build tree >> > make ARCH=x86_64 >> > >> > All warnings (new ones prefixed by >>): >> > >> >>> drivers//scsi/scsi_scan.o: warning: objtool: .text.refcount_overflow+0x5: special: can't find orig instruction >> >> Hi Josh, >> >> I'm seeing this error being generated on areas that are using a >> cross-section exception handler. I can't quite see why the .o checker >> is unhappy, so I figured I'd ask you first. :) >> >> The code is generated with calls to __REFCOUNT_CHECK() which is >> defined like this: >> >> +#define __REFCOUNT_EXCEPTION(size) \ >> + ".if "__stringify(size)" == 4\n\t" \ >> + ".pushsection .text.refcount_overflow\n" \ >> + ".elseif "__stringify(size)" == -4\n\t" \ >> + ".pushsection .text.refcount_underflow\n" \ >> + ".else\n" \ >> + ".error \"invalid size\"\n" \ >> + ".endif\n" \ >> + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ >> + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ >> + "222:\n\t" \ >> + ".popsection\n" \ >> + "333:\n" \ >> + _ASM_EXTABLE(222b, 333b) >> + >> +#define __REFCOUNT_CHECK(size) \ >> + "js 111f\n" \ >> + __REFCOUNT_EXCEPTION(size) >> + >> +#define __REFCOUNT_ERROR(size) \ >> + "jmp 111f\n" \ >> + __REFCOUNT_EXCEPTION(size) >> >> I assume it doesn't like seeing an exception split across .text and >> .text.refcount_overflow, but I haven't been able to figure out how >> that distinction would be made by the checker. :P > > This code uses the exception table a little differently than normal. > Usually it's used for catching page faults, where the exception table > points to the faulting instruction. > > But instead of a page fault, here it's doing a software interrupt. So > the __ex_table entry doesn't point to the 'int 0x81' instruction, it > points to the instruction immediately after it. In this case there > isn't actually an instruction there, which is why objtool is > complaining. What would it take to adjust objtool for this case? > > Is it superfluous to use the exception table here, when a simple 'jmp > 333f' could be used instead after the 'int'? I thought the exception tables were needed to have the trap handler notice it correctly, and do the right thing as far as continuing execution. (This is currently written as a survivable condition: the kernel can keep running even though it will kill the userspace process.) > Also it looks like the handler sends a SIGKILL to the current task. I > wonder if something like BUG_ON() could be used instead of implementing > a custom error interrupt. It's a rate limited report, but it must always kill. BUG doesn't fit this usage case (I've got similar problems with other areas; my intention is go create something that is configurable WARN vs Oops, respects panic_on_oops, etc, but this doesn't exist yet). -Kees
On Mon, May 1, 2017 at 9:30 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote: >> +#define __REFCOUNT_EXCEPTION(size) \ >> + ".if "__stringify(size)" == 4\n\t" \ >> + ".pushsection .text.refcount_overflow\n" \ >> + ".elseif "__stringify(size)" == -4\n\t" \ >> + ".pushsection .text.refcount_underflow\n" \ >> + ".else\n" \ >> + ".error \"invalid size\"\n" \ >> + ".endif\n" \ >> + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ >> + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ >> + "222:\n\t" \ >> + ".popsection\n" \ >> + "333:\n" \ >> + _ASM_EXTABLE(222b, 333b) > > The 'size' argument doesn't seem to correspond to an actual size of > anything. Its value '4' or '-4' only seems to indicate whether it's an > overflow or an underflow. This is to allow for expansion to refcount64_t if we ever move to it, then we'll have 4 cases: 4, -4, 8, -8. > Also there's some inconsistent use of "\n\t" on some lines, with "\n" on > others. It's not inconsistent, it's leaving directives at column 0, and section and instructions at tab-stop 1. >> +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) >> +{ >> + const char *str = NULL; >> + >> + BUG_ON(!(regs->flags & X86_EFLAGS_SF)); >> + >> +#define range_check(size, dir, type, value) \ >> + do { \ >> + if ((unsigned long)__##size##_##dir##_start <= regs->ip && \ >> + regs->ip < (unsigned long)__##size##_##dir##_end) { \ >> + *(type *)regs->cx = (value); \ >> + str = #size " " #dir; \ >> + } \ >> + } while (0) > > An interrupt was used, not a faulting exception, so regs->ip refers to > the address *after* the 'int' instruction. So the beginning of the > range should be exclusive, and the end of the range should be inclusive, > like: > >> + if ((unsigned long)__##size##_##dir##_start < regs->ip && \ >> + regs->ip <= (unsigned long)__##size##_##dir##_end) { \ Ah, yes, good catch. >> + >> + /* >> + * Reset to INT_MAX in both cases to attempt to let system >> + * continue operating. >> + */ >> + range_check(refcount, overflow, int, INT_MAX); >> + range_check(refcount, underflow, int, INT_MAX); > > I think "range_check" doesn't adequately describe the macro. In > addition to checking, it has a subtle side effect: it updates the > counter value with INT_MAX. > > It's not clear why the 'size' argument has its name. Also, three of the > arguments are always called with the same value. Anyway I suspect the > code would be more readable if it were open coded without the macro. Yeah, and I think I may drop the over/under distinction, since I think I've convinced myself that we always need to reset to the same position regardless of direction. This was originally for handling generic atomic_t operations, not refcount_t... PeterZ may convince me yet, but I'll send the next version without the over/under distinction. >> +#ifdef CONFIG_FAST_REFCOUNT >> +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3); >> + >> +void refcount_error_report(struct pt_regs *regs, const char *kind) >> +{ >> + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); >> + >> + if (!__ratelimit(&refcount_ratelimit)) >> + return; >> + >> + pr_emerg("%s detected in: %s:%d, uid/euid: %u/%u\n", >> + kind ? kind : "refcount error", >> + current->comm, task_pid_nr(current), >> + from_kuid_munged(&init_user_ns, current_uid()), >> + from_kuid_munged(&init_user_ns, current_euid())); >> + print_symbol(KERN_EMERG "refcount error occurred at: %s\n", >> + instruction_pointer(regs)); >> + preempt_disable(); >> + show_regs(regs); >> + preempt_enable(); >> +} > > Why is preemption disabled before calling show_regs()? I thought it was to avoid interleaving show_regs() output (I can't think of a way regs would be externally modified). >> +EXPORT_SYMBOL(refcount_error_report); > > Why is this exported? It looks like it's only called internally from > traps.c. Ah yes, good point. I'll drop this. Thanks for the review! -Kees
On Mon, May 01, 2017 at 10:28:53AM -0700, Kees Cook wrote: > On Mon, May 1, 2017 at 8:54 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Thu, Apr 27, 2017 at 01:22:05PM -0700, Kees Cook wrote: > >> +#define __REFCOUNT_EXCEPTION(size) \ > >> + ".if "__stringify(size)" == 4\n\t" \ > >> + ".pushsection .text.refcount_overflow\n" \ > >> + ".elseif "__stringify(size)" == -4\n\t" \ > >> + ".pushsection .text.refcount_underflow\n" \ > >> + ".else\n" \ > >> + ".error \"invalid size\"\n" \ > >> + ".endif\n" \ > >> + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ > >> + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ > >> + "222:\n\t" \ > >> + ".popsection\n" \ > >> + "333:\n" \ > >> + _ASM_EXTABLE(222b, 333b) > >> + > >> +#define __REFCOUNT_CHECK(size) \ > >> + "js 111f\n" \ > >> + __REFCOUNT_EXCEPTION(size) > >> + > >> +#define __REFCOUNT_ERROR(size) \ > >> + "jmp 111f\n" \ > >> + __REFCOUNT_EXCEPTION(size) > >> > >> I assume it doesn't like seeing an exception split across .text and > >> .text.refcount_overflow, but I haven't been able to figure out how > >> that distinction would be made by the checker. :P > > > > This code uses the exception table a little differently than normal. > > Usually it's used for catching page faults, where the exception table > > points to the faulting instruction. > > > > But instead of a page fault, here it's doing a software interrupt. So > > the __ex_table entry doesn't point to the 'int 0x81' instruction, it > > points to the instruction immediately after it. In this case there > > isn't actually an instruction there, which is why objtool is > > complaining. > > What would it take to adjust objtool for this case? I still need to think about it some more. I doubt it would be straightforward. But I am ok with making such a change, if it makes sense to do so. > > Is it superfluous to use the exception table here, when a simple 'jmp > > 333f' could be used instead after the 'int'? > > I thought the exception tables were needed to have the trap handler > notice it correctly, and do the right thing as far as continuing > execution. (This is currently written as a survivable condition: the > kernel can keep running even though it will kill the userspace > process.) Nothing needs to be done to make it continue execution, because regs->ip will already have the address immediately after the 'int' instruction, which is where the iret will return. So fixup_exception() isn't needed. Instead, along with the aforementioned 'jmp 333f', you could move the refcount_error_report() call to outside the fixup_exception() clause: if (!user_mode(regs)) { if (fixup_exception(regs, trapnr)) return 0; if (fixup_bug(regs, trapnr)) return 0; if (IS_ENABLED(CONFIG_FAST_REFCOUNT) && trapnr == X86_REFCOUNT_VECTOR) { refcount_error_report(regs, str); return 0; } (For better readability the refcount parts could be moved to a fixup_refcount() function.) Or, another way to handle it would be to use a real exception. One idea would be to share UD0 with WARN somehow, and handle it with fixup_bug(). At least then, IMO, the error handling code would be closer to where it belongs, with WARN and BUG. But stepping back a bit, why is the interrupt/exception even needed? Is there a reason why it can't do the stack dump from the context of the original overflow? i.e., instead of 'int 0x81', just call the error handler directly. Which could then WARN, send the self-signal, update the refcount to INT_MAX, etc. > > Also it looks like the handler sends a SIGKILL to the current task. I > > wonder if something like BUG_ON() could be used instead of implementing > > a custom error interrupt. > > It's a rate limited report, but it must always kill. BUG doesn't fit > this usage case (I've got similar problems with other areas; my > intention is go create something that is configurable WARN vs Oops, > respects panic_on_oops, etc, but this doesn't exist yet). Yeah, it would be good to make BUG and WARN flexible enough such that we don't need these custom errors, so we can get more consistent error handling behavior.
On Mon, May 01, 2017 at 10:36:59AM -0700, Kees Cook wrote: > >> +#ifdef CONFIG_FAST_REFCOUNT > >> +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3); > >> + > >> +void refcount_error_report(struct pt_regs *regs, const char *kind) > >> +{ > >> + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); > >> + > >> + if (!__ratelimit(&refcount_ratelimit)) > >> + return; > >> + > >> + pr_emerg("%s detected in: %s:%d, uid/euid: %u/%u\n", > >> + kind ? kind : "refcount error", > >> + current->comm, task_pid_nr(current), > >> + from_kuid_munged(&init_user_ns, current_uid()), > >> + from_kuid_munged(&init_user_ns, current_euid())); > >> + print_symbol(KERN_EMERG "refcount error occurred at: %s\n", > >> + instruction_pointer(regs)); > >> + preempt_disable(); > >> + show_regs(regs); > >> + preempt_enable(); > >> +} > > > > Why is preemption disabled before calling show_regs()? > > I thought it was to avoid interleaving show_regs() output (I can't > think of a way regs would be externally modified). This code is running from interrupt context, so preemption shouldn't be an issue unless I'm missing something.
diff --git a/arch/Kconfig b/arch/Kconfig index 6c00e5b00f8b..3983737fd919 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -867,4 +867,23 @@ config STRICT_MODULE_RWX config ARCH_WANT_RELAX_ORDER bool +config ARCH_HAS_FAST_REFCOUNT + bool + help + An architecture selects this when it has implemented refcount_t + using primitizes that provide a faster runtime at the expense + of some refcount state checks. The refcount overflow condition, + however, must be retained. Catching overflows is the primary + security concern for protecting against bugs in reference counts. + +config FAST_REFCOUNT + bool "Speed up reference counting at the expense of full validation" + depends on ARCH_HAS_FAST_REFCOUNT + help + The regular reference counting infrastructure in the kernel checks + many error conditions. If this option is selected, refcounting + is made faster using architecture-specific implementions that may + only check for reference count overflows (which is the most common + way reference counting bugs are turned into security exploits). + source "kernel/gcov/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a694d0002758..a1bbb09ae667 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -50,6 +50,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER + select ARCH_HAS_FAST_REFCOUNT select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 50bc26949e9e..bba69761ec24 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -789,6 +789,15 @@ ENTRY(spurious_interrupt_bug) jmp common_exception END(spurious_interrupt_bug) +#ifdef CONFIG_FAST_REFCOUNT +ENTRY(refcount_error) + ASM_CLAC + pushl $0 + pushl $do_refcount_error + jmp common_exception +ENDPROC(refcount_error) +#endif + #ifdef CONFIG_XEN ENTRY(xen_hypervisor_callback) pushl $-1 /* orig_ax = -1 => not a system call */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 607d72c4a485..783045d3887c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -855,6 +855,9 @@ idtentry coprocessor_error do_coprocessor_error has_error_code=0 idtentry alignment_check do_alignment_check has_error_code=1 idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0 +#ifdef CONFIG_FAST_REFCOUNT +idtentry refcount_error do_refcount_error has_error_code=0 +#endif /* * Reload gs selector with exception handling diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 6ca9fd6234e1..64ca4dcc29ec 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -48,6 +48,9 @@ #define IA32_SYSCALL_VECTOR 0x80 +/* Refcount Overflow or Underflow Exception. */ +#define X86_REFCOUNT_VECTOR 0x81 + /* * Vectors 0x30-0x3f are used for ISA interrupts. * round up to the next 16-vector boundary diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h new file mode 100644 index 000000000000..3d3125717154 --- /dev/null +++ b/arch/x86/include/asm/refcount.h @@ -0,0 +1,97 @@ +#ifndef __ASM_X86_REFCOUNT_H +#define __ASM_X86_REFCOUNT_H +/* + * x86-specific implementation of refcount_t. Ported from PAX_REFCOUNT in + * PaX/grsecurity and changed to use "js" instead of "jo" to trap on all + * signed results, not just when overflowing. + */ +#include <linux/refcount.h> +#include <asm/irq_vectors.h> + +#define __REFCOUNT_EXCEPTION(size) \ + ".if "__stringify(size)" == 4\n\t" \ + ".pushsection .text.refcount_overflow\n" \ + ".elseif "__stringify(size)" == -4\n\t" \ + ".pushsection .text.refcount_underflow\n" \ + ".else\n" \ + ".error \"invalid size\"\n" \ + ".endif\n" \ + "111:\tlea %[counter],%%"_ASM_CX"\n\t" \ + "int $"__stringify(X86_REFCOUNT_VECTOR)"\n" \ + "222:\n\t" \ + ".popsection\n" \ + "333:\n" \ + _ASM_EXTABLE(222b, 333b) + +#define __REFCOUNT_CHECK(size) \ + "js 111f\n" \ + __REFCOUNT_EXCEPTION(size) + +#define __REFCOUNT_ERROR(size) \ + "jmp 111f\n" \ + __REFCOUNT_EXCEPTION(size) + +#define REFCOUNT_CHECK_OVERFLOW(size) __REFCOUNT_CHECK(size) +#define REFCOUNT_CHECK_UNDERFLOW(size) __REFCOUNT_CHECK(-(size)) + +static __always_inline void refcount_add(unsigned int i, refcount_t *r) +{ + asm volatile(LOCK_PREFIX "addl %1,%0\n\t" + REFCOUNT_CHECK_OVERFLOW(4) + : [counter] "+m" (r->refs.counter) + : "ir" (i) + : "cc", "cx"); +} + +static __always_inline void refcount_inc(refcount_t *r) +{ + asm volatile(LOCK_PREFIX "incl %0\n\t" + REFCOUNT_CHECK_OVERFLOW(4) + : [counter] "+m" (r->refs.counter) + : : "cc", "cx"); +} + +static __always_inline void refcount_dec(refcount_t *r) +{ + asm volatile(LOCK_PREFIX "decl %0\n\t" + REFCOUNT_CHECK_UNDERFLOW(4) + : [counter] "+m" (r->refs.counter) + : : "cc", "cx"); +} + +static __always_inline __must_check +bool refcount_sub_and_test(unsigned int i, refcount_t *r) +{ + GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", + REFCOUNT_CHECK_UNDERFLOW(4), r->refs.counter, + "er", i, "%0", e); +} + +static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r) +{ + GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", + REFCOUNT_CHECK_UNDERFLOW(4), r->refs.counter, + "%0", e); +} + +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) +{ + int c; + + c = atomic_read(&(r->refs)); + do { + if (unlikely(c <= 0)) + break; + } while (!atomic_try_cmpxchg(&(r->refs), &c, c + 1)); + + /* Did we start or finish in an undesirable state? */ + if (unlikely(c <= 0 || c + 1 < 0)) { + asm volatile(__REFCOUNT_ERROR(4) + : : [counter] "m" (r->refs.counter) + : "cc", "cx"); + } + + return c != 0; +} + +#endif diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 01fd0a7f48cd..e4d8db75d85e 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -38,6 +38,10 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_FAST_REFCOUNT +asmlinkage void refcount_error(void); +#endif + #ifdef CONFIG_TRACING asmlinkage void trace_page_fault(void); #define trace_stack_segment stack_segment @@ -54,6 +58,7 @@ asmlinkage void trace_page_fault(void); #define trace_alignment_check alignment_check #define trace_simd_coprocessor_error simd_coprocessor_error #define trace_async_page_fault async_page_fault +#define trace_refcount_error refcount_error #endif dotraplinkage void do_divide_error(struct pt_regs *, long); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 3995d3a777d4..4b1c318c96ff 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -218,8 +218,13 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str, } if (!user_mode(regs)) { - if (fixup_exception(regs, trapnr)) + if (fixup_exception(regs, trapnr)) { + if (IS_ENABLED(CONFIG_FAST_REFCOUNT) && + trapnr == X86_REFCOUNT_VECTOR) + refcount_error_report(regs, str); + return 0; + } if (fixup_bug(regs, trapnr)) return 0; @@ -342,6 +347,38 @@ __visible void __noreturn handle_stack_overflow(const char *message, } #endif +#ifdef CONFIG_FAST_REFCOUNT + +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) +{ + const char *str = NULL; + + BUG_ON(!(regs->flags & X86_EFLAGS_SF)); + +#define range_check(size, dir, type, value) \ + do { \ + if ((unsigned long)__##size##_##dir##_start <= regs->ip && \ + regs->ip < (unsigned long)__##size##_##dir##_end) { \ + *(type *)regs->cx = (value); \ + str = #size " " #dir; \ + } \ + } while (0) + + /* + * Reset to INT_MAX in both cases to attempt to let system + * continue operating. + */ + range_check(refcount, overflow, int, INT_MAX); + range_check(refcount, underflow, int, INT_MAX); + +#undef range_check + + BUG_ON(!str); + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR, + SIGILL); +} +#endif + #ifdef CONFIG_X86_64 /* Runs on IST stack */ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) @@ -1017,6 +1054,11 @@ void __init trap_init(void) set_bit(IA32_SYSCALL_VECTOR, used_vectors); #endif +#ifdef CONFIG_FAST_REFCOUNT + set_intr_gate(X86_REFCOUNT_VECTOR, refcount_error); + set_bit(X86_REFCOUNT_VECTOR, used_vectors); +#endif + /* * Set the IDT descriptor to a fixed read-only location, so that the * "sidt" instruction will not leak the location of the kernel, and diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 532372c6cf15..0590f384f234 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -20,6 +20,8 @@ * may be out of this range on some architectures. * [_sinittext, _einittext]: contains .init.text.* sections * [__bss_start, __bss_stop]: contains BSS sections + * [__refcount_overflow/underflow_start, ..._end]: contains .text sections + * for refcount error handling. * * Following global variables are optional and may be unavailable on some * architectures and/or kernel configurations. @@ -39,6 +41,8 @@ extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[]; extern char __kprobes_text_start[], __kprobes_text_end[]; extern char __entry_text_start[], __entry_text_end[]; extern char __start_rodata[], __end_rodata[]; +extern char __refcount_overflow_start[], __refcount_overflow_end[]; +extern char __refcount_underflow_start[], __refcount_underflow_end[]; /* Start and end of .ctors section - used for constructor calls. */ extern char __ctors_start[], __ctors_end[]; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 314a0b9219c6..9b94c3f1e5ec 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -446,9 +446,18 @@ ALIGN_FUNCTION(); \ *(.text.hot .text .text.fixup .text.unlikely) \ *(.ref.text) \ + REFCOUNT_TEXT \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ +#define __REFCOUNT_TEXT(section) \ + VMLINUX_SYMBOL(__##section##_start) = .; \ + *(.text.##section) \ + VMLINUX_SYMBOL(__##section##_end) = .; + +#define REFCOUNT_TEXT \ + __REFCOUNT_TEXT(refcount_overflow) \ + __REFCOUNT_TEXT(refcount_underflow) /* sched.text is aling to function alignment to secure we have same * address even at second ld pass when generating System.map */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 13bc08aba704..94f87d5642e4 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -276,6 +276,8 @@ extern int oops_may_print(void); void do_exit(long error_code) __noreturn; void complete_and_exit(struct completion *, long) __noreturn; +void refcount_error_report(struct pt_regs *regs, const char *kind); + /* Internal, do not use. */ int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res); int __must_check _kstrtol(const char *s, unsigned int base, long *res); diff --git a/include/linux/refcount.h b/include/linux/refcount.h index b34aa649d204..d09ad4e91e55 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -41,6 +41,9 @@ static inline unsigned int refcount_read(const refcount_t *r) return atomic_read(&r->refs); } +#ifdef CONFIG_FAST_REFCOUNT +#include <asm/refcount.h> +#else extern __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r); extern void refcount_add(unsigned int i, refcount_t *r); @@ -52,6 +55,7 @@ extern void refcount_sub(unsigned int i, refcount_t *r); extern __must_check bool refcount_dec_and_test(refcount_t *r); extern void refcount_dec(refcount_t *r); +#endif extern __must_check bool refcount_dec_if_one(refcount_t *r); extern __must_check bool refcount_dec_not_one(refcount_t *r); diff --git a/kernel/panic.c b/kernel/panic.c index a58932b41700..7d5a3eedd1bb 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -26,6 +26,7 @@ #include <linux/nmi.h> #include <linux/console.h> #include <linux/bug.h> +#include <linux/ratelimit.h> #define PANIC_TIMER_STEP 100 #define PANIC_BLINK_SPD 18 @@ -601,6 +602,30 @@ EXPORT_SYMBOL(__stack_chk_fail); #endif +#ifdef CONFIG_FAST_REFCOUNT +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3); + +void refcount_error_report(struct pt_regs *regs, const char *kind) +{ + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); + + if (!__ratelimit(&refcount_ratelimit)) + return; + + pr_emerg("%s detected in: %s:%d, uid/euid: %u/%u\n", + kind ? kind : "refcount error", + current->comm, task_pid_nr(current), + from_kuid_munged(&init_user_ns, current_uid()), + from_kuid_munged(&init_user_ns, current_euid())); + print_symbol(KERN_EMERG "refcount error occurred at: %s\n", + instruction_pointer(regs)); + preempt_disable(); + show_regs(regs); + preempt_enable(); +} +EXPORT_SYMBOL(refcount_error_report); +#endif + core_param(panic, panic_timeout, int, 0644); core_param(pause_on_oops, pause_on_oops, int, 0644); core_param(panic_on_warn, panic_on_warn, int, 0644); diff --git a/lib/refcount.c b/lib/refcount.c index f42124ccf295..6bfe1b7f3e30 100644 --- a/lib/refcount.c +++ b/lib/refcount.c @@ -37,6 +37,9 @@ #include <linux/refcount.h> #include <linux/bug.h> +/* Leave out architecture-specific implementations. */ +#ifndef CONFIG_FAST_REFCOUNT + /** * refcount_add_not_zero - add a value to a refcount unless it is 0 * @i: the value to add to the refcount @@ -225,6 +228,7 @@ void refcount_dec(refcount_t *r) WARN_ONCE(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n"); } EXPORT_SYMBOL_GPL(refcount_dec); +#endif /* CONFIG_FAST_REFCOUNT */ /** * refcount_dec_if_one - decrement a refcount if it is 1 @@ -345,4 +349,3 @@ bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock) return true; } EXPORT_SYMBOL_GPL(refcount_dec_and_lock); -
This protection is a modified version of the x86 PAX_REFCOUNT implementation from PaX/grsecurity. This speeds up the refcount_t API by duplicating the existing atomic_t implementation with a single instruction added to detect if the refcount has wrapped past INT_MAX (or below 0) resulting in a signed value. Note that this protection is only meaningful for the overflow case, as that can be detected and stopped before the reference is freed and left to be abused by an attacker. Catching the "inc from 0" case is nice to have, but only indicates that a use-after-free has already happened. Such notifications are likely avoidable by an attacker that has already exploited a use-after-free vulnerability. With this overflow protection, the use-after-free cannot happen in the first place, avoiding the vulnerability entirely. On overflow detection (actually "signed value" detection), the offending process is killed, a report is generated, and the refcount value is reset to INT_MAX. This allows the system to attempt to keep operating. Another option, not done in this patch, would be to reset the counter to (INT_MIN / 2) to trap all future refcount inc or dec actions. Yet another option would be to choose (INT_MAX - N) with some small N to provide some headroom for legitimate users of the reference counter. On the matter of races, since the entire range beyond INT_MAX but before 0 is signed, every inc will trap, leaving no overflow-only race condition. As for performance, this implementation adds a single "js" instruction to a copy of the regular atomic_t operations, making this comparable to the existing atomic_t operations. The detection routine uses a combination of an alternative section exception handler and trap to return back to C for handling the error condition with minimal increase in text size. Various differences from PaX: - applied only to refcount_t, not atomic_t - rebased to -next - reorganized refcount error handler and section declaration locations - uses "js" instead of "jo" to trap all signed results instead of just under/overflow transitions Signed-off-by: Kees Cook <keescook@chromium.org> --- arch/Kconfig | 19 ++++++++ arch/x86/Kconfig | 1 + arch/x86/entry/entry_32.S | 9 ++++ arch/x86/entry/entry_64.S | 3 ++ arch/x86/include/asm/irq_vectors.h | 3 ++ arch/x86/include/asm/refcount.h | 97 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/traps.h | 5 ++ arch/x86/kernel/traps.c | 44 ++++++++++++++++- include/asm-generic/sections.h | 4 ++ include/asm-generic/vmlinux.lds.h | 9 ++++ include/linux/kernel.h | 2 + include/linux/refcount.h | 4 ++ kernel/panic.c | 25 ++++++++++ lib/refcount.c | 5 +- 14 files changed, 228 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/refcount.h