Message ID | 1626665029-49104-1-git-send-email-xiyuyang19@fudan.edu.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/rmap: Convert from atomic_t to refcount_t on anon_vma->refcount | expand |
On Mon, 19 Jul 2021 11:23:35 +0800 Xiyu Yang <xiyuyang19@fudan.edu.cn> wrote: > refcount_t type and corresponding API can protect refcounters from > accidental underflow and overflow and further use-after-free situations. Grumble. For x86_64 defconfig this takes rmap.o text size from 13226 bytes to 13622. For x86_64 allmodconfig this takes rmap.o text size from 66576 bytes to 67858. I didn't check which config option is making the bloat so much worse, but this really is quite bad. We bust a gut to make savings which are 1% the size of this! Is the refcount_t really so much better than a bare atomic_t that this impact is justified? Can someone pleeeeeeze take a look at what is happening here and put the refcount code on a very serious diet?
[+Peter and Boris] Sorry for not responding sooner; I looked at this and couldn't see a way to improve it at first, but then I had an idea the other day. On Tue, Jul 20, 2021 at 04:01:27PM -0700, Andrew Morton wrote: > On Mon, 19 Jul 2021 11:23:35 +0800 Xiyu Yang <xiyuyang19@fudan.edu.cn> wrote: > > > refcount_t type and corresponding API can protect refcounters from > > accidental underflow and overflow and further use-after-free situations. > > Grumble. > > For x86_64 defconfig this takes rmap.o text size from 13226 bytes to > 13622. > > For x86_64 allmodconfig this takes rmap.o text size from 66576 bytes to > 67858. > > I didn't check which config option is making the bloat so much worse, > but this really is quite bad. We bust a gut to make savings which are > 1% the size of this! Is the refcount_t really so much better than a > bare atomic_t that this impact is justified? I think so: the saturation semantics have been shown to mitigate attacks that rely on over/underflow so it's a good security feature. On top of that, the memory ordering is what you'd expect so you don't have to worry about adding (or missing) the necessary barriers -- there's more info about that in Documentation/core-api/refcount-vs-atomic.rst. > Can someone pleeeeeeze take a look at what is happening here and put > the refcount code on a very serious diet? I suppose the sanitisers might contribute a fair bit to the cmpxchg() loops in the allmodconfig case, but looking at how defconfig affects mm/rmap.o the *big* difference there seems to be due to the dec_and_test() operation: atomic_dec_and_test() looks like: lock decl (%rdi) sete %al retq whereas refcount_dec_and_test() looks like: push %r12 mov $0xffffffff,%eax lock xadd %eax,(%rdi) cmp $0x1,%eax je 97d xor %r12d,%r12d test %eax,%eax jle 989 mov %r12d,%eax pop %r12 retq mov $0x1,%r12d mov %r12d,%eax pop %r12 retq mov $0x3,%esi callq 993 mov %r12d,%eax pop %r12 retq which is a monster by comparison! This is because refcount_dec_and_test() wraps __refcount_sub_and_test(), which is necessary so that we can look at the old value: (1) If it was equal to 1, then ensure we have acquire semantics and return true. (2) If it was < 0 or the new value written is < 0, then saturate (UAF) However, talking to Peter and Boris, we may be able to achieve the same thing by looking at the flags after a DECL instruction: * We can implement (1) by checking if we hit zero (ZF=1) * We can implement (2) by checking if the new value is < 0 (SF=1). We then need to catch the case where the old value was < 0 but the new value is 0. I think this is (SF=0 && OF=1). So maybe the second check is actually SF != OF? I could benefit from some x86 expertise here, but hopefully you get the idea. We could do something similar for refcount_dec() and refcount_inc(), although commit a435b9a14356 ("locking/refcount: Provide __refcount API to obtain the old value") add versions which return the old value, which we can't reconstruct from the flags. Since nobody is using these variants, I suggest we remove them. The other thing we could do is to inline the saturation logic and allow the WARN_ONCE() to be disabled with a Kconfig option, but I'm not sure that the gain is really worth it considering that you lose the useful diagnostic (and silent saturation is likely to be very confusing if it manifests as a crash somewhere else). Thoughts? Will
On Thu, Aug 19, 2021 at 02:21:32PM +0100, Will Deacon wrote: > I suppose the sanitisers might contribute a fair bit to the cmpxchg() loops > in the allmodconfig case, but looking at how defconfig affects mm/rmap.o the > *big* difference there seems to be due to the dec_and_test() operation: > > atomic_dec_and_test() looks like: > > lock decl (%rdi) > sete %al > retq > > whereas refcount_dec_and_test() looks like: > > push %r12 > mov $0xffffffff,%eax > lock xadd %eax,(%rdi) > cmp $0x1,%eax > je 97d > xor %r12d,%r12d > test %eax,%eax > jle 989 > mov %r12d,%eax > pop %r12 > retq > mov $0x1,%r12d > mov %r12d,%eax > pop %r12 > retq > mov $0x3,%esi > callq 993 > mov %r12d,%eax > pop %r12 > retq > > which is a monster by comparison! > > This is because refcount_dec_and_test() wraps __refcount_sub_and_test(), > which is necessary so that we can look at the old value: > > (1) If it was equal to 1, then ensure we have acquire semantics and > return true. > > (2) If it was < 0 or the new value written is < 0, then saturate (UAF) > > However, talking to Peter and Boris, we may be able to achieve the same > thing by looking at the flags after a DECL instruction: > > * We can implement (1) by checking if we hit zero (ZF=1) > * We can implement (2) by checking if the new value is < 0 (SF=1). > We then need to catch the case where the old value was < 0 but the > new value is 0. I think this is (SF=0 && OF=1). > > So maybe the second check is actually SF != OF? I could benefit from some > x86 expertise here, but hopefully you get the idea. Right, so the first condition is ZF=1, we hit zero. The second condition is SF=1, the result is negative. I'm not sure we need OF, if we hit that condition we've already lost. But it's easy enough to add I suppose. > We could do something similar for refcount_dec() and refcount_inc(), > although commit a435b9a14356 ("locking/refcount: Provide __refcount API > to obtain the old value") add versions which return the old value, > which we can't reconstruct from the flags. Since nobody is using these > variants, I suggest we remove them. dhowells asked for them... he was very adamand they were required. David? > Thoughts? Something like so? --- arch/x86/include/asm/refcount.h | 29 +++++++++++++++++++++++++++++ include/linux/refcount.h | 6 +++++- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h new file mode 100644 index 000000000000..bbc0d4835bbb --- /dev/null +++ b/arch/x86/include/asm/refcount.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_REFCOUNT_H +#define _ASM_X86_REFCOUNT_H + +#define refcount_dec_and_test refcount_dec_and_test +static inline bool refcount_dec_and_test(refcount_t *r) +{ + asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" + "jz %l[cc_zero]\n\t" + "js %l[cc_fail]\n\t" + "jo %l[cc_fail]" + : : [var] "m" (r->refs.counter) + : "memory" : cc_zero, cc_fail); + + if (0) { +cc_zero: + smp_acquire__after_ctrl_dep(); + return true; + } + + if (0) { +cc_fail: + refcount_warn_saturate(r, REFCOUNT_SUB_UAF); + } + + return false; +} + +#endif /* _ASM_X86_REFCOUNT_H */ diff --git a/include/linux/refcount.h b/include/linux/refcount.h index b8a6e387f8f9..42e987549d1f 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -117,7 +117,7 @@ typedef struct refcount_struct { #define REFCOUNT_SATURATED (INT_MIN / 2) enum refcount_saturation_type { - REFCOUNT_ADD_NOT_ZERO_OVF, + REFCOUNT_ADD_NOT_ZERO_OVF = 0, REFCOUNT_ADD_OVF, REFCOUNT_ADD_UAF, REFCOUNT_SUB_UAF, @@ -126,6 +126,8 @@ enum refcount_saturation_type { void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t); +#include <asm/refcount.h> + /** * refcount_set - set a refcount's value * @r: the refcount @@ -328,10 +330,12 @@ static inline __must_check bool __refcount_dec_and_test(refcount_t *r, int *oldp * * Return: true if the resulting refcount is 0, false otherwise */ +#ifndef refcount_dec_and_test static inline __must_check bool refcount_dec_and_test(refcount_t *r) { return __refcount_dec_and_test(r, NULL); } +#endif static inline void __refcount_dec(refcount_t *r, int *oldp) {
On Thu, Aug 19, 2021 at 04:06:45PM +0200, Peter Zijlstra wrote: > > > > * We can implement (1) by checking if we hit zero (ZF=1) > > * We can implement (2) by checking if the new value is < 0 (SF=1). > > We then need to catch the case where the old value was < 0 but the > > new value is 0. I think this is (SF=0 && OF=1). > > > > So maybe the second check is actually SF != OF? I could benefit from some > > x86 expertise here, but hopefully you get the idea. > > Right, so the first condition is ZF=1, we hit zero. > The second condition is SF=1, the result is negative. > > I'm not sure we need OF, if we hit that condition we've already lost. > But it's easy enough to add I suppose. If we can skip the OF... we can do something like this: static inline bool refcount_dec_and_test(refcount_t *r) { asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" "jz %l[cc_zero]\n\t" "jns 1f\n\t" "ud1 %[var], %%ebx\n\t" "1:" : : [var] "m" (r->refs.counter) : "memory" : cc_zero); return false; cc_zero: smp_acquire__after_ctrl_dep(); return true; } where we encode the whole refcount_warn_saturate() thing into UD1. The first argument is @r and the second argument the REFCOUNT_* thing encoded in register space. It would mean adding something 'clever' to the #UD handler that decodes the trapping instruction and extracts these arguments, but this is the smallest I could get it.
On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote: > > If we can skip the OF... we can do something like this: Honestly, I think a lot of the refcount code is questionable. It was absolutely written with no care for performance AT ALL. I'm not sure it helps to then add arch-specific code for it without thinking it through a _lot_ first. It might be better to just have a "atomic_t with overflow handling" in general, exactly because the refcount_t was designed and written without any regard for code that cares about performance. > static inline bool refcount_dec_and_test(refcount_t *r) > { > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > "jz %l[cc_zero]\n\t" > "jns 1f\n\t" I think you can use "jl" for the bad case. You want to fail if the old value was negative, or if the old value was less than what you subtracted. That's literally the "less than" operator. I think it's better to handle that case out-of-line than play games with UD, though - this is going to be the rare case, the likelihood that we get the fixup wrong is just too big. Once it's out-of-line it's not as critical any more, even if it does add to the size of the code. So if we do this, I think it should be something like static inline __must_check bool refcount_dec_and_test(refcount_t *r) { asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" "jz %l[cc_zero]\n\t" "jl %l[cc_error]" : : [var] "m" (r->refs.counter) : "memory" : cc_zero, cc_error); return false; cc_zero: return true; cc_error: refcount_warn_saturate(r, REFCOUNT_SUB_UAF); return false; } and we can discuss whether we could improve on the refcount_warn_saturate() separately. But see above: maybe just make this a separate "careful atomic_t", with the option to panic-on-overflow. So then we could get rid of refcount_warn_saturate() enmtirely above, and instead just have a (compile-time option) BUG() case, with the non-careful version just being our existing atomic_dec_and_test. Giving people who want a smaller kernel the option to do so, while giving people who want checking the option too. Linus
On Thu, Aug 19, 2021 at 12:09:37PM -0700, Linus Torvalds wrote: > On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > If we can skip the OF... we can do something like this: > > Honestly, I think a lot of the refcount code is questionable. It was > absolutely written with no care for performance AT ALL. That's a bit unfair I feel. Will's last rewrite of the stuff was specifically to address performance issues. > I'm not sure it helps to then add arch-specific code for it without > thinking it through a _lot_ first. > > It might be better to just have a "atomic_t with overflow handling" in > general, exactly because the refcount_t was designed and written > without any regard for code that cares about performance. The primary concern was to use a single unconditional atomic op where possible (mostly fetch_add), as the atomic op dominates whatever else it does. The rest is just because C absolutely sucks at conditions. Doing atomic_t with overflow handling would require doing the whole thing in arch asm. > > static inline bool refcount_dec_and_test(refcount_t *r) > > { > > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > > "jz %l[cc_zero]\n\t" > > "jns 1f\n\t" > > I think you can use "jl" for the bad case. Duh yes. I clearly didn't have my head on straight. > I think it's better to handle that case out-of-line than play games > with UD, though - this is going to be the rare case, the likelihood > that we get the fixup wrong is just too big. Once it's out-of-line > it's not as critical any more, even if it does add to the size of the > code. Fine with me; although the immediate complaint from Andrew was about size, hence my UD1 hackery. > So if we do this, I think it should be something like > > static inline __must_check bool refcount_dec_and_test(refcount_t *r) > { > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > "jz %l[cc_zero]\n\t" > "jl %l[cc_error]" > : : [var] "m" (r->refs.counter) > : "memory" : cc_zero, cc_error); > > return false; > > cc_zero: > return true; > cc_error: > refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > return false; > } > > and we can discuss whether we could improve on the > refcount_warn_saturate() separately. I can do the refcount_warn_saturate() change separately. Let me go check how small I can get it... > But see above: maybe just make this a separate "careful atomic_t", > with the option to panic-on-overflow. So then we could get rid of > refcount_warn_saturate() enmtirely above, and instead just have a > (compile-time option) BUG() case, with the non-careful version just > being our existing atomic_dec_and_test. We used to have that option; the argument was made that everybody cares about security and as long as this doesn't show up on benchmarks we good. Also, I don't think most people want the overflow to go BUG, WARN is mostly the right thing and only the super paranoid use panic-on-warn or something.
On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote: > On Thu, Aug 19, 2021 at 12:09:37PM -0700, Linus Torvalds wrote: > > On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > If we can skip the OF... we can do something like this: > > > > Honestly, I think a lot of the refcount code is questionable. It was > > absolutely written with no care for performance AT ALL. > > That's a bit unfair I feel. Will's last rewrite of the stuff was > specifically to address performance issues. Well, to address performance issues with the "full" version. The default x86-specific code was already as fast as atomic_t. Will got it to nearly match while making it catch all conditions, not just the exploitable ones. (i.e. it didn't bother trying to catch underflow; there's no way to mitigate it). Will's version gave us three properties: correctness (it catches all the pathological conditions), speed (it was very nearly the same speed as regular atomic_t), and arch-agnosticism, which expanded this protection to things beyond just x86 and arm64. > > But see above: maybe just make this a separate "careful atomic_t", > > with the option to panic-on-overflow. So then we could get rid of > > refcount_warn_saturate() enmtirely above, and instead just have a > > (compile-time option) BUG() case, with the non-careful version just > > being our existing atomic_dec_and_test. This is nearly what we had before. But refcount_t should always saturate on overflow -- that's specifically the mitigation needed to defang the traditional atomic_t overflow exploits (of which we had several a year before refcount_t and now we've seen zero since). > We used to have that option; the argument was made that everybody cares > about security and as long as this doesn't show up on benchmarks we > good. > > Also, I don't think most people want the overflow to go BUG, WARN is > mostly the right thing and only the super paranoid use panic-on-warn or > something. Saturating on overflow stops exploitability. WARNing is informational. BUG kills the system for no good reason: the saturation is the defense against attack, and the WARN is the "oh, I found a bug" details needed to fix it. I prefer the arch-agnostic, fully checked, very fast version of this (i.e. what we have right now). :P I appreciate it's larger, but in my opinion size isn't as important as correctness and speed. If it's just as fast as a small version but has greater coverage, that seems worth the size.
On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote: > Fine with me; although the immediate complaint from Andrew was about > size, hence my UD1 hackery. > > > So if we do this, I think it should be something like > > > > static inline __must_check bool refcount_dec_and_test(refcount_t *r) > > { > > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > > "jz %l[cc_zero]\n\t" > > "jl %l[cc_error]" > > : : [var] "m" (r->refs.counter) > > : "memory" : cc_zero, cc_error); > > > > return false; > > > > cc_zero: > > return true; > > cc_error: > > refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > > return false; > > } > > > > and we can discuss whether we could improve on the > > refcount_warn_saturate() separately. > > I can do the refcount_warn_saturate() change separately. > > Let me go check how small I can get it... gcc-10.2.1, x86_64-defconfig kernel/event/core.o-inline-ud1: 96454 kernel/event/core.o-outofline-ud1: 96604 kernel/event/core.o-outofline-call: 97072 (42 refcount_warn_saturate/ud1 instances in that file, 10 of which are refcount_dec_and_test)
On Fri, Aug 20, 2021 at 10:16:18AM +0200, Peter Zijlstra wrote: > On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote: > > > Fine with me; although the immediate complaint from Andrew was about > > size, hence my UD1 hackery. > > > > > So if we do this, I think it should be something like > > > > > > static inline __must_check bool refcount_dec_and_test(refcount_t *r) > > > { > > > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > > > "jz %l[cc_zero]\n\t" > > > "jl %l[cc_error]" > > > : : [var] "m" (r->refs.counter) > > > : "memory" : cc_zero, cc_error); > > > > > > return false; > > > > > > cc_zero: > > > return true; > > > cc_error: > > > refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > > > return false; > > > } > > > > > > and we can discuss whether we could improve on the > > > refcount_warn_saturate() separately. > > > > I can do the refcount_warn_saturate() change separately. > > > > Let me go check how small I can get it... > > gcc-10.2.1, x86_64-defconfig > > kernel/event/core.o-inline-ud1: 96454 > kernel/event/core.o-outofline-ud1: 96604 > kernel/event/core.o-outofline-call: 97072 > > (42 refcount_warn_saturate/ud1 instances in that file, > 10 of which are refcount_dec_and_test) Is that with the saturation moved to the UD handler as well? I think it would be good to keep that as close to the point at which we detect the problem as we can, so perhaps we can inline that part and leave the diagnostics to the exception handler? Will
On Fri, Aug 20, 2021 at 09:24:58AM +0100, Will Deacon wrote: > > gcc-10.2.1, x86_64-defconfig > > > > kernel/event/core.o-inline-ud1: 96454 > > kernel/event/core.o-outofline-ud1: 96604 > > kernel/event/core.o-outofline-call: 97072 kernel/event/core.o-outofline-saturate-ud2: 96954 kernel/event/core.o: 97248 > Is that with the saturation moved to the UD handler as well? Yep, that's the full function call replaced with an exception. > I think it would be good to keep that as close to the point at which > we detect the problem as we can, so perhaps we can inline that part > and leave the diagnostics to the exception handler? That's simpler execption code too, we can abuse the existing WARN/UD2 stuff. --- arch/x86/include/asm/refcount.h | 31 +++++++++++++++++++++++++++++++ include/asm-generic/bug.h | 4 ++++ include/linux/refcount.h | 15 +++++++++++---- lib/bug.c | 13 ++++++++++++- lib/refcount.c | 7 ++----- 5 files changed, 60 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h new file mode 100644 index 000000000000..bed52b95d24c --- /dev/null +++ b/arch/x86/include/asm/refcount.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_REFCOUNT_H +#define _ASM_X86_REFCOUNT_H + +#define refcount_warn_saturate refcount_warn_saturate +static __always_inline void refcount_warn_saturate(refcount_t *r, const enum refcount_saturation_type t) +{ + refcount_set(r, REFCOUNT_SATURATED); + __WARN_FLAGS(BUGFLAG_ONCE|BUGFLAG_REFCOUNT|BUGFLAG_REFCOUNT_TYPE(t)); +} + +#define refcount_dec_and_test refcount_dec_and_test +static inline bool refcount_dec_and_test(refcount_t *r) +{ + asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" + "jz %l[cc_zero]\n\t" + "jl %l[cc_error]" + : : [var] "m" (r->refs.counter) + : "memory" : cc_zero, cc_error); + + return false; + +cc_zero: + return true; + +cc_error: + refcount_warn_saturate(r, REFCOUNT_SUB_UAF); + return false; +} + +#endif /* _ASM_X86_REFCOUNT_H */ diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h index edb0e2a602a8..9937c70138b8 100644 --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -13,6 +13,10 @@ #define BUGFLAG_ONCE (1 << 1) #define BUGFLAG_DONE (1 << 2) #define BUGFLAG_NO_CUT_HERE (1 << 3) /* CUT_HERE already sent */ + +#define BUGFLAG_REFCOUNT (1 << 4) +#define BUGFLAG_REFCOUNT_TYPE(x)((x&3) << 5) + #define BUGFLAG_TAINT(taint) ((taint) << 8) #define BUG_GET_TAINT(bug) ((bug)->flags >> 8) #endif diff --git a/include/linux/refcount.h b/include/linux/refcount.h index b8a6e387f8f9..7db2b024a75d 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -117,14 +117,13 @@ typedef struct refcount_struct { #define REFCOUNT_SATURATED (INT_MIN / 2) enum refcount_saturation_type { - REFCOUNT_ADD_NOT_ZERO_OVF, - REFCOUNT_ADD_OVF, + REFCOUNT_ADD_OVF = 0, REFCOUNT_ADD_UAF, REFCOUNT_SUB_UAF, REFCOUNT_DEC_LEAK, }; -void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t); +extern void __refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t); /** * refcount_set - set a refcount's value @@ -136,6 +135,12 @@ static inline void refcount_set(refcount_t *r, int n) atomic_set(&r->refs, n); } +#include <asm/refcount.h> + +#ifndef refcount_warn_saturate +#define refcount_warn_saturate __refcount_warn_saturate +#endif + /** * refcount_read - get a refcount's value * @r: the refcount @@ -160,7 +165,7 @@ static inline __must_check bool __refcount_add_not_zero(int i, refcount_t *r, in *oldp = old; if (unlikely(old < 0 || old + i < 0)) - refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF); + refcount_warn_saturate(r, REFCOUNT_ADD_OVF); return old; } @@ -328,10 +333,12 @@ static inline __must_check bool __refcount_dec_and_test(refcount_t *r, int *oldp * * Return: true if the resulting refcount is 0, false otherwise */ +#ifndef refcount_dec_and_test static inline __must_check bool refcount_dec_and_test(refcount_t *r) { return __refcount_dec_and_test(r, NULL); } +#endif static inline void __refcount_dec(refcount_t *r, int *oldp) { diff --git a/lib/bug.c b/lib/bug.c index 45a0584f6541..3878df956143 100644 --- a/lib/bug.c +++ b/lib/bug.c @@ -154,11 +154,18 @@ struct bug_entry *find_bug(unsigned long bugaddr) return module_find_bug(bugaddr); } +static const char *refstr[] = { + "refcount_t: saturated; leaking memory", + "refcount_t: addition on 0; use-after-free", + "refcount_t: underflow; use-after-free", + "refcount_t: decrement hit 0; leaking memory", +}; + enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) { + unsigned line, warning, once, done, refcount; struct bug_entry *bug; const char *file; - unsigned line, warning, once, done; if (!is_valid_bugaddr(bugaddr)) return BUG_TRAP_TYPE_NONE; @@ -174,6 +181,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) warning = (bug->flags & BUGFLAG_WARNING) != 0; once = (bug->flags & BUGFLAG_ONCE) != 0; done = (bug->flags & BUGFLAG_DONE) != 0; + refcount = (bug->flags & BUGFLAG_REFCOUNT) != 0; if (warning && once) { if (done) @@ -195,6 +203,9 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) printk(KERN_DEFAULT CUT_HERE); if (warning) { + if (refcount) + pr_warn("%s\n", refstr[(bug->flags >> 5) & 3]); + /* this is a WARN_ON rather than BUG/BUG_ON */ __warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs, NULL); diff --git a/lib/refcount.c b/lib/refcount.c index a207a8f22b3c..a36da0611f25 100644 --- a/lib/refcount.c +++ b/lib/refcount.c @@ -10,14 +10,11 @@ #define REFCOUNT_WARN(str) WARN_ONCE(1, "refcount_t: " str ".\n") -void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) +void __refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) { refcount_set(r, REFCOUNT_SATURATED); switch (t) { - case REFCOUNT_ADD_NOT_ZERO_OVF: - REFCOUNT_WARN("saturated; leaking memory"); - break; case REFCOUNT_ADD_OVF: REFCOUNT_WARN("saturated; leaking memory"); break; @@ -34,7 +31,7 @@ void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t) REFCOUNT_WARN("unknown saturation event!?"); } } -EXPORT_SYMBOL(refcount_warn_saturate); +EXPORT_SYMBOL(__refcount_warn_saturate); /** * refcount_dec_if_one - decrement a refcount if it is 1
On Fri, Aug 20, 2021 at 11:03:00AM +0200, Peter Zijlstra wrote: > On Fri, Aug 20, 2021 at 09:24:58AM +0100, Will Deacon wrote: > > > > gcc-10.2.1, x86_64-defconfig > > > > > > kernel/event/core.o-inline-ud1: 96454 > > > kernel/event/core.o-outofline-ud1: 96604 > > > kernel/event/core.o-outofline-call: 97072 > > kernel/event/core.o-outofline-saturate-ud2: 96954 > kernel/event/core.o: 97248 > > > Is that with the saturation moved to the UD handler as well? > > Yep, that's the full function call replaced with an exception. > > > I think it would be good to keep that as close to the point at which > > we detect the problem as we can, so perhaps we can inline that part > > and leave the diagnostics to the exception handler? > > That's simpler execption code too, we can abuse the existing WARN/UD2 > stuff. > > --- > arch/x86/include/asm/refcount.h | 31 +++++++++++++++++++++++++++++++ > include/asm-generic/bug.h | 4 ++++ > include/linux/refcount.h | 15 +++++++++++---- > lib/bug.c | 13 ++++++++++++- > lib/refcount.c | 7 ++----- > 5 files changed, 60 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h > new file mode 100644 > index 000000000000..bed52b95d24c > --- /dev/null > +++ b/arch/x86/include/asm/refcount.h > @@ -0,0 +1,31 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_X86_REFCOUNT_H > +#define _ASM_X86_REFCOUNT_H > + > +#define refcount_warn_saturate refcount_warn_saturate > +static __always_inline void refcount_warn_saturate(refcount_t *r, const enum refcount_saturation_type t) > +{ > + refcount_set(r, REFCOUNT_SATURATED); > + __WARN_FLAGS(BUGFLAG_ONCE|BUGFLAG_REFCOUNT|BUGFLAG_REFCOUNT_TYPE(t)); > +} Instead of using up warn flags, what was done in the past was to use an explicit EXTABLE in a cold text section: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/asm.h?h=v4.15#n80 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/mm/extable.c?h=v4.15#n45 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/refcount.h?h=v4.15 > + > +#define refcount_dec_and_test refcount_dec_and_test > +static inline bool refcount_dec_and_test(refcount_t *r) > +{ > + asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > + "jz %l[cc_zero]\n\t" > + "jl %l[cc_error]" > + : : [var] "m" (r->refs.counter) > + : "memory" : cc_zero, cc_error); > + > + return false; > + > +cc_zero: > + return true; > + > +cc_error: > + refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > + return false; > +} This looks basically the same as what we had before (i.e. the earlier REFCOUNT_CHECK_LE_ZERO within GEN_UNARY_SUFFIXED_RMWcc).
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index c976cc6de257..38151efe1a59 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -12,6 +12,8 @@ #include <linux/memcontrol.h> #include <linux/highmem.h> +#include <linux/refcount.h> + /* * The anon_vma heads a list of private "related" vmas, to scan if * an anonymous page pointing to this anon_vma needs to be unmapped: @@ -36,7 +38,7 @@ struct anon_vma { * the reference is responsible for clearing up the * anon_vma if they are the last user on release */ - atomic_t refcount; + refcount_t refcount; /* * Count of child anon_vmas and VMAs which points to this anon_vma. @@ -100,14 +102,14 @@ enum ttu_flags { #ifdef CONFIG_MMU static inline void get_anon_vma(struct anon_vma *anon_vma) { - atomic_inc(&anon_vma->refcount); + refcount_inc(&anon_vma->refcount); } void __put_anon_vma(struct anon_vma *anon_vma); static inline void put_anon_vma(struct anon_vma *anon_vma) { - if (atomic_dec_and_test(&anon_vma->refcount)) + if (refcount_dec_and_test(&anon_vma->refcount)) __put_anon_vma(anon_vma); } diff --git a/mm/rmap.c b/mm/rmap.c index b9eb5c12f3fe..7badd786e095 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -88,7 +88,7 @@ static inline struct anon_vma *anon_vma_alloc(void) anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL); if (anon_vma) { - atomic_set(&anon_vma->refcount, 1); + refcount_set(&anon_vma->refcount, 1); anon_vma->degree = 1; /* Reference for first vma */ anon_vma->parent = anon_vma; /* @@ -103,7 +103,7 @@ static inline struct anon_vma *anon_vma_alloc(void) static inline void anon_vma_free(struct anon_vma *anon_vma) { - VM_BUG_ON(atomic_read(&anon_vma->refcount)); + VM_BUG_ON(refcount_read(&anon_vma->refcount)); /* * Synchronize against page_lock_anon_vma_read() such that @@ -445,7 +445,7 @@ static void anon_vma_ctor(void *data) struct anon_vma *anon_vma = data; init_rwsem(&anon_vma->rwsem); - atomic_set(&anon_vma->refcount, 0); + refcount_set(&anon_vma->refcount, 0); anon_vma->rb_root = RB_ROOT_CACHED; } @@ -495,7 +495,7 @@ struct anon_vma *page_get_anon_vma(struct page *page) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - if (!atomic_inc_not_zero(&anon_vma->refcount)) { + if (!refcount_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; goto out; } @@ -554,7 +554,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page) } /* trylock failed, we got to sleep */ - if (!atomic_inc_not_zero(&anon_vma->refcount)) { + if (!refcount_inc_not_zero(&anon_vma->refcount)) { anon_vma = NULL; goto out; } @@ -569,7 +569,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page) rcu_read_unlock(); anon_vma_lock_read(anon_vma); - if (atomic_dec_and_test(&anon_vma->refcount)) { + if (refcount_dec_and_test(&anon_vma->refcount)) { /* * Oops, we held the last refcount, release the lock * and bail -- can't simply use put_anon_vma() because @@ -2221,7 +2221,7 @@ void __put_anon_vma(struct anon_vma *anon_vma) struct anon_vma *root = anon_vma->root; anon_vma_free(anon_vma); - if (root != anon_vma && atomic_dec_and_test(&root->refcount)) + if (root != anon_vma && refcount_dec_and_test(&root->refcount)) anon_vma_free(root); }