diff mbox series

mm/rmap: Convert from atomic_t to refcount_t on anon_vma->refcount

Message ID 1626665029-49104-1-git-send-email-xiyuyang19@fudan.edu.cn (mailing list archive)
State New
Headers show
Series mm/rmap: Convert from atomic_t to refcount_t on anon_vma->refcount | expand

Commit Message

Xiyu Yang July 19, 2021, 3:23 a.m. UTC
refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
---
 include/linux/rmap.h |  8 +++++---
 mm/rmap.c            | 14 +++++++-------
 2 files changed, 12 insertions(+), 10 deletions(-)

Comments

Andrew Morton July 20, 2021, 11:01 p.m. UTC | #1
On Mon, 19 Jul 2021 11:23:35 +0800 Xiyu Yang <xiyuyang19@fudan.edu.cn> wrote:

> refcount_t type and corresponding API can protect refcounters from
> accidental underflow and overflow and further use-after-free situations.

Grumble.

For x86_64 defconfig this takes rmap.o text size from 13226 bytes to
13622.

For x86_64 allmodconfig this takes rmap.o text size from 66576 bytes to
67858.

I didn't check which config option is making the bloat so much worse,
but this really is quite bad.  We bust a gut to make savings which are
1% the size of this!  Is the refcount_t really so much better than a
bare atomic_t that this impact is justified?

Can someone pleeeeeeze take a look at what is happening here and put
the refcount code on a very serious diet?
Will Deacon Aug. 19, 2021, 1:21 p.m. UTC | #2
[+Peter and Boris]

Sorry for not responding sooner; I looked at this and couldn't see a way
to improve it at first, but then I had an idea the other day.

On Tue, Jul 20, 2021 at 04:01:27PM -0700, Andrew Morton wrote:
> On Mon, 19 Jul 2021 11:23:35 +0800 Xiyu Yang <xiyuyang19@fudan.edu.cn> wrote:
> 
> > refcount_t type and corresponding API can protect refcounters from
> > accidental underflow and overflow and further use-after-free situations.
> 
> Grumble.
> 
> For x86_64 defconfig this takes rmap.o text size from 13226 bytes to
> 13622.
> 
> For x86_64 allmodconfig this takes rmap.o text size from 66576 bytes to
> 67858.
> 
> I didn't check which config option is making the bloat so much worse,
> but this really is quite bad.  We bust a gut to make savings which are
> 1% the size of this!  Is the refcount_t really so much better than a
> bare atomic_t that this impact is justified?

I think so: the saturation semantics have been shown to mitigate attacks
that rely on over/underflow so it's a good security feature. On top of
that, the memory ordering is what you'd expect so you don't have to worry
about adding (or missing) the necessary barriers -- there's more info about
that in Documentation/core-api/refcount-vs-atomic.rst.

> Can someone pleeeeeeze take a look at what is happening here and put
> the refcount code on a very serious diet?

I suppose the sanitisers might contribute a fair bit to the cmpxchg() loops
in the allmodconfig case, but looking at how defconfig affects mm/rmap.o the
*big* difference there seems to be due to the dec_and_test() operation:

atomic_dec_and_test() looks like:

	lock decl (%rdi)
	sete   %al
	retq

whereas refcount_dec_and_test() looks like:

	push   %r12
	mov    $0xffffffff,%eax
	lock xadd %eax,(%rdi)
	cmp    $0x1,%eax
	je     97d
	xor    %r12d,%r12d
	test   %eax,%eax
	jle    989
	mov    %r12d,%eax
	pop    %r12
	retq
	mov    $0x1,%r12d
	mov    %r12d,%eax
	pop    %r12
	retq
	mov    $0x3,%esi
	callq  993
	mov    %r12d,%eax
	pop    %r12
	retq

which is a monster by comparison!

This is because refcount_dec_and_test() wraps __refcount_sub_and_test(),
which is necessary so that we can look at the old value:

  (1) If it was equal to 1, then ensure we have acquire semantics and
      return true.

  (2) If it was < 0 or the new value written is < 0, then saturate (UAF)

However, talking to Peter and Boris, we may be able to achieve the same
thing by looking at the flags after a DECL instruction:

  * We can implement (1) by checking if we hit zero (ZF=1)
  * We can implement (2) by checking if the new value is < 0 (SF=1).
    We then need to catch the case where the old value was < 0 but the
    new value is 0. I think this is (SF=0 && OF=1).

So maybe the second check is actually SF != OF? I could benefit from some
x86 expertise here, but hopefully you get the idea.

We could do something similar for refcount_dec() and refcount_inc(),
although commit a435b9a14356 ("locking/refcount: Provide __refcount API
to obtain the old value") add versions which return the old value,
which we can't reconstruct from the flags. Since nobody is using these
variants, I suggest we remove them.

The other thing we could do is to inline the saturation logic and allow
the WARN_ONCE() to be disabled with a Kconfig option, but I'm not sure
that the gain is really worth it considering that you lose the useful
diagnostic (and silent saturation is likely to be very confusing if it
manifests as a crash somewhere else).

Thoughts?

Will
Peter Zijlstra Aug. 19, 2021, 2:06 p.m. UTC | #3
On Thu, Aug 19, 2021 at 02:21:32PM +0100, Will Deacon wrote:

> I suppose the sanitisers might contribute a fair bit to the cmpxchg() loops
> in the allmodconfig case, but looking at how defconfig affects mm/rmap.o the
> *big* difference there seems to be due to the dec_and_test() operation:
> 
> atomic_dec_and_test() looks like:
> 
> 	lock decl (%rdi)
> 	sete   %al
> 	retq
> 
> whereas refcount_dec_and_test() looks like:
> 
> 	push   %r12
> 	mov    $0xffffffff,%eax
> 	lock xadd %eax,(%rdi)
> 	cmp    $0x1,%eax
> 	je     97d
> 	xor    %r12d,%r12d
> 	test   %eax,%eax
> 	jle    989
> 	mov    %r12d,%eax
> 	pop    %r12
> 	retq
> 	mov    $0x1,%r12d
> 	mov    %r12d,%eax
> 	pop    %r12
> 	retq
> 	mov    $0x3,%esi
> 	callq  993
> 	mov    %r12d,%eax
> 	pop    %r12
> 	retq
> 
> which is a monster by comparison!
> 
> This is because refcount_dec_and_test() wraps __refcount_sub_and_test(),
> which is necessary so that we can look at the old value:
> 
>   (1) If it was equal to 1, then ensure we have acquire semantics and
>       return true.
> 
>   (2) If it was < 0 or the new value written is < 0, then saturate (UAF)
>
> However, talking to Peter and Boris, we may be able to achieve the same
> thing by looking at the flags after a DECL instruction:
> 
>   * We can implement (1) by checking if we hit zero (ZF=1)
>   * We can implement (2) by checking if the new value is < 0 (SF=1).
>     We then need to catch the case where the old value was < 0 but the
>     new value is 0. I think this is (SF=0 && OF=1).
> 
> So maybe the second check is actually SF != OF? I could benefit from some
> x86 expertise here, but hopefully you get the idea.

Right, so the first condition is ZF=1, we hit zero.
The second condition is SF=1, the result is negative.

I'm not sure we need OF, if we hit that condition we've already lost.
But it's easy enough to add I suppose.

> We could do something similar for refcount_dec() and refcount_inc(),
> although commit a435b9a14356 ("locking/refcount: Provide __refcount API
> to obtain the old value") add versions which return the old value,
> which we can't reconstruct from the flags. Since nobody is using these
> variants, I suggest we remove them.

dhowells asked for them... he was very adamand they were required. David?

> Thoughts?

Something like so?

---
 arch/x86/include/asm/refcount.h | 29 +++++++++++++++++++++++++++++
 include/linux/refcount.h        |  6 +++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
new file mode 100644
index 000000000000..bbc0d4835bbb
--- /dev/null
+++ b/arch/x86/include/asm/refcount.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_REFCOUNT_H
+#define _ASM_X86_REFCOUNT_H
+
+#define refcount_dec_and_test refcount_dec_and_test
+static inline bool refcount_dec_and_test(refcount_t *r)
+{
+	asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
+			   "jz %l[cc_zero]\n\t"
+			   "js %l[cc_fail]\n\t"
+			   "jo %l[cc_fail]"
+			   : : [var] "m" (r->refs.counter)
+			   : "memory" : cc_zero, cc_fail);
+
+	if (0) {
+cc_zero:
+		smp_acquire__after_ctrl_dep();
+		return true;
+	}
+
+	if (0) {
+cc_fail:
+		refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
+	}
+
+	return false;
+}
+
+#endif /* _ASM_X86_REFCOUNT_H */
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index b8a6e387f8f9..42e987549d1f 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -117,7 +117,7 @@ typedef struct refcount_struct {
 #define REFCOUNT_SATURATED	(INT_MIN / 2)
 
 enum refcount_saturation_type {
-	REFCOUNT_ADD_NOT_ZERO_OVF,
+	REFCOUNT_ADD_NOT_ZERO_OVF = 0,
 	REFCOUNT_ADD_OVF,
 	REFCOUNT_ADD_UAF,
 	REFCOUNT_SUB_UAF,
@@ -126,6 +126,8 @@ enum refcount_saturation_type {
 
 void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t);
 
+#include <asm/refcount.h>
+
 /**
  * refcount_set - set a refcount's value
  * @r: the refcount
@@ -328,10 +330,12 @@ static inline __must_check bool __refcount_dec_and_test(refcount_t *r, int *oldp
  *
  * Return: true if the resulting refcount is 0, false otherwise
  */
+#ifndef refcount_dec_and_test
 static inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
 	return __refcount_dec_and_test(r, NULL);
 }
+#endif
 
 static inline void __refcount_dec(refcount_t *r, int *oldp)
 {
Peter Zijlstra Aug. 19, 2021, 3:19 p.m. UTC | #4
On Thu, Aug 19, 2021 at 04:06:45PM +0200, Peter Zijlstra wrote:
> > 
> >   * We can implement (1) by checking if we hit zero (ZF=1)
> >   * We can implement (2) by checking if the new value is < 0 (SF=1).
> >     We then need to catch the case where the old value was < 0 but the
> >     new value is 0. I think this is (SF=0 && OF=1).
> > 
> > So maybe the second check is actually SF != OF? I could benefit from some
> > x86 expertise here, but hopefully you get the idea.
> 
> Right, so the first condition is ZF=1, we hit zero.
> The second condition is SF=1, the result is negative.
> 
> I'm not sure we need OF, if we hit that condition we've already lost.
> But it's easy enough to add I suppose.

If we can skip the OF... we can do something like this:

static inline bool refcount_dec_and_test(refcount_t *r)
{
	asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
			   "jz %l[cc_zero]\n\t"
			   "jns 1f\n\t"
			   "ud1 %[var], %%ebx\n\t"
			   "1:"
			   : : [var] "m" (r->refs.counter)
			   : "memory" : cc_zero);

	return false;

cc_zero:
	smp_acquire__after_ctrl_dep();
	return true;
}

where we encode the whole refcount_warn_saturate() thing into UD1. The
first argument is @r and the second argument the REFCOUNT_* thing
encoded in register space.

It would mean adding something 'clever' to the #UD handler that decodes
the trapping instruction and extracts these arguments, but this is the
smallest I could get it.
Linus Torvalds Aug. 19, 2021, 7:09 p.m. UTC | #5
On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> If we can skip the OF... we can do something like this:

Honestly, I think a lot of the refcount code is questionable. It was
absolutely written with no care for performance AT ALL.

I'm not sure it helps to then add arch-specific code for it without
thinking it through a _lot_ first.

It might be better to just have a "atomic_t with overflow handling" in
general, exactly because the refcount_t was designed and written
without any regard for code that cares about performance.

> static inline bool refcount_dec_and_test(refcount_t *r)
> {
>         asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
>                            "jz %l[cc_zero]\n\t"
>                            "jns 1f\n\t"

I think you can use "jl" for the bad case.

You want to fail if the old value was negative, or if the old value
was less than what you subtracted. That's literally the "less than"
operator.

I think it's better to handle that case out-of-line than play games
with UD, though - this is going to be the rare case, the likelihood
that we get the fixup wrong is just too big. Once it's out-of-line
it's not as critical any more, even if it does add to the size of the
code.

So if we do this, I think it should be something like

   static inline __must_check bool refcount_dec_and_test(refcount_t *r)
   {
        asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
                "jz %l[cc_zero]\n\t"
                "jl %l[cc_error]"
                : : [var] "m" (r->refs.counter)
                : "memory" : cc_zero, cc_error);

        return false;

   cc_zero:
        return true;
   cc_error:
        refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
        return false;
   }

and we can discuss whether we could improve on the
refcount_warn_saturate() separately.

But see above: maybe just make this a separate "careful atomic_t",
with the option to panic-on-overflow. So then we could get rid of
refcount_warn_saturate() enmtirely above, and instead just have a
(compile-time option) BUG() case, with the non-careful version just
being our existing atomic_dec_and_test.

Giving people who want a smaller kernel the option to do so, while
giving people who want checking the option too.

                Linus
Peter Zijlstra Aug. 20, 2021, 6:43 a.m. UTC | #6
On Thu, Aug 19, 2021 at 12:09:37PM -0700, Linus Torvalds wrote:
> On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > If we can skip the OF... we can do something like this:
> 
> Honestly, I think a lot of the refcount code is questionable. It was
> absolutely written with no care for performance AT ALL.

That's a bit unfair I feel. Will's last rewrite of the stuff was
specifically to address performance issues.

> I'm not sure it helps to then add arch-specific code for it without
> thinking it through a _lot_ first.
> 
> It might be better to just have a "atomic_t with overflow handling" in
> general, exactly because the refcount_t was designed and written
> without any regard for code that cares about performance.

The primary concern was to use a single unconditional atomic op where
possible (mostly fetch_add), as the atomic op dominates whatever else it
does. The rest is just because C absolutely sucks at conditions.

Doing atomic_t with overflow handling would require doing the whole
thing in arch asm.

> > static inline bool refcount_dec_and_test(refcount_t *r)
> > {
> >         asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
> >                            "jz %l[cc_zero]\n\t"
> >                            "jns 1f\n\t"
> 
> I think you can use "jl" for the bad case.

Duh yes. I clearly didn't have my head on straight.

> I think it's better to handle that case out-of-line than play games
> with UD, though - this is going to be the rare case, the likelihood
> that we get the fixup wrong is just too big. Once it's out-of-line
> it's not as critical any more, even if it does add to the size of the
> code.

Fine with me; although the immediate complaint from Andrew was about
size, hence my UD1 hackery.

> So if we do this, I think it should be something like
> 
>    static inline __must_check bool refcount_dec_and_test(refcount_t *r)
>    {
>         asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
>                 "jz %l[cc_zero]\n\t"
>                 "jl %l[cc_error]"
>                 : : [var] "m" (r->refs.counter)
>                 : "memory" : cc_zero, cc_error);
> 
>         return false;
> 
>    cc_zero:
>         return true;
>    cc_error:
>         refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
>         return false;
>    }
> 
> and we can discuss whether we could improve on the
> refcount_warn_saturate() separately.

I can do the refcount_warn_saturate() change separately.

Let me go check how small I can get it...

> But see above: maybe just make this a separate "careful atomic_t",
> with the option to panic-on-overflow. So then we could get rid of
> refcount_warn_saturate() enmtirely above, and instead just have a
> (compile-time option) BUG() case, with the non-careful version just
> being our existing atomic_dec_and_test.

We used to have that option; the argument was made that everybody cares
about security and as long as this doesn't show up on benchmarks we
good.

Also, I don't think most people want the overflow to go BUG, WARN is
mostly the right thing and only the super paranoid use panic-on-warn or
something.
Kees Cook Aug. 20, 2021, 7:33 a.m. UTC | #7
On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 19, 2021 at 12:09:37PM -0700, Linus Torvalds wrote:
> > On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > If we can skip the OF... we can do something like this:
> > 
> > Honestly, I think a lot of the refcount code is questionable. It was
> > absolutely written with no care for performance AT ALL.
> 
> That's a bit unfair I feel. Will's last rewrite of the stuff was
> specifically to address performance issues.

Well, to address performance issues with the "full" version. The default
x86-specific code was already as fast as atomic_t. Will got it to nearly
match while making it catch all conditions, not just the exploitable
ones. (i.e. it didn't bother trying to catch underflow; there's no way
to mitigate it).

Will's version gave us three properties: correctness (it catches all the
pathological conditions), speed (it was very nearly the same speed as
regular atomic_t), and arch-agnosticism, which expanded this protection
to things beyond just x86 and arm64.

> > But see above: maybe just make this a separate "careful atomic_t",
> > with the option to panic-on-overflow. So then we could get rid of
> > refcount_warn_saturate() enmtirely above, and instead just have a
> > (compile-time option) BUG() case, with the non-careful version just
> > being our existing atomic_dec_and_test.

This is nearly what we had before. But refcount_t should always saturate
on overflow -- that's specifically the mitigation needed to defang the
traditional atomic_t overflow exploits (of which we had several a year
before refcount_t and now we've seen zero since).

> We used to have that option; the argument was made that everybody cares
> about security and as long as this doesn't show up on benchmarks we
> good.
> 
> Also, I don't think most people want the overflow to go BUG, WARN is
> mostly the right thing and only the super paranoid use panic-on-warn or
> something.

Saturating on overflow stops exploitability. WARNing is informational.
BUG kills the system for no good reason: the saturation is the defense
against attack, and the WARN is the "oh, I found a bug" details needed
to fix it.

I prefer the arch-agnostic, fully checked, very fast version of this
(i.e. what we have right now). :P I appreciate it's larger, but in my
opinion size isn't as important as correctness and speed. If it's just
as fast as a small version but has greater coverage, that seems worth
the size.
Peter Zijlstra Aug. 20, 2021, 8:16 a.m. UTC | #8
On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote:

> Fine with me; although the immediate complaint from Andrew was about
> size, hence my UD1 hackery.
> 
> > So if we do this, I think it should be something like
> > 
> >    static inline __must_check bool refcount_dec_and_test(refcount_t *r)
> >    {
> >         asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
> >                 "jz %l[cc_zero]\n\t"
> >                 "jl %l[cc_error]"
> >                 : : [var] "m" (r->refs.counter)
> >                 : "memory" : cc_zero, cc_error);
> > 
> >         return false;
> > 
> >    cc_zero:
> >         return true;
> >    cc_error:
> >         refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> >         return false;
> >    }
> > 
> > and we can discuss whether we could improve on the
> > refcount_warn_saturate() separately.
> 
> I can do the refcount_warn_saturate() change separately.
> 
> Let me go check how small I can get it...

gcc-10.2.1, x86_64-defconfig

kernel/event/core.o-inline-ud1:     96454
kernel/event/core.o-outofline-ud1:  96604
kernel/event/core.o-outofline-call: 97072

(42 refcount_warn_saturate/ud1 instances in that file,
 10 of which are refcount_dec_and_test)
Will Deacon Aug. 20, 2021, 8:24 a.m. UTC | #9
On Fri, Aug 20, 2021 at 10:16:18AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 20, 2021 at 08:43:40AM +0200, Peter Zijlstra wrote:
> 
> > Fine with me; although the immediate complaint from Andrew was about
> > size, hence my UD1 hackery.
> > 
> > > So if we do this, I think it should be something like
> > > 
> > >    static inline __must_check bool refcount_dec_and_test(refcount_t *r)
> > >    {
> > >         asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
> > >                 "jz %l[cc_zero]\n\t"
> > >                 "jl %l[cc_error]"
> > >                 : : [var] "m" (r->refs.counter)
> > >                 : "memory" : cc_zero, cc_error);
> > > 
> > >         return false;
> > > 
> > >    cc_zero:
> > >         return true;
> > >    cc_error:
> > >         refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> > >         return false;
> > >    }
> > > 
> > > and we can discuss whether we could improve on the
> > > refcount_warn_saturate() separately.
> > 
> > I can do the refcount_warn_saturate() change separately.
> > 
> > Let me go check how small I can get it...
> 
> gcc-10.2.1, x86_64-defconfig
> 
> kernel/event/core.o-inline-ud1:     96454
> kernel/event/core.o-outofline-ud1:  96604
> kernel/event/core.o-outofline-call: 97072
> 
> (42 refcount_warn_saturate/ud1 instances in that file,
>  10 of which are refcount_dec_and_test)

Is that with the saturation moved to the UD handler as well? I think it
would be good to keep that as close to the point at which we detect the
problem as we can, so perhaps we can inline that part and leave the
diagnostics to the exception handler?

Will
Peter Zijlstra Aug. 20, 2021, 9:03 a.m. UTC | #10
On Fri, Aug 20, 2021 at 09:24:58AM +0100, Will Deacon wrote:

> > gcc-10.2.1, x86_64-defconfig
> > 
> > kernel/event/core.o-inline-ud1:     96454
> > kernel/event/core.o-outofline-ud1:  96604
> > kernel/event/core.o-outofline-call: 97072

    kernel/event/core.o-outofline-saturate-ud2: 96954
    kernel/event/core.o:                97248

> Is that with the saturation moved to the UD handler as well? 

Yep, that's the full function call replaced with an exception.

> I think it would be good to keep that as close to the point at which
> we detect the problem as we can, so perhaps we can inline that part
> and leave the diagnostics to the exception handler?

That's simpler execption code too, we can abuse the existing WARN/UD2
stuff.

---
 arch/x86/include/asm/refcount.h | 31 +++++++++++++++++++++++++++++++
 include/asm-generic/bug.h       |  4 ++++
 include/linux/refcount.h        | 15 +++++++++++----
 lib/bug.c                       | 13 ++++++++++++-
 lib/refcount.c                  |  7 ++-----
 5 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
new file mode 100644
index 000000000000..bed52b95d24c
--- /dev/null
+++ b/arch/x86/include/asm/refcount.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_REFCOUNT_H
+#define _ASM_X86_REFCOUNT_H
+
+#define refcount_warn_saturate refcount_warn_saturate
+static __always_inline void refcount_warn_saturate(refcount_t *r, const enum refcount_saturation_type t)
+{
+	refcount_set(r, REFCOUNT_SATURATED);
+	__WARN_FLAGS(BUGFLAG_ONCE|BUGFLAG_REFCOUNT|BUGFLAG_REFCOUNT_TYPE(t));
+}
+
+#define refcount_dec_and_test refcount_dec_and_test
+static inline bool refcount_dec_and_test(refcount_t *r)
+{
+	asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
+			   "jz %l[cc_zero]\n\t"
+			   "jl %l[cc_error]"
+			   : : [var] "m" (r->refs.counter)
+			   : "memory" : cc_zero, cc_error);
+
+	return false;
+
+cc_zero:
+	return true;
+
+cc_error:
+	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
+	return false;
+}
+
+#endif /* _ASM_X86_REFCOUNT_H */
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index edb0e2a602a8..9937c70138b8 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -13,6 +13,10 @@
 #define BUGFLAG_ONCE		(1 << 1)
 #define BUGFLAG_DONE		(1 << 2)
 #define BUGFLAG_NO_CUT_HERE	(1 << 3)	/* CUT_HERE already sent */
+
+#define BUGFLAG_REFCOUNT	(1 << 4)
+#define BUGFLAG_REFCOUNT_TYPE(x)((x&3) << 5)
+
 #define BUGFLAG_TAINT(taint)	((taint) << 8)
 #define BUG_GET_TAINT(bug)	((bug)->flags >> 8)
 #endif
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index b8a6e387f8f9..7db2b024a75d 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -117,14 +117,13 @@ typedef struct refcount_struct {
 #define REFCOUNT_SATURATED	(INT_MIN / 2)
 
 enum refcount_saturation_type {
-	REFCOUNT_ADD_NOT_ZERO_OVF,
-	REFCOUNT_ADD_OVF,
+	REFCOUNT_ADD_OVF = 0,
 	REFCOUNT_ADD_UAF,
 	REFCOUNT_SUB_UAF,
 	REFCOUNT_DEC_LEAK,
 };
 
-void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t);
+extern void __refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t);
 
 /**
  * refcount_set - set a refcount's value
@@ -136,6 +135,12 @@ static inline void refcount_set(refcount_t *r, int n)
 	atomic_set(&r->refs, n);
 }
 
+#include <asm/refcount.h>
+
+#ifndef refcount_warn_saturate
+#define refcount_warn_saturate __refcount_warn_saturate
+#endif
+
 /**
  * refcount_read - get a refcount's value
  * @r: the refcount
@@ -160,7 +165,7 @@ static inline __must_check bool __refcount_add_not_zero(int i, refcount_t *r, in
 		*oldp = old;
 
 	if (unlikely(old < 0 || old + i < 0))
-		refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
+		refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
 
 	return old;
 }
@@ -328,10 +333,12 @@ static inline __must_check bool __refcount_dec_and_test(refcount_t *r, int *oldp
  *
  * Return: true if the resulting refcount is 0, false otherwise
  */
+#ifndef refcount_dec_and_test
 static inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
 	return __refcount_dec_and_test(r, NULL);
 }
+#endif
 
 static inline void __refcount_dec(refcount_t *r, int *oldp)
 {
diff --git a/lib/bug.c b/lib/bug.c
index 45a0584f6541..3878df956143 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -154,11 +154,18 @@ struct bug_entry *find_bug(unsigned long bugaddr)
 	return module_find_bug(bugaddr);
 }
 
+static const char *refstr[] = {
+	"refcount_t: saturated; leaking memory",
+	"refcount_t: addition on 0; use-after-free",
+	"refcount_t: underflow; use-after-free",
+	"refcount_t: decrement hit 0; leaking memory",
+};
+
 enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 {
+	unsigned line, warning, once, done, refcount;
 	struct bug_entry *bug;
 	const char *file;
-	unsigned line, warning, once, done;
 
 	if (!is_valid_bugaddr(bugaddr))
 		return BUG_TRAP_TYPE_NONE;
@@ -174,6 +181,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 	warning = (bug->flags & BUGFLAG_WARNING) != 0;
 	once = (bug->flags & BUGFLAG_ONCE) != 0;
 	done = (bug->flags & BUGFLAG_DONE) != 0;
+	refcount = (bug->flags & BUGFLAG_REFCOUNT) != 0;
 
 	if (warning && once) {
 		if (done)
@@ -195,6 +203,9 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 		printk(KERN_DEFAULT CUT_HERE);
 
 	if (warning) {
+		if (refcount)
+			pr_warn("%s\n", refstr[(bug->flags >> 5) & 3]);
+
 		/* this is a WARN_ON rather than BUG/BUG_ON */
 		__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
 		       NULL);
diff --git a/lib/refcount.c b/lib/refcount.c
index a207a8f22b3c..a36da0611f25 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -10,14 +10,11 @@
 
 #define REFCOUNT_WARN(str)	WARN_ONCE(1, "refcount_t: " str ".\n")
 
-void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
+void __refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
 {
 	refcount_set(r, REFCOUNT_SATURATED);
 
 	switch (t) {
-	case REFCOUNT_ADD_NOT_ZERO_OVF:
-		REFCOUNT_WARN("saturated; leaking memory");
-		break;
 	case REFCOUNT_ADD_OVF:
 		REFCOUNT_WARN("saturated; leaking memory");
 		break;
@@ -34,7 +31,7 @@ void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
 		REFCOUNT_WARN("unknown saturation event!?");
 	}
 }
-EXPORT_SYMBOL(refcount_warn_saturate);
+EXPORT_SYMBOL(__refcount_warn_saturate);
 
 /**
  * refcount_dec_if_one - decrement a refcount if it is 1
Kees Cook Aug. 20, 2021, 5:26 p.m. UTC | #11
On Fri, Aug 20, 2021 at 11:03:00AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 20, 2021 at 09:24:58AM +0100, Will Deacon wrote:
> 
> > > gcc-10.2.1, x86_64-defconfig
> > > 
> > > kernel/event/core.o-inline-ud1:     96454
> > > kernel/event/core.o-outofline-ud1:  96604
> > > kernel/event/core.o-outofline-call: 97072
> 
>     kernel/event/core.o-outofline-saturate-ud2: 96954
>     kernel/event/core.o:                97248
> 
> > Is that with the saturation moved to the UD handler as well? 
> 
> Yep, that's the full function call replaced with an exception.
> 
> > I think it would be good to keep that as close to the point at which
> > we detect the problem as we can, so perhaps we can inline that part
> > and leave the diagnostics to the exception handler?
> 
> That's simpler execption code too, we can abuse the existing WARN/UD2
> stuff.
> 
> ---
>  arch/x86/include/asm/refcount.h | 31 +++++++++++++++++++++++++++++++
>  include/asm-generic/bug.h       |  4 ++++
>  include/linux/refcount.h        | 15 +++++++++++----
>  lib/bug.c                       | 13 ++++++++++++-
>  lib/refcount.c                  |  7 ++-----
>  5 files changed, 60 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> new file mode 100644
> index 000000000000..bed52b95d24c
> --- /dev/null
> +++ b/arch/x86/include/asm/refcount.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_REFCOUNT_H
> +#define _ASM_X86_REFCOUNT_H
> +
> +#define refcount_warn_saturate refcount_warn_saturate
> +static __always_inline void refcount_warn_saturate(refcount_t *r, const enum refcount_saturation_type t)
> +{
> +	refcount_set(r, REFCOUNT_SATURATED);
> +	__WARN_FLAGS(BUGFLAG_ONCE|BUGFLAG_REFCOUNT|BUGFLAG_REFCOUNT_TYPE(t));
> +}

Instead of using up warn flags, what was done in
the past was to use an explicit EXTABLE in a cold text section:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/asm.h?h=v4.15#n80
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/mm/extable.c?h=v4.15#n45
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/refcount.h?h=v4.15

> +
> +#define refcount_dec_and_test refcount_dec_and_test
> +static inline bool refcount_dec_and_test(refcount_t *r)
> +{
> +	asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t"
> +			   "jz %l[cc_zero]\n\t"
> +			   "jl %l[cc_error]"
> +			   : : [var] "m" (r->refs.counter)
> +			   : "memory" : cc_zero, cc_error);
> +
> +	return false;
> +
> +cc_zero:
> +	return true;
> +
> +cc_error:
> +	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> +	return false;
> +}

This looks basically the same as what we had before (i.e. the earlier
REFCOUNT_CHECK_LE_ZERO within GEN_UNARY_SUFFIXED_RMWcc).
diff mbox series

Patch

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c976cc6de257..38151efe1a59 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -12,6 +12,8 @@ 
 #include <linux/memcontrol.h>
 #include <linux/highmem.h>
 
+#include <linux/refcount.h>
+
 /*
  * The anon_vma heads a list of private "related" vmas, to scan if
  * an anonymous page pointing to this anon_vma needs to be unmapped:
@@ -36,7 +38,7 @@  struct anon_vma {
 	 * the reference is responsible for clearing up the
 	 * anon_vma if they are the last user on release
 	 */
-	atomic_t refcount;
+	refcount_t refcount;
 
 	/*
 	 * Count of child anon_vmas and VMAs which points to this anon_vma.
@@ -100,14 +102,14 @@  enum ttu_flags {
 #ifdef CONFIG_MMU
 static inline void get_anon_vma(struct anon_vma *anon_vma)
 {
-	atomic_inc(&anon_vma->refcount);
+	refcount_inc(&anon_vma->refcount);
 }
 
 void __put_anon_vma(struct anon_vma *anon_vma);
 
 static inline void put_anon_vma(struct anon_vma *anon_vma)
 {
-	if (atomic_dec_and_test(&anon_vma->refcount))
+	if (refcount_dec_and_test(&anon_vma->refcount))
 		__put_anon_vma(anon_vma);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
index b9eb5c12f3fe..7badd786e095 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -88,7 +88,7 @@  static inline struct anon_vma *anon_vma_alloc(void)
 
 	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
 	if (anon_vma) {
-		atomic_set(&anon_vma->refcount, 1);
+		refcount_set(&anon_vma->refcount, 1);
 		anon_vma->degree = 1;	/* Reference for first vma */
 		anon_vma->parent = anon_vma;
 		/*
@@ -103,7 +103,7 @@  static inline struct anon_vma *anon_vma_alloc(void)
 
 static inline void anon_vma_free(struct anon_vma *anon_vma)
 {
-	VM_BUG_ON(atomic_read(&anon_vma->refcount));
+	VM_BUG_ON(refcount_read(&anon_vma->refcount));
 
 	/*
 	 * Synchronize against page_lock_anon_vma_read() such that
@@ -445,7 +445,7 @@  static void anon_vma_ctor(void *data)
 	struct anon_vma *anon_vma = data;
 
 	init_rwsem(&anon_vma->rwsem);
-	atomic_set(&anon_vma->refcount, 0);
+	refcount_set(&anon_vma->refcount, 0);
 	anon_vma->rb_root = RB_ROOT_CACHED;
 }
 
@@ -495,7 +495,7 @@  struct anon_vma *page_get_anon_vma(struct page *page)
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+	if (!refcount_inc_not_zero(&anon_vma->refcount)) {
 		anon_vma = NULL;
 		goto out;
 	}
@@ -554,7 +554,7 @@  struct anon_vma *page_lock_anon_vma_read(struct page *page)
 	}
 
 	/* trylock failed, we got to sleep */
-	if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+	if (!refcount_inc_not_zero(&anon_vma->refcount)) {
 		anon_vma = NULL;
 		goto out;
 	}
@@ -569,7 +569,7 @@  struct anon_vma *page_lock_anon_vma_read(struct page *page)
 	rcu_read_unlock();
 	anon_vma_lock_read(anon_vma);
 
-	if (atomic_dec_and_test(&anon_vma->refcount)) {
+	if (refcount_dec_and_test(&anon_vma->refcount)) {
 		/*
 		 * Oops, we held the last refcount, release the lock
 		 * and bail -- can't simply use put_anon_vma() because
@@ -2221,7 +2221,7 @@  void __put_anon_vma(struct anon_vma *anon_vma)
 	struct anon_vma *root = anon_vma->root;
 
 	anon_vma_free(anon_vma);
-	if (root != anon_vma && atomic_dec_and_test(&root->refcount))
+	if (root != anon_vma && refcount_dec_and_test(&root->refcount))
 		anon_vma_free(root);
 }