[v2] lib/refcount: Document interaction with PID_MAX_LIMIT

Message ID	20200303105427.260620-1-jannh@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=LwRM=4U=lists.openwall.com=kernel-hardening-return-18049-patchwork-kernel-hardening=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F0E8D2080C Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk Date: Tue, 3 Mar 2020 11:54:27 +0100 Message-Id: <20200303105427.260620-1-jannh@google.com> Mime-Version: 1.0 Subject: [PATCH v2] lib/refcount: Document interaction with PID_MAX_LIMIT From: Jann Horn <jannh@google.com> To: Will Deacon <will@kernel.org>, Kees Cook <keescook@chromium.org>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org> Cc: kernel list <linux-kernel@vger.kernel.org>, Elena Reshetova <elena.reshetova@intel.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Hanjun Guo <guohanjun@huawei.com>, Jan Glauber <jglauber@marvell.com>, Kernel Hardening <kernel-hardening@lists.openwall.com> Content-Type: text/plain; charset="UTF-8"
Series	[v2] lib/refcount: Document interaction with PID_MAX_LIMIT \| expand [v2] lib/refcount: Document interaction with PID_MAX_LIMIT

Jann Horn March 3, 2020, 10:54 a.m. UTC

Document the circumstances under which refcount_t's saturation mechanism
works deterministically.

Signed-off-by: Jann Horn <jannh@google.com>
---

Notes:
    v2:
     - write down the math (Kees)

 include/linux/refcount.h | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)


base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b

Ard Biesheuvel March 3, 2020, 1:07 p.m. UTC | #1

On Tue, 3 Mar 2020 at 11:54, Jann Horn <jannh@google.com> wrote:
>
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
>
> Signed-off-by: Jann Horn <jannh@google.com>

I /think/ the main point of Kees's suggestion was that FUTEX_TID_MASK
is UAPI, so unlikely to change.


> ---
>
> Notes:
>     v2:
>      - write down the math (Kees)
>
>  include/linux/refcount.h | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
>   * atomic operations, then the count will continue to edge closer to 0. If it
>   * reaches a value of 1 before /any/ of the threads reset it to the saturated
>   * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + *     (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + *     0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
>   *
>   * Memory ordering
>   * ===============
>
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
> --
> 2.25.0.265.gbab2e86ba0-goog
>

Jann Horn March 3, 2020, 1:57 p.m. UTC | #2

On Tue, Mar 3, 2020 at 2:07 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On Tue, 3 Mar 2020 at 11:54, Jann Horn <jannh@google.com> wrote:
> >
> > Document the circumstances under which refcount_t's saturation mechanism
> > works deterministically.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> I /think/ the main point of Kees's suggestion was that FUTEX_TID_MASK
> is UAPI, so unlikely to change.

Yeah, but it has already changed three times in git history:

76b81e2b0e224 ("[PATCH] lightweight robust futexes updates 2"):
0x1fffffff -> 0x3fffffff
d0aa7a70bf03b ("futex_requeue_pi optimization"): 0x3fffffff -> 0x0fffffff
bd197234b0a6 ("Revert "futex_requeue_pi optimization""): 0x0fffffff ->
0x3fffffff

I just sent a patch to fix up a comment that still claimed the mask
was 0x1fffffff... so I didn't want to explicitly write the new value
here.

While making the value *bigger* would probably be a bit hard (and
unnecessary), making it smaller would be fairly easy here - the field
is populated by userspace, so even though the mask is 0x3fffffff,
userspace will never set the upper bits, so they're effectively
reserved bits with value 0.

Kees Cook March 3, 2020, 9:09 p.m. UTC | #3

On Tue, Mar 03, 2020 at 11:54:27AM +0100, Jann Horn wrote:
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
> 
> Signed-off-by: Jann Horn <jannh@google.com>

Acked-by: Kees Cook <keescook@chromium.org>

Thanks!

-Kees

> 
> Notes:
>     v2:
>      - write down the math (Kees)
> 
>  include/linux/refcount.h | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
>   * atomic operations, then the count will continue to edge closer to 0. If it
>   * reaches a value of 1 before /any/ of the threads reset it to the saturated
>   * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + *     (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + *     0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
>   *
>   * Memory ordering
>   * ===============
> 
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
> -- 
> 2.25.0.265.gbab2e86ba0-goog
>

Will Deacon March 17, 2020, 10:27 p.m. UTC | #4

On Tue, Mar 03, 2020 at 11:54:27AM +0100, Jann Horn wrote:
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
> 
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> 
> Notes:
>     v2:
>      - write down the math (Kees)
> 
>  include/linux/refcount.h | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
>   * atomic operations, then the count will continue to edge closer to 0. If it
>   * reaches a value of 1 before /any/ of the threads reset it to the saturated
>   * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + *     (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + *     0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
>   *
>   * Memory ordering
>   * ===============
> 
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b

Acked-by: Will Deacon <will@kernel.org>

Peter -- would you be able to take this through -tip, please?

Will

Peter Zijlstra March 24, 2020, 3:29 p.m. UTC | #5

On Tue, Mar 17, 2020 at 10:27:18PM +0000, Will Deacon wrote:

> Acked-by: Will Deacon <will@kernel.org>
> 
> Peter -- would you be able to take this through -tip, please?

Got it, I'll stick it in locking/core.

Thanks!

[v2] lib/refcount: Document interaction with PID_MAX_LIMIT

Commit Message

Comments

Patch