[1/5] fs: Do not allow get_file() to resurrect 0 f_count

Message ID	20240502223341.1835070-1-keescook@chromium.org (mailing list archive)
State	New, archived
Headers	show Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B79C1C286 for <linux-fsdevel@vger.kernel.org>; Thu, 2 May 2024 22:33:46 +0000 (UTC) From: Kees Cook <keescook@chromium.org> To: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org, Zack Rusin <zack.rusin@broadcom.com>, Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>, Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, Jani Nikula <jani.nikula@linux.intel.com>, Joonas Lahtinen <joonas.lahtinen@linux.intel.com>, Rodrigo Vivi <rodrigo.vivi@intel.com>, Tvrtko Ursulin <tursulin@ursulin.net>, Andi Shyti <andi.shyti@linux.intel.com>, Lucas De Marchi <lucas.demarchi@intel.com>, Matt Atwood <matthew.s.atwood@intel.com>, Matthew Auld <matthew.auld@intel.com>, Nirmoy Das <nirmoy.das@intel.com>, Jonathan Cavitt <jonathan.cavitt@intel.com>, Will Deacon <will@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Boqun Feng <boqun.feng@gmail.com>, Mark Rutland <mark.rutland@arm.com>, Kent Overstreet <kent.overstreet@linux.dev>, Masahiro Yamada <masahiroy@kernel.org>, Nathan Chancellor <nathan@kernel.org>, Nicolas Schier <nicolas@fjasle.eu>, Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kbuild@vger.kernel.org, linux-hardening@vger.kernel.org Subject: [PATCH 1/5] fs: Do not allow get_file() to resurrect 0 f_count Date: Thu, 2 May 2024 15:33:36 -0700 Message-Id: <20240502223341.1835070-1-keescook@chromium.org> In-Reply-To: <20240502222252.work.690-kees@kernel.org> References: <20240502222252.work.690-kees@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	fs: Do not allow get_file() to resurrect 0 f_count \| expand [0/5] fs: Do not allow get_file() to resurrect 0 f_count [1/5] fs: Do not allow get_file() to resurrect 0 f_count [2/5] drm/vmwgfx: Do not directly manipulate file->f_count [3/5] drm/i915: Do not directly manipulate file->f_count [4/5] refcount: Introduce refcount_long_t and APIs [5/5] fs: Convert struct file::f_count to refcount_long_t

Message ID

20240502223341.1835070-1-keescook@chromium.org (mailing list archive)

State

New, archived

Headers

From: Kees Cook <keescook@chromium.org>
To: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Zack Rusin <zack.rusin@broadcom.com>,
	Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Lucas De Marchi <lucas.demarchi@intel.com>,
	Matt Atwood <matthew.s.atwood@intel.com>,
	Matthew Auld <matthew.auld@intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>,
	Jonathan Cavitt <jonathan.cavitt@intel.com>,
	Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nicolas Schier <nicolas@fjasle.eu>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org,
	linux-kbuild@vger.kernel.org,
	linux-hardening@vger.kernel.org
Subject: [PATCH 1/5] fs: Do not allow get_file() to resurrect 0 f_count
Date: Thu,  2 May 2024 15:33:36 -0700
Message-Id: <20240502223341.1835070-1-keescook@chromium.org>
In-Reply-To: <20240502222252.work.690-kees@kernel.org>
References: <20240502222252.work.690-kees@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

fs: Do not allow get_file() to resurrect 0 f_count | expand

Commit Message

Kees Cook May 2, 2024, 10:33 p.m. UTC

If f_count reaches 0, calling get_file() should be a failure. Adjust to
use atomic_long_inc_not_zero() and return NULL on failure. In the future
get_file() can be annotated with __must_check, though that is not
currently possible.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
---
 include/linux/fs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jann Horn May 2, 2024, 10:53 p.m. UTC | #1

On Fri, May 3, 2024 at 12:34 AM Kees Cook <keescook@chromium.org> wrote:
> If f_count reaches 0, calling get_file() should be a failure. Adjust to
> use atomic_long_inc_not_zero() and return NULL on failure. In the future
> get_file() can be annotated with __must_check, though that is not
> currently possible.
[...]
>  static inline struct file *get_file(struct file *f)
>  {
> -       atomic_long_inc(&f->f_count);
> +       if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
> +               return NULL;
>         return f;
>  }

Oh, I really don't like this...

In most code, if you call get_file() on a file and see refcount zero,
that basically means you're in a UAF write situation, or that you
could be in such a situation if you had raced differently. It's
basically just like refcount_inc() in that regard.

And get_file() has semantics just like refcount_inc(): The caller
guarantees that it is already holding a reference to the file; and if
the caller is wrong about that, their subsequent attempt to clean up
the reference that they think they were already holding will likely
lead to UAF too. If get_file() sees a zero refcount, there is no safe
way to continue. And all existing callers of get_file() expect the
return value to be the same as the non-NULL pointer they passed in, so
they'll either ignore the result of this check and barrel on, or oops
with a NULL deref.

For callers that want to actually try incrementing file refcounts that
could be zero, which is only possible under specific circumstances, we
have helpers like get_file_rcu() and get_file_active().

Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
there instead?

Kees Cook May 2, 2024, 11:03 p.m. UTC | #2

On Fri, May 03, 2024 at 12:53:56AM +0200, Jann Horn wrote:
> On Fri, May 3, 2024 at 12:34 AM Kees Cook <keescook@chromium.org> wrote:
> > If f_count reaches 0, calling get_file() should be a failure. Adjust to
> > use atomic_long_inc_not_zero() and return NULL on failure. In the future
> > get_file() can be annotated with __must_check, though that is not
> > currently possible.
> [...]
> >  static inline struct file *get_file(struct file *f)
> >  {
> > -       atomic_long_inc(&f->f_count);
> > +       if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
> > +               return NULL;
> >         return f;
> >  }
> 
> Oh, I really don't like this...
> 
> In most code, if you call get_file() on a file and see refcount zero,
> that basically means you're in a UAF write situation, or that you
> could be in such a situation if you had raced differently. It's
> basically just like refcount_inc() in that regard.

Shouldn't the system attempt to not make things worse if it encounters
an inc-from-0 condition? Yes, we've already lost the race for a UaF
condition, but maybe don't continue on.

> And get_file() has semantics just like refcount_inc(): The caller
> guarantees that it is already holding a reference to the file; and if

Yes, but if that guarantee is violated, we should do something about it.

> the caller is wrong about that, their subsequent attempt to clean up
> the reference that they think they were already holding will likely
> lead to UAF too. If get_file() sees a zero refcount, there is no safe
> way to continue. And all existing callers of get_file() expect the
> return value to be the same as the non-NULL pointer they passed in, so
> they'll either ignore the result of this check and barrel on, or oops
> with a NULL deref.
> 
> For callers that want to actually try incrementing file refcounts that
> could be zero, which is only possible under specific circumstances, we
> have helpers like get_file_rcu() and get_file_active().

So what's going on in here:
https://lore.kernel.org/linux-hardening/20240502223341.1835070-2-keescook@chromium.org/

> Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
> there instead?

I'm open to suggestions, but given what's happening with struct dma_buf
above, it seems like this is a state worth checking for?

Christian Brauner May 3, 2024, 9:02 a.m. UTC | #3

On Thu, May 02, 2024 at 04:03:24PM -0700, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:53:56AM +0200, Jann Horn wrote:
> > On Fri, May 3, 2024 at 12:34 AM Kees Cook <keescook@chromium.org> wrote:
> > > If f_count reaches 0, calling get_file() should be a failure. Adjust to
> > > use atomic_long_inc_not_zero() and return NULL on failure. In the future
> > > get_file() can be annotated with __must_check, though that is not
> > > currently possible.
> > [...]
> > >  static inline struct file *get_file(struct file *f)
> > >  {
> > > -       atomic_long_inc(&f->f_count);
> > > +       if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
> > > +               return NULL;
> > >         return f;
> > >  }
> > 
> > Oh, I really don't like this...
> > 
> > In most code, if you call get_file() on a file and see refcount zero,
> > that basically means you're in a UAF write situation, or that you
> > could be in such a situation if you had raced differently. It's
> > basically just like refcount_inc() in that regard.
> 
> Shouldn't the system attempt to not make things worse if it encounters
> an inc-from-0 condition? Yes, we've already lost the race for a UaF
> condition, but maybe don't continue on.
> 
> > And get_file() has semantics just like refcount_inc(): The caller
> > guarantees that it is already holding a reference to the file; and if
> 
> Yes, but if that guarantee is violated, we should do something about it.
> 
> > the caller is wrong about that, their subsequent attempt to clean up
> > the reference that they think they were already holding will likely
> > lead to UAF too. If get_file() sees a zero refcount, there is no safe
> > way to continue. And all existing callers of get_file() expect the
> > return value to be the same as the non-NULL pointer they passed in, so
> > they'll either ignore the result of this check and barrel on, or oops
> > with a NULL deref.
> > 
> > For callers that want to actually try incrementing file refcounts that
> > could be zero, which is only possible under specific circumstances, we
> > have helpers like get_file_rcu() and get_file_active().
> 
> So what's going on in here:
> https://lore.kernel.org/linux-hardening/20240502223341.1835070-2-keescook@chromium.org/

Afaict, there's dma_buf_export() that allocates a new file and sets:

file->private_data = dmabuf;
dmabuf->file = file;

The file has f_op->release::dma_buf_file_release() as it's f_op->release
method. When that's called the file's refcount is already zero but the
file has not been freed yet. This will remove the dmabuf from some
public list but it won't free it.

Then we see that any dentry allocated for such a dmabuf file will have
dma_buf_dentry_ops which in turn has
dentry->d_release::dma_buf_release() which is where the actual release
of the dma buffer happens taken from dentry->d_fsdata.

That whole thing calls allocate_file_pseudo() which allocates a new
dentry specific to that struct file. That dentry is unhashed (no lookup)
and thus isn't retained so when dput() is called and it's the last
reference it's immediately followed by
dentry->d_release::dma_buf_release() which wipes the dmabuf itself.

The lifetime of the dmabuf is managed via fget()/fput(). So the lifetime
of the dmabuf and the lifetime of the file are almost identical afaict:

__fput()
-> f_op->release::dma_buf_file_release() // handles file specific freeing
-> dput()
   -> d_op->d_release::dma_buf_release() // handles dmabuf freeing
                                         // including the driver specific stuff.

If you fput() the file then the dmabuf will be freed as well immediately
after it when the dput() happens in __fput() (I struggle to come up with
an explanation why the freeing of the dmabuf is moved to
dentry->d_release instead of f_op->release itself but that's a separate
matter.).

So on the face of it without looking a little closer

static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
{
        return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L;
}

looks wrong or broken. Because if dmabuf->file->f_count is 0 it implies
that @dmabuf should have already been freed. So the bug would be in
accessing @dmabuf. And if @dmabuf is valid then it automatically means
that dmabuf->file->f_count isn't 0. So it looks like it could just use
get_file().

_But_ the interesting bits are in ttm_object_device_init(). This steals
the dma_buf_ops into tdev->ops. It then takes dma_buf_ops->release and
stores it away into tdev->dma_buf_release. Then it overrides the
dma_buf_ops->release with ttm_prime_dmabuf_release(). And that's where
the very questionable magic happens.

So now let's say the dmabuf is freed because of lat fput(). We now get
f_op->release::dma_buf_file_release(). Then it's followed by dput() and
ultimately dentry->d_release::dma_buf_release() as mentioned above.

But now when we get:

dentry->d_release::dma_buf_release()
-> dmabuf->ops->release::ttm_prime_dmabuf_release()

instead of the original dmabuf->ops->release method that was stolen into
tdev->dmabuf_release. And ttm_prime_dmabuf_release() now calls
tdev->dma_buf_release() which just frees the data associated with the
dmabuf not the dmabuf itself.

ttm_prime_dmabuf_release() then takes prime->mutex_lock replacing
prime->dma_buf with NULL.

The same lock is taken in ttm_prime_handle_to_fd() which is picking that
dmabuf from prime->dmabuf. So the interesting case is when
ttm_prime_dma_buf_release() has called tdev->dmabuf_release() and but
someone else maanged to grab prime->mutex_lock before
ttm_prime_dma_buf_release() could grab it to NULL prime->dma_buf.

So at that point @dmabuf hasn't been freed yet and is still valid. So
dereferencing prime->dma_buf is still valid and by extension
dma_buf->file as their lifetimes are tied.

IOW, that should just use get_file_active() which handles that just
fine.

And while that get_dma_buf_unless_doomed() thing is safe that whole code
reeks of a level of complexity that's asking for trouble.

But that has zero to do with get_file() and it is absolutely not a
reason to mess with it's semantics impacting every caller in the tree.

> 
> > Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
> > there instead?
> 
> I'm open to suggestions, but given what's happening with struct dma_buf
> above, it seems like this is a state worth checking for?

No, it's really not. If you use get_file() you better know that you're
already holding a valid reference that's no reason to make it suddenly
fail.

Hillf Danton May 6, 2024, 10:41 a.m. UTC | #4

On Fri, 3 May 2024 11:02:57 +0200 Christian Brauner <brauner@kernel.org>
> On Thu, May 02, 2024 at 04:03:24PM -0700, Kees Cook wrote:
> > On Fri, May 03, 2024 at 12:53:56AM +0200, Jann Horn wrote:
> > > On Fri, May 3, 2024 at 12:34 AM Kees Cook <keescook@chromium.org> wrote:
> > > > If f_count reaches 0, calling get_file() should be a failure. Adjust to
> > > > use atomic_long_inc_not_zero() and return NULL on failure. In the future
> > > > get_file() can be annotated with __must_check, though that is not
> > > > currently possible.
> > > [...]
> > > >  static inline struct file *get_file(struct file *f)
> > > >  {
> > > > -       atomic_long_inc(&f->f_count);
> > > > +       if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
> > > > +               return NULL;
> > > >         return f;
> > > >  }
> > > 
> > > Oh, I really don't like this...
> > > 
> > > In most code, if you call get_file() on a file and see refcount zero,
> > > that basically means you're in a UAF write situation, or that you
> > > could be in such a situation if you had raced differently. It's
> > > basically just like refcount_inc() in that regard.
> > 
> > Shouldn't the system attempt to not make things worse if it encounters
> > an inc-from-0 condition? Yes, we've already lost the race for a UaF
> > condition, but maybe don't continue on.
> > 
> > > And get_file() has semantics just like refcount_inc(): The caller
> > > guarantees that it is already holding a reference to the file; and if
> > 
> > Yes, but if that guarantee is violated, we should do something about it.
> > 
> > > the caller is wrong about that, their subsequent attempt to clean up
> > > the reference that they think they were already holding will likely
> > > lead to UAF too. If get_file() sees a zero refcount, there is no safe
> > > way to continue. And all existing callers of get_file() expect the
> > > return value to be the same as the non-NULL pointer they passed in, so
> > > they'll either ignore the result of this check and barrel on, or oops
> > > with a NULL deref.
> > > 
> > > For callers that want to actually try incrementing file refcounts that
> > > could be zero, which is only possible under specific circumstances, we
> > > have helpers like get_file_rcu() and get_file_active().
> > 
> > So what's going on in here:
> > https://lore.kernel.org/linux-hardening/20240502223341.1835070-2-keescook@chromium.org/
> 
> Afaict, there's dma_buf_export() that allocates a new file and sets:
> 
> file->private_data = dmabuf;
> dmabuf->file = file;
> 
> The file has f_op->release::dma_buf_file_release() as it's f_op->release
> method. When that's called the file's refcount is already zero but the
> file has not been freed yet. This will remove the dmabuf from some
> public list but it won't free it.
> 
> Then we see that any dentry allocated for such a dmabuf file will have
> dma_buf_dentry_ops which in turn has
> dentry->d_release::dma_buf_release() which is where the actual release
> of the dma buffer happens taken from dentry->d_fsdata.
> 
> That whole thing calls allocate_file_pseudo() which allocates a new
> dentry specific to that struct file. That dentry is unhashed (no lookup)
> and thus isn't retained so when dput() is called and it's the last
> reference it's immediately followed by
> dentry->d_release::dma_buf_release() which wipes the dmabuf itself.
> 
> The lifetime of the dmabuf is managed via fget()/fput(). So the lifetime
> of the dmabuf and the lifetime of the file are almost identical afaict:
> 
> __fput()
> -> f_op->release::dma_buf_file_release() // handles file specific freeing
> -> dput()
>    -> d_op->d_release::dma_buf_release() // handles dmabuf freeing
>                                          // including the driver specific stuff.
> 
> If you fput() the file then the dmabuf will be freed as well immediately
> after it when the dput() happens in __fput() (I struggle to come up with
> an explanation why the freeing of the dmabuf is moved to
> dentry->d_release instead of f_op->release itself but that's a separate
> matter.).
> 
> So on the face of it without looking a little closer
> 
> static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
> {
>         return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L;
> }
> 
> looks wrong or broken. Because if dmabuf->file->f_count is 0 it implies
> that @dmabuf should have already been freed. So the bug would be in
> accessing @dmabuf. And if @dmabuf is valid then it automatically means
> that dmabuf->file->f_count isn't 0. So it looks like it could just use
> get_file().
> 
> _But_ the interesting bits are in ttm_object_device_init(). This steals
> the dma_buf_ops into tdev->ops. It then takes dma_buf_ops->release and
> stores it away into tdev->dma_buf_release. Then it overrides the
> dma_buf_ops->release with ttm_prime_dmabuf_release(). And that's where
> the very questionable magic happens.
> 
> So now let's say the dmabuf is freed because of lat fput(). We now get
> f_op->release::dma_buf_file_release(). Then it's followed by dput() and
> ultimately dentry->d_release::dma_buf_release() as mentioned above.
> 
> But now when we get:
> 
> dentry->d_release::dma_buf_release()
> -> dmabuf->ops->release::ttm_prime_dmabuf_release()
> 
> instead of the original dmabuf->ops->release method that was stolen into
> tdev->dmabuf_release. And ttm_prime_dmabuf_release() now calls
> tdev->dma_buf_release() which just frees the data associated with the
> dmabuf not the dmabuf itself.
> 
> ttm_prime_dmabuf_release() then takes prime->mutex_lock replacing
> prime->dma_buf with NULL.
> 
> The same lock is taken in ttm_prime_handle_to_fd() which is picking that
> dmabuf from prime->dmabuf. So the interesting case is when
> ttm_prime_dma_buf_release() has called tdev->dmabuf_release() and but
> someone else maanged to grab prime->mutex_lock before
> ttm_prime_dma_buf_release() could grab it to NULL prime->dma_buf.
> 
> So at that point @dmabuf hasn't been freed yet and is still valid. So
> dereferencing prime->dma_buf is still valid and by extension
> dma_buf->file as their lifetimes are tied.
> 
> IOW, that should just use get_file_active() which handles that just
> fine.
> 
> And while that get_dma_buf_unless_doomed() thing is safe that whole code
> reeks of a level of complexity that's asking for trouble.
> 
> But that has zero to do with get_file() and it is absolutely not a
> reason to mess with it's semantics impacting every caller in the tree.
> 
> > 
> > > Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
> > > there instead?
> > 
> > I'm open to suggestions, but given what's happening with struct dma_buf
> > above, it seems like this is a state worth checking for?
> 
> No, it's really not. If you use get_file() you better know that you're
> already holding a valid reference that's no reason to make it suddenly
> fail.
> 
But the simple question is, why are you asking dma_buf to poll a 0 f_count
file? All fine if you are not.

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 00fc429b0af0..210bbbfe9b83 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1038,7 +1038,8 @@  struct file_handle {
 
 static inline struct file *get_file(struct file *f)
 {
-	atomic_long_inc(&f->f_count);
+	if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
+		return NULL;
 	return f;
 }

[1/5] fs: Do not allow get_file() to resurrect 0 f_count

Commit Message

Comments

Patch