diff mbox series

[5/5] fs: Convert struct file::f_count to refcount_long_t

Message ID 20240502223341.1835070-5-keescook@chromium.org (mailing list archive)
State New, archived
Headers show
Series fs: Do not allow get_file() to resurrect 0 f_count | expand

Commit Message

Kees Cook May 2, 2024, 10:33 p.m. UTC
Underflow of f_count needs to be more carefully detected than it
currently is. The results of get_file() should be checked, but the
first step is detection. Redefine f_count from atomic_long_t to
refcount_long_t.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/file.c          | 4 ++--
 fs/file_table.c    | 6 +++---
 include/linux/fs.h | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

Comments

Al Viro May 2, 2024, 10:42 p.m. UTC | #1
On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote:
> Underflow of f_count needs to be more carefully detected than it
> currently is. The results of get_file() should be checked, but the
> first step is detection. Redefine f_count from atomic_long_t to
> refcount_long_t.

	It is used on fairly hot paths.  What's more, it's not
at all obvious what the hell would right semantics be.

NAKed-by: Al Viro <viro@zeniv.linux.org.uk>
Kees Cook May 2, 2024, 10:52 p.m. UTC | #2
On Thu, May 02, 2024 at 11:42:50PM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote:
> > Underflow of f_count needs to be more carefully detected than it
> > currently is. The results of get_file() should be checked, but the
> > first step is detection. Redefine f_count from atomic_long_t to
> > refcount_long_t.
> 
> 	It is used on fairly hot paths.  What's more, it's not
> at all obvious what the hell would right semantics be.

I think we've put performance concerns between refcount_t and atomic_t
to rest long ago. If there is a real workload where it's a problem,
let's find it! :)

As for semantics, what do you mean? Detecting dec-below-zero means we
catch underflow, and detected inc-from-zero means we catch resurrection
attempts. In both cases we avoid double-free, but we have already lost
to a potential dangling reference to a freed struct file. But just
letting f_count go bad seems dangerous.
Al Viro May 2, 2024, 11:12 p.m. UTC | #3
On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:

> As for semantics, what do you mean? Detecting dec-below-zero means we
> catch underflow, and detected inc-from-zero means we catch resurrection
> attempts. In both cases we avoid double-free, but we have already lost
> to a potential dangling reference to a freed struct file. But just
> letting f_count go bad seems dangerous.

Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
in the middle of getting closed.  And it's more subtle than that, actually,
thanks to SLAB_TYPESAFE_BY_RCU for struct file.
Kees Cook May 2, 2024, 11:21 p.m. UTC | #4
On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> 
> > As for semantics, what do you mean? Detecting dec-below-zero means we
> > catch underflow, and detected inc-from-zero means we catch resurrection
> > attempts. In both cases we avoid double-free, but we have already lost
> > to a potential dangling reference to a freed struct file. But just
> > letting f_count go bad seems dangerous.
> 
> Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> in the middle of getting closed.  And it's more subtle than that, actually,
> thanks to SLAB_TYPESAFE_BY_RCU for struct file.

But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
impossible for a simple get_file() to ever see a 0 f_count under normal
conditions?
Al Viro May 2, 2024, 11:41 p.m. UTC | #5
On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> > 
> > > As for semantics, what do you mean? Detecting dec-below-zero means we
> > > catch underflow, and detected inc-from-zero means we catch resurrection
> > > attempts. In both cases we avoid double-free, but we have already lost
> > > to a potential dangling reference to a freed struct file. But just
> > > letting f_count go bad seems dangerous.
> > 
> > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> > in the middle of getting closed.  And it's more subtle than that, actually,
> > thanks to SLAB_TYPESAFE_BY_RCU for struct file.
> 
> But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
> impossible for a simple get_file() to ever see a 0 f_count under normal
> conditions?

For get_file() it is impossible.  The comment about semantics had been
about the sane ways to recover if such crap gets detected.

__get_file_rcu() is a separate story - consider the comment in there: 
	* atomic_long_inc_not_zero() above provided a full memory
	* barrier when we acquired a reference.
         *
         * This is paired with the write barrier from assigning to the
         * __rcu protected file pointer so that if that pointer still
         * matches the current file, we know we have successfully
         * acquired a reference to the right file.

and IIRC, refcount_t is weaker wrt barriers.
Kees Cook May 3, 2024, 12:10 a.m. UTC | #6
On Fri, May 03, 2024 at 12:41:52AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote:
> > On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> > > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> > > 
> > > > As for semantics, what do you mean? Detecting dec-below-zero means we
> > > > catch underflow, and detected inc-from-zero means we catch resurrection
> > > > attempts. In both cases we avoid double-free, but we have already lost
> > > > to a potential dangling reference to a freed struct file. But just
> > > > letting f_count go bad seems dangerous.
> > > 
> > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> > > in the middle of getting closed.  And it's more subtle than that, actually,
> > > thanks to SLAB_TYPESAFE_BY_RCU for struct file.
> > 
> > But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
> > impossible for a simple get_file() to ever see a 0 f_count under normal
> > conditions?
> 
> For get_file() it is impossible.  The comment about semantics had been
> about the sane ways to recover if such crap gets detected.
> 
> __get_file_rcu() is a separate story - consider the comment in there: 
> 	* atomic_long_inc_not_zero() above provided a full memory
> 	* barrier when we acquired a reference.
>          *
>          * This is paired with the write barrier from assigning to the
>          * __rcu protected file pointer so that if that pointer still
>          * matches the current file, we know we have successfully
>          * acquired a reference to the right file.
> 
> and IIRC, refcount_t is weaker wrt barriers.

I think that was also fixed for refcount_t. I'll need to go dig out the
commit...

But anyway, there needs to be a general "oops I hit 0"-aware form of
get_file(), and it seems like it should just be get_file() itself...
Al Viro May 3, 2024, 12:14 a.m. UTC | #7
On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:

> But anyway, there needs to be a general "oops I hit 0"-aware form of
> get_file(), and it seems like it should just be get_file() itself...

... which brings back the question of what's the sane damage mitigation
for that.  Adding arseloads of never-exercised failure exits is generally
a bad idea - it's asking for bitrot and making the thing harder to review
in future.
Kees Cook May 3, 2024, 12:41 a.m. UTC | #8
On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:
> 
> > But anyway, there needs to be a general "oops I hit 0"-aware form of
> > get_file(), and it seems like it should just be get_file() itself...
> 
> ... which brings back the question of what's the sane damage mitigation
> for that.  Adding arseloads of never-exercised failure exits is generally
> a bad idea - it's asking for bitrot and making the thing harder to review
> in future.

Linus seems to prefer best-effort error recovery to sprinkling BUG()s
around.  But if that's really the solution, then how about get_file()
switching to to use inc_not_zero and BUG on 0?
Christian Brauner May 3, 2024, 9:37 a.m. UTC | #9
On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote:
> On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote:
> > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:
> > 
> > > But anyway, there needs to be a general "oops I hit 0"-aware form of
> > > get_file(), and it seems like it should just be get_file() itself...
> > 
> > ... which brings back the question of what's the sane damage mitigation
> > for that.  Adding arseloads of never-exercised failure exits is generally
> > a bad idea - it's asking for bitrot and making the thing harder to review
> > in future.
> 
> Linus seems to prefer best-effort error recovery to sprinkling BUG()s
> around.  But if that's really the solution, then how about get_file()
> switching to to use inc_not_zero and BUG on 0?

Making get_file() return an error is not an option. For all current
callers that's pointless churn for a condition that's not supposed to
happen at all.

Additionally, iirc *_inc_not_zero() variants are implemented with
try_cmpxchg() which scales poorly under contention for a condition
that's not supposed to happen.
Peter Zijlstra May 3, 2024, 10:36 a.m. UTC | #10
On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote:
> On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote:
> > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote:
> > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:
> > > 
> > > > But anyway, there needs to be a general "oops I hit 0"-aware form of
> > > > get_file(), and it seems like it should just be get_file() itself...
> > > 
> > > ... which brings back the question of what's the sane damage mitigation
> > > for that.  Adding arseloads of never-exercised failure exits is generally
> > > a bad idea - it's asking for bitrot and making the thing harder to review
> > > in future.
> > 
> > Linus seems to prefer best-effort error recovery to sprinkling BUG()s
> > around.  But if that's really the solution, then how about get_file()
> > switching to to use inc_not_zero and BUG on 0?
> 
> Making get_file() return an error is not an option. For all current
> callers that's pointless churn for a condition that's not supposed to
> happen at all.
> 
> Additionally, iirc *_inc_not_zero() variants are implemented with
> try_cmpxchg() which scales poorly under contention for a condition
> that's not supposed to happen.

	unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count);
	WARN_ON(!old);

Or somesuch might be an option?
Christian Brauner May 3, 2024, 11:35 a.m. UTC | #11
On Fri, May 03, 2024 at 12:36:14PM +0200, Peter Zijlstra wrote:
> On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote:
> > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote:
> > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote:
> > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:
> > > > 
> > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of
> > > > > get_file(), and it seems like it should just be get_file() itself...
> > > > 
> > > > ... which brings back the question of what's the sane damage mitigation
> > > > for that.  Adding arseloads of never-exercised failure exits is generally
> > > > a bad idea - it's asking for bitrot and making the thing harder to review
> > > > in future.
> > > 
> > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s
> > > around.  But if that's really the solution, then how about get_file()
> > > switching to to use inc_not_zero and BUG on 0?
> > 
> > Making get_file() return an error is not an option. For all current
> > callers that's pointless churn for a condition that's not supposed to
> > happen at all.
> > 
> > Additionally, iirc *_inc_not_zero() variants are implemented with
> > try_cmpxchg() which scales poorly under contention for a condition
> > that's not supposed to happen.
> 
> 	unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count);
> 	WARN_ON(!old);
> 
> Or somesuch might be an option?

Yeah, I'd be fine with that. WARN_ON() (or WARN_ON_ONCE() even?) and
then people can do their panic_on_warn stuff to get the BUG_ON()
behavior if they want to.
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index 3b683b9101d8..570424dd634b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -865,7 +865,7 @@  static struct file *__get_file_rcu(struct file __rcu **f)
 	if (!file)
 		return NULL;
 
-	if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+	if (unlikely(!refcount_long_inc_not_zero(&file->f_count)))
 		return ERR_PTR(-EAGAIN);
 
 	file_reloaded = rcu_dereference_raw(*f);
@@ -987,7 +987,7 @@  static inline struct file *__fget_files_rcu(struct files_struct *files,
 		 * barrier. We only really need an 'acquire' one to
 		 * protect the loads below, but we don't have that.
 		 */
-		if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+		if (unlikely(!refcount_long_inc_not_zero(&file->f_count)))
 			continue;
 
 		/*
diff --git a/fs/file_table.c b/fs/file_table.c
index 4f03beed4737..f29e7b94bca1 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -167,7 +167,7 @@  static int init_file(struct file *f, int flags, const struct cred *cred)
 	 * fget-rcu pattern users need to be able to handle spurious
 	 * refcount bumps we should reinitialize the reused file first.
 	 */
-	atomic_long_set(&f->f_count, 1);
+	refcount_long_set(&f->f_count, 1);
 	return 0;
 }
 
@@ -470,7 +470,7 @@  static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
 void fput(struct file *file)
 {
-	if (atomic_long_dec_and_test(&file->f_count)) {
+	if (refcount_long_dec_and_test(&file->f_count)) {
 		struct task_struct *task = current;
 
 		if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
@@ -503,7 +503,7 @@  void fput(struct file *file)
  */
 void __fput_sync(struct file *file)
 {
-	if (atomic_long_dec_and_test(&file->f_count))
+	if (refcount_long_dec_and_test(&file->f_count))
 		__fput(file);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 210bbbfe9b83..b8f6cce7c39d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1001,7 +1001,7 @@  struct file {
 	 */
 	spinlock_t		f_lock;
 	fmode_t			f_mode;
-	atomic_long_t		f_count;
+	refcount_long_t		f_count;
 	struct mutex		f_pos_lock;
 	loff_t			f_pos;
 	unsigned int		f_flags;
@@ -1038,7 +1038,7 @@  struct file_handle {
 
 static inline struct file *get_file(struct file *f)
 {
-	if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
+	if (unlikely(!refcount_long_inc_not_zero(&f->f_count)))
 		return NULL;
 	return f;
 }
@@ -1046,7 +1046,7 @@  static inline struct file *get_file(struct file *f)
 struct file *get_file_rcu(struct file __rcu **f);
 struct file *get_file_active(struct file **f);
 
-#define file_count(x)	atomic_long_read(&(x)->f_count)
+#define file_count(x)	refcount_long_read(&(x)->f_count)
 
 #define	MAX_NON_LFS	((1UL<<31) - 1)