diff mbox series

[MEH] fs: sort out a stale comment about races between fd alloc and dup2

Message ID 20241205154743.1586584-1-mjguzik@gmail.com (mailing list archive)
State New
Headers show
Series [MEH] fs: sort out a stale comment about races between fd alloc and dup2 | expand

Commit Message

Mateusz Guzik Dec. 5, 2024, 3:47 p.m. UTC
It claims the issue is only relevant for shared descriptor tables which
is of no concern for POSIX (but then is POSIX of concern to anyone
today?), which I presume predates standarized threading.

The comment also mentions the following systems:
- OpenBSD installing a larval file -- this remains accurate
- FreeBSD returning EBADF -- not accurate, the system uses the same
  idea as OpenBSD
- NetBSD "deadlocks in amusing ways" -- their solution looks
  Solaris-inspired (not a compliment) and I would not be particularly
  surprised if it indeed deadlocked, in amusing ways or otherwise

I don't believe mentioning any of these adds anything and the statement
about the issue not being POSIX-relevant is outdated.

dup2 description in POSIX still does not mention the problem.

Just shorten the comment and be done with it.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

I'm pretty sure the comment adds nothing in the current form, but I'm
not going to argue about it.

 fs/file.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

Comments

Christian Brauner Dec. 6, 2024, 12:13 p.m. UTC | #1
On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote:
> It claims the issue is only relevant for shared descriptor tables which
> is of no concern for POSIX (but then is POSIX of concern to anyone
> today?), which I presume predates standarized threading.
> 
> The comment also mentions the following systems:
> - OpenBSD installing a larval file -- this remains accurate
> - FreeBSD returning EBADF -- not accurate, the system uses the same
>   idea as OpenBSD
> - NetBSD "deadlocks in amusing ways" -- their solution looks
>   Solaris-inspired (not a compliment) and I would not be particularly
>   surprised if it indeed deadlocked, in amusing ways or otherwise
> 
> [...]

Applied to the vfs-6.14.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.14.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.14.misc

[1/1] fs: sort out a stale comment about races between fd alloc and dup2
      https://git.kernel.org/vfs/vfs/c/9c9e176d7546
Al Viro Dec. 9, 2024, 7:56 p.m. UTC | #2
On Fri, Dec 06, 2024 at 01:13:47PM +0100, Christian Brauner wrote:
> On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote:
> > It claims the issue is only relevant for shared descriptor tables which
> > is of no concern for POSIX (but then is POSIX of concern to anyone
> > today?), which I presume predates standarized threading.
> > 
> > The comment also mentions the following systems:
> > - OpenBSD installing a larval file -- this remains accurate
> > - FreeBSD returning EBADF -- not accurate, the system uses the same
> >   idea as OpenBSD
> > - NetBSD "deadlocks in amusing ways" -- their solution looks
> >   Solaris-inspired (not a compliment) and I would not be particularly
> >   surprised if it indeed deadlocked, in amusing ways or otherwise

FWIW, the note about OpenBSD approach is potentially still interesting,
but probably not in that place...

These days "not an embryo" indicator would probably map to FMODE_OPENED,
so fget side would be fairly easy (especially if we invert that bit -
then the same logics we have for "don't accept FMODE_PATH" would apply
verbatim).

IIRC, the last time I looked into it the main obstacle to untangle had
boiled down to "how do we guarantee that do_dentry_open() failing with
-ESTALE will leave struct file in pristine state" and _that_ got boggled
down in S&M shite - ->file_open() is not idempotent and has no reverse
operation - ->file_free_security() takes care of everything.

If that gets solved, we could lift alloc_empty_file() out of path_openat()
into do_filp_open()/do_file_open_root() - it would require a non-trivial
amount of massage, but a couple of years ago that appeared to have been
plausible; would need to recheck that...  It would reduce the amount of
free/reallocate cycles for struct file in those, which might make it
worthwhile on its own.

Lifting it further would require some massage in the callers, but that
would be on per-caller basis; used to look plausible...

Hell knows - right now I'm ears-deep in ->d_revalidate() crap, but once
that settles down a bit...  Might be worth looking into that again.

LSM ->file_open() behaviour changes would need to be discussed with LSM
crowd, obviously.  Ideally we want it idempotent, so that calling it
twice in a row would have the second call work in the same way as if
the first one hadn't happened.  In-tree instances seem to be trivial
to massage to that (in the worst case you'd need to clear some fields
if the first call hadn't taken a fast path out, but the second one
had), but that really needs a buy-in from maintainers.
Mateusz Guzik Dec. 10, 2024, 4:48 a.m. UTC | #3
On Mon, Dec 9, 2024 at 8:56 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Dec 06, 2024 at 01:13:47PM +0100, Christian Brauner wrote:
> > On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote:
> > > It claims the issue is only relevant for shared descriptor tables which
> > > is of no concern for POSIX (but then is POSIX of concern to anyone
> > > today?), which I presume predates standarized threading.
> > >
> > > The comment also mentions the following systems:
> > > - OpenBSD installing a larval file -- this remains accurate
> > > - FreeBSD returning EBADF -- not accurate, the system uses the same
> > >   idea as OpenBSD
> > > - NetBSD "deadlocks in amusing ways" -- their solution looks
> > >   Solaris-inspired (not a compliment) and I would not be particularly
> > >   surprised if it indeed deadlocked, in amusing ways or otherwise
>
> FWIW, the note about OpenBSD approach is potentially still interesting,
> but probably not in that place...
>
> These days "not an embryo" indicator would probably map to FMODE_OPENED,
> so fget side would be fairly easy (especially if we invert that bit -
> then the same logics we have for "don't accept FMODE_PATH" would apply
> verbatim).
>
> IIRC, the last time I looked into it the main obstacle to untangle had
> boiled down to "how do we guarantee that do_dentry_open() failing with
> -ESTALE will leave struct file in pristine state" and _that_ got boggled
> down in S&M shite - ->file_open() is not idempotent and has no reverse
> operation - ->file_free_security() takes care of everything.
>
> If that gets solved, we could lift alloc_empty_file() out of path_openat()
> into do_filp_open()/do_file_open_root() - it would require a non-trivial
> amount of massage, but a couple of years ago that appeared to have been
> plausible; would need to recheck that...  It would reduce the amount of
> free/reallocate cycles for struct file in those, which might make it
> worthwhile on its own.

Oh huh. I had seen that code before, did not mentally register there
may be repeat file alloc/free calls due to repeat path_openat.

Indeed it would be nice if someone(tm) sorted it out, but I don't see
how this has any relation to installing the file early and thus having
fget worry about it.

Suppose the "embryo"/"larval" file pointer is to be installed early
and populated later. I don't see a benefit but do see a downside: this
requires protection against close() on the fd (on top of dup2 needed
now).
The options that I see are:
- install the file with a refcount of 2, let dup2/close whack it, do a
fput in open to bring back to 1 or get rid of it if it raced (yuck)
(freebsd is doing this)
- dup2 is already special casing to not mess with it, add that to
close as well (also yuck imo)

From userspace side the only programs which can ever see EBUSY are
buggy or trying to screw the kernel, so not a concern on that front.

Now something amusing, I did not realize I had a super stale copy of
the OpenBSD source code hanging around -- they stopped pre-installing
files in 2018! Instead they install late and do the in dup2 returning
EBUSY, i.e. the same thing as Linux. I do have up to date FreeBSD and
NetBSD though. :)

Christian, would you mind massaging the OS entries in the commit
message (or should i send a v2?):
- OpenBSD installing a larval file -- they moved away from it, file is
installed late and EBUSY is returned on conflict
- FreeBSD returning EBADF -- reworked to install the file early like
OpenBSD used to do
Christian Brauner Dec. 10, 2024, 10:18 a.m. UTC | #4
> Christian, would you mind massaging the OS entries in the commit
> message (or should i send a v2?):

No need, updated the commit message. Thanks!
Al Viro Dec. 10, 2024, 6:15 p.m. UTC | #5
On Tue, Dec 10, 2024 at 05:48:40AM +0100, Mateusz Guzik wrote:
 
> Oh huh. I had seen that code before, did not mentally register there
> may be repeat file alloc/free calls due to repeat path_openat.
> 
> Indeed it would be nice if someone(tm) sorted it out, but I don't see
> how this has any relation to installing the file early and thus having
> fget worry about it.

Other than the former being an obvious prereq for the latter?  Not much...

> Suppose the "embryo"/"larval" file pointer is to be installed early
> and populated later. I don't see a benefit but do see a downside: this
> requires protection against close() on the fd (on top of dup2 needed
> now).
> The options that I see are:
> - install the file with a refcount of 2, let dup2/close whack it, do a
> fput in open to bring back to 1 or get rid of it if it raced (yuck)
> (freebsd is doing this)
> - dup2 is already special casing to not mess with it, add that to
> close as well (also yuck imo)

As a possibility (again, I'm not sold on the benefits of that scheme,
just looking into feasibility):
	dup2() when evicting an embryo:
		mark it evicted
		remove from descriptor table
		do nothing to refcount (in effect, transfer it to open())
		then proceed as if it hadn't been there
		[== pretend that dup2() always loses the race]
	close() when running into an embryo
		return -EBADF
		[== pretend that close() always loses the race]
	open() when it's done setting file up:
		if opening failed
			if not marked evicted
				remove from descriptor table
			fput()
			return whatever error we've got
		else
			if marked evicted
				fput()
			return the descriptor
		[== pretend that open() always wins the race]
"open" in the above stands for everything that opens a descriptor - socket(2),
pipe(2), eventfd(2), whatever.

> >From userspace side the only programs which can ever see EBUSY are
> buggy or trying to screw the kernel, so not a concern on that front.

Agreed.  I'm not saying we should go that way.
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index d065a24980da..ad8aabc08122 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1258,17 +1258,9 @@  __releases(&files->file_lock)
 
 	/*
 	 * We need to detect attempts to do dup2() over allocated but still
-	 * not finished descriptor.  NB: OpenBSD avoids that at the price of
-	 * extra work in their equivalent of fget() - they insert struct
-	 * file immediately after grabbing descriptor, mark it larval if
-	 * more work (e.g. actual opening) is needed and make sure that
-	 * fget() treats larval files as absent.  Potentially interesting,
-	 * but while extra work in fget() is trivial, locking implications
-	 * and amount of surgery on open()-related paths in VFS are not.
-	 * FreeBSD fails with -EBADF in the same situation, NetBSD "solution"
-	 * deadlocks in rather amusing ways, AFAICS.  All of that is out of
-	 * scope of POSIX or SUS, since neither considers shared descriptor
-	 * tables and this condition does not arise without those.
+	 * not finished descriptor.
+	 *
+	 * POSIX is silent on the issue, we return -EBUSY.
 	 */
 	fdt = files_fdtable(files);
 	fd = array_index_nospec(fd, fdt->max_fds);