Message ID | 20241205154743.1586584-1-mjguzik@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [MEH] fs: sort out a stale comment about races between fd alloc and dup2 | expand |
On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote: > It claims the issue is only relevant for shared descriptor tables which > is of no concern for POSIX (but then is POSIX of concern to anyone > today?), which I presume predates standarized threading. > > The comment also mentions the following systems: > - OpenBSD installing a larval file -- this remains accurate > - FreeBSD returning EBADF -- not accurate, the system uses the same > idea as OpenBSD > - NetBSD "deadlocks in amusing ways" -- their solution looks > Solaris-inspired (not a compliment) and I would not be particularly > surprised if it indeed deadlocked, in amusing ways or otherwise > > [...] Applied to the vfs-6.14.misc branch of the vfs/vfs.git tree. Patches in the vfs-6.14.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.14.misc [1/1] fs: sort out a stale comment about races between fd alloc and dup2 https://git.kernel.org/vfs/vfs/c/9c9e176d7546
On Fri, Dec 06, 2024 at 01:13:47PM +0100, Christian Brauner wrote: > On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote: > > It claims the issue is only relevant for shared descriptor tables which > > is of no concern for POSIX (but then is POSIX of concern to anyone > > today?), which I presume predates standarized threading. > > > > The comment also mentions the following systems: > > - OpenBSD installing a larval file -- this remains accurate > > - FreeBSD returning EBADF -- not accurate, the system uses the same > > idea as OpenBSD > > - NetBSD "deadlocks in amusing ways" -- their solution looks > > Solaris-inspired (not a compliment) and I would not be particularly > > surprised if it indeed deadlocked, in amusing ways or otherwise FWIW, the note about OpenBSD approach is potentially still interesting, but probably not in that place... These days "not an embryo" indicator would probably map to FMODE_OPENED, so fget side would be fairly easy (especially if we invert that bit - then the same logics we have for "don't accept FMODE_PATH" would apply verbatim). IIRC, the last time I looked into it the main obstacle to untangle had boiled down to "how do we guarantee that do_dentry_open() failing with -ESTALE will leave struct file in pristine state" and _that_ got boggled down in S&M shite - ->file_open() is not idempotent and has no reverse operation - ->file_free_security() takes care of everything. If that gets solved, we could lift alloc_empty_file() out of path_openat() into do_filp_open()/do_file_open_root() - it would require a non-trivial amount of massage, but a couple of years ago that appeared to have been plausible; would need to recheck that... It would reduce the amount of free/reallocate cycles for struct file in those, which might make it worthwhile on its own. Lifting it further would require some massage in the callers, but that would be on per-caller basis; used to look plausible... Hell knows - right now I'm ears-deep in ->d_revalidate() crap, but once that settles down a bit... Might be worth looking into that again. LSM ->file_open() behaviour changes would need to be discussed with LSM crowd, obviously. Ideally we want it idempotent, so that calling it twice in a row would have the second call work in the same way as if the first one hadn't happened. In-tree instances seem to be trivial to massage to that (in the worst case you'd need to clear some fields if the first call hadn't taken a fast path out, but the second one had), but that really needs a buy-in from maintainers.
On Mon, Dec 9, 2024 at 8:56 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Fri, Dec 06, 2024 at 01:13:47PM +0100, Christian Brauner wrote: > > On Thu, 05 Dec 2024 16:47:43 +0100, Mateusz Guzik wrote: > > > It claims the issue is only relevant for shared descriptor tables which > > > is of no concern for POSIX (but then is POSIX of concern to anyone > > > today?), which I presume predates standarized threading. > > > > > > The comment also mentions the following systems: > > > - OpenBSD installing a larval file -- this remains accurate > > > - FreeBSD returning EBADF -- not accurate, the system uses the same > > > idea as OpenBSD > > > - NetBSD "deadlocks in amusing ways" -- their solution looks > > > Solaris-inspired (not a compliment) and I would not be particularly > > > surprised if it indeed deadlocked, in amusing ways or otherwise > > FWIW, the note about OpenBSD approach is potentially still interesting, > but probably not in that place... > > These days "not an embryo" indicator would probably map to FMODE_OPENED, > so fget side would be fairly easy (especially if we invert that bit - > then the same logics we have for "don't accept FMODE_PATH" would apply > verbatim). > > IIRC, the last time I looked into it the main obstacle to untangle had > boiled down to "how do we guarantee that do_dentry_open() failing with > -ESTALE will leave struct file in pristine state" and _that_ got boggled > down in S&M shite - ->file_open() is not idempotent and has no reverse > operation - ->file_free_security() takes care of everything. > > If that gets solved, we could lift alloc_empty_file() out of path_openat() > into do_filp_open()/do_file_open_root() - it would require a non-trivial > amount of massage, but a couple of years ago that appeared to have been > plausible; would need to recheck that... It would reduce the amount of > free/reallocate cycles for struct file in those, which might make it > worthwhile on its own. Oh huh. I had seen that code before, did not mentally register there may be repeat file alloc/free calls due to repeat path_openat. Indeed it would be nice if someone(tm) sorted it out, but I don't see how this has any relation to installing the file early and thus having fget worry about it. Suppose the "embryo"/"larval" file pointer is to be installed early and populated later. I don't see a benefit but do see a downside: this requires protection against close() on the fd (on top of dup2 needed now). The options that I see are: - install the file with a refcount of 2, let dup2/close whack it, do a fput in open to bring back to 1 or get rid of it if it raced (yuck) (freebsd is doing this) - dup2 is already special casing to not mess with it, add that to close as well (also yuck imo) From userspace side the only programs which can ever see EBUSY are buggy or trying to screw the kernel, so not a concern on that front. Now something amusing, I did not realize I had a super stale copy of the OpenBSD source code hanging around -- they stopped pre-installing files in 2018! Instead they install late and do the in dup2 returning EBUSY, i.e. the same thing as Linux. I do have up to date FreeBSD and NetBSD though. :) Christian, would you mind massaging the OS entries in the commit message (or should i send a v2?): - OpenBSD installing a larval file -- they moved away from it, file is installed late and EBUSY is returned on conflict - FreeBSD returning EBADF -- reworked to install the file early like OpenBSD used to do
> Christian, would you mind massaging the OS entries in the commit > message (or should i send a v2?): No need, updated the commit message. Thanks!
On Tue, Dec 10, 2024 at 05:48:40AM +0100, Mateusz Guzik wrote: > Oh huh. I had seen that code before, did not mentally register there > may be repeat file alloc/free calls due to repeat path_openat. > > Indeed it would be nice if someone(tm) sorted it out, but I don't see > how this has any relation to installing the file early and thus having > fget worry about it. Other than the former being an obvious prereq for the latter? Not much... > Suppose the "embryo"/"larval" file pointer is to be installed early > and populated later. I don't see a benefit but do see a downside: this > requires protection against close() on the fd (on top of dup2 needed > now). > The options that I see are: > - install the file with a refcount of 2, let dup2/close whack it, do a > fput in open to bring back to 1 or get rid of it if it raced (yuck) > (freebsd is doing this) > - dup2 is already special casing to not mess with it, add that to > close as well (also yuck imo) As a possibility (again, I'm not sold on the benefits of that scheme, just looking into feasibility): dup2() when evicting an embryo: mark it evicted remove from descriptor table do nothing to refcount (in effect, transfer it to open()) then proceed as if it hadn't been there [== pretend that dup2() always loses the race] close() when running into an embryo return -EBADF [== pretend that close() always loses the race] open() when it's done setting file up: if opening failed if not marked evicted remove from descriptor table fput() return whatever error we've got else if marked evicted fput() return the descriptor [== pretend that open() always wins the race] "open" in the above stands for everything that opens a descriptor - socket(2), pipe(2), eventfd(2), whatever. > >From userspace side the only programs which can ever see EBUSY are > buggy or trying to screw the kernel, so not a concern on that front. Agreed. I'm not saying we should go that way.
diff --git a/fs/file.c b/fs/file.c index d065a24980da..ad8aabc08122 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1258,17 +1258,9 @@ __releases(&files->file_lock) /* * We need to detect attempts to do dup2() over allocated but still - * not finished descriptor. NB: OpenBSD avoids that at the price of - * extra work in their equivalent of fget() - they insert struct - * file immediately after grabbing descriptor, mark it larval if - * more work (e.g. actual opening) is needed and make sure that - * fget() treats larval files as absent. Potentially interesting, - * but while extra work in fget() is trivial, locking implications - * and amount of surgery on open()-related paths in VFS are not. - * FreeBSD fails with -EBADF in the same situation, NetBSD "solution" - * deadlocks in rather amusing ways, AFAICS. All of that is out of - * scope of POSIX or SUS, since neither considers shared descriptor - * tables and this condition does not arise without those. + * not finished descriptor. + * + * POSIX is silent on the issue, we return -EBUSY. */ fdt = files_fdtable(files); fd = array_index_nospec(fd, fdt->max_fds);
It claims the issue is only relevant for shared descriptor tables which is of no concern for POSIX (but then is POSIX of concern to anyone today?), which I presume predates standarized threading. The comment also mentions the following systems: - OpenBSD installing a larval file -- this remains accurate - FreeBSD returning EBADF -- not accurate, the system uses the same idea as OpenBSD - NetBSD "deadlocks in amusing ways" -- their solution looks Solaris-inspired (not a compliment) and I would not be particularly surprised if it indeed deadlocked, in amusing ways or otherwise I don't believe mentioning any of these adds anything and the statement about the issue not being POSIX-relevant is outdated. dup2 description in POSIX still does not mention the problem. Just shorten the comment and be done with it. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- I'm pretty sure the comment adds nothing in the current form, but I'm not going to argue about it. fs/file.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)