Message ID | 20231019-kampfsport-metapher-e5211d7be247@brauner (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] vfs fixes | expand |
On Thu, 19 Oct 2023 at 03:09, Christian Brauner <brauner@kernel.org> wrote: > > An openat() call from io_uring triggering an audit call can apparently > cause the refcount of struct filename to be incremented from multiple > threads concurrently during async execution, triggering a refcount > underflow and hitting a BUG_ON(). That bug has been lurking around since > at least v5.16 apparently. Ouch. That filename ref by audit was always supposed to be thread-local in a "for this system call" kind of sense. But yes, looks like the io_uring stuff ended up making it no longer thread-local. That said, using atomics for reference counting is our default behavior and should be normal, so the patch isn't wrong, it's just annoying since getname/putname is very much in the critical path of filename handling. That said, the extra atomics are hopefully not really noticeable. Some people might want to use the non-refcounted version (ie we have getname/putname used by ksmbd too, for example), if they really care. It already exists, as __getname/__putname. But the normal open/stat/etc system call paths are obviously now going to hit those extra atomics. Not lovely, but I guess it's the best we can do. > Switch to an atomic counter to fix that. The underflow check is > downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily > remove that check altogether tbh and not waste an additional atomic. So > if you feel that extra check isn't needed you could just remove in case > you're pulling. Well, the atomic *read* is cheap - the expensive part is the atomic_dec_and_test() (and the atomic_inc in the audit code. I'm not sure why you made it check just for zero in the WARN_ON_ONCE, rather than <= 0 as it used to, but that check is racy regardless, so it doesn't matter. It would miss two concurrent decrements coming in with a count of 1. We don't have the ternary test of atomic decrement results (positive, zero or negative), so it is what it is. Linus
The pull request you sent on Thu, 19 Oct 2023 12:07:08 +0200:
> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc7.vfs.fixes
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ea1cc20cd4ce55dd920a87a317c43da03ccea192
Thank you!
> Ouch. That filename ref by audit was always supposed to be > thread-local in a "for this system call" kind of sense. Yeah, I wasn't happy when that bug showed up. > That said, using atomics for reference counting is our default > behavior and should be normal, so the patch isn't wrong, it's just > annoying since getname/putname is very much in the critical path of > filename handling. Yeah. > That said, the extra atomics are hopefully not really noticeable. > > Some people might want to use the non-refcounted version (ie we have > getname/putname used by ksmbd too, for example), if they really care. > > It already exists, as __getname/__putname. > > But the normal open/stat/etc system call paths are obviously now going > to hit those extra atomics. Not lovely, but I guess it's the best we > can do. I didn't spend too much time on this issue because it's -rc7 and the straightforward seemed ok, if annoying. But if we really really really really cared we could probably do a deranged thing and massage both audit and io_uring to allows us to separate the regular system call getname from the io_uring getname. But I think that would be ugly and likely error prone.