Message ID | tencent_7588DFD1F365950A757310D764517A14B306@qq.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0 | expand |
On Sun, Jul 09, 2023 at 09:13:28PM +0200, Markus Elfring wrote: > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling > > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. > > Please choose an imperative change suggestion. Markus, stop this nitpicking. It puts off contributors. The changelog is perfectly clear.
On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@foxmail.com wrote: > From: Wen Yang <wenyang.linux@foxmail.com> > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. > > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") > Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Jens Axboe <axboe@kernel.dk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Dylan Yudaken <dylany@fb.com> > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- So this looks ok but I would like to see an analysis how the overflow can happen. I'm looking at the callers and it seems that once ctx->count hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is there a caller that can call directly or indirectly eventfd_ctx_do_read() on a ctx->count == 0? I'm just slightly skeptical about patches that fix issues without an analysis how this can happen. > fs/eventfd.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 8aa36cd37351..10a101df19cd 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) > { > lockdep_assert_held(&ctx->wqh.lock); > > - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; > + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count; > ctx->count -= *cnt; > } > EXPORT_SYMBOL_GPL(eventfd_ctx_do_read); > @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c > return -EFAULT; > if (ucnt == ULLONG_MAX) > return -EINVAL; > + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) > + return -EINVAL; > spin_lock_irq(&ctx->wqh.lock); > res = -EAGAIN; > if (ULLONG_MAX - ctx->count > ucnt) > -- > 2.25.1 >
On 2023/7/10 22:12, Christian Brauner wrote: > On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@foxmail.com wrote: >> From: Wen Yang <wenyang.linux@foxmail.com> >> >> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling >> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. >> >> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") >> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> >> Cc: Alexander Viro <viro@zeniv.linux.org.uk> >> Cc: Jens Axboe <axboe@kernel.dk> >> Cc: Christian Brauner <brauner@kernel.org> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Dylan Yudaken <dylany@fb.com> >> Cc: David Woodhouse <dwmw@amazon.co.uk> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: linux-fsdevel@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> --- > So this looks ok but I would like to see an analysis how the overflow > can happen. I'm looking at the callers and it seems that once ctx->count > hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is > there a caller that can call directly or indirectly > eventfd_ctx_do_read() on a ctx->count == 0? eventfd_read() ensures that ctx->count is not 0 before calling eventfd_ctx_do_read() and it is correct. But it is not appropriate for eventfd_ctx_remove_wait_queue() to call eventfd_ctx_do_read() unconditionally, as it may not only causes ctx->count to overflow, but also unnecessarily calls wake_up_locked_poll(). I am sorry for just adding the following string in the patch: Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") Looking forward to your suggestions. -- Best wishes, Wen > I'm just slightly skeptical about patches that fix issues without an > analysis how this can happen. > >> fs/eventfd.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index 8aa36cd37351..10a101df19cd 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) >> { >> lockdep_assert_held(&ctx->wqh.lock); >> >> - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; >> + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count; >> ctx->count -= *cnt; >> } >> EXPORT_SYMBOL_GPL(eventfd_ctx_do_read); >> @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c >> return -EFAULT; >> if (ucnt == ULLONG_MAX) >> return -EINVAL; >> + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) >> + return -EINVAL; >> spin_lock_irq(&ctx->wqh.lock); >> res = -EAGAIN; >> if (ULLONG_MAX - ctx->count > ucnt) >> -- >> 2.25.1 >>
On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote: > > On 2023/7/10 22:12, Christian Brauner wrote: > > On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@foxmail.com wrote: > > > From: Wen Yang <wenyang.linux@foxmail.com> > > > > > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling > > > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. > > > > > > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") > > > Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > > > Cc: Jens Axboe <axboe@kernel.dk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Christoph Hellwig <hch@lst.de> > > > Cc: Dylan Yudaken <dylany@fb.com> > > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > > Cc: Matthew Wilcox <willy@infradead.org> > > > Cc: linux-fsdevel@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > --- > > So this looks ok but I would like to see an analysis how the overflow > > can happen. I'm looking at the callers and it seems that once ctx->count > > hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is > > there a caller that can call directly or indirectly > > eventfd_ctx_do_read() on a ctx->count == 0? > eventfd_read() ensures that ctx->count is not 0 before calling > eventfd_ctx_do_read() and it is correct. > > But it is not appropriate for eventfd_ctx_remove_wait_queue() to call > eventfd_ctx_do_read() unconditionally, > > as it may not only causes ctx->count to overflow, but also unnecessarily > calls wake_up_locked_poll(). Hm, so I think you're right and an underflow can be triggered for at least three subsystems: (1) virt/kvm/eventfd.c (2) drivers/vfio/virqfd.c (3) drivers/virt/acrn/irqfd.c where (2) and (3) are just modeled after (1). The eventfd must've been set to EFD_SEMAPHORE and ctx->count must been or decremented zero. The only way I can see the _underflow_ happening is if the irqfd is shutdown through an ioctl() like KVM_IRQFD with KVM_IRQFD_FLAG_DEASSIGN raised while ctx->count is zero: kvm_vm_ioctl() -> kvm_irqfd() -> kvm_irqfd_deassign() -> irqfd_deactivate() -> irqfd_shutdown() -> eventfd_ctx_remove_wait_queue(&cnt) which would underflow @cnt and cause a spurious wakeup. Userspace would still read one because of EFD_SEMAPHORE semantics and wouldn't notice the underflow. I think it's probably not that bad because afaict, this really can only happen when (1)-(3) are shutdown. But we should still fix it ofc. > > > I am sorry for just adding the following string in the patch: > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") > > > Looking forward to your suggestions. What I usually look for is some callchain/analysis that explain under what circumstance what this is fixing can happen. That makes life for reviewers a lot easier because they don't have to dig out that work themselves which takes time.
On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote: > > On 2023/7/10 22:12, Christian Brauner wrote: > > On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@foxmail.com wrote: > > > From: Wen Yang <wenyang.linux@foxmail.com> > > > > > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling > > > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. > > > > > > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") > > > Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > > > Cc: Jens Axboe <axboe@kernel.dk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Christoph Hellwig <hch@lst.de> > > > Cc: Dylan Yudaken <dylany@fb.com> > > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > > Cc: Matthew Wilcox <willy@infradead.org> > > > Cc: linux-fsdevel@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > --- > > So this looks ok but I would like to see an analysis how the overflow > > can happen. I'm looking at the callers and it seems that once ctx->count > > hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is > > there a caller that can call directly or indirectly > > eventfd_ctx_do_read() on a ctx->count == 0? > eventfd_read() ensures that ctx->count is not 0 before calling > eventfd_ctx_do_read() and it is correct. > > But it is not appropriate for eventfd_ctx_remove_wait_queue() to call > eventfd_ctx_do_read() unconditionally, > > as it may not only causes ctx->count to overflow, but also unnecessarily > calls wake_up_locked_poll(). > > > I am sorry for just adding the following string in the patch: > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") > > > Looking forward to your suggestions. > > -- > > Best wishes, > > Wen > > > > I'm just slightly skeptical about patches that fix issues without an > > analysis how this can happen. > > > > > fs/eventfd.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/eventfd.c b/fs/eventfd.c > > > index 8aa36cd37351..10a101df19cd 100644 > > > --- a/fs/eventfd.c > > > +++ b/fs/eventfd.c > > > @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) > > > { > > > lockdep_assert_held(&ctx->wqh.lock); > > > - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; > > > + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count; > > > ctx->count -= *cnt; > > > } > > > EXPORT_SYMBOL_GPL(eventfd_ctx_do_read); > > > @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c > > > return -EFAULT; > > > if (ucnt == ULLONG_MAX) > > > return -EINVAL; > > > + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) > > > + return -EINVAL; Hm, why is bit necessary though? What's wrong with specifying ucnt == 0 with EFD_SEMAPHORE? This also looks like a (very low potential) uapi break.
On Sun, 09 Jul 2023 14:54:51 +0800, wenyang.linux@foxmail.com wrote: > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. > > I've tweaked the commit message to explain how an underflow can happen. And for now I dropped that bit: diff --git a/fs/eventfd.c b/fs/eventfd.c index 10a101df19cd..33a918f9566c 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -269,8 +269,6 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c return -EFAULT; if (ucnt == ULLONG_MAX) return -EINVAL; - if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) - return -EINVAL; spin_lock_irq(&ctx->wqh.lock); res = -EAGAIN; if (ULLONG_MAX - ctx->count > ucnt) because I don't yet understand why that should be forbidden. Please explain the reason for wanting that check in there and I can add it back. I might just be missing the obvious. --- Applied to the vfs.misc branch of the vfs/vfs.git tree. Patches in the vfs.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.misc [1/1] eventfd: prevent underflow for eventfd semaphores https://git.kernel.org/vfs/vfs/c/7b2edd278691
On 2023/7/11 17:39, Christian Brauner wrote: > On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote: >> On 2023/7/10 22:12, Christian Brauner wrote: >>> On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@foxmail.com wrote: >>>> From: Wen Yang <wenyang.linux@foxmail.com> >>>> >>>> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling >>>> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. >>>> >>>> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") >>>> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> >>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk> >>>> Cc: Jens Axboe <axboe@kernel.dk> >>>> Cc: Christian Brauner <brauner@kernel.org> >>>> Cc: Christoph Hellwig <hch@lst.de> >>>> Cc: Dylan Yudaken <dylany@fb.com> >>>> Cc: David Woodhouse <dwmw@amazon.co.uk> >>>> Cc: Matthew Wilcox <willy@infradead.org> >>>> Cc: linux-fsdevel@vger.kernel.org >>>> Cc: linux-kernel@vger.kernel.org >>>> --- >>> So this looks ok but I would like to see an analysis how the overflow >>> can happen. I'm looking at the callers and it seems that once ctx->count >>> hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is >>> there a caller that can call directly or indirectly >>> eventfd_ctx_do_read() on a ctx->count == 0? >> eventfd_read() ensures that ctx->count is not 0 before calling >> eventfd_ctx_do_read() and it is correct. >> >> But it is not appropriate for eventfd_ctx_remove_wait_queue() to call >> eventfd_ctx_do_read() unconditionally, >> >> as it may not only causes ctx->count to overflow, but also unnecessarily >> calls wake_up_locked_poll(). >> >> >> I am sorry for just adding the following string in the patch: >> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") >> >> >> Looking forward to your suggestions. >> >> -- >> >> Best wishes, >> >> Wen >> >> >>> I'm just slightly skeptical about patches that fix issues without an >>> analysis how this can happen. >>> >>>> fs/eventfd.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/eventfd.c b/fs/eventfd.c >>>> index 8aa36cd37351..10a101df19cd 100644 >>>> --- a/fs/eventfd.c >>>> +++ b/fs/eventfd.c >>>> @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) >>>> { >>>> lockdep_assert_held(&ctx->wqh.lock); >>>> - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; >>>> + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count; >>>> ctx->count -= *cnt; >>>> } >>>> EXPORT_SYMBOL_GPL(eventfd_ctx_do_read); >>>> @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c >>>> return -EFAULT; >>>> if (ucnt == ULLONG_MAX) >>>> return -EINVAL; >>>> + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) >>>> + return -EINVAL; > Hm, why is bit necessary though? What's wrong with specifying ucnt == 0 > with EFD_SEMAPHORE? This also looks like a (very low potential) uapi > break. Thank you for your careful review. As you pointed out, this may break the uapi. Although manaul has the following description (man 2 eventfd): * The file descriptor is readable (the select(2) readfds argument; the poll(2) POLLIN flag) if the counter has a value greater than 0. * The file descriptor is writable (the select(2) writefds argument; the poll(2) POLLOUT flag) if it is possible to write a value of at least "1" without blocking. But it does not specify that the ucnt cannot be zero, so we can only delete the two lines of code above Could we propose another patch specifically to address the issue you have identified? Since there are indeed some corner scenes when ucnt is 0 and ctx->count is also 0: static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { ... if (ULLONG_MAX - ctx->count > ucnt) res = sizeof(ucnt); ---> always > 0 ... if (likely(res > 0)) { ctx->count += ucnt; ---> unnecessary addition of 0 current->in_eventfd = 1; ---> May affect eventfd_signal() if (waitqueue_active(&ctx->wqh)) wake_up_locked_poll(&ctx->wqh, EPOLLIN); ---> heavyweight wakeup current->in_eventfd = 0; } ... } static __poll_t eventfd_poll(struct file *file, poll_table *wait) { ... count = READ_ONCE(ctx->count); if (count > 0) events |= EPOLLIN; ---> If count is 0, all previous operations are wasted if (count == ULLONG_MAX) events |= EPOLLERR; if (ULLONG_MAX - 1 > count) events |= EPOLLOUT; return events; } Could we optimize it like this? static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { ... if (likely(res > 0)) { ctx->count += ucnt; if (ctx->count) { ---> avoiding unnecessary heavyweight operations current->in_eventfd = 1; if (waitqueue_active(&ctx->wqh)) wake_up_locked_poll(&ctx->wqh, EPOLLIN); current->in_eventfd = 0; } } ... } -- Best wishes, Wen
diff --git a/fs/eventfd.c b/fs/eventfd.c index 8aa36cd37351..10a101df19cd 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) { lockdep_assert_held(&ctx->wqh.lock); - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count; ctx->count -= *cnt; } EXPORT_SYMBOL_GPL(eventfd_ctx_do_read); @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c return -EFAULT; if (ucnt == ULLONG_MAX) return -EINVAL; + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt) + return -EINVAL; spin_lock_irq(&ctx->wqh.lock); res = -EAGAIN; if (ULLONG_MAX - ctx->count > ucnt)