[v2] signal: Adjust error codes according to restore_user_sigmask()
diff mbox series

Message ID 20190522032144.10995-1-deepa.kernel@gmail.com
State New
Headers show
Series
  • [v2] signal: Adjust error codes according to restore_user_sigmask()
Related show

Commit Message

Deepa Dinamani May 22, 2019, 3:21 a.m. UTC
A regression caused by 854a6ed56839 ("signal: Add restore_user_sigmask()")
caused users of epoll_pwait, io_pgetevents, and ppoll to notice a
latent problem in signal handling during these syscalls.

That patch (854a6ed56839) moved the signal_pending() check closer
to restoring of the user sigmask. But, it failed to update the error
code accordingly.  From the userspace perspective, the patch increased
the time window for the signal discovery and subsequent delivery to the
userspace, but did not always adjust the errno afterwards. The behavior
before 854a6ed56839a was that the signals were dropped after the error
code was decided. This resulted in lost signals but the userspace did not
notice it as the syscalls had finished executing the core functionality
and the error codes returned notified success.

For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.

Reported-by: Eric Wong <e@80x24.org>
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <stable@vger.kernel.org> # 5.0.x
Cc: <stable@vger.kernel.org> # 5.1.x
---
Changes since v1:
* updated the commit text for more context of the pre-existing condition
* added stable tags as requested

 fs/aio.c               | 24 ++++++++++++------------
 fs/eventpoll.c         | 14 ++++++++++----
 fs/io_uring.c          |  7 +++++--
 fs/select.c            | 37 +++++++++++++++++++++----------------
 include/linux/signal.h |  2 +-
 kernel/signal.c        | 13 ++++++++++---
 6 files changed, 59 insertions(+), 38 deletions(-)

Comments

Oleg Nesterov May 22, 2019, 3:05 p.m. UTC | #1
On 05/21, Deepa Dinamani wrote:
>
> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> etc) only when there is no other error. If there is a signal and an error
> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> error codes.

Ugh. I need to re-check, but at first glance I really dislike this change.

I think we can fix the problem _and_ simplify the code. Something like below.
The patch is obviously incomplete, it changes only only one caller of
set_user_sigmask(), epoll_pwait() to explain what I mean.

restore_user_sigmask() should simply die. Although perhaps another helper
makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).

Oleg.


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d..85f56e4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 		size_t, sigsetsize)
 {
 	int error;
-	sigset_t ksigmask, sigsaved;
 
 	/*
 	 * If the caller wants a certain signal mask to be set during the wait,
 	 * we apply it here.
 	 */
-	error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
+	error = set_user_sigmask(sigmask, sigsetsize);
 	if (error)
 		return error;
 
 	error = do_epoll_wait(epfd, events, maxevents, timeout);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	if (error != -EINTR)
+		restore_saved_sigmask();
 
 	return error;
 }
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e412c09..1e82ae0 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
 static inline void set_restore_sigmask(void)
 {
 	set_thread_flag(TIF_RESTORE_SIGMASK);
-	WARN_ON(!test_thread_flag(TIF_SIGPENDING));
 }
 
 static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
@@ -447,7 +446,6 @@ static inline bool test_and_clear_restore_sigmask(void)
 static inline void set_restore_sigmask(void)
 {
 	current->restore_sigmask = true;
-	WARN_ON(!test_thread_flag(TIF_SIGPENDING));
 }
 static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
 {
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016..887cea6 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -273,8 +273,7 @@ extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
-extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
-	sigset_t *oldset, size_t sigsetsize);
+extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize);
 extern void restore_user_sigmask(const void __user *usigmask,
 				 sigset_t *sigsaved);
 extern void set_current_blocked(sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 227ba17..76f4f9a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2801,19 +2801,21 @@ EXPORT_SYMBOL(sigprocmask);
  * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
  * epoll_pwait where a new sigmask is passed from userland for the syscalls.
  */
-int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
-		     sigset_t *oldset, size_t sigsetsize)
+int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize)
 {
-	if (!usigmask)
+	sigset_t *kmask;
+
+	if (!umask)
 		return 0;
 
 	if (sigsetsize != sizeof(sigset_t))
 		return -EINVAL;
-	if (copy_from_user(set, usigmask, sizeof(sigset_t)))
+	if (copy_from_user(kmask, umask, sizeof(sigset_t)))
 		return -EFAULT;
 
-	*oldset = current->blocked;
-	set_current_blocked(set);
+	set_restore_sigmask();
+	current->saved_sigmask = current->blocked;
+	set_current_blocked(kmask);
 
 	return 0;
 }
@@ -2840,39 +2842,6 @@ int set_compat_user_sigmask(const compat_sigset_t __user *usigmask,
 EXPORT_SYMBOL(set_compat_user_sigmask);
 #endif
 
-/*
- * restore_user_sigmask:
- * usigmask: sigmask passed in from userland.
- * sigsaved: saved sigmask when the syscall started and changed the sigmask to
- *           usigmask.
- *
- * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
- * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
- */
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
-{
-
-	if (!usigmask)
-		return;
-	/*
-	 * When signals are pending, do not restore them here.
-	 * Restoring sigmask here can lead to delivering signals that the above
-	 * syscalls are intended to block because of the sigmask passed in.
-	 */
-	if (signal_pending(current)) {
-		current->saved_sigmask = *sigsaved;
-		set_restore_sigmask();
-		return;
-	}
-
-	/*
-	 * This is needed because the fast syscall return path does not restore
-	 * saved_sigmask when signals are not pending.
-	 */
-	set_current_blocked(sigsaved);
-}
-EXPORT_SYMBOL(restore_user_sigmask);
-
 /**
  *  sys_rt_sigprocmask - change the list of currently blocked signals
  *  @how: whether to add, remove, or set signals
Deepa Dinamani May 22, 2019, 3:55 p.m. UTC | #2
-Deepa

> On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
>> On 05/21, Deepa Dinamani wrote:
>>
>> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
>> etc) only when there is no other error. If there is a signal and an error
>> like EINVAL, the syscalls return -EINVAL rather than the interrupted
>> error codes.
>
> Ugh. I need to re-check, but at first glance I really dislike this change.
>
> I think we can fix the problem _and_ simplify the code. Something like below.
> The patch is obviously incomplete, it changes only only one caller of
> set_user_sigmask(), epoll_pwait() to explain what I mean.
> restore_user_sigmask() should simply die. Although perhaps another helper
> makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).

restore_user_sigmask() was added because of all the variants of these
syscalls we added because of y2038 as noted in commit message:

  signal: Add restore_user_sigmask()

    Refactor the logic to restore the sigmask before the syscall
    returns into an api.
    This is useful for versions of syscalls that pass in the
    sigmask and expect the current->sigmask to be changed during
    the execution and restored after the execution of the syscall.

    With the advent of new y2038 syscalls in the subsequent patches,
    we add two more new versions of the syscalls (for pselect, ppoll
    and io_pgetevents) in addition to the existing native and compat
    versions. Adding such an api reduces the logic that would need to
    be replicated otherwise.


>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4a0e98d..85f56e4 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
>        size_t, sigsetsize)
> {
>    int error;
> -    sigset_t ksigmask, sigsaved;
>
>    /*
>     * If the caller wants a certain signal mask to be set during the wait,
>     * we apply it here.
>     */
> -    error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> +    error = set_user_sigmask(sigmask, sigsetsize);
>    if (error)
>        return error;
>
>    error = do_epoll_wait(epfd, events, maxevents, timeout);
>
> -    restore_user_sigmask(sigmask, &sigsaved);
> +    if (error != -EINTR)

As you address all the other syscalls this condition becomes more and
more complicated.

> +        restore_saved_sigmask();
>
>    return error;
> }
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index e412c09..1e82ae0 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> static inline void set_restore_sigmask(void)
> {
>    set_thread_flag(TIF_RESTORE_SIGMASK);
> -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));

So you always want do_signal() to be called?
You will have to check each architecture's implementation of
do_signal() to check if that has any side effects.

Although this is not what the patch is solving. What we want is to
adjust return codes on all these syscalls to user and not drop
signals. Please check v2/v3 of the patch. I've updated the commit text
to provide more context into what is actually being fixed here.

If we really want to simplify, we should rewrite all the internal
logic of all the ppoll, epoll_pwait, io_pgetevent syscall internal
handling where we set the error code.
As new versions of syscalls were added, the internal logic got
reworked rather hapazardly. But, as the current issue points out,
these are delicate changes.

-Deepa
> }
>
> static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
> @@ -447,7 +446,6 @@ static inline bool test_and_clear_restore_sigmask(void)
> static inline void set_restore_sigmask(void)
> {
>    current->restore_sigmask = true;
> -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> }
> static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
> {
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index 9702016..887cea6 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -273,8 +273,7 @@ extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
>                   struct task_struct *p, enum pid_type type);
> extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
> extern int sigprocmask(int, sigset_t *, sigset_t *);
> -extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
> -    sigset_t *oldset, size_t sigsetsize);
> +extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize);
> extern void restore_user_sigmask(const void __user *usigmask,
>                 sigset_t *sigsaved);
> extern void set_current_blocked(sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 227ba17..76f4f9a 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2801,19 +2801,21 @@ EXPORT_SYMBOL(sigprocmask);
>  * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
>  * epoll_pwait where a new sigmask is passed from userland for the syscalls.
>  */
> -int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
> -             sigset_t *oldset, size_t sigsetsize)
> +int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize)
> {
> -    if (!usigmask)
> +    sigset_t *kmask;
> +
> +    if (!umask)
>        return 0;
>
>    if (sigsetsize != sizeof(sigset_t))
>        return -EINVAL;
> -    if (copy_from_user(set, usigmask, sizeof(sigset_t)))
> +    if (copy_from_user(kmask, umask, sizeof(sigset_t)))
>        return -EFAULT;
>
> -    *oldset = current->blocked;
> -    set_current_blocked(set);
> +    set_restore_sigmask();
> +    current->saved_sigmask = current->blocked;
> +    set_current_blocked(kmask);
>
>    return 0;
> }
> @@ -2840,39 +2842,6 @@ int set_compat_user_sigmask(const compat_sigset_t __user *usigmask,
> EXPORT_SYMBOL(set_compat_user_sigmask);
> #endif
>
> -/*
> - * restore_user_sigmask:
> - * usigmask: sigmask passed in from userland.
> - * sigsaved: saved sigmask when the syscall started and changed the sigmask to
> - *           usigmask.
> - *
> - * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
> - * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
> - */
> -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> -{
> -
> -    if (!usigmask)
> -        return;
> -    /*
> -     * When signals are pending, do not restore them here.
> -     * Restoring sigmask here can lead to delivering signals that the above
> -     * syscalls are intended to block because of the sigmask passed in.
> -     */
> -    if (signal_pending(current)) {
> -        current->saved_sigmask = *sigsaved;
> -        set_restore_sigmask();
> -        return;
> -    }
> -
> -    /*
> -     * This is needed because the fast syscall return path does not restore
> -     * saved_sigmask when signals are not pending.
> -     */
> -    set_current_blocked(sigsaved);
> -}
> -EXPORT_SYMBOL(restore_user_sigmask);
> -
> /**
>  *  sys_rt_sigprocmask - change the list of currently blocked signals
>  *  @how: whether to add, remove, or set signals
>
Oleg Nesterov May 22, 2019, 4:14 p.m. UTC | #3
On 05/22, Deepa Dinamani wrote:
>
> -Deepa
>
> > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> On 05/21, Deepa Dinamani wrote:
> >>
> >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> >> etc) only when there is no other error. If there is a signal and an error
> >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> >> error codes.
> >
> > Ugh. I need to re-check, but at first glance I really dislike this change.
> >
> > I think we can fix the problem _and_ simplify the code. Something like below.
> > The patch is obviously incomplete, it changes only only one caller of
> > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > restore_user_sigmask() should simply die. Although perhaps another helper
> > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
>
> restore_user_sigmask() was added because of all the variants of these
> syscalls we added because of y2038 as noted in commit message:
>
>   signal: Add restore_user_sigmask()
>
>     Refactor the logic to restore the sigmask before the syscall
>     returns into an api.
>     This is useful for versions of syscalls that pass in the
>     sigmask and expect the current->sigmask to be changed during
>     the execution and restored after the execution of the syscall.
>
>     With the advent of new y2038 syscalls in the subsequent patches,
>     we add two more new versions of the syscalls (for pselect, ppoll
>     and io_pgetevents) in addition to the existing native and compat
>     versions. Adding such an api reduces the logic that would need to
>     be replicated otherwise.

Again, I need to re-check, will continue tomorrow. But so far I am not sure
this helper can actually help.

> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> >        size_t, sigsetsize)
> > {
> >    int error;
> > -    sigset_t ksigmask, sigsaved;
> >
> >    /*
> >     * If the caller wants a certain signal mask to be set during the wait,
> >     * we apply it here.
> >     */
> > -    error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> > +    error = set_user_sigmask(sigmask, sigsetsize);
> >    if (error)
> >        return error;
> >
> >    error = do_epoll_wait(epfd, events, maxevents, timeout);
> >
> > -    restore_user_sigmask(sigmask, &sigsaved);
> > +    if (error != -EINTR)
>
> As you address all the other syscalls this condition becomes more and
> more complicated.

May be.

> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > static inline void set_restore_sigmask(void)
> > {
> >    set_thread_flag(TIF_RESTORE_SIGMASK);
> > -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
>
> So you always want do_signal() to be called?

Why do you think so? No. This is just to avoid the warning, because with the
patch I sent set_restore_sigmask() is called "in advance".

> You will have to check each architecture's implementation of
> do_signal() to check if that has any side effects.

I don't think so.

> Although this is not what the patch is solving.

Sure. But you know, after I tried to read the changelog, I am not sure
I understand what exactly you are trying to fix. Could you please explain
this part

	The behavior
	before 854a6ed56839a was that the signals were dropped after the error
	code was decided. This resulted in lost signals but the userspace did not
	notice it

? I fail to understand it, sorry. It looks as if the code was already buggy before
that commit and it could miss a signal or something like this, but I do not see how.

Oleg.
Deepa Dinamani May 22, 2019, 4:33 p.m. UTC | #4
On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/22, Deepa Dinamani wrote:
> >
> > -Deepa
> >
> > > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > >> On 05/21, Deepa Dinamani wrote:
> > >>
> > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> > >> etc) only when there is no other error. If there is a signal and an error
> > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> > >> error codes.
> > >
> > > Ugh. I need to re-check, but at first glance I really dislike this change.
> > >
> > > I think we can fix the problem _and_ simplify the code. Something like below.
> > > The patch is obviously incomplete, it changes only only one caller of
> > > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > > restore_user_sigmask() should simply die. Although perhaps another helper
> > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
> >
> > restore_user_sigmask() was added because of all the variants of these
> > syscalls we added because of y2038 as noted in commit message:
> >
> >   signal: Add restore_user_sigmask()
> >
> >     Refactor the logic to restore the sigmask before the syscall
> >     returns into an api.
> >     This is useful for versions of syscalls that pass in the
> >     sigmask and expect the current->sigmask to be changed during
> >     the execution and restored after the execution of the syscall.
> >
> >     With the advent of new y2038 syscalls in the subsequent patches,
> >     we add two more new versions of the syscalls (for pselect, ppoll
> >     and io_pgetevents) in addition to the existing native and compat
> >     versions. Adding such an api reduces the logic that would need to
> >     be replicated otherwise.
>
> Again, I need to re-check, will continue tomorrow. But so far I am not sure
> this helper can actually help.
>
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> > >        size_t, sigsetsize)
> > > {
> > >    int error;
> > > -    sigset_t ksigmask, sigsaved;
> > >
> > >    /*
> > >     * If the caller wants a certain signal mask to be set during the wait,
> > >     * we apply it here.
> > >     */
> > > -    error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> > > +    error = set_user_sigmask(sigmask, sigsetsize);
> > >    if (error)
> > >        return error;
> > >
> > >    error = do_epoll_wait(epfd, events, maxevents, timeout);
> > >
> > > -    restore_user_sigmask(sigmask, &sigsaved);
> > > +    if (error != -EINTR)
> >
> > As you address all the other syscalls this condition becomes more and
> > more complicated.
>
> May be.
>
> > > --- a/include/linux/sched/signal.h
> > > +++ b/include/linux/sched/signal.h
> > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > static inline void set_restore_sigmask(void)
> > > {
> > >    set_thread_flag(TIF_RESTORE_SIGMASK);
> > > -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> >
> > So you always want do_signal() to be called?
>
> Why do you think so? No. This is just to avoid the warning, because with the
> patch I sent set_restore_sigmask() is called "in advance".
>
> > You will have to check each architecture's implementation of
> > do_signal() to check if that has any side effects.
>
> I don't think so.

Why not?

> > Although this is not what the patch is solving.
>
> Sure. But you know, after I tried to read the changelog, I am not sure
> I understand what exactly you are trying to fix. Could you please explain
> this part
>
>         The behavior
>         before 854a6ed56839a was that the signals were dropped after the error
>         code was decided. This resulted in lost signals but the userspace did not
>         notice it
>
> ? I fail to understand it, sorry. It looks as if the code was already buggy before
> that commit and it could miss a signal or something like this, but I do not see how.

Did you read the explanation pointed to in the commit text? :

https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Let me know what part you don't understand and I can explain more.

It would be better to understand the isssue before we start discussing the fix.

-Deepa
Chris Down May 22, 2019, 10:18 p.m. UTC | #5
+Cc: linux-mm, since this broke mmots tree and has been applied there

This patch is missing a definition for signal_detected in io_cqring_wait, which 
breaks the build.

diff --git fs/io_uring.c fs/io_uring.c
index b785c8d7efc4..b34311675d2d 100644
--- fs/io_uring.c
+++ fs/io_uring.c
@@ -2182,7 +2182,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 {
        struct io_cq_ring *ring = ctx->cq_ring;
        sigset_t ksigmask, sigsaved;
-       int ret;
+       int ret, signal_detected;
 
        if (io_cqring_events(ring) >= min_events)
                return 0;
Deepa Dinamani May 22, 2019, 10:52 p.m. UTC | #6
On Wed, May 22, 2019 at 3:18 PM Chris Down <chris@chrisdown.name> wrote:
>
> +Cc: linux-mm, since this broke mmots tree and has been applied there
>
> This patch is missing a definition for signal_detected in io_cqring_wait, which
> breaks the build.

This patch does not break the build.
The patch the breaks the build was the v2 of this patch since there
was an accidental deletion.
That's what the v3 fixed. I think v3 got picked up today morning into
the mm tree


-Deepa
David Laight May 23, 2019, 9:03 a.m. UTC | #7
From: Deepa Dinamani
> Sent: 22 May 2019 17:34
> On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 05/22, Deepa Dinamani wrote:
> > >
> > > -Deepa
> > >
> > > > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > > >
> > > >> On 05/21, Deepa Dinamani wrote:
> > > >>
> > > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> > > >> etc) only when there is no other error. If there is a signal and an error
> > > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> > > >> error codes.
> > > >
> > > > Ugh. I need to re-check, but at first glance I really dislike this change.
> > > >
> > > > I think we can fix the problem _and_ simplify the code. Something like below.
> > > > The patch is obviously incomplete, it changes only only one caller of
> > > > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > > > restore_user_sigmask() should simply die. Although perhaps another helper
> > > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
> > >
> > > restore_user_sigmask() was added because of all the variants of these
> > > syscalls we added because of y2038 as noted in commit message:
> > >
> > >   signal: Add restore_user_sigmask()
> > >
> > >     Refactor the logic to restore the sigmask before the syscall
> > >     returns into an api.
> > >     This is useful for versions of syscalls that pass in the
> > >     sigmask and expect the current->sigmask to be changed during
> > >     the execution and restored after the execution of the syscall.
> > >
> > >     With the advent of new y2038 syscalls in the subsequent patches,
> > >     we add two more new versions of the syscalls (for pselect, ppoll
> > >     and io_pgetevents) in addition to the existing native and compat
> > >     versions. Adding such an api reduces the logic that would need to
> > >     be replicated otherwise.
> >
> > Again, I need to re-check, will continue tomorrow. But so far I am not sure
> > this helper can actually help.
> >
> > > > --- a/fs/eventpoll.c
> > > > +++ b/fs/eventpoll.c
> > > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *,
> events,
> > > >        size_t, sigsetsize)
> > > > {
> > > >    int error;
> > > > -    sigset_t ksigmask, sigsaved;
> > > >
> > > >    /*
> > > >     * If the caller wants a certain signal mask to be set during the wait,
> > > >     * we apply it here.
> > > >     */
> > > > -    error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> > > > +    error = set_user_sigmask(sigmask, sigsetsize);
> > > >    if (error)
> > > >        return error;
> > > >
> > > >    error = do_epoll_wait(epfd, events, maxevents, timeout);
> > > >
> > > > -    restore_user_sigmask(sigmask, &sigsaved);
> > > > +    if (error != -EINTR)
> > >
> > > As you address all the other syscalls this condition becomes more and
> > > more complicated.
> >
> > May be.
> >
> > > > --- a/include/linux/sched/signal.h
> > > > +++ b/include/linux/sched/signal.h
> > > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > > static inline void set_restore_sigmask(void)
> > > > {
> > > >    set_thread_flag(TIF_RESTORE_SIGMASK);
> > > > -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> > >
> > > So you always want do_signal() to be called?
> >
> > Why do you think so? No. This is just to avoid the warning, because with the
> > patch I sent set_restore_sigmask() is called "in advance".
> >
> > > You will have to check each architecture's implementation of
> > > do_signal() to check if that has any side effects.
> >
> > I don't think so.
> 
> Why not?
> 
> > > Although this is not what the patch is solving.
> >
> > Sure. But you know, after I tried to read the changelog, I am not sure
> > I understand what exactly you are trying to fix. Could you please explain
> > this part
> >
> >         The behavior
> >         before 854a6ed56839a was that the signals were dropped after the error
> >         code was decided. This resulted in lost signals but the userspace did not
> >         notice it
> >
> > ? I fail to understand it, sorry. It looks as if the code was already buggy before
> > that commit and it could miss a signal or something like this, but I do not see how.
> 
> Did you read the explanation pointed to in the commit text? :
> 
> https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
> 
> Let me know what part you don't understand and I can explain more.
> 
> It would be better to understand the isssue before we start discussing the fix.


I'm confused...
I thought:

EINTR should only be returned if a blocking sleep (eg in do_epoll_wait() itself)
was interrupted by a signal that was enabled at the time of the sleep.

The handlers for all unblocked signals should be run on return to user.
This is after the mask has been restored and regardless of the error code.

So epoll() can return 'success' or 'timeout' (etc) and the handler for SIG_URG
should still be called.
This is exactly equivalent to the interrupt that generates the signal happening
just after the 'return to user' of the system call.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Oleg Nesterov May 23, 2019, 2:33 p.m. UTC | #8
On 05/22, Deepa Dinamani wrote:
>
> > > > --- a/include/linux/sched/signal.h
> > > > +++ b/include/linux/sched/signal.h
> > > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > > static inline void set_restore_sigmask(void)
> > > > {
> > > >    set_thread_flag(TIF_RESTORE_SIGMASK);
> > > > -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> > >
> > > So you always want do_signal() to be called?
> >
> > Why do you think so? No. This is just to avoid the warning, because with the
> > patch I sent set_restore_sigmask() is called "in advance".
> >
> > > You will have to check each architecture's implementation of
> > > do_signal() to check if that has any side effects.
> >
> > I don't think so.
>
> Why not?

Why yes?

it seems that we have some communication problems. OK, please look at the code
I proposed, I only added a couple of TODO comments

	static inline void set_restore_sigmask(void)
	{
		// WARN_ON(!TIF_SIGPENDING) was removed by this patch
		current->restore_sigmask = true;
	}

	int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize)
	{
		sigset_t *kmask;

		if (!umask)
			return 0;

		if (sigsetsize != sizeof(sigset_t))
			return -EINVAL;
		if (copy_from_user(kmask, umask, sizeof(sigset_t)))
			return -EFAULT;

		set_restore_sigmask();
		current->saved_sigmask = current->blocked;
		set_current_blocked(kmask);

		return 0;
	}

	SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
			int, maxevents, int, timeout, const sigset_t __user *, sigmask,
			size_t, sigsetsize)
	{
		int error;

		/*
		 * If the caller wants a certain signal mask to be set during the wait,
		 * we apply it here.
		 */
		error = set_user_sigmask(sigmask, sigsetsize);
		if (error)
			return error;

		error = do_epoll_wait(epfd, events, maxevents, timeout);

		// TODO. Add another helper to restore WARN_ON(!TIF_SIGPENDING)
		// in case restore_saved_sigmask() is NOT called.

		if (error != -EINTR)
			restore_saved_sigmask();

		return error;
	}

Note that it looks much simpler. Now, could you please explain

	- why do you think this code is not correct ?

	- why do you think we need to audit do_signal() ???



> > > Although this is not what the patch is solving.
> >
> > Sure. But you know, after I tried to read the changelog, I am not sure
> > I understand what exactly you are trying to fix. Could you please explain
> > this part
> >
> >         The behavior
> >         before 854a6ed56839a was that the signals were dropped after the error
> >         code was decided. This resulted in lost signals but the userspace did not
> >         notice it
> >
> > ? I fail to understand it, sorry. It looks as if the code was already buggy before
> > that commit and it could miss a signal or something like this, but I do not see how.
>
> Did you read the explanation pointed to in the commit text? :
>
> https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

this link points to the lengthy and confusing discussion... after a quick glance
I didn't find an answer to my question, so let me repeat it again: why do you think
the kernel was buggy even before 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal:
Add restore_user_sigmask()") ?

Just in case...
https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=P3QQ@mail.gmail.com/
doesn't look right to me... let me quite some parts of your email:


	-       /*
	-        * If we changed the signal mask, we need to restore the original one.
	-        * In case we've got a signal while waiting, we do not restore the
	-        * signal mask yet, and we allow do_signal() to deliver the signal on
	-        * the way back to userspace, before the signal mask is restored.
	-        */
	-       if (sigmask) {
	-               if (error == -EINTR) {
	-                       memcpy(&current->saved_sigmask, &sigsaved,
	-                              sizeof(sigsaved));
	-                       set_restore_sigmask();
	-               } else

	**** Execution reaches this else statement and the sigmask is restored
	directly, ignoring the newly generated signal.

I see nothing wrong. This is what we want.

	The signal is never
	handled.

Well, "never" is not right. It won't be handled now, because it is blocked, but
for example think of another pselect/whatever call with the same sigmask.

> It would be better to understand the isssue before we start discussing the fix.

Agreed. And that is why I am asking for your explanations, quite possibly I missed
something, but so far I fail to understand you.

Oleg.
Oleg Nesterov May 23, 2019, 2:59 p.m. UTC | #9
On 05/23, David Laight wrote:
>
> I'm confused...

Me too. To clarify, the current code is obviously buggy, pselect/whatever
shouldn't return 0 (or anything else) if it was interrupted and we are going
to deliver the signal.

But it seems that Deepa has other concerns which I do not understand at all.

In any case, the signal_pending() check _inside_ restore_user_sigmask() can't
be right, with or without this patch. If nothing else, a signal can come right
after the check.

> So epoll() can return 'success' or 'timeout' (etc) and the handler for SIG_URG
> should still be called.

Not sure I understand... OK, suppose that you do

	block-all-signals;
	ret = pselect(..., sigmask(SIG_URG));

if it returns success/timeout then the handler for SIG_URG should not be called?

or I am totally confused...

Oleg.
David Laight May 23, 2019, 4:18 p.m. UTC | #10
From: Oleg Nesterov
> On 05/23, David Laight wrote:
> >
> > I'm confused...
> 
> Me too. To clarify, the current code is obviously buggy, pselect/whatever
> shouldn't return 0 (or anything else) if it was interrupted and we are going
> to deliver the signal.

If it was interrupted the return value has to be EINTR.
Whether any signal handlers are called is a separate matter.

> But it seems that Deepa has other concerns which I do not understand at all.
> 
> In any case, the signal_pending() check _inside_ restore_user_sigmask() can't
> be right, with or without this patch. If nothing else, a signal can come right
> after the check.
> 
> > So epoll() can return 'success' or 'timeout' (etc) and the handler for SIG_URG
> > should still be called.
> 
> Not sure I understand... OK, suppose that you do
> 
> 	block-all-signals;
> 	ret = pselect(..., sigmask(SIG_URG));
> 
> if it returns success/timeout then the handler for SIG_URG should not be called?

Ugg...
Posix probably allows the signal handler be called at the point the event
happens rather than being deferred until the system call completes.
Queueing up the signal handler to be run at a later time (syscall exit)
certainly makes sense.
Definitely safest to call the signal handler even if success/timeout
is returned.
pselect() exists to stop the entry race, not the exit one.


> or I am totally confused...

The pselect(2) man page says that the signal handler for a signal that is
enabled for the duration should run.
Clearly it is also valid to call the signal handlers for any signals that
are allowed on entry/exit (they could happen just after the return).
Also remember that pselect() can also be used to disable signals.

So ISTM that signal handlers allowed by either signal mask
should be called during syscall exit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Oleg Nesterov May 23, 2019, 4:36 p.m. UTC | #11
On 05/23, David Laight wrote:
>
> From: Oleg Nesterov
> > On 05/23, David Laight wrote:
> > >
> > > I'm confused...
> >
> > Me too. To clarify, the current code is obviously buggy, pselect/whatever
> > shouldn't return 0 (or anything else) if it was interrupted and we are going
> > to deliver the signal.
>
> If it was interrupted the return value has to be EINTR.

Yes, and this is what we need to fix.

> Whether any signal handlers are called is a separate matter.

Not really... because in this case we know that the signal will be delivered,

> > Not sure I understand... OK, suppose that you do
> >
> > 	block-all-signals;
> > 	ret = pselect(..., sigmask(SIG_URG));
> >
> > if it returns success/timeout then the handler for SIG_URG should not be called?
>
> Ugg...
> Posix probably allows the signal handler be called at the point the event
> happens rather than being deferred until the system call completes.
> Queueing up the signal handler to be run at a later time (syscall exit)
> certainly makes sense.
> Definitely safest to call the signal handler even if success/timeout
> is returned.

Why?

> pselect() exists to stop the entry race, not the exit one.

pselect() has to block SIG_URG again before it returns to user-mode, right?

Suppose pselect() finds a ready fd, and this races with SIG_URG.

Why do you think the handler should run?

What if SIG_URG comes right after pselect() blocks SIG_URG again? I mean,
how this differs the case when it comes before, but a ready fd was already
found?

Oleg.
David Laight May 23, 2019, 4:56 p.m. UTC | #12
From: Oleg Nesterov
> Sent: 23 May 2019 17:36
> On 05/23, David Laight wrote:
> >
> > From: Oleg Nesterov
> > > On 05/23, David Laight wrote:
...
> > > Not sure I understand... OK, suppose that you do
> > >
> > > 	block-all-signals;
> > > 	ret = pselect(..., sigmask(SIG_URG));
> > >
> > > if it returns success/timeout then the handler for SIG_URG should not be called?
> >
> > Ugg...
> > Posix probably allows the signal handler be called at the point the event
> > happens rather than being deferred until the system call completes.
> > Queueing up the signal handler to be run at a later time (syscall exit)
> > certainly makes sense.
> > Definitely safest to call the signal handler even if success/timeout
> > is returned.
> 
> Why?
> 
> > pselect() exists to stop the entry race, not the exit one.
> 
> pselect() has to block SIG_URG again before it returns to user-mode, right?

Yep.
So the signal handler can't be called for a signal that happens after
pselect() returns.

> Suppose pselect() finds a ready fd, and this races with SIG_URG.

You mean if SIG_URG is raised after a ready fd is found (or even timeout)?
So the return value isn't EINTR.
(If an fd is readable on entry, the SIG_URG could have happened much earlier.)

> Why do you think the handler should run?

Think of the application code loop.
Consider what happens if the signal is SIG_INT - to request the program
stop.
After every pselect() call the application looks to see if the handler
has been called.
If one of the fds is always readable pselect() will never return EINTR
but you want the SIG_INT handler run so that the loop gets terminated.
If you only call the signal handler when EINTR is returned the process
will never stop.
So you need to call the handler even when pselect() succeeds/time out.

> What if SIG_URG comes right after pselect() blocks SIG_URG again? I mean,
> how this differs the case when it comes before, but a ready fd was already
> found?

I suspect you need to defer the re-instatement of the original mask
to the code that calls the signal handlers (which probably should
be called with the programs signal mask).
So that particular window doesn't exist.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Deepa Dinamani May 23, 2019, 6:06 p.m. UTC | #13
Ok, since there has been quite a bit of argument here, I will
backtrack a little bit and maybe it will help us understand what's
happening here.
There are many scenarios being discussed on this thread:
a. State of code before 854a6ed56839a
b. State after 854a6ed56839a
c. Proposed fix as per the patchset in question.

Oleg, I will discuss these first and then we can discuss the
additional changes you suggested.

Some background on why we have these syscalls that take sigmask as an
argument. This is just for the sake of completeness of the argument.

These are particularly meant for a scenario(d) such as below:

1. block the signals you don't care about.
2. syscall()
3. unblock the signals blocked in 1.

The problem here is that if there is a signal that is not blocked by 1
and such a signal is delivered between 1 and 2(since they are not
atomic), the syscall in 2 might block forever as it never found out
about the signal.

As per [a] and let's consider the case of epoll_pwait only first for simplicity.

As I said before, ep_poll() is what checks for signal_pending() and is
responsible for setting errno to -EINTR when there is a signal.

So if a signal is received after ep_poll() and ep_poll() returns
success, it is never noticed by the syscall during execution.
So the question is does the userspace have to know about this signal
or not. From scenario [d] above, I would say it should, even if all
the fd's completed successfully.
This does not happen in [a]. So this is what I said was already broken.

What [b] does is to move the signal check closer to the restoration of
the signal. This way it is good. So, if there is a signal after
ep_poll() returns success, it is noticed and the signal is delivered
when the syscall exits. But, the syscall error status itself is 0.

So now [c] is adjusting the return values based on whether extra
signals were detected after ep_poll(). This part was needed even for
[a].

Let me know if this clarifies things a bit.

-Deepa
Deepa Dinamani May 23, 2019, 8:41 p.m. UTC | #14
On Thu, May 23, 2019 at 11:06 AM Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>
> Ok, since there has been quite a bit of argument here, I will
> backtrack a little bit and maybe it will help us understand what's
> happening here.
> There are many scenarios being discussed on this thread:
> a. State of code before 854a6ed56839a
> b. State after 854a6ed56839a
> c. Proposed fix as per the patchset in question.
>
> Oleg, I will discuss these first and then we can discuss the
> additional changes you suggested.
>
> Some background on why we have these syscalls that take sigmask as an
> argument. This is just for the sake of completeness of the argument.
>
> These are particularly meant for a scenario(d) such as below:
>
> 1. block the signals you don't care about.
> 2. syscall()
> 3. unblock the signals blocked in 1.
>
> The problem here is that if there is a signal that is not blocked by 1
> and such a signal is delivered between 1 and 2(since they are not
> atomic), the syscall in 2 might block forever as it never found out
> about the signal.
>
> As per [a] and let's consider the case of epoll_pwait only first for simplicity.
>
> As I said before, ep_poll() is what checks for signal_pending() and is
> responsible for setting errno to -EINTR when there is a signal.
>
> So if a signal is received after ep_poll() and ep_poll() returns
> success, it is never noticed by the syscall during execution.
> So the question is does the userspace have to know about this signal
> or not. From scenario [d] above, I would say it should, even if all
> the fd's completed successfully.
> This does not happen in [a]. So this is what I said was already broken.
>
> What [b] does is to move the signal check closer to the restoration of
> the signal. This way it is good. So, if there is a signal after
> ep_poll() returns success, it is noticed and the signal is delivered
> when the syscall exits. But, the syscall error status itself is 0.
>
> So now [c] is adjusting the return values based on whether extra
> signals were detected after ep_poll(). This part was needed even for
> [a].
>
> Let me know if this clarifies things a bit.

Just adding a little more clarification, there is an additional change
between [a] and [b].
As per [a] we would just restore the signal instead of changing the
saved_sigmask and the signal could get delivered right then. [b]
changes this to happen at syscall exit:

void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{

           <snip>

          /*
           * When signals are pending, do not restore them here.
           * Restoring sigmask here can lead to delivering signals
that the above
           * syscalls are intended to block because of the sigmask passed in.
           */
           if (signal_pending(current)) {
           current->saved_sigmask = *sigsaved;
           set_restore_sigmask();
           return;
}

-Deepa
Deepa Dinamani May 23, 2019, 9:06 p.m. UTC | #15
> Just adding a little more clarification, there is an additional change
> between [a] and [b].
> As per [a] we would just restore the signal instead of changing the
> saved_sigmask and the signal could get delivered right then. [b]
> changes this to happen at syscall exit:

Rewording above, as there seems to be a few misrepresentations:

Just adding a little more clarification, there is an additional change
between [a] and [b].
As per [a] we would just restore the signal mask instead of changing
the saved_sigmask and the even the blocked signals could get delivered
right then. [b] changes the restoration to happen at syscall exit:

> void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> {
>
>            <snip>
>
>           /*
>            * When signals are pending, do not restore them here.
>            * Restoring sigmask here can lead to delivering signals
> that the above
>            * syscalls are intended to block because of the sigmask passed in.
>            */
>            if (signal_pending(current)) {
>            current->saved_sigmask = *sigsaved;
>            set_restore_sigmask();
>            return;
> }

 -Deepa
David Laight May 24, 2019, 9:58 a.m. UTC | #16
From: Deepa Dinamani
> Sent: 23 May 2019 19:07
> Ok, since there has been quite a bit of argument here, I will
> backtrack a little bit and maybe it will help us understand what's
> happening here.
> There are many scenarios being discussed on this thread:
> a. State of code before 854a6ed56839a
> b. State after 854a6ed56839a
> c. Proposed fix as per the patchset in question.
> 
> Oleg, I will discuss these first and then we can discuss the
> additional changes you suggested.
> 
> Some background on why we have these syscalls that take sigmask as an
> argument. This is just for the sake of completeness of the argument.
> 
> These are particularly meant for a scenario(d) such as below:
> 
> 1. block the signals you don't care about.
> 2. syscall()
> 3. unblock the signals blocked in 1.
> 
> The problem here is that if there is a signal that is not blocked by 1
> and such a signal is delivered between 1 and 2(since they are not
> atomic), the syscall in 2 might block forever as it never found out
> about the signal.

I think we all agree about the underlying problem these system calls solve.

> As per [a] and let's consider the case of epoll_pwait only first for simplicity.
For simplicity you ought to consider sigwaitinfo() :-)

> As I said before, ep_poll() is what checks for signal_pending() and is
> responsible for setting errno to -EINTR when there is a signal.

Ah, there in lies the problem (well one of them).
ep_poll() should only return -EINTR if its sleep (waiting for an fd to
be ready) is interrupted.
The signal handler(s) should still be called though.
If the timeout is 0 then any signal handler should be called, but the
return value is still 0 (if no fd are 'ready').

> So if a signal is received after ep_poll() and ep_poll() returns
> success, it is never noticed by the syscall during execution.
> So the question is does the userspace have to know about this signal
> or not. From scenario [d] above, I would say it should, even if all
> the fd's completed successfully.

What is scenario [d]? You've code versions a/b/c but no scenarios.

> This does not happen in [a]. So this is what I said was already broken.
> 
> What [b] does is to move the signal check closer to the restoration of
> the signal. This way it is good. So, if there is a signal after
> ep_poll() returns success, it is noticed and the signal is delivered
> when the syscall exits. But, the syscall error status itself is 0.

By 0 you mean >= 0 ??

> So now [c] is adjusting the return values based on whether extra
> signals were detected after ep_poll(). This part was needed even for
> [a].

IMHO The return value should never be changed.
Much like write() can return a partial length if a signal happens.

ISTM that a user signal handler should be scheduled to be run
if the signal is pending and not masked.
On return to user all scheduled signal handlers are called
(regardless as to whether the signal is masked at that time).
This might mean getting the 'return to user' code to restore
the original signal mask saved for epoll_pwait() and pselect() etc.

If, for some perverted reason (compatibility with broken apps),
you need epoll_pwait() to return EINTR instead of 0 the you
probably need a special 'kernel internal' errno value that
is always converted by the syscall exit code to EINTR/0
(and a second one that is EINTR/EAGAIN etc).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Oleg Nesterov May 24, 2019, 1:29 p.m. UTC | #17
It seems that we all are just trying to confuse each other. I got lost.

On 05/23, David Laight wrote:
>
> From: Oleg Nesterov
> > Sent: 23 May 2019 17:36
> > On 05/23, David Laight wrote:
> > >
> > > From: Oleg Nesterov
> > > > On 05/23, David Laight wrote:
> ...
> > > > Not sure I understand... OK, suppose that you do
> > > >
> > > > 	block-all-signals;
> > > > 	ret = pselect(..., sigmask(SIG_URG));
> > > >
> > > > if it returns success/timeout then the handler for SIG_URG should not be called?
> > >
> > > Ugg...
> > > Posix probably allows the signal handler be called at the point the event
> > > happens rather than being deferred until the system call completes.
> > > Queueing up the signal handler to be run at a later time (syscall exit)
> > > certainly makes sense.
> > > Definitely safest to call the signal handler even if success/timeout
> > > is returned.
> >
> > Why?
> >
> > > pselect() exists to stop the entry race, not the exit one.
> >
> > pselect() has to block SIG_URG again before it returns to user-mode, right?
>
> Yep.
> So the signal handler can't be called for a signal that happens after
> pselect() returns.

Yes. And "after pselect() returns" actually means "after pselect() restores
the old sigmask while it returns to user mode".

> > Suppose pselect() finds a ready fd, and this races with SIG_URG.
>
> You mean if SIG_URG is raised after a ready fd is found (or even timeout)?
> So the return value isn't EINTR.

Yes.

> (If an fd is readable on entry, the SIG_URG could have happened much earlier.)

Why not? See the pseudo code above. It was blocked before pselect() was called.
So SIG_URG can be already pending when pselect() is called but since an fd is
already ready on entry pselect() restores the old sigmask (and thus blocks SIG_URG
again) and returns success. The handler is not called.

However, if there is no a ready fd, pselect won't block. It will notice SIG_URG,
deliver this signal, and return -EINTR.


> > Why do you think the handler should run?
>
> Think of the application code loop.
> Consider what happens if the signal is SIG_INT - to request the program
> stop.

SIG_INT or SIG_URG ? Again, please look at the pseudo code above. SIG_INT is
blocked and never unblocked.

> After every pselect() call the application looks to see if the handler
> has been called.
> If one of the fds is always readable pselect() will never return EINTR
> but you want the SIG_INT handler run so that the loop gets terminated.
> If you only call the signal handler when EINTR is returned the process
> will never stop.
> So you need to call the handler even when pselect() succeeds/time out.

Then do not block SIG_INT ?

	block-all-signals-except-SIG_INT;
	ret = pselect(..., sigmask{SIG_URG, SIG_INT});


> > What if SIG_URG comes right after pselect() blocks SIG_URG again? I mean,
> > how this differs the case when it comes before, but a ready fd was already
> > found?
>
> I suspect you need to defer the re-instatement of the original mask
> to the code that calls the signal handlers (which probably should
> be called with the programs signal mask).

This is what the kernel does when the signal is delivered, the original mask
is restored after the signal handler runs.

> So that particular window doesn't exist.

Which window???

Oleg.
Oleg Nesterov May 24, 2019, 2:10 p.m. UTC | #18
On 05/23, Deepa Dinamani wrote:
>
> Ok, since there has been quite a bit of argument here, I will
> backtrack a little bit and maybe it will help us understand what's
> happening here.
> There are many scenarios being discussed on this thread:
> a. State of code before 854a6ed56839a

I think everything was correct,

> b. State after 854a6ed56839a

obviously buggy,

> c. Proposed fix as per the patchset in question.

Nack, sorry. I'll try to finish my patch on Monday. It will restore the state
before 854a6ed56839a and (imo) cleanup/simplify this code.

At leat this is what I think right now. May be I will have to change my mind
after this discussion. But in any case I can't believe I will ever agree with
your fix ;)

> These are particularly meant for a scenario(d) such as below:
>
> 1. block the signals you don't care about.
> 2. syscall()
> 3. unblock the signals blocked in 1.
>
> The problem here is that if there is a signal that is not blocked by 1
> and such a signal is delivered between 1 and 2(since they are not
> atomic), the syscall in 2 might block forever as it never found out
> about the signal.

and that is why we have pselect/etc to make this sequence "atomic".

> As per [a] and let's consider the case of epoll_pwait only first for simplicity.
>
> As I said before, ep_poll() is what checks for signal_pending() and is
> responsible for setting errno to -EINTR when there is a signal.

To clarify, if do_epoll_wait() return -EINTR then signal_pending() is true,
right?

> So if a signal is received after ep_poll() and ep_poll() returns
> success, it is never noticed by the syscall during execution.

What you are saying looks very confusing to me, I will assume that you
meant something like

	- a signal SIG_XXX was blocked before sys_epoll_pwait() was called

	- sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask

	- sys_epoll_pwait() calls do_epoll_wait() which returns success

	- SIG_XXX comes after that and it is "never noticed"

Yes. Everything is correct. And see my reply to David, SIG_XXX can even
come _before_ sys_epoll_pwait() was called.

> So the question is does the userspace have to know about this signal
> or not.

If userspace needs to know about SIG_XXX it should not block it, that is all.

> What [b] does is to move the signal check closer to the restoration of
> the signal.

FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong
return value you are trying to fix).

And even if there were ANY reason to do this, note that (with or without this
fix) the signal_pending() check inside restore_user_sigmask() can NOT help,
simply because SIG_XXX can come right after this check.

Oleg.
Oleg Nesterov May 24, 2019, 2:19 p.m. UTC | #19
On 05/23, Deepa Dinamani wrote:
>
> 1. block the signals you don't care about.
> 2. syscall()
> 3. unblock the signals blocked in 1.

and even this part of your email is very confusing. because in this case
we can never miss a signal. I'd say

	1. block the signals you don't care about
	2. unblock the signals which should interrupt the syscall below
	3. syscall()
	4. block the signals unblocked in 2.

Oleg.
Deepa Dinamani May 24, 2019, 2:29 p.m. UTC | #20
I think you are misunderstanding what I said. You are taking things
out of context. I was saying here what I did was inspired by why the
syscall was designed to begin with. The syscall below refers to
epoll_wait and not epoll_pwait.

-Deepa

On Fri, May 24, 2019 at 7:19 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/23, Deepa Dinamani wrote:
> >
> > 1. block the signals you don't care about.
> > 2. syscall()
> > 3. unblock the signals blocked in 1.
>
> and even this part of your email is very confusing. because in this case
> we can never miss a signal. I'd say
>
>         1. block the signals you don't care about
>         2. unblock the signals which should interrupt the syscall below
>         3. syscall()
>         4. block the signals unblocked in 2.
>
> Oleg.
>
Oleg Nesterov May 24, 2019, 2:51 p.m. UTC | #21
On 05/24, Deepa Dinamani wrote:
>
> I think you are misunderstanding what I said.

probably. Everything was very confusing to me from the very beginning.
And yes, I can hardly understand your emails, sorry. This one too :/

> You are taking things
> out of context. I was saying here what I did was inspired by why the
> syscall was designed to begin with.

which syscall?

> The syscall below refers to
> epoll_wait and not epoll_pwait.

So you tried to explain why epoll_pwait() was designed? Or what?

Either way, everything I said below still looks right to me. This probably
means that I still can't understand you.

But this is irrelevant. My main point is that the kernel was correct before
854a6ed568 ("signal: Add restore_user_sigmask()"), the (incomplete) patch I sent
tries to a) restore the correct behaviour and b) simplify/cleanup the code.

> On Fri, May 24, 2019 at 7:19 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 05/23, Deepa Dinamani wrote:
> > >
> > > 1. block the signals you don't care about.
> > > 2. syscall()
> > > 3. unblock the signals blocked in 1.
> >
> > and even this part of your email is very confusing. because in this case
> > we can never miss a signal. I'd say
> >
> >         1. block the signals you don't care about
> >         2. unblock the signals which should interrupt the syscall below
> >         3. syscall()
> >         4. block the signals unblocked in 2.
> >
> > Oleg.
> >
David Laight May 24, 2019, 2:59 p.m. UTC | #22
From: Oleg Nesterov
> Sent: 24 May 2019 14:29
> It seems that we all are just trying to confuse each other. I got lost.

I'm always lost :-)

> On 05/23, David Laight wrote:
> >
> > From: Oleg Nesterov
> > > Sent: 23 May 2019 17:36
> > > On 05/23, David Laight wrote:
> > > >
> > > > From: Oleg Nesterov
> > > > > On 05/23, David Laight wrote:
> > ...
> > > > > Not sure I understand... OK, suppose that you do
> > > > >
> > > > > 	block-all-signals;
> > > > > 	ret = pselect(..., sigmask(SIG_URG));
> > > > >
> > > > > if it returns success/timeout then the handler for SIG_URG should not be called?
> > > >
> > > > Ugg...
> > > > Posix probably allows the signal handler be called at the point the event
> > > > happens rather than being deferred until the system call completes.
> > > > Queueing up the signal handler to be run at a later time (syscall exit)
> > > > certainly makes sense.
> > > > Definitely safest to call the signal handler even if success/timeout
> > > > is returned.
> > >
> > > Why?
> > >
> > > > pselect() exists to stop the entry race, not the exit one.
> > >
> > > pselect() has to block SIG_URG again before it returns to user-mode, right?
> >
> > Yep.
> > So the signal handler can't be called for a signal that happens after
> > pselect() returns.
> 
> Yes. And "after pselect() returns" actually means "after pselect() restores
> the old sigmask while it returns to user mode".
> 
> > > Suppose pselect() finds a ready fd, and this races with SIG_URG.
> >
> > You mean if SIG_URG is raised after a ready fd is found (or even timeout)?
> > So the return value isn't EINTR.
> 
> Yes.
> 
> > (If an fd is readable on entry, the SIG_URG could have happened much earlier.)
> 
> Why not? See the pseudo code above. It was blocked before pselect() was called.
> So SIG_URG can be already pending when pselect() is called but since an fd is
> already ready on entry pselect() restores the old sigmask (and thus blocks SIG_URG
> again) and returns success. The handler is not called.
> 
> However, if there is no a ready fd, pselect won't block. It will notice SIG_URG,
> deliver this signal, and return -EINTR.

To my mind changing the signal mask should be enough to get a masked
signal handler called - even if the mask is reset before the syscall exits.
There shouldn't be any need for an interruptible wait to be interrupted.

I suspect that if you send a signal to a process that is looping
in userspace (on a different) the signal handler is called on the next
exit to userspace regardless as to whether the kernel blocks.

epoll and pselect shouldn't be any different.
Having the signal unmasked at any time should be enough to get it called.

...
> > > What if SIG_URG comes right after pselect() blocks SIG_URG again? I mean,
> > > how this differs the case when it comes before, but a ready fd was already
> > > found?
> >
> > I suspect you need to defer the re-instatement of the original mask
> > to the code that calls the signal handlers (which probably should
> > be called with the programs signal mask).
> 
> This is what the kernel does when the signal is delivered, the original mask
> is restored after the signal handler runs.

I'd have thought that the original signal mask (all blocked in the examples)
should be restored before the signal handler is called.
After all the signal handler is allowed to modify the processes signal mask.
I've had horrid thoughts about SIG_SUSPEND :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight May 24, 2019, 3:09 p.m. UTC | #23
From: David Laight
> Sent: 24 May 2019 16:00
...
> I've had horrid thoughts about SIG_SUSPEND :-)

Not to mention exiting signal handlers with longjmp().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Deepa Dinamani May 24, 2019, 3:16 p.m. UTC | #24
On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/23, Deepa Dinamani wrote:
> >
> > Ok, since there has been quite a bit of argument here, I will
> > backtrack a little bit and maybe it will help us understand what's
> > happening here.
> > There are many scenarios being discussed on this thread:
> > a. State of code before 854a6ed56839a
>
> I think everything was correct,

There were 2 things that were wrong:

1. If an unblocked signal was received, after the ep_poll(), then the
return status did not indicate that. This is expected behavior
according to man page. If this is indeed what is expected then the man
page should note that signal will be delivered in this case and return
code will still be 0.

"EINTR
The call was interrupted by a signal handler before either any of the
requested events occurred or the timeout expired; see signal(7)."

2. The restoring of the sigmask is done right in the syscall part and
not while exiting the syscall and if you get a blocked signal here,
you will deliver this to userspace.

> > b. State after 854a6ed56839a
>
> obviously buggy,

Ok, then can you point out what specifically was wrong with
854a6ed56839a? And, not how it could be more simple?

> > c. Proposed fix as per the patchset in question.
>
> > As per [a] and let's consider the case of epoll_pwait only first for simplicity.
> >
> > As I said before, ep_poll() is what checks for signal_pending() and is
> > responsible for setting errno to -EINTR when there is a signal.
>
> To clarify, if do_epoll_wait() return -EINTR then signal_pending() is true,
> right?

Yes, the case I'm talking about is when do_epoll_wait() returns 0 and
then you get a signal.

> > So if a signal is received after ep_poll() and ep_poll() returns
> > success, it is never noticed by the syscall during execution.
>
> What you are saying looks very confusing to me, I will assume that you
> meant something like
>
>         - a signal SIG_XXX was blocked before sys_epoll_pwait() was called
>
>         - sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask
>
>         - sys_epoll_pwait() calls do_epoll_wait() which returns success
>
>         - SIG_XXX comes after that and it is "never noticed"
>
> Yes. Everything is correct. And see my reply to David, SIG_XXX can even
> come _before_ sys_epoll_pwait() was called.

No, I'm talking about a signal that was not blocked.

> > So the question is does the userspace have to know about this signal
> > or not.
>
> If userspace needs to know about SIG_XXX it should not block it, that is all.

What should be the return value if a signal is detected after a fd completed?

> > What [b] does is to move the signal check closer to the restoration of
> > the signal.
>
> FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong
> return value you are trying to fix).

As I already pointed out, the restoring of the sigmask is done during
the syscall and not while exiting the syscall and if you get a blocked
signal here, you will deliver this to userspace.

> And even if there were ANY reason to do this, note that (with or without this
> fix) the signal_pending() check inside restore_user_sigmask() can NOT help,
> simply because SIG_XXX can come right after this check.

This I pointed out already that we should probably make this sequence atomic.


-Deepa
Oleg Nesterov May 24, 2019, 3:44 p.m. UTC | #25
On 05/24, David Laight wrote:
>
> From: Oleg Nesterov
> > Sent: 24 May 2019 14:29
> > It seems that we all are just trying to confuse each other. I got lost.
>
> I'm always lost :-)

same here ;)

> To my mind changing the signal mask should be enough to get a masked
> signal handler called - even if the mask is reset before the syscall exits.

well, the kernel doesn't do this, and on purpose.

> There shouldn't be any need for an interruptible wait to be interrupted.

can't parse ;)

> I suspect that if you send a signal to a process that is looping
> in userspace (on a different) the signal handler is called on the next
> exit to userspace regardless as to whether the kernel blocks.
>
> epoll and pselect shouldn't be any different.

They differ exactly because they manipulate the blocked mask,

> Having the signal unmasked at any time should be enough to get it called.

No. The sigmask passed to pselect() tells the kernel which signals should
interrupt the syscall if it blocks. The fact that pselect() actually unblocks
a signal is just the internal implementation detail.

> > > I suspect you need to defer the re-instatement of the original mask
> > > to the code that calls the signal handlers (which probably should
> > > be called with the programs signal mask).
> >
> > This is what the kernel does when the signal is delivered, the original mask
> > is restored after the signal handler runs.
>
> I'd have thought that the original signal mask (all blocked in the examples)
> should be restored before the signal handler is called.

No. And this means that if you have 2 pending signals, they both will be delivered.
Unless of course sigaction->sa_mask includes the 2nd one.

> After all the signal handler is allowed to modify the processes signal mask.

only untill the handler returns.

> I've had horrid thoughts about SIG_SUSPEND :-)

google knows nothing about SIG_SUSPEND, neither me ;)

Oleg.
Oleg Nesterov May 24, 2019, 3:46 p.m. UTC | #26
On 05/24, David Laight wrote:
>
> From: David Laight
> > Sent: 24 May 2019 16:00
> ...
> > I've had horrid thoughts about SIG_SUSPEND :-)
>
> Not to mention exiting signal handlers with longjmp().

that is why we have siglongjmp().

Oleg.
Oleg Nesterov May 24, 2019, 4:33 p.m. UTC | #27
On 05/24, Deepa Dinamani wrote:
>
> On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 05/23, Deepa Dinamani wrote:
> > >
> > > Ok, since there has been quite a bit of argument here, I will
> > > backtrack a little bit and maybe it will help us understand what's
> > > happening here.
> > > There are many scenarios being discussed on this thread:
> > > a. State of code before 854a6ed56839a
> >
> > I think everything was correct,
>
> There were 2 things that were wrong:
>
> 1. If an unblocked signal was received, after the ep_poll(), then the
> return status did not indicate that.

Yes,

> This is expected behavior
> according to man page. If this is indeed what is expected then the man
> page should note that signal will be delivered in this case and return
> code will still be 0.
>
> "EINTR
> The call was interrupted by a signal handler before either any of the
> requested events occurred or the timeout expired; see signal(7)."

and what do you think the man page could say?

This is obviously possible for any syscall, and we can't avoid this. A signal
can come right after syscall insn completes. The signal handler will be called
but this won't change $rax, user-space can see return code == 0 or anything else.

And this doesn't differ from the case when the signal comes before syscall returns.

> 2. The restoring of the sigmask is done right in the syscall part and
> not while exiting the syscall and if you get a blocked signal here,
> you will deliver this to userspace.

So I assume that this time you are talking about epoll_pwait() and not epoll_wait()...

And I simply can't understand you. But yes, if the original mask doesn't include
the pending signal it will be delivered while the syscall can return success/timout
or -EFAULT or anything.

This is correct, see above.

> > > b. State after 854a6ed56839a
> >
> > obviously buggy,
>
> Ok, then can you point out what specifically was wrong with
> 854a6ed56839a?

Cough. If nothing else the lost -EINTR?

> And, not how it could be more simple?

Well, I already sent the patch and after that I even showed you the code with the
patch applied. See https://lore.kernel.org/lkml/20190523143340.GA23070@redhat.com/

> > What you are saying looks very confusing to me, I will assume that you
> > meant something like
> >
> >         - a signal SIG_XXX was blocked before sys_epoll_pwait() was called
> >
> >         - sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask
> >
> >         - sys_epoll_pwait() calls do_epoll_wait() which returns success
> >
> >         - SIG_XXX comes after that and it is "never noticed"
> >
> > Yes. Everything is correct. And see my reply to David, SIG_XXX can even
> > come _before_ sys_epoll_pwait() was called.
>
> No, I'm talking about a signal that was not blocked.

OK, see above.

> > > So the question is does the userspace have to know about this signal
> > > or not.
> >
> > If userspace needs to know about SIG_XXX it should not block it, that is all.
>
> What should be the return value if a signal is detected after a fd completed?

Did you mean "if a signal is detected after a ready fd was already found" ?

In this case the return value should report success. But I have already lost,
this all looks irrelevant wrt to fix we need.

> > > What [b] does is to move the signal check closer to the restoration of
> > > the signal.
> >
> > FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong
> > return value you are trying to fix).
>
> As I already pointed out, the restoring of the sigmask is done during
> the syscall and not while exiting the syscall and if you get a blocked
> signal here, you will deliver this to userspace.
>
> > And even if there were ANY reason to do this, note that (with or without this
> > fix) the signal_pending() check inside restore_user_sigmask() can NOT help,
> > simply because SIG_XXX can come right after this check.
>
> This I pointed out already that we should probably make this sequence atomic.

See above.

Oleg.
David Laight May 24, 2019, 4:40 p.m. UTC | #28
From: Oleg Nesterov
> Sent: 24 May 2019 16:44
> > To my mind changing the signal mask should be enough to get a masked
> > signal handler called - even if the mask is reset before the syscall exits.
> 
> well, the kernel doesn't do this, and on purpose.
> 
> > There shouldn't be any need for an interruptible wait to be interrupted.
> 
> can't parse ;)
> 
> > I suspect that if you send a signal to a process that is looping
> > in userspace (on a different) the signal handler is called on the next
> > exit to userspace regardless as to whether the kernel blocks.
> >
> > epoll and pselect shouldn't be any different.
> 
> They differ exactly because they manipulate the blocked mask,
> 
> > Having the signal unmasked at any time should be enough to get it called.
> 
> No. The sigmask passed to pselect() tells the kernel which signals should
> interrupt the syscall if it blocks. The fact that pselect() actually unblocks
> a signal is just the internal implementation detail.

If you take that line of reasoning the signal handler shouldn't be called
at all.

For pselect() (which ought to work the same way as epoll_pwait()) the
man page states that the current signal mask is replaced by the specified
one for the duration of the call - so you'd expect signal handlers to run
even if pselect() returns >= 0.

Consider a program that disables all signals at the top of main()
then has a processing loop with epoll_pwait() (or pselect()) at the
top) that enables a variety of signals.

It would be reasonable to expect that a signal handler would run
even if one of the fds was always 'ready'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Deepa Dinamani May 24, 2019, 5:01 p.m. UTC | #29
On Fri, May 24, 2019 at 9:33 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/24, Deepa Dinamani wrote:
> >
> > On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > On 05/23, Deepa Dinamani wrote:
> > > >
> > > > Ok, since there has been quite a bit of argument here, I will
> > > > backtrack a little bit and maybe it will help us understand what's
> > > > happening here.
> > > > There are many scenarios being discussed on this thread:
> > > > a. State of code before 854a6ed56839a
> > >
> > > I think everything was correct,
> >
> > There were 2 things that were wrong:
> >
> > 1. If an unblocked signal was received, after the ep_poll(), then the
> > return status did not indicate that.
>
> Yes,
>
> > This is expected behavior
> > according to man page. If this is indeed what is expected then the man
> > page should note that signal will be delivered in this case and return
> > code will still be 0.
> >
> > "EINTR
> > The call was interrupted by a signal handler before either any of the
> > requested events occurred or the timeout expired; see signal(7)."
>
> and what do you think the man page could say?

Maybe clarify that a signal handler can be invoked even if the syscall
return indicates a success.

Maybe a crude userspace application could do something like this:

sig_handler()
{
  set global abort = 1
}

poll_the_fds()
{
           ret = epoll_pwait()
           if (ret)
              return ret
          if (abort)              # but this abort should be ignored
if ret was 0.
            return try_again

}

> This is obviously possible for any syscall, and we can't avoid this. A signal
> can come right after syscall insn completes. The signal handler will be called
> but this won't change $rax, user-space can see return code == 0 or anything else.
>
> And this doesn't differ from the case when the signal comes before syscall returns.

But, these syscalls are depending on there signals. I would assume for
the purpose of these syscalls that the execution is done when we
updated the saved_sigmask. We can pick a different point per syscall
like ep_poll() also, but then we need to probably make it clear for
each such syscall.

> > 2. The restoring of the sigmask is done right in the syscall part and
> > not while exiting the syscall and if you get a blocked signal here,
> > you will deliver this to userspace.
>
> So I assume that this time you are talking about epoll_pwait() and not epoll_wait()...

Yes.

> And I simply can't understand you. But yes, if the original mask doesn't include
> the pending signal it will be delivered while the syscall can return success/timout
> or -EFAULT or anything.
>
> This is correct, see above.

Look at the code before 854a6ed56839a:

  /*
        * If we changed the signal mask, we need to restore the original one.
        * In case we've got a signal while waiting, we do not restore the
        * signal mask yet, and we allow do_signal() to deliver the signal on
        * the way back to userspace, before the signal mask is restored.
        */
       if (sigmask) {
              ####### This err has not been changed since ep_poll()
              ####### So if there is a signal before this point, but
err = 0, then we goto else.
               if (err == -EINTR) {
                       memcpy(&current->saved_sigmask, &sigsaved,
                              sizeof(sigsaved));
                       set_restore_sigmask();
               } else
                     ############ This is a problem if there is signal
pending that is sigmask should block.
                     ########### This is the whole reason we have
current->saved_sigmask?
                       set_current_blocked(&sigsaved);
       }

> > > > b. State after 854a6ed56839a
> > >
> > > obviously buggy,
> >
> > Ok, then can you point out what specifically was wrong with
> > 854a6ed56839a?
>
> Cough. If nothing else the lost -EINTR?

This was my theory. My basis behind the theory was [1](the issue with
return value not being updated) above. And, you are saying this is ok.

854a6ed56839a also has timing differences compared to the original
code. So unless we are sure what was uncovered because of
854a6ed56839a, we might just be masking a pre-existing problem by
reverting it. So I think we should code review 854a6ed56839a and
figure out what is wrong programatically before just reverting it.

> > And, not how it could be more simple?

Oh, I was not asking here. I was saying let's please discuss what's
wrong before simplifying the code.

-Deepa
Oleg Nesterov May 27, 2019, 3:04 p.m. UTC | #30
Deepa,

it seems that we both are saying the same things again and again, and we
simply can't understand each other.

I'll try to write another email to restart this discussion. Tomorrow, somehow
I can't wake up today.

And let me repeat, of course I can be wrong. IOW, it is not that I am trying
to blame you for all this confusion.

On 05/24, Deepa Dinamani wrote:
>
> > > Ok, then can you point out what specifically was wrong with
> > > 854a6ed56839a?
> >
> > Cough. If nothing else the lost -EINTR?
>
> This was my theory. My basis behind the theory was [1](the issue with
> return value not being updated) above. And, you are saying this is ok.

I agree that "the lost -EINTR" above was not clear. I'll try to clarify
what I think is not OK.

Oleg.
David Laight May 28, 2019, 9:02 a.m. UTC | #31
From: Deepa Dinamani
> Sent: 24 May 2019 18:02
...
> Maybe a crude userspace application could do something like this:
> 
> sig_handler()
> {
>   set global abort = 1
> }
> 
> poll_the_fds()
> {
>         ret = epoll_pwait()
>         if (ret)
>                 return ret
>         if (abort)
>                 // but this abort should be ignored if ret was 0.
>                 return try_again
> 
> }

As an application writer I'd want to see 'abort == 1' even
if epoll_pwait() returned that an fd was 'ready'.

So the code above should probably be:
    wait_again:
        ret = epoll_pwait();
        if (abort)
            process_abort();
        if (ret <= 0) {
            if (ret == 0)
                process_timeout();
            if (ret == 0 || errno == EINTR)
                goto wait_again;
            // Something went horribly wrong in epoll_pwait()
            return ret;
        }
        // process the 'ready' fds

It would be non-unreasonable for the application to have
all signals blocked except during the epoll_pwait().
So the application needs the signal handler for SIG_INT (etc)
to be called even if one of the fd is always 'ready'.

    David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight May 28, 2019, 9:12 a.m. UTC | #32
From: Deepa Dinamani
> Sent: 24 May 2019 18:02
...
> Look at the code before 854a6ed56839a:
> 
>   /*
>         * If we changed the signal mask, we need to restore the original one.
>         * In case we've got a signal while waiting, we do not restore the
>         * signal mask yet, and we allow do_signal() to deliver the signal on
>         * the way back to userspace, before the signal mask is restored.
>         */
>        if (sigmask) {
>               ####### This err has not been changed since ep_poll()
>               ####### So if there is a signal before this point, but
> err = 0, then we goto else.
>                if (err == -EINTR) {
>                        memcpy(&current->saved_sigmask, &sigsaved,
>                               sizeof(sigsaved));
>                        set_restore_sigmask();
>                } else
>                      ############ This is a problem if there is signal
> pending that is sigmask should block.
>                      ########### This is the whole reason we have
> current->saved_sigmask?
>                        set_current_blocked(&sigsaved);
>        }

What happens if all that crap is just deleted (I presume from the
bottom of ep_wait()) ?

I'm guessing that on the way back to userspace signal handlers for
signals enabled in the process's current mask (the one specified
to epoll_pwait) get called.
Then the signal mask is loaded from current->saved_sigmask and
and enabled signal handlers are called again.
No special code there that depends on the syscall result, errno
of the syscall number.

That seems exactly correct!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Deepa Dinamani May 28, 2019, 11:37 a.m. UTC | #33
> On May 28, 2019, at 2:12 AM, David Laight <David.Laight@aculab.com> wrote:
> 
> From: Deepa Dinamani
>> Sent: 24 May 2019 18:02
> ...
>> Look at the code before 854a6ed56839a:
>> 
>> /*
>>       * If we changed the signal mask, we need to restore the original one.
>>       * In case we've got a signal while waiting, we do not restore the
>>       * signal mask yet, and we allow do_signal() to deliver the signal on
>>       * the way back to userspace, before the signal mask is restored.
>>       */
>>      if (sigmask) {
>>             ####### This err has not been changed since ep_poll()
>>             ####### So if there is a signal before this point, but
>> err = 0, then we goto else.
>>              if (err == -EINTR) {
>>                      memcpy(&current->saved_sigmask, &sigsaved,
>>                             sizeof(sigsaved));
>>                      set_restore_sigmask();
>>              } else
>>                    ############ This is a problem if there is signal
>> pending that is sigmask should block.
>>                    ########### This is the whole reason we have
>> current->saved_sigmask?
>>                      set_current_blocked(&sigsaved);
>>      }
> 
> What happens if all that crap is just deleted (I presume from the
> bottom of ep_wait()) ?

Hmm, you have to update the saved_sigmask or the sigmask.

> I'm guessing that on the way back to userspace signal handlers for
> signals enabled in the process's current mask (the one specified
> to epoll_pwait) get called.
> Then the signal mask is loaded from current->saved_sigmask and
> and enabled signal handlers are called again.

Who is saving this saved_sigmask that is being restored on the way back?

> No special code there that depends on the syscall result, errno
> of the syscall number.

I didn’t say this has anything to do with errno.

-Deepa
David Laight May 28, 2019, 12:04 p.m. UTC | #34
From: Deepa Dinamani
> Sent: 28 May 2019 12:38
> > On May 28, 2019, at 2:12 AM, David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Deepa Dinamani
> >> Sent: 24 May 2019 18:02
> > ...
> >> Look at the code before 854a6ed56839a:
> >>
> >> /*
> >>       * If we changed the signal mask, we need to restore the original one.
> >>       * In case we've got a signal while waiting, we do not restore the
> >>       * signal mask yet, and we allow do_signal() to deliver the signal on
> >>       * the way back to userspace, before the signal mask is restored.
> >>       */
> >>      if (sigmask) {
> >>             ####### This err has not been changed since ep_poll()
> >>             ####### So if there is a signal before this point, but
> >> err = 0, then we goto else.
> >>              if (err == -EINTR) {
> >>                      memcpy(&current->saved_sigmask, &sigsaved,
> >>                             sizeof(sigsaved));
> >>                      set_restore_sigmask();
> >>              } else
> >>                    ############ This is a problem if there is signal
> >> pending that is sigmask should block.
> >>                    ########### This is the whole reason we have
> >> current->saved_sigmask?
> >>                      set_current_blocked(&sigsaved);
> >>      }
> >
> > What happens if all that crap is just deleted (I presume from the
> > bottom of ep_wait()) ?
> 
> Hmm, you have to update the saved_sigmask or the sigmask.

Doesn't the top of ep_poll() do all that?
Something like copying the current sigmask to the saved_sigmask
and the user-supplied sigmask to the current sigmask.
Otherwise the sleeps won't be interruptible.

It is worth noting that both pselect() and epoll_pwait() differ
from sigwait() (and friends) which were (IIRC) the first system
calls to try to remove the timing windows associated with waiting
for signals.
sigwait() returns the signal number and removes it from the pending
mask - so the signal handler won't be called for the returned signal.
(Unless it wasn't actually masked!)
So the sigwait() code does have to restore the signal mask itself.

> > I'm guessing that on the way back to userspace signal handlers for
> > signals enabled in the process's current mask (the one specified
> > to epoll_pwait) get called.
> > Then the signal mask is loaded from current->saved_sigmask and
> > and enabled signal handlers are called again.
> 
> Who is saving this saved_sigmask that is being restored on the way back?

It has to be saved when the user-supplied one in installed.

> > No special code there that depends on the syscall result, errno
> > of the syscall number.
> 
> I didn’t say this has anything to do with errno.

The code snippet above checks it ...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Deepa Dinamani May 28, 2019, 8:47 p.m. UTC | #35
On Mon, May 27, 2019 at 8:04 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Deepa,
>
> it seems that we both are saying the same things again and again, and we
> simply can't understand each other.

Oleg, I'm sorry for the confusion.  Maybe I should point out what I
agree with also.

I agree that signal handller being called and return value not being
altered is an issue with other syscalls also. I was just wondering if
some userspace code assumption would be assuming this. This is not a
kernel bug.

But, I do not think we have an understanding of what was wrong in
854a6ed56839a anymore since you pointed out that my assumption was not
correct that the signal handler being called without errno being set
is wrong.

One open question: this part of epoll_pwait was already broken before
854a6ed56839a. Do you agree?

if (err == -EINTR) {
                   memcpy(&current->saved_sigmask, &sigsaved,
                          sizeof(sigsaved));
                    set_restore_sigmask();
  } else
                   set_current_blocked(&sigsaved);

What to do next?
We could just see if your optimization patch resolves Eric's issue.
Or, I could revert the signal_pending() check and provide a fix
something like below(not a complete patch) since mainline has this
regression. Eric had tested something like this works also. And, I can
continue to look at what was wrong with 854a6ed56839a in the first
place. Let me know what you prefer:

-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t
*sigsaved, int sig_pending)
 {

        if (!usigmask)
               return;

        /*
         * When signals are pending, do not restore them here.
         * Restoring sigmask here can lead to delivering signals that the above
         * syscalls are intended to block because of the sigmask passed in.
         */
+       if (sig_pending) {
                current->saved_sigmask = *sigsaved;
                set_restore_sigmask();
               return;
           }

@@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,

        error = do_epoll_wait(epfd, events, maxevents, timeout);

-       restore_user_sigmask(sigmask, &sigsaved);
+       signal_detected = restore_user_sigmask(sigmask, &sigsaved,
error == -EINTR);

-Deepa
Oleg Nesterov May 29, 2019, 4:57 p.m. UTC | #36
On 05/28, Deepa Dinamani wrote:
>
> I agree that signal handller being called and return value not being
> altered is an issue with other syscalls also. I was just wondering if
> some userspace code assumption would be assuming this. This is not a
> kernel bug.
>
> But, I do not think we have an understanding of what was wrong in
> 854a6ed56839a anymore since you pointed out that my assumption was not
> correct that the signal handler being called without errno being set
> is wrong.

Deepa, sorry, I simply can't parse the above... most probably because of
my bad English.

> One open question: this part of epoll_pwait was already broken before
> 854a6ed56839a. Do you agree?
>
> if (err == -EINTR) {
>                    memcpy(&current->saved_sigmask, &sigsaved,
>                           sizeof(sigsaved));
>                     set_restore_sigmask();
>   } else
>                    set_current_blocked(&sigsaved);

I do not understand why do you think this part was broken :/

> Or, I could revert the signal_pending() check and provide a fix
> something like below(not a complete patch)

...

> -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> +int restore_user_sigmask(const void __user *usigmask, sigset_t
> *sigsaved, int sig_pending)
>  {
> 
>         if (!usigmask)
>                return;
> 
>         /*
>          * When signals are pending, do not restore them here.
>          * Restoring sigmask here can lead to delivering signals that the above
>          * syscalls are intended to block because of the sigmask passed in.
>          */
> +       if (sig_pending) {
>                 current->saved_sigmask = *sigsaved;
>                 set_restore_sigmask();
>                return;
>            }
> 
> @@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> epoll_event __user *, events,
> 
>         error = do_epoll_wait(epfd, events, maxevents, timeout);
> 
> -       restore_user_sigmask(sigmask, &sigsaved);
> +       signal_detected = restore_user_sigmask(sigmask, &sigsaved,
> error == -EINTR);

I fail to understand this pseudo-code, sorry. In particular, do not understand
why restore_user_sigmask() needs to return a boolean.

The only thing I _seem to_ understand is the "sig_pending" flag passed by the
caller which replaces the signal_pending() check. Yes, this is what I think we
should do, and this is what I tried to propose from the very beginning in my
1st email in this thread.

Oleg.
Deepa Dinamani May 29, 2019, 6:42 p.m. UTC | #37
On Wed, May 29, 2019 at 9:57 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/28, Deepa Dinamani wrote:
> >
> > I agree that signal handller being called and return value not being
> > altered is an issue with other syscalls also. I was just wondering if
> > some userspace code assumption would be assuming this. This is not a
> > kernel bug.
> >
> > But, I do not think we have an understanding of what was wrong in
> > 854a6ed56839a anymore since you pointed out that my assumption was not
> > correct that the signal handler being called without errno being set
> > is wrong.
>
> Deepa, sorry, I simply can't parse the above... most probably because of
> my bad English.

Ok, All I meant was that I had thought a signal handler being invoked
without the error value reflecting it was wrong. That is what I had
thought was wrong with 854a6ed56839a. Now, that we agree that signal
handler can be invoked without the errno returning success, I thought
I did not know what is wrong with 854a6ed56839a anymore.

But, you now pointed out that the signals we care about should not be
delivered after an event has been ready. This points out to what was
wrong with 854a6ed56839a. Thanks.

> > One open question: this part of epoll_pwait was already broken before
> > 854a6ed56839a. Do you agree?
> >
> > if (err == -EINTR) {
> >                    memcpy(&current->saved_sigmask, &sigsaved,
> >                           sizeof(sigsaved));
> >                     set_restore_sigmask();
> >   } else
> >                    set_current_blocked(&sigsaved);
>
> I do not understand why do you think this part was broken :/

Ok, because of your other statement that the signals the application
cares about do not want to know about signals they care about after an
event is ready this is also not a problem.

> > Or, I could revert the signal_pending() check and provide a fix
> > something like below(not a complete patch)
>
> ...
>
> > -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> > +int restore_user_sigmask(const void __user *usigmask, sigset_t
> > *sigsaved, int sig_pending)
> >  {
> >
> >         if (!usigmask)
> >                return;
> >
> >         /*
> >          * When signals are pending, do not restore them here.
> >          * Restoring sigmask here can lead to delivering signals that the above
> >          * syscalls are intended to block because of the sigmask passed in.
> >          */
> > +       if (sig_pending) {
> >                 current->saved_sigmask = *sigsaved;
> >                 set_restore_sigmask();
> >                return;
> >            }
> >
> > @@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> >
> >         error = do_epoll_wait(epfd, events, maxevents, timeout);
> >
> > -       restore_user_sigmask(sigmask, &sigsaved);
> > +       signal_detected = restore_user_sigmask(sigmask, &sigsaved,
> > error == -EINTR);
>
> I fail to understand this pseudo-code, sorry. In particular, do not understand
> why restore_user_sigmask() needs to return a boolean.

That was a remnant from the other patch. Return type needs to be void.

> The only thing I _seem to_ understand is the "sig_pending" flag passed by the
> caller which replaces the signal_pending() check.

Correct. This is what is the main change I was proposing.

> Yes, this is what I think we
> should do, and this is what I tried to propose from the very beginning in my
> 1st email in this thread.

This was not clear to me in your first response that you did not want
the signal_pending() check in restore_user_sigmask(). :
https://lore.kernel.org/lkml/20190522150505.GA4915@redhat.com/

"Ugh. I need to re-check, but at first glance I really dislike this change.

I think we can fix the problem _and_ simplify the code. Something like below.
The patch is obviously incomplete, it changes only only one caller of
set_user_sigmask(), epoll_pwait() to explain what I mean.

restore_user_sigmask() should simply die. Although perhaps another helper
makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending)."

-Deepa

Patch
diff mbox series

diff --git a/fs/aio.c b/fs/aio.c
index 3490d1fa0e16..ebd2b1980161 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2095,7 +2095,7 @@  SYSCALL_DEFINE6(io_pgetevents,
 	struct __aio_sigset	ksig = { NULL, };
 	sigset_t		ksigmask, sigsaved;
 	struct timespec64	ts;
-	int ret;
+	int ret, signal_detected;
 
 	if (timeout && unlikely(get_timespec64(&ts, timeout)))
 		return -EFAULT;
@@ -2108,8 +2108,8 @@  SYSCALL_DEFINE6(io_pgetevents,
 		return ret;
 
 	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
-	restore_user_sigmask(ksig.sigmask, &sigsaved);
-	if (signal_pending(current) && !ret)
+	signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+	if (signal_detected && !ret)
 		ret = -ERESTARTNOHAND;
 
 	return ret;
@@ -2128,7 +2128,7 @@  SYSCALL_DEFINE6(io_pgetevents_time32,
 	struct __aio_sigset	ksig = { NULL, };
 	sigset_t		ksigmask, sigsaved;
 	struct timespec64	ts;
-	int ret;
+	int ret, signal_detected;
 
 	if (timeout && unlikely(get_old_timespec32(&ts, timeout)))
 		return -EFAULT;
@@ -2142,8 +2142,8 @@  SYSCALL_DEFINE6(io_pgetevents_time32,
 		return ret;
 
 	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
-	restore_user_sigmask(ksig.sigmask, &sigsaved);
-	if (signal_pending(current) && !ret)
+	signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+	if (signal_detected && !ret)
 		ret = -ERESTARTNOHAND;
 
 	return ret;
@@ -2193,7 +2193,7 @@  COMPAT_SYSCALL_DEFINE6(io_pgetevents,
 	struct __compat_aio_sigset ksig = { NULL, };
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 t;
-	int ret;
+	int ret, signal_detected;
 
 	if (timeout && get_old_timespec32(&t, timeout))
 		return -EFAULT;
@@ -2206,8 +2206,8 @@  COMPAT_SYSCALL_DEFINE6(io_pgetevents,
 		return ret;
 
 	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
-	restore_user_sigmask(ksig.sigmask, &sigsaved);
-	if (signal_pending(current) && !ret)
+	signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+	if (signal_detected && !ret)
 		ret = -ERESTARTNOHAND;
 
 	return ret;
@@ -2226,7 +2226,7 @@  COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
 	struct __compat_aio_sigset ksig = { NULL, };
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 t;
-	int ret;
+	int ret, signal_detected;
 
 	if (timeout && get_timespec64(&t, timeout))
 		return -EFAULT;
@@ -2239,8 +2239,8 @@  COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
 		return ret;
 
 	ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
-	restore_user_sigmask(ksig.sigmask, &sigsaved);
-	if (signal_pending(current) && !ret)
+	signal_detected = restore_user_sigmask(ksig.sigmask, &sigsaved);
+	if (signal_detected && !ret)
 		ret = -ERESTARTNOHAND;
 
 	return ret;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..fe5a0724b417 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@  SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
 		size_t, sigsetsize)
 {
-	int error;
+	int error, signal_detected;
 	sigset_t ksigmask, sigsaved;
 
 	/*
@@ -2330,7 +2330,10 @@  SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 
 	error = do_epoll_wait(epfd, events, maxevents, timeout);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+	if (signal_detected && !error)
+		error = -EINTR;
 
 	return error;
 }
@@ -2342,7 +2345,7 @@  COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 			const compat_sigset_t __user *, sigmask,
 			compat_size_t, sigsetsize)
 {
-	long err;
+	long err, signal_detected;
 	sigset_t ksigmask, sigsaved;
 
 	/*
@@ -2355,7 +2358,10 @@  COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 
 	err = do_epoll_wait(epfd, events, maxevents, timeout);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+	if (signal_detected && !err)
+		err = -EINTR;
 
 	return err;
 }
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e11d77181398..b785c8d7efc4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2205,8 +2205,11 @@  static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	if (ret == -ERESTARTSYS)
 		ret = -EINTR;
 
-	if (sig)
-		restore_user_sigmask(sig, &sigsaved);
+	if (sig) {
+		signal_detected = restore_user_sigmask(sig, &sigsaved);
+		if (signal_detected && !ret)
+			ret  = -EINTR;
+	}
 
 	return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0;
 }
diff --git a/fs/select.c b/fs/select.c
index 6cbc9ff56ba0..da9cfea35159 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -732,7 +732,7 @@  static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		switch (type) {
@@ -760,7 +760,9 @@  static long do_pselect(int n, fd_set __user *inp, fd_set __user *outp,
 	ret = core_sys_select(n, inp, outp, exp, to);
 	ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+	if (signal_detected && !ret)
+		ret = -EINTR;
 
 	return ret;
 }
@@ -1089,7 +1091,7 @@  SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		if (get_timespec64(&ts, tsp))
@@ -1106,10 +1108,10 @@  SYSCALL_DEFINE5(ppoll, struct pollfd __user *, ufds, unsigned int, nfds,
 
 	ret = do_sys_poll(ufds, nfds, to);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
 
 	/* We can restart this syscall, usually */
-	if (ret == -EINTR)
+	if (ret == -EINTR || (signal_detected && !ret))
 		ret = -ERESTARTNOHAND;
 
 	ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
@@ -1125,7 +1127,7 @@  SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		if (get_old_timespec32(&ts, tsp))
@@ -1142,10 +1144,10 @@  SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds, unsigned int, nfds,
 
 	ret = do_sys_poll(ufds, nfds, to);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
 
 	/* We can restart this syscall, usually */
-	if (ret == -EINTR)
+	if (ret == -EINTR || (signal_detected && !ret))
 		ret = -ERESTARTNOHAND;
 
 	ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1324,7 +1326,7 @@  static long do_compat_pselect(int n, compat_ulong_t __user *inp,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		switch (type) {
@@ -1352,7 +1354,10 @@  static long do_compat_pselect(int n, compat_ulong_t __user *inp,
 	ret = compat_core_sys_select(n, inp, outp, exp, to);
 	ret = poll_select_copy_remaining(&end_time, tsp, type, ret);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
+
+	if (signal_detected && !ret)
+		ret = -EINTR;
 
 	return ret;
 }
@@ -1408,7 +1413,7 @@  COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		if (get_old_timespec32(&ts, tsp))
@@ -1425,10 +1430,10 @@  COMPAT_SYSCALL_DEFINE5(ppoll_time32, struct pollfd __user *, ufds,
 
 	ret = do_sys_poll(ufds, nfds, to);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
 
 	/* We can restart this syscall, usually */
-	if (ret == -EINTR)
+	if (ret == -EINTR || (signal_detected && !ret))
 		ret = -ERESTARTNOHAND;
 
 	ret = poll_select_copy_remaining(&end_time, tsp, PT_OLD_TIMESPEC, ret);
@@ -1444,7 +1449,7 @@  COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
 {
 	sigset_t ksigmask, sigsaved;
 	struct timespec64 ts, end_time, *to = NULL;
-	int ret;
+	int ret, signal_detected;
 
 	if (tsp) {
 		if (get_timespec64(&ts, tsp))
@@ -1461,10 +1466,10 @@  COMPAT_SYSCALL_DEFINE5(ppoll_time64, struct pollfd __user *, ufds,
 
 	ret = do_sys_poll(ufds, nfds, to);
 
-	restore_user_sigmask(sigmask, &sigsaved);
+	signal_detected = restore_user_sigmask(sigmask, &sigsaved);
 
 	/* We can restart this syscall, usually */
-	if (ret == -EINTR)
+	if (ret == -EINTR || (signal_detected && !ret))
 		ret = -ERESTARTNOHAND;
 
 	ret = poll_select_copy_remaining(&end_time, tsp, PT_TIMESPEC, ret);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..1d36e8629edf 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -275,7 +275,7 @@  extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struc
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
 	sigset_t *oldset, size_t sigsetsize);
-extern void restore_user_sigmask(const void __user *usigmask,
+extern int restore_user_sigmask(const void __user *usigmask,
 				 sigset_t *sigsaved);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 1c86b78a7597..7cc33d23ee4b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2916,15 +2916,21 @@  EXPORT_SYMBOL(set_compat_user_sigmask);
  * usigmask: sigmask passed in from userland.
  * sigsaved: saved sigmask when the syscall started and changed the sigmask to
  *           usigmask.
+ * returns 1 in case a pending signal is detected.
+ *
+ * Users of the api need to adjust their return values based on whether the
+ * signal was detected here. If a signal is detected, it is delivered to the
+ * userspace. So without an error like -ETINR, userspace might fail to
+ * adjust the flow of execution.
  *
  * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
  * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
  */
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
 {
 
 	if (!usigmask)
-		return;
+		return 0;
 	/*
 	 * When signals are pending, do not restore them here.
 	 * Restoring sigmask here can lead to delivering signals that the above
@@ -2933,7 +2939,7 @@  void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
 	if (signal_pending(current)) {
 		current->saved_sigmask = *sigsaved;
 		set_restore_sigmask();
-		return;
+		return 1;
 	}
 
 	/*
@@ -2941,6 +2947,7 @@  void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
 	 * saved_sigmask when signals are not pending.
 	 */
 	set_current_blocked(sigsaved);
+	return 0;
 }
 EXPORT_SYMBOL(restore_user_sigmask);