diff mbox series

[RFC,net-next] vsock: Reschedule connect_work for O_NONBLOCK connect() requests

Message ID 20220804020925.32167-1-yepeilin.cs@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] vsock: Reschedule connect_work for O_NONBLOCK connect() requests | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/apply fail Patch does not apply to net-next

Commit Message

Peilin Ye Aug. 4, 2022, 2:09 a.m. UTC
From: Peilin Ye <peilin.ye@bytedance.com>

An O_NONBLOCK vsock_connect() request may try to reschedule
@connect_work.  Consider the following vsock_connect() requests:

  1. The 1st, non-blocking request schedules @connect_work, which will
     expire after, say, 200 jiffies.  Socket state is now SS_CONNECTING;

  2. Later, the 2nd, blocking request gets interrupted by a signal after
     5 jiffies while waiting for the connection to be established.
     Socket state is back to SS_UNCONNECTED, and @connect_work will
     expire after 100 jiffies;

  3. Now, the 3rd, non-blocking request tries to schedule @connect_work
     again, but @connect_work has already been scheduled, and will
     expire in, say, 50 jiffies.

In this scenario, currently this 3rd request simply decreases the sock
reference count and returns.  Instead, let it reschedules @connect_work
and resets the timeout back to @connect_timeout.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
Hi all,

This patch is RFC because it bases on Stefano's WIP fix [1] for a bug [2]
reported by syzbot, and it won't apply on current net-next.  I think it
solves a separate issue.

Please advise, thanks!
Peilin Ye

[1] https://gitlab.com/sgarzarella/linux/-/commit/2d0f0b9cbbb30d58fdcbca7c1a857fd8f3110d61
[2] https://syzkaller.appspot.com/bug?id=cd9103dc63346d26acbbdbf5c6ba9bd74e48c860

 net/vmw_vsock/af_vsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Stefano Garzarella Aug. 4, 2022, 6:59 a.m. UTC | #1
On Wed, Aug 03, 2022 at 07:09:25PM -0700, Peilin Ye wrote:
>From: Peilin Ye <peilin.ye@bytedance.com>
>
>An O_NONBLOCK vsock_connect() request may try to reschedule
>@connect_work.  Consider the following vsock_connect() requests:
>
>  1. The 1st, non-blocking request schedules @connect_work, which will
>     expire after, say, 200 jiffies.  Socket state is now SS_CONNECTING;
>
>  2. Later, the 2nd, blocking request gets interrupted by a signal after
>     5 jiffies while waiting for the connection to be established.
>     Socket state is back to SS_UNCONNECTED, and @connect_work will
>     expire after 100 jiffies;
>
>  3. Now, the 3rd, non-blocking request tries to schedule @connect_work
>     again, but @connect_work has already been scheduled, and will
>     expire in, say, 50 jiffies.
>
>In this scenario, currently this 3rd request simply decreases the sock
>reference count and returns.  Instead, let it reschedules @connect_work
>and resets the timeout back to @connect_timeout.
>
>Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
>---
>Hi all,
>
>This patch is RFC because it bases on Stefano's WIP fix [1] for a bug 
>[2]
>reported by syzbot, and it won't apply on current net-next.  I think it
>solves a separate issue.

Nice, this is better!

Feel free to include my patch in this (inclunding also the Fixes tag and 
maybe senidng to syzbot and including its tag as well).

The last thing I was trying to figure out before sending the patch was 
whether to set sock->state = SS_UNCONNECTED in vsock_connect_timeout(). 

I think we should do that, otherwise a subsequent to connect() with 
O_NONBLOCK set would keep returning -EALREADY, even though the timeout 
has expired.

What do you think?

I don't think it changes anything for the bug raised by sysbot, so it 
could be a separate patch.

Thanks,
Stefano

>
>Please advise, thanks!
>Peilin Ye
>
>[1] https://gitlab.com/sgarzarella/linux/-/commit/2d0f0b9cbbb30d58fdcbca7c1a857fd8f3110d61
>[2] https://syzkaller.appspot.com/bug?id=cd9103dc63346d26acbbdbf5c6ba9bd74e48c860
>
> net/vmw_vsock/af_vsock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 194d22291d8b..417e4ad17c03 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1395,7 +1395,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
> 			/* If the timeout function is already scheduled, ungrab
> 			 * the socket refcount to not leave it unbalanced.
> 			 */
>-			if (!schedule_delayed_work(&vsk->connect_work, timeout))
>+			if (mod_delayed_work(system_wq, &vsk->connect_work, timeout))
> 				sock_put(sk);
>
> 			/* Skip ahead to preserve error code set above. */
>-- 
>2.20.1
>
Peilin Ye Aug. 4, 2022, 11:44 p.m. UTC | #2
Hi Stefano,

On Thu, Aug 04, 2022 at 08:59:23AM +0200, Stefano Garzarella wrote:
> The last thing I was trying to figure out before sending the patch was
> whether to set sock->state = SS_UNCONNECTED in vsock_connect_timeout().
> 
> I think we should do that, otherwise a subsequent to connect() with
> O_NONBLOCK set would keep returning -EALREADY, even though the timeout has
> expired.
> 
> What do you think?

Thanks for bringing this up, after thinking about sock->state, I have 3
thoughts:

1. I think the root cause of this memleak is, we keep @connect_work
   pending, even after the 2nd, blocking request times out (or gets
   interrupted) and sets sock->state back to SS_UNCONNECTED.

   @connect_work is effectively no-op when sk->sk_state is
   TCP_CLOS{E,ING} anyway, so why not we just cancel @connect_work when
   blocking requests time out or get interrupted?  Something like:

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f04abf662ec6..62628af84164 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1402,6 +1402,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
                lock_sock(sk);

                if (signal_pending(current)) {
+                       if (cancel_delayed_work(&vsk->connect_work))
+                               sock_put(sk);
+
                        err = sock_intr_errno(timeout);
                        sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
                        sock->state = SS_UNCONNECTED;
@@ -1409,6 +1412,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
                        vsock_remove_connected(vsk);
                        goto out_wait;
                } else if (timeout == 0) {
+                       if (cancel_delayed_work(&vsk->connect_work))
+                               sock_put(sk);
+
                        err = -ETIMEDOUT;
                        sk->sk_state = TCP_CLOSE;
                        sock->state = SS_UNCONNECTED;

   Then no need to worry about rescheduling @connect_work, and the state
   machine becomes more accurate.  What do you think?  I will ask syzbot
   to test this.

2. About your suggestion of setting sock->state = SS_UNCONNECTED in
   vsock_connect_timeout(), I think it makes sense.  Are you going to
   send a net-next patch for this?

3. After a TCP_SYN_SENT sock receives VIRTIO_VSOCK_OP_RESPONSE in
   virtio_transport_recv_connecting(), why don't we cancel @connect_work?
   Am I missing something?

Thanks,
Peilin Ye
Stefano Garzarella Aug. 5, 2022, 12:42 p.m. UTC | #3
On Thu, Aug 04, 2022 at 04:44:47PM -0700, Peilin Ye wrote:
>Hi Stefano,
>
>On Thu, Aug 04, 2022 at 08:59:23AM +0200, Stefano Garzarella wrote:
>> The last thing I was trying to figure out before sending the patch was
>> whether to set sock->state = SS_UNCONNECTED in vsock_connect_timeout().
>>
>> I think we should do that, otherwise a subsequent to connect() with
>> O_NONBLOCK set would keep returning -EALREADY, even though the timeout has
>> expired.
>>
>> What do you think?
>
>Thanks for bringing this up, after thinking about sock->state, I have 3
>thoughts:
>
>1. I think the root cause of this memleak is, we keep @connect_work
>   pending, even after the 2nd, blocking request times out (or gets
>   interrupted) and sets sock->state back to SS_UNCONNECTED.
>
>   @connect_work is effectively no-op when sk->sk_state is
>   TCP_CLOS{E,ING} anyway, so why not we just cancel @connect_work when
>   blocking requests time out or get interrupted?  Something like:
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index f04abf662ec6..62628af84164 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1402,6 +1402,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
>                lock_sock(sk);
>
>                if (signal_pending(current)) {
>+                       if (cancel_delayed_work(&vsk->connect_work))
>+                               sock_put(sk);
>+
>                        err = sock_intr_errno(timeout);
>                        sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
>                        sock->state = SS_UNCONNECTED;
>@@ -1409,6 +1412,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
>                        vsock_remove_connected(vsk);
>                        goto out_wait;
>                } else if (timeout == 0) {
>+                       if (cancel_delayed_work(&vsk->connect_work))
>+                               sock_put(sk);
>+
>                        err = -ETIMEDOUT;
>                        sk->sk_state = TCP_CLOSE;
>                        sock->state = SS_UNCONNECTED;
>
>   Then no need to worry about rescheduling @connect_work, and the state
>   machine becomes more accurate.  What do you think?  I will ask syzbot
>   to test this.

It could work, but should we set `sk->sk_err` and call sk_error_report() 
to wake up thread waiting on poll()?

Maybe the previous version is simpler.

>
>2. About your suggestion of setting sock->state = SS_UNCONNECTED in
>   vsock_connect_timeout(), I think it makes sense.  Are you going to
>   send a net-next patch for this?

If you have time, feel free to send it.

Since it is a fix, I believe you can use the "net" tree. (Also for this 
patch).

Remember to put the "Fixes" tag that should be the same.

>
>3. After a TCP_SYN_SENT sock receives VIRTIO_VSOCK_OP_RESPONSE in
>   virtio_transport_recv_connecting(), why don't we cancel 
>   @connect_work?
>   Am I missing something?

Because when the timeout will fire, vsock_connect_timeout() will just 
call sock_put() since sk->sk_state is changed.

Of course, we can cancel it if we want, but I think it's not worth it.
In the end, this rescheduling patch should solve all the problems.

Thanks,
Stefano
Peilin Ye Aug. 5, 2022, 6:27 p.m. UTC | #4
On Fri, Aug 05, 2022 at 02:42:39PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 04, 2022 at 04:44:47PM -0700, Peilin Ye wrote:
> > 1. I think the root cause of this memleak is, we keep @connect_work
> >   pending, even after the 2nd, blocking request times out (or gets
> >   interrupted) and sets sock->state back to SS_UNCONNECTED.
> > 
> >   @connect_work is effectively no-op when sk->sk_state is
> >   TCP_CLOS{E,ING} anyway, so why not we just cancel @connect_work when
> >   blocking requests time out or get interrupted?  Something like:
> > 
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index f04abf662ec6..62628af84164 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1402,6 +1402,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
> >                lock_sock(sk);
> > 
> >                if (signal_pending(current)) {
> > +                       if (cancel_delayed_work(&vsk->connect_work))
> > +                               sock_put(sk);
> > +
> >                        err = sock_intr_errno(timeout);
> >                        sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
> >                        sock->state = SS_UNCONNECTED;
> > @@ -1409,6 +1412,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
> >                        vsock_remove_connected(vsk);
> >                        goto out_wait;
> >                } else if (timeout == 0) {
> > +                       if (cancel_delayed_work(&vsk->connect_work))
> > +                               sock_put(sk);
> > +
> >                        err = -ETIMEDOUT;
> >                        sk->sk_state = TCP_CLOSE;
> >                        sock->state = SS_UNCONNECTED;
> > 
> >   Then no need to worry about rescheduling @connect_work, and the state
> >   machine becomes more accurate.  What do you think?  I will ask syzbot
> >   to test this.
> 
> It could work, but should we set `sk->sk_err` and call sk_error_report() to
> wake up thread waiting on poll()?
> 
> Maybe the previous version is simpler.

Right, I forgot about sk_error_report().  Let us use the simpler version
then.

> > 2. About your suggestion of setting sock->state = SS_UNCONNECTED in
> >   vsock_connect_timeout(), I think it makes sense.  Are you going to
> >   send a net-next patch for this?
> 
> If you have time, feel free to send it.
> 
> Since it is a fix, I believe you can use the "net" tree. (Also for this
> patch).
> 
> Remember to put the "Fixes" tag that should be the same.

Sure, I will send them this week.  Thanks!

Peilin Ye
diff mbox series

Patch

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 194d22291d8b..417e4ad17c03 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1395,7 +1395,7 @@  static int vsock_connect(struct socket *sock, struct sockaddr *addr,
 			/* If the timeout function is already scheduled, ungrab
 			 * the socket refcount to not leave it unbalanced.
 			 */
-			if (!schedule_delayed_work(&vsk->connect_work, timeout))
+			if (mod_delayed_work(system_wq, &vsk->connect_work, timeout))
 				sock_put(sk);
 
 			/* Skip ahead to preserve error code set above. */