diff mbox series

[resend] vsock/virtio: avoid potential deadlock when vsock device remove

Message ID 20210812053056.1699-1-longpeng2@huawei.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [resend] vsock/virtio: avoid potential deadlock when vsock device remove | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Longpeng(Mike) Aug. 12, 2021, 5:30 a.m. UTC
There's a potential deadlock case when remove the vsock device or
process the RESET event:

  vsock_for_each_connected_socket:
      spin_lock_bh(&vsock_table_lock) ----------- (1)
      ...
          virtio_vsock_reset_sock:
              lock_sock(sk) --------------------- (2)
      ...
      spin_unlock_bh(&vsock_table_lock)

lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".

Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
  Call trace:
   queued_spin_lock_slowpath
   vsock_remove_bound
   vsock_remove_sock
   virtio_transport_release
   __vsock_release
   vsock_release
   __sock_release
   sock_close
   __fput
   ____fput

So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
---
 net/vmw_vsock/virtio_transport.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Stefano Garzarella Aug. 12, 2021, 8:03 a.m. UTC | #1
On Thu, Aug 12, 2021 at 01:30:56PM +0800, Longpeng(Mike) wrote:
>There's a potential deadlock case when remove the vsock device or
>process the RESET event:
>
>  vsock_for_each_connected_socket:
>      spin_lock_bh(&vsock_table_lock) ----------- (1)
>      ...
>          virtio_vsock_reset_sock:
>              lock_sock(sk) --------------------- (2)
>      ...
>      spin_unlock_bh(&vsock_table_lock)
>
>lock_sock() may do initiative schedule when the 'sk' is owned by
>other thread at the same time, we would receivce a warning message
>that "scheduling while atomic".
>
>Even worse, if the next task (selected by the scheduler) try to
>release a 'sk', it need to request vsock_table_lock and the deadlock
>occur, cause the system into softlockup state.
>  Call trace:
>   queued_spin_lock_slowpath
>   vsock_remove_bound
>   vsock_remove_sock
>   virtio_transport_release
>   __vsock_release
>   vsock_release
>   __sock_release
>   sock_close
>   __fput
>   ____fput
>
>So we should not require sk_lock in this case, just like the behavior
>in vhost_vsock or vmci.

The difference with vhost_vsock is that here we call it also when we 
receive an event in the event queue (for example because we are 
migrating the VM).

I think the idea of this lock was to prevent concurrency with RX loop, 
but actually if a socket is connected, it can only change state to 
TCP_CLOSING/TCP_CLOSE.

I don't think there is any problem not to take the lock, at most we 
could take the rx_lock in virtio_vsock_event_handle(), but I'm not sure 
it's necessary.

>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Jakub Kicinski <kuba@kernel.org>

We should add:
Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")
>Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
>---
> net/vmw_vsock/virtio_transport.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index e0c2c99..4f7c99d 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -357,11 +357,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
>
> static void virtio_vsock_reset_sock(struct sock *sk)
> {
>-	lock_sock(sk);
>+	/* vmci_transport.c doesn't take sk_lock here either.  At least we're
>+	 * under vsock_table_lock so the sock cannot disappear while 
>we're
>+	 * executing.
>+	 */
>+
> 	sk->sk_state = TCP_CLOSE;
> 	sk->sk_err = ECONNRESET;
> 	sk_error_report(sk);
>-	release_sock(sk);
> }
>
> static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
>-- 
>1.8.3.1
>

With the Fixes tag added:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Jakub Kicinski Aug. 12, 2021, 5:59 p.m. UTC | #2
On Thu, 12 Aug 2021 10:03:32 +0200 Stefano Garzarella wrote:
> On Thu, Aug 12, 2021 at 01:30:56PM +0800, Longpeng(Mike) wrote:
> >There's a potential deadlock case when remove the vsock device or
> >process the RESET event:
> >
> >  vsock_for_each_connected_socket:
> >      spin_lock_bh(&vsock_table_lock) ----------- (1)
> >      ...
> >          virtio_vsock_reset_sock:
> >              lock_sock(sk) --------------------- (2)
> >      ...
> >      spin_unlock_bh(&vsock_table_lock)
> >
> >lock_sock() may do initiative schedule when the 'sk' is owned by
> >other thread at the same time, we would receivce a warning message
> >that "scheduling while atomic".
> >
> >Even worse, if the next task (selected by the scheduler) try to
> >release a 'sk', it need to request vsock_table_lock and the deadlock
> >occur, cause the system into softlockup state.
> >  Call trace:
> >   queued_spin_lock_slowpath
> >   vsock_remove_bound
> >   vsock_remove_sock
> >   virtio_transport_release
> >   __vsock_release
> >   vsock_release
> >   __sock_release
> >   sock_close
> >   __fput
> >   ____fput
> >
> >So we should not require sk_lock in this case, just like the behavior
> >in vhost_vsock or vmci.  
>
> We should add:
> Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")

Added.

> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

And applied, thanks!
diff mbox series

Patch

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e0c2c99..4f7c99d 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -357,11 +357,14 @@  static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
 
 static void virtio_vsock_reset_sock(struct sock *sk)
 {
-	lock_sock(sk);
+	/* vmci_transport.c doesn't take sk_lock here either.  At least we're
+	 * under vsock_table_lock so the sock cannot disappear while we're
+	 * executing.
+	 */
+
 	sk->sk_state = TCP_CLOSE;
 	sk->sk_err = ECONNRESET;
 	sk_error_report(sk);
-	release_sock(sk);
 }
 
 static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)