mbox series

[v3,00/33] block/nbd: rework client connection

Message ID 20210416080911.83197-1-vsementsov@virtuozzo.com (mailing list archive)
Headers show
Series block/nbd: rework client connection | expand

Message

Vladimir Sementsov-Ogievskiy April 16, 2021, 8:08 a.m. UTC
The series substitutes "[PATCH v2 00/10] block/nbd: move connection code to separate file"
Supersedes: <20210408140827.332915-1-vsementsov@virtuozzo.com>
so it's called v3

block/nbd.c is overcomplicated. These series is a big refactoring, which
finally drops all the complications around drained sections and context
switching, including abuse of bs->in_flight counter.

Also, at the end of the series we don't cancel reconnect on drained
sections (and don't cancel requests waiting for reconnect on drained
section begin), which fixes a problem reported by Roman.

The series is also available at tag up-nbd-client-connection-v3 in
git https://src.openvz.org/scm/~vsementsov/qemu.git 

v3:
Changes in first part of the series (main thing is not using refcnt, but instead (modified) Roman's patch):

01-04: new
05: add Roman's r-b
06: new
07: now, new aio_co_schedule(NULL, thr->wait_co) is used
08: reworked, we now need also bool detached, as we don't have refcnt
09,10: add Roman's r-b
11: rebased, don't modify nbd_free_connect_thread() name at this point
12: add Roman's r-b
13: new
14: rebased

Other patches are new.

Roman Kagan (2):
  block/nbd: fix channel object leak
  block/nbd: ensure ->connection_thread is always valid

Vladimir Sementsov-Ogievskiy (31):
  block/nbd: fix how state is cleared on nbd_open() failure paths
  block/nbd: nbd_client_handshake(): fix leak of s->ioc
  block/nbd: BDRVNBDState: drop unused connect_err and connect_status
  util/async: aio_co_schedule(): support reschedule in same ctx
  block/nbd: simplify waking of nbd_co_establish_connection()
  block/nbd: drop thr->state
  block/nbd: bs-independent interface for nbd_co_establish_connection()
  block/nbd: make nbd_co_establish_connection_cancel() bs-independent
  block/nbd: rename NBDConnectThread to NBDClientConnection
  block/nbd: introduce nbd_client_connection_new()
  block/nbd: introduce nbd_client_connection_release()
  nbd: move connection code from block/nbd to nbd/client-connection
  nbd/client-connection: use QEMU_LOCK_GUARD
  nbd/client-connection: add possibility of negotiation
  nbd/client-connection: implement connection retry
  nbd/client-connection: shutdown connection on release
  block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()
  block/nbd: use negotiation of NBDClientConnection
  qemu-socket: pass monitor link to socket_get_fd directly
  block/nbd: pass monitor directly to connection thread
  block/nbd: nbd_teardown_connection() don't touch s->sioc
  block/nbd: drop BDRVNBDState::sioc
  nbd/client-connection: return only one io channel
  block-coroutine-wrapper: allow non bdrv_ prefix
  block/nbd: split nbd_co_do_establish_connection out of
    nbd_reconnect_attempt
  nbd/client-connection: do qio_channel_set_delay(false)
  nbd/client-connection: add option for non-blocking connection attempt
  block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
  block/nbd: add nbd_clinent_connected() helper
  block/nbd: safer transition to receiving request
  block/nbd: drop connection_co

 block/coroutines.h                 |   6 +
 include/block/aio.h                |   2 +-
 include/block/nbd.h                |  19 +
 include/io/channel-socket.h        |  20 +
 include/qemu/sockets.h             |   2 +-
 block/nbd.c                        | 908 +++++++----------------------
 io/channel-socket.c                |  17 +-
 nbd/client-connection.c            | 364 ++++++++++++
 nbd/client.c                       |   2 -
 tests/unit/test-util-sockets.c     |  16 +-
 util/async.c                       |   8 +
 util/qemu-sockets.c                |  10 +-
 nbd/meson.build                    |   1 +
 scripts/block-coroutine-wrapper.py |   7 +-
 14 files changed, 666 insertions(+), 716 deletions(-)
 create mode 100644 nbd/client-connection.c

Comments

Vladimir Sementsov-Ogievskiy April 16, 2021, 8:21 a.m. UTC | #1
16.04.2021 11:14, Vladimir Sementsov-Ogievskiy wrote:
> 16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote:
>> OK, that's a big rewrite of the logic.
>>
>> Pre-patch we have an always running coroutine - connection_co. It does
>> reply receiving and reconnecting. And it leads to a lot of difficult
>> and unobvious code around drained sections and context switch. We also
>> abuse bs->in_flight counter which is increased for connection_co and
>> temporary decreased in points where we want to allow drained section to
>> begin. One of these place is in another file: in nbd_read_eof() in
>> nbd/client.c.
>>
>> We also cancel reconnect and requests waiting for reconnect on drained
>> begin which is not correct.
>>
>> Let's finally drop this always running coroutine and go another way:
>>
>> 1. reconnect_attempt() goes to nbd_co_send_request and called under
>>     send_mutex
>>
>> 2. We do receive headers in request coroutine. But we also should
>>     dispatch replies for another pending requests. So,
>>     nbd_connection_entry() is turned into nbd_receive_replies(), which
>>     does reply dispatching until it receive another request headers, and
>>     returns when it receive the requested header.
>>
>> 3. All old staff around drained sections and context switch is dropped.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> 
> Please consider this last patch as RFC for now:
> 
> 1. It is complicated, and doesn't have good documentation. Please look through and ask everything that is not obvious, I'll explain. Don't waste your time trying to understand what is not clean.
> 
> 2. I also failed to image, how to split the patch into smaller simple patches.. Ideas are welcome.
> 
> 3. It actually reverts what was done in
> 
> commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date:   Thu Sep 3 22:02:58 2020 +0300
> 
>     block/nbd: fix drain dead-lock because of nbd reconnect-delay
> 
> and I didn't check yet, does this dead-lock still here or not. Even if it still here I believe that nbd driver is a wrong place to workaround this bug, but I should check it first at least.
> 

4. As Roman said, there is a problem in new architecture: when guest is idle, we will not detect disconnect immediately but only on the next request from the guest. That may be considered as a degradation.
Still, let's implement a kind of keep-alive on top of this series, some ideas:

   - add an idle-timeout, and do simple NBD request by timeout, which will result in some expected error reply from the server.
   - or add an idle coroutine, which will do endless "read" when there no requests. It will be a kind of old connection_co, but it will have only one function and will be extremely simple. And we may just cancel it on drained-begin and restart on drained-end.
Paolo Bonzini May 12, 2021, 6:54 a.m. UTC | #2
On 16/04/21 10:08, Vladimir Sementsov-Ogievskiy wrote:
> The series substitutes "[PATCH v2 00/10] block/nbd: move connection code to separate file"
> Supersedes: <20210408140827.332915-1-vsementsov@virtuozzo.com>
> so it's called v3
> 
> block/nbd.c is overcomplicated. These series is a big refactoring, which
> finally drops all the complications around drained sections and context
> switching, including abuse of bs->in_flight counter.
> 
> Also, at the end of the series we don't cancel reconnect on drained
> sections (and don't cancel requests waiting for reconnect on drained
> section begin), which fixes a problem reported by Roman.
> 
> The series is also available at tag up-nbd-client-connection-v3 in
> git https://src.openvz.org/scm/~vsementsov/qemu.git

I have independently done some rework of the connection state machine, 
mostly in order to use the QemuCoSleep API instead of aio_co_wake.  In 
general it seems to be independent of this work.  I'll review this series.

Paolo

> v3:
> Changes in first part of the series (main thing is not using refcnt, but instead (modified) Roman's patch):
> 
> 01-04: new
> 05: add Roman's r-b
> 06: new
> 07: now, new aio_co_schedule(NULL, thr->wait_co) is used
> 08: reworked, we now need also bool detached, as we don't have refcnt
> 09,10: add Roman's r-b
> 11: rebased, don't modify nbd_free_connect_thread() name at this point
> 12: add Roman's r-b
> 13: new
> 14: rebased
> 
> Other patches are new.
> 
> Roman Kagan (2):
>    block/nbd: fix channel object leak
>    block/nbd: ensure ->connection_thread is always valid
> 
> Vladimir Sementsov-Ogievskiy (31):
>    block/nbd: fix how state is cleared on nbd_open() failure paths
>    block/nbd: nbd_client_handshake(): fix leak of s->ioc
>    block/nbd: BDRVNBDState: drop unused connect_err and connect_status
>    util/async: aio_co_schedule(): support reschedule in same ctx
>    block/nbd: simplify waking of nbd_co_establish_connection()
>    block/nbd: drop thr->state
>    block/nbd: bs-independent interface for nbd_co_establish_connection()
>    block/nbd: make nbd_co_establish_connection_cancel() bs-independent
>    block/nbd: rename NBDConnectThread to NBDClientConnection
>    block/nbd: introduce nbd_client_connection_new()
>    block/nbd: introduce nbd_client_connection_release()
>    nbd: move connection code from block/nbd to nbd/client-connection
>    nbd/client-connection: use QEMU_LOCK_GUARD
>    nbd/client-connection: add possibility of negotiation
>    nbd/client-connection: implement connection retry
>    nbd/client-connection: shutdown connection on release
>    block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()
>    block/nbd: use negotiation of NBDClientConnection
>    qemu-socket: pass monitor link to socket_get_fd directly
>    block/nbd: pass monitor directly to connection thread
>    block/nbd: nbd_teardown_connection() don't touch s->sioc
>    block/nbd: drop BDRVNBDState::sioc
>    nbd/client-connection: return only one io channel
>    block-coroutine-wrapper: allow non bdrv_ prefix
>    block/nbd: split nbd_co_do_establish_connection out of
>      nbd_reconnect_attempt
>    nbd/client-connection: do qio_channel_set_delay(false)
>    nbd/client-connection: add option for non-blocking connection attempt
>    block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
>    block/nbd: add nbd_clinent_connected() helper
>    block/nbd: safer transition to receiving request
>    block/nbd: drop connection_co
> 
>   block/coroutines.h                 |   6 +
>   include/block/aio.h                |   2 +-
>   include/block/nbd.h                |  19 +
>   include/io/channel-socket.h        |  20 +
>   include/qemu/sockets.h             |   2 +-
>   block/nbd.c                        | 908 +++++++----------------------
>   io/channel-socket.c                |  17 +-
>   nbd/client-connection.c            | 364 ++++++++++++
>   nbd/client.c                       |   2 -
>   tests/unit/test-util-sockets.c     |  16 +-
>   util/async.c                       |   8 +
>   util/qemu-sockets.c                |  10 +-
>   nbd/meson.build                    |   1 +
>   scripts/block-coroutine-wrapper.py |   7 +-
>   14 files changed, 666 insertions(+), 716 deletions(-)
>   create mode 100644 nbd/client-connection.c
>