Message ID | 20210416080911.83197-1-vsementsov@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | block/nbd: rework client connection | expand |
16.04.2021 11:14, Vladimir Sementsov-Ogievskiy wrote: > 16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote: >> OK, that's a big rewrite of the logic. >> >> Pre-patch we have an always running coroutine - connection_co. It does >> reply receiving and reconnecting. And it leads to a lot of difficult >> and unobvious code around drained sections and context switch. We also >> abuse bs->in_flight counter which is increased for connection_co and >> temporary decreased in points where we want to allow drained section to >> begin. One of these place is in another file: in nbd_read_eof() in >> nbd/client.c. >> >> We also cancel reconnect and requests waiting for reconnect on drained >> begin which is not correct. >> >> Let's finally drop this always running coroutine and go another way: >> >> 1. reconnect_attempt() goes to nbd_co_send_request and called under >> send_mutex >> >> 2. We do receive headers in request coroutine. But we also should >> dispatch replies for another pending requests. So, >> nbd_connection_entry() is turned into nbd_receive_replies(), which >> does reply dispatching until it receive another request headers, and >> returns when it receive the requested header. >> >> 3. All old staff around drained sections and context switch is dropped. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> > > Please consider this last patch as RFC for now: > > 1. It is complicated, and doesn't have good documentation. Please look through and ask everything that is not obvious, I'll explain. Don't waste your time trying to understand what is not clean. > > 2. I also failed to image, how to split the patch into smaller simple patches.. Ideas are welcome. > > 3. It actually reverts what was done in > > commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace > Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > Date: Thu Sep 3 22:02:58 2020 +0300 > > block/nbd: fix drain dead-lock because of nbd reconnect-delay > > and I didn't check yet, does this dead-lock still here or not. Even if it still here I believe that nbd driver is a wrong place to workaround this bug, but I should check it first at least. > 4. As Roman said, there is a problem in new architecture: when guest is idle, we will not detect disconnect immediately but only on the next request from the guest. That may be considered as a degradation. Still, let's implement a kind of keep-alive on top of this series, some ideas: - add an idle-timeout, and do simple NBD request by timeout, which will result in some expected error reply from the server. - or add an idle coroutine, which will do endless "read" when there no requests. It will be a kind of old connection_co, but it will have only one function and will be extremely simple. And we may just cancel it on drained-begin and restart on drained-end.
On 16/04/21 10:08, Vladimir Sementsov-Ogievskiy wrote: > The series substitutes "[PATCH v2 00/10] block/nbd: move connection code to separate file" > Supersedes: <20210408140827.332915-1-vsementsov@virtuozzo.com> > so it's called v3 > > block/nbd.c is overcomplicated. These series is a big refactoring, which > finally drops all the complications around drained sections and context > switching, including abuse of bs->in_flight counter. > > Also, at the end of the series we don't cancel reconnect on drained > sections (and don't cancel requests waiting for reconnect on drained > section begin), which fixes a problem reported by Roman. > > The series is also available at tag up-nbd-client-connection-v3 in > git https://src.openvz.org/scm/~vsementsov/qemu.git I have independently done some rework of the connection state machine, mostly in order to use the QemuCoSleep API instead of aio_co_wake. In general it seems to be independent of this work. I'll review this series. Paolo > v3: > Changes in first part of the series (main thing is not using refcnt, but instead (modified) Roman's patch): > > 01-04: new > 05: add Roman's r-b > 06: new > 07: now, new aio_co_schedule(NULL, thr->wait_co) is used > 08: reworked, we now need also bool detached, as we don't have refcnt > 09,10: add Roman's r-b > 11: rebased, don't modify nbd_free_connect_thread() name at this point > 12: add Roman's r-b > 13: new > 14: rebased > > Other patches are new. > > Roman Kagan (2): > block/nbd: fix channel object leak > block/nbd: ensure ->connection_thread is always valid > > Vladimir Sementsov-Ogievskiy (31): > block/nbd: fix how state is cleared on nbd_open() failure paths > block/nbd: nbd_client_handshake(): fix leak of s->ioc > block/nbd: BDRVNBDState: drop unused connect_err and connect_status > util/async: aio_co_schedule(): support reschedule in same ctx > block/nbd: simplify waking of nbd_co_establish_connection() > block/nbd: drop thr->state > block/nbd: bs-independent interface for nbd_co_establish_connection() > block/nbd: make nbd_co_establish_connection_cancel() bs-independent > block/nbd: rename NBDConnectThread to NBDClientConnection > block/nbd: introduce nbd_client_connection_new() > block/nbd: introduce nbd_client_connection_release() > nbd: move connection code from block/nbd to nbd/client-connection > nbd/client-connection: use QEMU_LOCK_GUARD > nbd/client-connection: add possibility of negotiation > nbd/client-connection: implement connection retry > nbd/client-connection: shutdown connection on release > block/nbd: split nbd_handle_updated_info out of nbd_client_handshake() > block/nbd: use negotiation of NBDClientConnection > qemu-socket: pass monitor link to socket_get_fd directly > block/nbd: pass monitor directly to connection thread > block/nbd: nbd_teardown_connection() don't touch s->sioc > block/nbd: drop BDRVNBDState::sioc > nbd/client-connection: return only one io channel > block-coroutine-wrapper: allow non bdrv_ prefix > block/nbd: split nbd_co_do_establish_connection out of > nbd_reconnect_attempt > nbd/client-connection: do qio_channel_set_delay(false) > nbd/client-connection: add option for non-blocking connection attempt > block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() > block/nbd: add nbd_clinent_connected() helper > block/nbd: safer transition to receiving request > block/nbd: drop connection_co > > block/coroutines.h | 6 + > include/block/aio.h | 2 +- > include/block/nbd.h | 19 + > include/io/channel-socket.h | 20 + > include/qemu/sockets.h | 2 +- > block/nbd.c | 908 +++++++---------------------- > io/channel-socket.c | 17 +- > nbd/client-connection.c | 364 ++++++++++++ > nbd/client.c | 2 - > tests/unit/test-util-sockets.c | 16 +- > util/async.c | 8 + > util/qemu-sockets.c | 10 +- > nbd/meson.build | 1 + > scripts/block-coroutine-wrapper.py | 7 +- > 14 files changed, 666 insertions(+), 716 deletions(-) > create mode 100644 nbd/client-connection.c >