[0/7] block/nbd: decouple reconnect from drain

Message ID	20210315060611.2989049-1-rvkagan@yandex-team.ru (mailing list archive)
Headers	show Return-Path: <SRS0=ofIk=IN=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5010D64DE8 Precedence: bulk From: Roman Kagan <rvkagan@yandex-team.ru> To: qemu-devel@nongnu.org Subject: [PATCH 0/7] block/nbd: decouple reconnect from drain Date: Mon, 15 Mar 2021 09:06:04 +0300 Message-Id: <20210315060611.2989049-1-rvkagan@yandex-team.ru> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a02:6b8:0:1619::183; envelope-from=rvkagan@yandex-team.ru; helo=forwardcorp1j.mail.yandex.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Cc: Kevin Wolf <kwolf@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>, yc-core@yandex-team.ru Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	block/nbd: decouple reconnect from drain \| expand [0/7] block/nbd: decouple reconnect from drain [1/7] block/nbd: avoid touching freed connect_thread [2/7] block/nbd: use uniformly nbd_client_connecting_wait [3/7] block/nbd: assert attach/detach runs in the proper context [4/7] block/nbd: transfer reconnection stuff across aio_context switch [5/7] block/nbd: better document a case in nbd_co_establish_connection [6/7] block/nbd: decouple reconnect from drain [7/7] block/nbd: stop manipulating in_flight counter

Roman Kagan March 15, 2021, 6:06 a.m. UTC

The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).

This series aims to just stop messing with the drained section in the
reconnection code.

While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
connection_co reentrance"); as I've missed the point of that commit I'd
appreciate more scrutiny in this area.

Roman Kagan (7):
  block/nbd: avoid touching freed connect_thread
  block/nbd: use uniformly nbd_client_connecting_wait
  block/nbd: assert attach/detach runs in the proper context
  block/nbd: transfer reconnection stuff across aio_context switch
  block/nbd: better document a case in nbd_co_establish_connection
  block/nbd: decouple reconnect from drain
  block/nbd: stop manipulating in_flight counter

 block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
 nbd/client.c |   2 -
 2 files changed, 86 insertions(+), 107 deletions(-)

Vladimir Sementsov-Ogievskiy March 15, 2021, 7:45 p.m. UTC | #1

15.03.2021 09:06, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.

The actual point is:

connection_co (together with all functions called from it) has a lot of yield points. And we can't just enter the coroutine in any of the when we want, as it may break some BH which is actually waited for in this yield point..

Still, we should care only about yield points possible during drained section, so we don't need to care about direct qemu_coroutine_yield() inside nbd_connection_entry().

Many things changed since 5ad81b4946.. So probably, now all the (possible during drained section) yield points in nbd_connection_entry support reentering. But some analysis of possible yield points should be done.

> 
> Roman Kagan (7):
>    block/nbd: avoid touching freed connect_thread
>    block/nbd: use uniformly nbd_client_connecting_wait
>    block/nbd: assert attach/detach runs in the proper context
>    block/nbd: transfer reconnection stuff across aio_context switch
>    block/nbd: better document a case in nbd_co_establish_connection
>    block/nbd: decouple reconnect from drain
>    block/nbd: stop manipulating in_flight counter
> 
>   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>   nbd/client.c |   2 -
>   2 files changed, 86 insertions(+), 107 deletions(-)
>

Eric Blake March 16, 2021, 2:41 p.m. UTC | #2

On 3/15/21 1:06 AM, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.

Soft freeze is today.  I'm leaning towards declaring this series as a
bug fix (and so give it some more soak time to get right, but still okay
for -rc1) rather than a feature addition (and therefore would need to be
in a pull request today).  Speak up now if this characterization is off
base.

> 
> Roman Kagan (7):
>   block/nbd: avoid touching freed connect_thread
>   block/nbd: use uniformly nbd_client_connecting_wait
>   block/nbd: assert attach/detach runs in the proper context
>   block/nbd: transfer reconnection stuff across aio_context switch
>   block/nbd: better document a case in nbd_co_establish_connection
>   block/nbd: decouple reconnect from drain
>   block/nbd: stop manipulating in_flight counter
> 
>  block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>  nbd/client.c |   2 -
>  2 files changed, 86 insertions(+), 107 deletions(-)
>

Roman Kagan March 16, 2021, 3:52 p.m. UTC | #3

On Mon, Mar 15, 2021 at 10:45:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> 
> 
> The actual point is:
> 
> connection_co (together with all functions called from it) has a lot of yield points. And we can't just enter the coroutine in any of the when we want, as it may break some BH which is actually waited for in this yield point..
> 
> Still, we should care only about yield points possible during drained section, so we don't need to care about direct qemu_coroutine_yield() inside nbd_connection_entry().
> 
> Many things changed since 5ad81b4946.. So probably, now all the (possible during drained section) yield points in nbd_connection_entry support reentering. But some analysis of possible yield points should be done.

Thanks for the explanation.  Will do this analysis.

Roman.

Roman Kagan March 16, 2021, 4:10 p.m. UTC | #4

On Tue, Mar 16, 2021 at 09:41:36AM -0500, Eric Blake wrote:
> On 3/15/21 1:06 AM, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> 
> Soft freeze is today.  I'm leaning towards declaring this series as a
> bug fix (and so give it some more soak time to get right, but still okay
> for -rc1) rather than a feature addition (and therefore would need to be
> in a pull request today).  Speak up now if this characterization is off
> base.

Yes I'd consider it a bug fix, too.  I'll do my best beating it into
shape before -rc2.

Thanks,
Roman.

Vladimir Sementsov-Ogievskiy March 17, 2021, 8:35 a.m. UTC | #5

15.03.2021 09:06, Roman Kagan wrote:
> The reconnection logic doesn't need to stop while in a drained section.
> Moreover it has to be active during the drained section, as the requests
> that were caught in-flight with the connection to the server broken can
> only usefully get drained if the connection is restored.  Otherwise such
> requests can only either stall resulting in a deadlock (before
> 8c517de24a), or be aborted defeating the purpose of the reconnection
> machinery (after 8c517de24a).
> 
> This series aims to just stop messing with the drained section in the
> reconnection code.
> 
> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> connection_co reentrance"); as I've missed the point of that commit I'd
> appreciate more scrutiny in this area.
> 
> Roman Kagan (7):
>    block/nbd: avoid touching freed connect_thread
>    block/nbd: use uniformly nbd_client_connecting_wait
>    block/nbd: assert attach/detach runs in the proper context
>    block/nbd: transfer reconnection stuff across aio_context switch
>    block/nbd: better document a case in nbd_co_establish_connection
>    block/nbd: decouple reconnect from drain
>    block/nbd: stop manipulating in_flight counter
> 
>   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>   nbd/client.c |   2 -
>   2 files changed, 86 insertions(+), 107 deletions(-)
> 

Hmm. The huge source of problems for this series is weird logic around drain and aio context switch in NBD driver.

Why do we have all these too complicated logic with abuse of in_flight counter in NBD? The answer is connection_co. NBD differs from other drivers, it has a coroutine independent of request coroutines. And we have to move this coroutine carefully to new aio context. We can't just enter it from the new context, we want to be sure that connection_co is in one of yield points that supports reentering.

I have an idea of how to avoid this thing: drop connection_co at all.

1. nbd negotiation goes to connection thread and becomes independent of any aio context.

2. waiting for server reply goes to request code. So, instead of reading the replay from socket always in connection_co, we read in the request coroutine, after sending the request. We'll need a CoMutex for it (as only one request coroutine should read from socket), and be prepared to coming reply is not for _this_ request (in this case we should wake another request and continue read from socket).

but this may be too much for soft freeze.

Another idea:

You want all the requests be completed on drain_begin(), not cancelled. Actually, you don't need reconnect runnning during drained section for it. It should be enough just wait for all currenct requests before disabling the reconnect in drain_begin handler.

Roman Kagan March 26, 2021, 8:07 a.m. UTC | #6

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> > 
> > Roman Kagan (7):
> >    block/nbd: avoid touching freed connect_thread
> >    block/nbd: use uniformly nbd_client_connecting_wait
> >    block/nbd: assert attach/detach runs in the proper context
> >    block/nbd: transfer reconnection stuff across aio_context switch
> >    block/nbd: better document a case in nbd_co_establish_connection
> >    block/nbd: decouple reconnect from drain
> >    block/nbd: stop manipulating in_flight counter
> > 
> >   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
> >   nbd/client.c |   2 -
> >   2 files changed, 86 insertions(+), 107 deletions(-)
> > 
> 
> 
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
> 
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
> 
> I have an idea of how to avoid this thing: drop connection_co at all.
> 
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
> 
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).
> 
> but this may be too much for soft freeze.

This approach does look appealing to me, and I gave it a quick shot but
the amount of changes this involves exceeds the rc tolerance indeed.

> Another idea:
> 
> You want all the requests be completed on drain_begin(), not
> cancelled. Actually, you don't need reconnect runnning during drained
> section for it. It should be enough just wait for all currenct
> requests before disabling the reconnect in drain_begin handler.

So effectively you suggest doing nbd's own drain within
bdrv_co_drain_begin callback.  I'm not totally sure there are no
assumptions this may break, but I'll try to look into this possibility.

Thanks,
Roman.

Roman Kagan April 7, 2021, 7:45 a.m. UTC | #7

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> > 
> > Roman Kagan (7):
> >    block/nbd: avoid touching freed connect_thread
> >    block/nbd: use uniformly nbd_client_connecting_wait
> >    block/nbd: assert attach/detach runs in the proper context
> >    block/nbd: transfer reconnection stuff across aio_context switch
> >    block/nbd: better document a case in nbd_co_establish_connection
> >    block/nbd: decouple reconnect from drain
> >    block/nbd: stop manipulating in_flight counter
> > 
> >   block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
> >   nbd/client.c |   2 -
> >   2 files changed, 86 insertions(+), 107 deletions(-)
> > 
> 
> 
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
> 
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
> 
> I have an idea of how to avoid this thing: drop connection_co at all.
> 
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
> 
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).

The problem with this approach is that it would change the reconnect
behavior.

Currently connection_co purpose is three-fold:

1) receive the header of the server response, identify the request it
   pertains to, and wake the resective request coroutine

2) take on the responsibility to reestablish the connection when it's
   lost

3) monitor the idle connection and initiate the reconnect as soon as the
   connection is lost

Points 1 and 2 can be moved to the request coroutines indeed.  However I
don't see how to do 3 without an extra ever-running coroutine.
Sacrificing it would mean that a connection loss wouldn't be noticed and
the recovery wouldn't be attempted until a request arrived.

This change looks to me like a degradation compared to the current
state.

Roman.

Vladimir Sementsov-Ogievskiy April 7, 2021, 10:13 a.m. UTC | #8

07.04.2021 10:45, Roman Kagan wrote:
> On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 15.03.2021 09:06, Roman Kagan wrote:
>>> The reconnection logic doesn't need to stop while in a drained section.
>>> Moreover it has to be active during the drained section, as the requests
>>> that were caught in-flight with the connection to the server broken can
>>> only usefully get drained if the connection is restored.  Otherwise such
>>> requests can only either stall resulting in a deadlock (before
>>> 8c517de24a), or be aborted defeating the purpose of the reconnection
>>> machinery (after 8c517de24a).
>>>
>>> This series aims to just stop messing with the drained section in the
>>> reconnection code.
>>>
>>> While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
>>> connection_co reentrance"); as I've missed the point of that commit I'd
>>> appreciate more scrutiny in this area.
>>>
>>> Roman Kagan (7):
>>>     block/nbd: avoid touching freed connect_thread
>>>     block/nbd: use uniformly nbd_client_connecting_wait
>>>     block/nbd: assert attach/detach runs in the proper context
>>>     block/nbd: transfer reconnection stuff across aio_context switch
>>>     block/nbd: better document a case in nbd_co_establish_connection
>>>     block/nbd: decouple reconnect from drain
>>>     block/nbd: stop manipulating in_flight counter
>>>
>>>    block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
>>>    nbd/client.c |   2 -
>>>    2 files changed, 86 insertions(+), 107 deletions(-)
>>>
>>
>>
>> Hmm. The huge source of problems for this series is weird logic around
>> drain and aio context switch in NBD driver.
>>
>> Why do we have all these too complicated logic with abuse of in_flight
>> counter in NBD? The answer is connection_co. NBD differs from other
>> drivers, it has a coroutine independent of request coroutines. And we
>> have to move this coroutine carefully to new aio context. We can't
>> just enter it from the new context, we want to be sure that
>> connection_co is in one of yield points that supports reentering.
>>
>> I have an idea of how to avoid this thing: drop connection_co at all.
>>
>> 1. nbd negotiation goes to connection thread and becomes independent
>> of any aio context.
>>
>> 2. waiting for server reply goes to request code. So, instead of
>> reading the replay from socket always in connection_co, we read in the
>> request coroutine, after sending the request. We'll need a CoMutex for
>> it (as only one request coroutine should read from socket), and be
>> prepared to coming reply is not for _this_ request (in this case we
>> should wake another request and continue read from socket).
> 
> The problem with this approach is that it would change the reconnect
> behavior.
> 
> Currently connection_co purpose is three-fold:
> 
> 1) receive the header of the server response, identify the request it
>     pertains to, and wake the resective request coroutine
> 
> 2) take on the responsibility to reestablish the connection when it's
>     lost
> 
> 3) monitor the idle connection and initiate the reconnect as soon as the
>     connection is lost
> 
> Points 1 and 2 can be moved to the request coroutines indeed.  However I
> don't see how to do 3 without an extra ever-running coroutine.
> Sacrificing it would mean that a connection loss wouldn't be noticed and
> the recovery wouldn't be attempted until a request arrived.
> 
> This change looks to me like a degradation compared to the current
> state.
> 

For 3 we can check the connection by timeout:

  - getsockopt( .. SO_ERROR .. ), which could be done from bs aio context, or even from reconnect-thread context

  - or, we can create a PING request: just use some request with parameters for which we are sure that NBD server should do no action but report some expected error. We can create such request by timeout when there no more requests, just to check that connection still works.

Note, that neither of this (and nor current [3] which is just endless read from socket) will work only with keep-alive set, which is not by default for now.

Anyway I think first step is splitting connect-thread out of nbd.c which is overcomplicated now, I'm going to send a refactoring series for this.

[0/7] block/nbd: decouple reconnect from drain

Message

Comments