diff mbox series

[v3,3/3] nbd/server: Allow MULTI_CONN for shared writable exports

Message ID 20220314203818.3681277-4-eblake@redhat.com (mailing list archive)
State New, archived
Headers show
Series nbd: MULTI_CONN for shared writable exports | expand

Commit Message

Eric Blake March 14, 2022, 8:38 p.m. UTC
According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu when our block layer is backed by the local
filesystem (by virtue of the semantics of fdatasync(), and the fact
that qemu itself is not buffering writes beyond flushes).  It is
harder to state whether we satisfy these conditions for network-based
protocols, so the safest course of action is to allow users to opt-in
to advertising multi-conn.  We may later tweak defaults to advertise
by default when the block layer can confirm that the underlying
protocol driver is cache consistent between multiple writers, but for
now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
explicitly start qemu-nbd (new -m command-line option) or
qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn
advertisement in a known-safe setup where the client end can then
benefit from parallel clients.

Note, however, that we don't want to advertise MULTI_CONN when we know
that a second client cannot connect (for historical reasons, qemu-nbd
defaults to a single connection while nbd-server-add and QMP commands
default to unlimited connections; but we already have existing means
to let either style of NBD server creation alter those defaults).  The
harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru>
---
 docs/interop/nbd.txt                       |   1 +
 docs/tools/qemu-nbd.rst                    |  14 +-
 qapi/block-export.json                     |  19 ++-
 include/block/nbd.h                        |   3 +-
 blockdev-nbd.c                             |   5 +
 nbd/server.c                               |  27 +++-
 qemu-nbd.c                                 |  20 ++-
 MAINTAINERS                                |   1 +
 tests/qemu-iotests/tests/nbd-multiconn     | 157 +++++++++++++++++++++
 tests/qemu-iotests/tests/nbd-multiconn.out |   5 +
 10 files changed, 240 insertions(+), 12 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
 create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

Comments

Richard W.M. Jones March 15, 2022, 1:14 p.m. UTC | #1
The patches seem OK to me, but I don't really know enough about the
internals of qemu-nbd to give a line-by-line review.  I did however
build and test qemu-nbd with the patches:

  $ ./build/qemu-nbd /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
	can_multi_conn: false


  $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
	can_multi_conn: false

^^^ Is this expected?  It also happens with -e 0.


  $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
	can_multi_conn: true


  $ ./build/qemu-nbd -e 2 -m off /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
	can_multi_conn: false


Rich.
Eric Blake March 16, 2022, 9:07 p.m. UTC | #2
On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote:
> The patches seem OK to me, but I don't really know enough about the
> internals of qemu-nbd to give a line-by-line review.  I did however
> build and test qemu-nbd with the patches:
> 
>   $ ./build/qemu-nbd /var/tmp/test.qcow2 
>   $ nbdinfo nbd://localhost
>   ...
> 	can_multi_conn: false
> 
> 
>   $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
>   $ nbdinfo nbd://localhost
>   ...
> 	can_multi_conn: false
> 
> ^^^ Is this expected?  It also happens with -e 0.

Yes, because qemu-nbd defaults to read-write connections, but to be
conservative, this patch defaults '-m auto' to NOT advertise
multi-conn for read-write; you need to be explicit:

> 
> 
>   $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
>   $ nbdinfo nbd://localhost
>   ...
> 	can_multi_conn: true

either with '-m on' as you did here, or with

build/qemu-nbd -r -e 2 /var/tmp/test.qcow2

where the '-m auto' default exposes multi-conn for a readonly client.

> 
> 
>   $ ./build/qemu-nbd -e 2 -m off /var/tmp/test.qcow2 
>   $ nbdinfo nbd://localhost
>   ...
> 	can_multi_conn: false
> 
> 
> Rich.
Eric Blake March 16, 2022, 9:15 p.m. UTC | #3
On Wed, Mar 16, 2022 at 04:07:21PM -0500, Eric Blake wrote:
> On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote:
> > The patches seem OK to me, but I don't really know enough about the
> > internals of qemu-nbd to give a line-by-line review.  I did however
> > build and test qemu-nbd with the patches:
> > 
> >   $ ./build/qemu-nbd /var/tmp/test.qcow2 
> >   $ nbdinfo nbd://localhost
> >   ...
> > 	can_multi_conn: false
> > 
> > 
> >   $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
> >   $ nbdinfo nbd://localhost
> >   ...
> > 	can_multi_conn: false
> > 
> > ^^^ Is this expected?  It also happens with -e 0.
> 
> Yes, because qemu-nbd defaults to read-write connections, but to be
> conservative, this patch defaults '-m auto' to NOT advertise
> multi-conn for read-write; you need to be explicit:
> 
> > 
> > 
> >   $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
> >   $ nbdinfo nbd://localhost
> >   ...
> > 	can_multi_conn: true
> 
> either with '-m on' as you did here, or with
> 
> build/qemu-nbd -r -e 2 /var/tmp/test.qcow2
> 
> where the '-m auto' default exposes multi-conn for a readonly client.

I hit send prematurely - one more thought I wanted to make clear.  For
_this_ series, the behavior of '-m auto' depends solely on readonly
vs. read-write; that being the most conservative choice of preserving
pre-existing qemu-nbd behavior (that is, if you don't use -m, the only
change in behavior should be the fact that --help output now lists
-m).

But in future versions of qemu, we might be able to improve '-m auto'
to also take into consideration whether the backing image of a
read-write device is known-cache-consistent (such as a local file
system), where we can then manage to default to advertising multi-conn
for those writable images, while still avoiding the advertisement when
exposing other devices such as network-mounted devices where we are
less confident on whether there are sufficient cache consistency
guarantees.  Use of explicit '-m off' or '-m on' gives guaranteed
results no matter the qemu version, so it is only explicit (or
implied) '-m auto' where we might get smarter defaults over time.

Thus testing whether advertising multi-conn makes a difference in
other tools like nbdcopy thus requires checking if qemu-nbd is new
enough to support -m, and then being explicit that we are opting in to
using it.
Richard W.M. Jones March 16, 2022, 11:01 p.m. UTC | #4
On Wed, Mar 16, 2022 at 04:15:53PM -0500, Eric Blake wrote:
> On Wed, Mar 16, 2022 at 04:07:21PM -0500, Eric Blake wrote:
> > On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote:
> > > The patches seem OK to me, but I don't really know enough about the
> > > internals of qemu-nbd to give a line-by-line review.  I did however
> > > build and test qemu-nbd with the patches:
> > > 
> > >   $ ./build/qemu-nbd /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > > 	can_multi_conn: false
> > > 
> > > 
> > >   $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > > 	can_multi_conn: false
> > > 
> > > ^^^ Is this expected?  It also happens with -e 0.
> > 
> > Yes, because qemu-nbd defaults to read-write connections, but to be
> > conservative, this patch defaults '-m auto' to NOT advertise
> > multi-conn for read-write; you need to be explicit:
> > 
> > > 
> > > 
> > >   $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > > 	can_multi_conn: true
> > 
> > either with '-m on' as you did here, or with
> > 
> > build/qemu-nbd -r -e 2 /var/tmp/test.qcow2
> > 
> > where the '-m auto' default exposes multi-conn for a readonly client.
> 
> I hit send prematurely - one more thought I wanted to make clear.  For
> _this_ series, the behavior of '-m auto' depends solely on readonly
> vs. read-write; that being the most conservative choice of preserving
> pre-existing qemu-nbd behavior (that is, if you don't use -m, the only
> change in behavior should be the fact that --help output now lists
> -m).
> 
> But in future versions of qemu, we might be able to improve '-m auto'
> to also take into consideration whether the backing image of a
> read-write device is known-cache-consistent (such as a local file
> system), where we can then manage to default to advertising multi-conn
> for those writable images, while still avoiding the advertisement when
> exposing other devices such as network-mounted devices where we are
> less confident on whether there are sufficient cache consistency
> guarantees.  Use of explicit '-m off' or '-m on' gives guaranteed
> results no matter the qemu version, so it is only explicit (or
> implied) '-m auto' where we might get smarter defaults over time.
> 
> Thus testing whether advertising multi-conn makes a difference in
> other tools like nbdcopy thus requires checking if qemu-nbd is new
> enough to support -m, and then being explicit that we are opting in to
> using it.

I see.  The commit message of patch 3 confused me because
superficially it seems to say that multi-conn is safe (and therefore I
assumed safe == enabled) when backed by a local filesystem.  Could the
commit message be improved to list the default for common combinations
of flags?

Rich.
Kevin Wolf April 27, 2022, 3:52 p.m. UTC | #5
Am 14.03.2022 um 21:38 hat Eric Blake geschrieben:
> According to the NBD spec, a server that advertises
> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> not see any cache inconsistencies: when properly separated by a single
> flush, actions performed by one client will be visible to another
> client, regardless of which client did the flush.  We satisfy these
> conditions in qemu when our block layer is backed by the local
> filesystem (by virtue of the semantics of fdatasync(), and the fact
> that qemu itself is not buffering writes beyond flushes).  It is
> harder to state whether we satisfy these conditions for network-based
> protocols, so the safest course of action is to allow users to opt-in
> to advertising multi-conn.

Do you have an example of how this could be unsafe?

As I understand it, the NBD server has a single BlockBackend and
therefore is a single client for the backend, be it file-posix or any
network-based protocol. It doesn't really make a difference for the
storage from how many different NBD clients the requests are coming.

I would have expected that cache coherency of the protocol level driver
would only matter if you had two QEMU processes accessing the same file
concurrently.

In fact, I don't think we even need the flush restriction from the NBD
spec. All clients see the same state (that of the NBD server
BlockBackend) even without anyone issuing any flush. The flush is only
needed to make sure that cached data is written to the backing storage
when writeback caches are involved.

Please correct me if I'm misunderstanding something here.

> We may later tweak defaults to advertise
> by default when the block layer can confirm that the underlying
> protocol driver is cache consistent between multiple writers, but for
> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> explicitly start qemu-nbd (new -m command-line option) or
> qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn
> advertisement in a known-safe setup where the client end can then
> benefit from parallel clients.
> 
> Note, however, that we don't want to advertise MULTI_CONN when we know
> that a second client cannot connect (for historical reasons, qemu-nbd
> defaults to a single connection while nbd-server-add and QMP commands
> default to unlimited connections; but we already have existing means
> to let either style of NBD server creation alter those defaults).  The
> harder part of this patch is setting up an iotest to demonstrate
> behavior of multiple NBD clients to a single server.  It might be
> possible with parallel qemu-io processes, but I found it easier to do
> in python with the help of libnbd, and help from Nir and Vladimir in
> writing the test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Suggested-by: Nir Soffer <nsoffer@redhat.com>
> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru>

> @@ -709,6 +714,17 @@ int main(int argc, char **argv)
>                  exit(EXIT_FAILURE);
>              }
>              break;
> +        case 'm':
> +        {
> +            Error *err = NULL;
> +            multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg,
> +                                         ON_OFF_AUTO_AUTO, &err);
> +            if (err) {
> +                error_report_err(err);
> +                exit(EXIT_FAILURE);
> +            }

I think this is the same as passing &error_fatal.

> +            break;
> +        }
>          case 'f':
>              fmt = optarg;
>              break;
> diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn
> new file mode 100755
> index 000000000000..7d1179b33b05
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/nbd-multiconn
> @@ -0,0 +1,157 @@
> +#!/usr/bin/env python3
> +# group: rw auto quick
> +#
> +# Test cases for NBD multi-conn advertisement
> +#
> +# Copyright (C) 2022 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +import os
> +from contextlib import contextmanager
> +import iotests
> +from iotests import qemu_img_create, qemu_io_silent

qemu_io_silent() doesn't exist any more, commit 72cfb937 removed it.

Kevin
Eric Blake April 27, 2022, 9:39 p.m. UTC | #6
On Wed, Apr 27, 2022 at 05:52:09PM +0200, Kevin Wolf wrote:
> Am 14.03.2022 um 21:38 hat Eric Blake geschrieben:
> > According to the NBD spec, a server that advertises
> > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > not see any cache inconsistencies: when properly separated by a single
> > flush, actions performed by one client will be visible to another
> > client, regardless of which client did the flush.  We satisfy these
> > conditions in qemu when our block layer is backed by the local
> > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > that qemu itself is not buffering writes beyond flushes).  It is
> > harder to state whether we satisfy these conditions for network-based
> > protocols, so the safest course of action is to allow users to opt-in
> > to advertising multi-conn.
> 
> Do you have an example of how this could be unsafe?

Nothing direct.  I tried to turn this on unconditionally in an earlier
version, and we waffled about whether we could prove that network
block backends (such as gluster) provide us the safety that the NBD
spec demands:

https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html

> 
> As I understand it, the NBD server has a single BlockBackend and
> therefore is a single client for the backend, be it file-posix or any
> network-based protocol. It doesn't really make a difference for the
> storage from how many different NBD clients the requests are coming.
> 
> I would have expected that cache coherency of the protocol level driver
> would only matter if you had two QEMU processes accessing the same file
> concurrently.

Or a multi-pathed connection to network storage, where one QEMU
process accesses the network device, but those accesses may
round-robin which server they reach, and where any caching at an
individual server may be inconsistent with what is seen on another
server unless flushing is used to force the round-robin access to
synchronize between the multi-path views.

> 
> In fact, I don't think we even need the flush restriction from the NBD
> spec. All clients see the same state (that of the NBD server
> BlockBackend) even without anyone issuing any flush. The flush is only
> needed to make sure that cached data is written to the backing storage
> when writeback caches are involved.
> 
> Please correct me if I'm misunderstanding something here.

Likewise me, if I'm being overly cautious.

I can certainly write a simpler v4 that just always advertises
MULTI_CONN if we allow more than one client, without any knob to
override it; it's just that it is harder to write a commit message
justifying why I think it is safe to do so.

> 
> > We may later tweak defaults to advertise
> > by default when the block layer can confirm that the underlying
> > protocol driver is cache consistent between multiple writers, but for
> > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> > explicitly start qemu-nbd (new -m command-line option) or
> > qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn
> > advertisement in a known-safe setup where the client end can then
> > benefit from parallel clients.
> > 
> > Note, however, that we don't want to advertise MULTI_CONN when we know
> > that a second client cannot connect (for historical reasons, qemu-nbd
> > defaults to a single connection while nbd-server-add and QMP commands
> > default to unlimited connections; but we already have existing means
> > to let either style of NBD server creation alter those defaults).  The
> > harder part of this patch is setting up an iotest to demonstrate
> > behavior of multiple NBD clients to a single server.  It might be
> > possible with parallel qemu-io processes, but I found it easier to do
> > in python with the help of libnbd, and help from Nir and Vladimir in
> > writing the test.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > Suggested-by: Nir Soffer <nsoffer@redhat.com>
> > Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru>
> 
> > @@ -709,6 +714,17 @@ int main(int argc, char **argv)
> >                  exit(EXIT_FAILURE);
> >              }
> >              break;
> > +        case 'm':
> > +        {
> > +            Error *err = NULL;
> > +            multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg,
> > +                                         ON_OFF_AUTO_AUTO, &err);
> > +            if (err) {
> > +                error_report_err(err);
> > +                exit(EXIT_FAILURE);
> > +            }
> 
> I think this is the same as passing &error_fatal.

Yes, sounds right.

> 
> > +            break;
> > +        }
> >          case 'f':
> >              fmt = optarg;
> >              break;
> > diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn
> > new file mode 100755
> > index 000000000000..7d1179b33b05
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/nbd-multiconn
> > @@ -0,0 +1,157 @@
> > +#!/usr/bin/env python3

> > +
> > +import os
> > +from contextlib import contextmanager
> > +import iotests
> > +from iotests import qemu_img_create, qemu_io_silent
> 
> qemu_io_silent() doesn't exist any more, commit 72cfb937 removed it.

Sounds like I need to rebase and post v4 anyways, then.
Kevin Wolf April 29, 2022, 12:49 p.m. UTC | #7
Am 27.04.2022 um 23:39 hat Eric Blake geschrieben:
> On Wed, Apr 27, 2022 at 05:52:09PM +0200, Kevin Wolf wrote:
> > Am 14.03.2022 um 21:38 hat Eric Blake geschrieben:
> > > According to the NBD spec, a server that advertises
> > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > > not see any cache inconsistencies: when properly separated by a single
> > > flush, actions performed by one client will be visible to another
> > > client, regardless of which client did the flush.  We satisfy these
> > > conditions in qemu when our block layer is backed by the local
> > > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > > that qemu itself is not buffering writes beyond flushes).  It is
> > > harder to state whether we satisfy these conditions for network-based
> > > protocols, so the safest course of action is to allow users to opt-in
> > > to advertising multi-conn.
> > 
> > Do you have an example of how this could be unsafe?
> 
> Nothing direct.  I tried to turn this on unconditionally in an earlier
> version, and we waffled about whether we could prove that network
> block backends (such as gluster) provide us the safety that the NBD
> spec demands:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html
> 
> > 
> > As I understand it, the NBD server has a single BlockBackend and
> > therefore is a single client for the backend, be it file-posix or any
> > network-based protocol. It doesn't really make a difference for the
> > storage from how many different NBD clients the requests are coming.
> > 
> > I would have expected that cache coherency of the protocol level driver
> > would only matter if you had two QEMU processes accessing the same file
> > concurrently.
> 
> Or a multi-pathed connection to network storage, where one QEMU
> process accesses the network device, but those accesses may
> round-robin which server they reach, and where any caching at an
> individual server may be inconsistent with what is seen on another
> server unless flushing is used to force the round-robin access to
> synchronize between the multi-path views.

I don't think this is a realistic scenario. It would mean that you
successfully write data to the storage, and when you then read the same
location, you get different data back. This would be inconsistent even
with a single client. So I'd call this broken storage that should be
replaced as soon as possible.

I could imagine problems of this kind with two separate connections to
the network storage, but here all the NBD clients share a single
BlockBackend, so for the storage they are a single connection.

> > In fact, I don't think we even need the flush restriction from the NBD
> > spec. All clients see the same state (that of the NBD server
> > BlockBackend) even without anyone issuing any flush. The flush is only
> > needed to make sure that cached data is written to the backing storage
> > when writeback caches are involved.
> > 
> > Please correct me if I'm misunderstanding something here.
> 
> Likewise me, if I'm being overly cautious.
> 
> I can certainly write a simpler v4 that just always advertises
> MULTI_CONN if we allow more than one client, without any knob to
> override it; it's just that it is harder to write a commit message
> justifying why I think it is safe to do so.

Having an explicit option doesn't hurt, but it's the reasoning in the
commit message that feels wrong to me.

We could consider changing "auto" to advertise MULTI_CONN even for
writable exports. There might still be a good reason not to do this by
default, though, because of the NBD clients. I'm quite sure that the
backend won't make any trouble, but client might if someone else is
writing to the same image (this is why we require an explicit
share-rw=on for guest devices in the same case).

Kevin
Eric Blake May 2, 2022, 9:12 p.m. UTC | #8
On Fri, Apr 29, 2022 at 02:49:35PM +0200, Kevin Wolf wrote:
...
> > Or a multi-pathed connection to network storage, where one QEMU
> > process accesses the network device, but those accesses may
> > round-robin which server they reach, and where any caching at an
> > individual server may be inconsistent with what is seen on another
> > server unless flushing is used to force the round-robin access to
> > synchronize between the multi-path views.
> 
> I don't think this is a realistic scenario. It would mean that you
> successfully write data to the storage, and when you then read the same
> location, you get different data back. This would be inconsistent even
> with a single client. So I'd call this broken storage that should be
> replaced as soon as possible.
> 
> I could imagine problems of this kind with two separate connections to
> the network storage, but here all the NBD clients share a single
> BlockBackend, so for the storage they are a single connection.

I like that chain of reasoning.

> 
> > > In fact, I don't think we even need the flush restriction from the NBD
> > > spec. All clients see the same state (that of the NBD server
> > > BlockBackend) even without anyone issuing any flush. The flush is only
> > > needed to make sure that cached data is written to the backing storage
> > > when writeback caches are involved.
> > > 
> > > Please correct me if I'm misunderstanding something here.
> > 
> > Likewise me, if I'm being overly cautious.
> > 
> > I can certainly write a simpler v4 that just always advertises
> > MULTI_CONN if we allow more than one client, without any knob to
> > override it; it's just that it is harder to write a commit message
> > justifying why I think it is safe to do so.
> 
> Having an explicit option doesn't hurt, but it's the reasoning in the
> commit message that feels wrong to me.
> 
> We could consider changing "auto" to advertise MULTI_CONN even for
> writable exports. There might still be a good reason not to do this by
> default, though, because of the NBD clients. I'm quite sure that the
> backend won't make any trouble, but client might if someone else is
> writing to the same image (this is why we require an explicit
> share-rw=on for guest devices in the same case).

If your worry is about a client trying to determine if writing to an
NBD server is going to race with some external process writing to the
direct image, I don't see how not advertising MULTI_CONN will make
things safer - the NBD client to qemu-nbd will still be going through
a single backend, and that race is present even if there is only one
NBD client.  The point of MULTI_CONN is informing the client that it
can open multiple sockets and see a consistent view across all of
them, and in your scenario of the server competing with some external
process over the underlying data file, that competition is not
controlled by how many NBD clients connect to the server, but by the
external process having access at the same time the server has access
through a single BlockBackend (and would be just as risky as if
MULTI_CONN were not advertised and the client limits itself to one NBD
connection).

If we can argue that our single BlockBackend point of access is safe
enough to default to advertising MULTI_CONN for writable connections
(when we support parallel clients), then exposing an OnOffAuto knob is
overkill.  I'm not even sure I can envision a case where needing to
not advertise the bit would matter to a client (clients are supposed
to ignore unknown feature bits).
Kevin Wolf May 3, 2022, 7:56 a.m. UTC | #9
Am 02.05.2022 um 23:12 hat Eric Blake geschrieben:
> On Fri, Apr 29, 2022 at 02:49:35PM +0200, Kevin Wolf wrote:
> ...
> > > Or a multi-pathed connection to network storage, where one QEMU
> > > process accesses the network device, but those accesses may
> > > round-robin which server they reach, and where any caching at an
> > > individual server may be inconsistent with what is seen on another
> > > server unless flushing is used to force the round-robin access to
> > > synchronize between the multi-path views.
> > 
> > I don't think this is a realistic scenario. It would mean that you
> > successfully write data to the storage, and when you then read the same
> > location, you get different data back. This would be inconsistent even
> > with a single client. So I'd call this broken storage that should be
> > replaced as soon as possible.
> > 
> > I could imagine problems of this kind with two separate connections to
> > the network storage, but here all the NBD clients share a single
> > BlockBackend, so for the storage they are a single connection.
> 
> I like that chain of reasoning.
> 
> > 
> > > > In fact, I don't think we even need the flush restriction from the NBD
> > > > spec. All clients see the same state (that of the NBD server
> > > > BlockBackend) even without anyone issuing any flush. The flush is only
> > > > needed to make sure that cached data is written to the backing storage
> > > > when writeback caches are involved.
> > > > 
> > > > Please correct me if I'm misunderstanding something here.
> > > 
> > > Likewise me, if I'm being overly cautious.
> > > 
> > > I can certainly write a simpler v4 that just always advertises
> > > MULTI_CONN if we allow more than one client, without any knob to
> > > override it; it's just that it is harder to write a commit message
> > > justifying why I think it is safe to do so.
> > 
> > Having an explicit option doesn't hurt, but it's the reasoning in the
> > commit message that feels wrong to me.
> > 
> > We could consider changing "auto" to advertise MULTI_CONN even for
> > writable exports. There might still be a good reason not to do this by
> > default, though, because of the NBD clients. I'm quite sure that the
> > backend won't make any trouble, but client might if someone else is
> > writing to the same image (this is why we require an explicit
> > share-rw=on for guest devices in the same case).
> 
> If your worry is about a client trying to determine if writing to an
> NBD server is going to race with some external process writing to the
> direct image

No, I'm not really thinking of a race, but just the fact that someone
else could write to the image at all. Guest OSes don't generally expect
the disk magically changing under their feet (unless you use cluster
filesystems etc.), so the safe default is not allowing multiple writers
to the same image.

> I don't see how not advertising MULTI_CONN will make things safer -
> the NBD client to qemu-nbd will still be going through a single
> backend, and that race is present even if there is only one NBD
> client.  The point of MULTI_CONN is informing the client that it can
> open multiple sockets and see a consistent view across all of them,
> and in your scenario of the server competing with some external
> process over the underlying data file, that competition is not
> controlled by how many NBD clients connect to the server, but by the
> external process having access at the same time the server has access
> through a single BlockBackend (and would be just as risky as if
> MULTI_CONN were not advertised and the client limits itself to one NBD
> connection).

Oh, right, that makes sense. So the other connections should really be
considered the same client in this case, not some random other writer.

Can we somehow tell whether a new connection is just the same client
opening an additional connection or whether someone else is connecting?

But maybe it's not worth spending too much time with this for a check
that would only try to catch configurations that the user set up in a
way they didn't actually intend.

> If we can argue that our single BlockBackend point of access is safe
> enough to default to advertising MULTI_CONN for writable connections
> (when we support parallel clients), then exposing an OnOffAuto knob is
> overkill.  I'm not even sure I can envision a case where needing to
> not advertise the bit would matter to a client (clients are supposed
> to ignore unknown feature bits).

Yes, my concern seems unrelated to MULTI_CONN after all.

Kevin
diff mbox series

Patch

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index bdb0f2a41aca..6c99070b99c8 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -68,3 +68,4 @@  NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 4c950f61998e..229414e3cdc7 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -139,8 +139,18 @@  driver options if :option:`--image-opts` is specified.
 .. option:: -e, --shared=NUM

   Allow up to *NUM* clients to share the device (default
-  ``1``), 0 for unlimited. Safe for readers, but for now,
-  consistency is not guaranteed between multiple writers.
+  ``1``), 0 for unlimited.
+
+.. option:: -m, --multi-conn=MODE
+
+  When multiple clients are allowed at once (via :option:`--shared`),
+  controls whether the NBD protocol MULTI_CONN bit is advertised to
+  clients.  This defaults to ``auto``, which advertises the bit for
+  readonly connections (:option:`--read-only`) but not for writeable
+  connections; it can also be set to ``on`` or ``off`` to override the
+  default.  Advertising this bit informs clients whether a flush from
+  one client has guaranteed consistency to subsequent access from a
+  parallel client.

 .. option:: -t, --persistent

diff --git a/qapi/block-export.json b/qapi/block-export.json
index f183522d0d2c..d9d38620156f 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -21,7 +21,9 @@ 
 #             recreated on the fly while the NBD server is active.
 #             If missing, it will default to denying access (since 4.0).
 # @max-connections: The maximum number of connections to allow at the same
-#                   time, 0 for unlimited. (since 5.2; default: 0)
+#                   time, 0 for unlimited. Setting this to 1 also stops
+#                   the server from advertising multiple client support
+#                   (since 5.2; default: 0)
 #
 # Since: 4.2
 ##
@@ -50,7 +52,9 @@ 
 #             recreated on the fly while the NBD server is active.
 #             If missing, it will default to denying access (since 4.0).
 # @max-connections: The maximum number of connections to allow at the same
-#                   time, 0 for unlimited. (since 5.2; default: 0)
+#                   time, 0 for unlimited. Setting this to 1 also stops
+#                   the server from advertising multiple client support
+#                   (since 5.2; default: 0).
 #
 # Returns: error if the server is already running.
 #
@@ -95,11 +99,20 @@ 
 #                    the metadata context name "qemu:allocation-depth" to
 #                    inspect allocation details. (since 5.2)
 #
+# @multi-conn: Controls whether multiple client support is advertised, if the
+#              server's max-connections is not 1. "auto" advertises support
+#              for read-only clients, but not for writeable (although this
+#              may be improved in the future for writeable devices that are
+#              known to be cache-consistent, such as the raw file driver),
+#              while "on" and "off" explicitly set the bit (default auto,
+#              since 7.0).
+#
 # Since: 5.2
 ##
 { 'struct': 'BlockExportOptionsNbd',
   'base': 'BlockExportOptionsNbdBase',
-  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } }
+  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool',
+            '*multi-conn': 'OnOffAuto' } }

 ##
 # @BlockExportOptionsVhostUserBlk:
diff --git a/include/block/nbd.h b/include/block/nbd.h
index c5a29ce1c61b..c74b7a9d2e6e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@ 
 /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2022 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -346,6 +346,7 @@  void nbd_client_put(NBDClient *client);

 void nbd_server_is_qemu_nbd(int max_connections);
 bool nbd_server_is_running(void);
+int nbd_server_max_connections(void);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                       const char *tls_authz, uint32_t max_connections,
                       Error **errp);
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index add41a23af43..15ad7b8dbfd5 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -44,6 +44,11 @@  bool nbd_server_is_running(void)
     return nbd_server || qemu_nbd_connections >= 0;
 }

+int nbd_server_max_connections(void)
+{
+    return nbd_server ? nbd_server->max_connections : -1;
+}
+
 static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
 {
     nbd_client_put(client);
diff --git a/nbd/server.c b/nbd/server.c
index 5da884c2fc35..2cfdd56cda19 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@ 
 /*
- *  Copyright (C) 2016-2021 Red Hat, Inc.
+ *  Copyright (C) 2016-2022 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -1642,7 +1642,7 @@  static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
     int64_t size;
     uint64_t perm, shared_perm;
     bool readonly = !exp_args->writable;
-    bool shared = !exp_args->writable;
+    bool multi_conn;
     strList *bitmaps;
     size_t i;
     int ret;
@@ -1680,6 +1680,23 @@  static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
         return size;
     }

+    /*
+     * Determine whether to advertise multi-conn.  Default is auto,
+     * which resolves to on for read-only and off for writable.  But
+     * if the server has max-connections 1, that forces the flag off.
+     */
+    if (!arg->has_multi_conn) {
+        arg->multi_conn = ON_OFF_AUTO_AUTO;
+    }
+    if (nbd_server_max_connections() == 1) {
+        arg->multi_conn = ON_OFF_AUTO_OFF;
+    }
+    if (arg->multi_conn == ON_OFF_AUTO_AUTO) {
+        multi_conn = readonly;
+    } else {
+        multi_conn = arg->multi_conn == ON_OFF_AUTO_ON;
+    }
+
     /* Don't allow resize while the NBD server is running, otherwise we don't
      * care what happens with the node. */
     blk_get_perm(blk, &perm, &shared_perm);
@@ -1693,11 +1710,11 @@  static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
     exp->description = g_strdup(arg->description);
     exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH |
                      NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE);
+    if (multi_conn) {
+        exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
+    }
     if (readonly) {
         exp->nbdflags |= NBD_FLAG_READ_ONLY;
-        if (shared) {
-            exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
-        }
     } else {
         exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
                           NBD_FLAG_SEND_FAST_ZERO);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 8c25ae93df83..ddeb2fcf7f40 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -70,6 +70,7 @@ 
 #define QEMU_NBD_OPT_PID_FILE      265
 #define QEMU_NBD_OPT_SELINUX_LABEL 266
 #define QEMU_NBD_OPT_TLSHOSTNAME   267
+#define QEMU_NBD_OPT_MULTI_CONN    268

 #define MBR_SIZE 512

@@ -79,6 +80,7 @@  static SocketAddress *saddr;
 static int persistent = 0;
 static enum { RUNNING, TERMINATE, TERMINATED } state;
 static int shared = 1;
+static OnOffAuto multi_conn = ON_OFF_AUTO_AUTO;
 static int nb_fds;
 static QIONetListener *server;
 static QCryptoTLSCreds *tlscreds;
@@ -100,6 +102,8 @@  static void usage(const char *name)
 "  -k, --socket=PATH         path to the unix socket\n"
 "                            (default '"SOCKET_PATH"')\n"
 "  -e, --shared=NUM          device can be shared by NUM clients (default '1')\n"
+"  -m, --multi-conn=MODE     control multi-conn advertisement with -e, MODE is\n"
+"                            'on', 'off', or 'auto' (default based on -r)\n"
 "  -t, --persistent          don't exit on the last connection\n"
 "  -v, --verbose             display extra debugging information\n"
 "  -x, --export-name=NAME    expose export by name (default is empty string)\n"
@@ -513,7 +517,7 @@  int main(int argc, char **argv)
     char *device = NULL;
     QemuOpts *sn_opts = NULL;
     const char *sn_id_or_name = NULL;
-    const char *sopt = "hVb:o:p:rsnc:dvk:e:f:tl:x:T:D:AB:L";
+    const char *sopt = "hVb:o:p:rsnc:dvk:e:m:f:tl:x:T:D:AB:L";
     struct option lopt[] = {
         { "help", no_argument, NULL, 'h' },
         { "version", no_argument, NULL, 'V' },
@@ -536,6 +540,7 @@  int main(int argc, char **argv)
         { "detect-zeroes", required_argument, NULL,
           QEMU_NBD_OPT_DETECT_ZEROES },
         { "shared", required_argument, NULL, 'e' },
+        { "multi-conn", required_argument, NULL, 'm' },
         { "format", required_argument, NULL, 'f' },
         { "persistent", no_argument, NULL, 't' },
         { "verbose", no_argument, NULL, 'v' },
@@ -709,6 +714,17 @@  int main(int argc, char **argv)
                 exit(EXIT_FAILURE);
             }
             break;
+        case 'm':
+        {
+            Error *err = NULL;
+            multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg,
+                                         ON_OFF_AUTO_AUTO, &err);
+            if (err) {
+                error_report_err(err);
+                exit(EXIT_FAILURE);
+            }
+            break;
+        }
         case 'f':
             fmt = optarg;
             break;
@@ -1107,6 +1123,8 @@  int main(int argc, char **argv)
             .bitmaps              = bitmaps,
             .has_allocation_depth = alloc_depth,
             .allocation_depth     = alloc_depth,
+            .has_multi_conn       = true,
+            .multi_conn           = multi_conn,
         },
     };
     blk_exp_add(export_opts, &error_fatal);
diff --git a/MAINTAINERS b/MAINTAINERS
index f2e9ce1da2ac..678c34e8077f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3362,6 +3362,7 @@  F: qemu-nbd.*
 F: blockdev-nbd.c
 F: docs/interop/nbd.txt
 F: docs/tools/qemu-nbd.rst
+F: tests/qemu-iotests/tests/*nbd*
 T: git https://repo.or.cz/qemu/ericb.git nbd
 T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd

diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn
new file mode 100755
index 000000000000..7d1179b33b05
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-multiconn
@@ -0,0 +1,157 @@ 
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for NBD multi-conn advertisement
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import os
+from contextlib import contextmanager
+import iotests
+from iotests import qemu_img_create, qemu_io_silent
+
+
+disk = os.path.join(iotests.test_dir, 'disk')
+size = '4M'
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd_sock')
+nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock
+
+
+@contextmanager
+def open_nbd(export_name):
+    h = nbd.NBD()
+    try:
+        h.connect_uri(nbd_uri.format(export_name))
+        yield h
+    finally:
+        h.shutdown()
+
+class TestNbdMulticonn(iotests.QMPTestCase):
+    def setUp(self):
+        assert qemu_img_create('-f', iotests.imgfmt, disk, size) == 0
+        assert qemu_io_silent('-c', 'w -P 1 0 2M',
+                              '-c', 'w -P 2 2M 2M', disk) == 0
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+        result = self.vm.qmp('blockdev-add', {
+            'driver': 'qcow2',
+            'node-name': 'n',
+            'file': {'driver': 'file', 'filename': disk}
+        })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(disk)
+        try:
+            os.remove(nbd_sock)
+        except OSError:
+            pass
+
+    @contextmanager
+    def run_server(self, max_connections=None):
+        args = {
+            'addr': {
+                'type': 'unix',
+                'data': {'path': nbd_sock}
+            }
+        }
+        if max_connections is not None:
+            args['max-connections'] = max_connections
+
+        result = self.vm.qmp('nbd-server-start', args)
+        self.assert_qmp(result, 'return', {})
+        yield
+
+        result = self.vm.qmp('nbd-server-stop')
+        self.assert_qmp(result, 'return', {})
+
+    def add_export(self, name, writable=None, multi_conn=None):
+        args = {
+            'type': 'nbd',
+            'id': name,
+            'node-name': 'n',
+            'name': name,
+        }
+        if writable is not None:
+            args['writable'] = writable
+        if multi_conn is not None:
+            args['multi-conn'] = multi_conn
+
+        result = self.vm.qmp('block-export-add', args)
+        self.assert_qmp(result, 'return', {})
+
+    def test_default_settings(self):
+        with self.run_server():
+            self.add_export('r')
+            self.add_export('w', writable=True)
+            with open_nbd('r') as h:
+                self.assertTrue(h.can_multi_conn())
+            with open_nbd('w') as h:
+                self.assertFalse(h.can_multi_conn())
+
+    def test_multiconn_option(self):
+        with self.run_server():
+            self.add_export('r', multi_conn='off')
+            self.add_export('w', writable=True, multi_conn='on')
+            with open_nbd('r') as h:
+                self.assertFalse(h.can_multi_conn())
+            with open_nbd('w') as h:
+                self.assertTrue(h.can_multi_conn())
+
+    def test_limited_connections(self):
+        with self.run_server(max_connections=1):
+            self.add_export('r', multi_conn='on')
+            self.add_export('w', writable=True, multi_conn='on')
+            with open_nbd('r') as h:
+                self.assertFalse(h.can_multi_conn())
+            with open_nbd('w') as h:
+                self.assertFalse(h.can_multi_conn())
+
+    def test_parallel_writes(self):
+        with self.run_server():
+            self.add_export('w', writable=True, multi_conn='on')
+
+            clients = [nbd.NBD() for _ in range(3)]
+            for c in clients:
+                c.connect_uri(nbd_uri.format('w'))
+                assert c.can_multi_conn()
+
+            initial_data = clients[0].pread(1024 * 1024, 0)
+            self.assertEqual(initial_data, b'\x01' * 1024 * 1024)
+
+            updated_data = b'\x03' * 1024 * 1024
+            clients[1].pwrite(updated_data, 0)
+            clients[2].flush()
+            current_data = clients[0].pread(1024 * 1024, 0)
+
+            self.assertEqual(updated_data, current_data)
+
+            for i in range(3):
+                clients[i].shutdown()
+
+
+if __name__ == '__main__':
+    try:
+        # Easier to use libnbd than to try and set up parallel
+        # 'qemu-nbd --list' or 'qemu-io' processes, but not all systems
+        # have libnbd installed.
+        import nbd  # type: ignore
+
+        iotests.main(supported_fmts=['qcow2'])
+    except ImportError:
+        iotests.notrun('libnbd not installed')
diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out b/tests/qemu-iotests/tests/nbd-multiconn.out
new file mode 100644
index 000000000000..89968f35d783
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-multiconn.out
@@ -0,0 +1,5 @@ 
+....
+----------------------------------------------------------------------
+Ran 4 tests
+
+OK