diff mbox series

nbd: Advertise multi-conn for shared read-only connections

Message ID 20190815185024.7010-1-eblake@redhat.com (mailing list archive)
State New, archived
Headers show
Series nbd: Advertise multi-conn for shared read-only connections | expand

Commit Message

Eric Blake Aug. 15, 2019, 6:50 p.m. UTC
The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
advertised when the server promises cache consistency between
simultaneous clients (basically, rules that determine what FUA and
flush from one client are able to guarantee for reads from another
client).  When we don't permit simultaneous clients (such as qemu-nbd
without -e), the bit makes no sense; and for writable images, we
probably have a lot more work before we can declare that actions from
one client are cache-consistent with actions from another.  But for
read-only images, where flush isn't changing any data, we might as
well advertise multi-conn support.  What's more, advertisement of the
bit makes it easier for clients to determine if 'qemu-nbd -e' was in
use, where a second connection will succeed rather than hang until the
first client goes away.

This patch affects qemu as server in advertising the bit.  We may want
to consider patches to qemu as client to attempt parallel connections
for higher throughput by spreading the load over those connections
when a server advertises multi-conn, but for now sticking to one
connection per nbd:// BDS is okay.

See also: https://bugzilla.redhat.com/1708300
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt | 1 +
 include/block/nbd.h  | 2 +-
 blockdev-nbd.c       | 2 +-
 nbd/server.c         | 4 +++-
 qemu-nbd.c           | 2 +-
 5 files changed, 7 insertions(+), 4 deletions(-)

Comments

Richard W.M. Jones Aug. 15, 2019, 8 p.m. UTC | #1
On Thu, Aug 15, 2019 at 01:50:24PM -0500, Eric Blake wrote:
> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
> advertised when the server promises cache consistency between
> simultaneous clients (basically, rules that determine what FUA and
> flush from one client are able to guarantee for reads from another
> client).  When we don't permit simultaneous clients (such as qemu-nbd
> without -e), the bit makes no sense; and for writable images, we
> probably have a lot more work before we can declare that actions from
> one client are cache-consistent with actions from another.  But for
> read-only images, where flush isn't changing any data, we might as
> well advertise multi-conn support.  What's more, advertisement of the
> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
> use, where a second connection will succeed rather than hang until the
> first client goes away.
> 
> This patch affects qemu as server in advertising the bit.  We may want
> to consider patches to qemu as client to attempt parallel connections
> for higher throughput by spreading the load over those connections
> when a server advertises multi-conn, but for now sticking to one
> connection per nbd:// BDS is okay.
> 
> See also: https://bugzilla.redhat.com/1708300
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  docs/interop/nbd.txt | 1 +
>  include/block/nbd.h  | 2 +-
>  blockdev-nbd.c       | 2 +-
>  nbd/server.c         | 4 +++-
>  qemu-nbd.c           | 2 +-
>  5 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index fc64473e02b2..6dfec7f47647 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -53,3 +53,4 @@ the operation of that feature.
>  * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>  * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>  NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 7b36d672f046..991fd52a5134 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
> 
>  NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                            uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                            void (*close)(NBDExport *), bool writethrough,
>                            BlockBackend *on_eject_blk, Error **errp);
>  void nbd_export_close(NBDExport *exp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 66eebab31875..e5d228771292 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>      }
> 
>      exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>                           NULL, false, on_eject_blk, errp);
>      if (!exp) {
>          return;
> diff --git a/nbd/server.c b/nbd/server.c
> index a2cf085f7635..a602d85070ff 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> 
>  NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                            uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                            void (*close)(NBDExport *), bool writethrough,
>                            BlockBackend *on_eject_blk, Error **errp)
>  {
> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>      perm = BLK_PERM_CONSISTENT_READ;
>      if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>          perm |= BLK_PERM_WRITE;
> +    } else if (shared) {
> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>      }
>      blk = blk_new(bdrv_get_aio_context(bs), perm,
>                    BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 049645491dab..55f5ceaf5c92 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
>      }
> 
>      export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> -                            export_description, bitmap, nbdflags,
> +                            export_description, bitmap, nbdflags, shared > 1,
>                              nbd_export_closed, writethrough, NULL,
>                              &error_fatal);
> 

Multi-conn is a no-brainer.  For nbdkit it more than doubled
throughput:

https://github.com/libguestfs/nbdkit/commit/910a220aa454b410c44731e8d965e92244b536f5

Those results are for loopback mounts of a file located on /dev/shm
and served by nbdkit file plugin, and I would imagine that without the
loop mounting / filesystem overhead the results could be even better.

For read-only connections where the server can handle more than one
connection (-e) it ought to be safe.  You have to tell the client how
many connections the server may accept, but that's a limitation of the
current protocol.

So yes ACK, patch makes sense.

Worth noting that fio has NBD support so you can test NBD servers
directly these days:

https://github.com/axboe/fio/commit/d643a1e29d31bf974a613866819dde241c928b6d
https://github.com/axboe/fio/blob/master/examples/nbd.fio#L5

Rich.
John Snow Aug. 15, 2019, 9:45 p.m. UTC | #2
On 8/15/19 2:50 PM, Eric Blake wrote:
> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
> advertised when the server promises cache consistency between
> simultaneous clients (basically, rules that determine what FUA and
> flush from one client are able to guarantee for reads from another
> client).  When we don't permit simultaneous clients (such as qemu-nbd
> without -e), the bit makes no sense; and for writable images, we
> probably have a lot more work before we can declare that actions from
> one client are cache-consistent with actions from another.  But for
> read-only images, where flush isn't changing any data, we might as
> well advertise multi-conn support.  What's more, advertisement of the
> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
> use, where a second connection will succeed rather than hang until the
> first client goes away.
> 
> This patch affects qemu as server in advertising the bit.  We may want
> to consider patches to qemu as client to attempt parallel connections
> for higher throughput by spreading the load over those connections
> when a server advertises multi-conn, but for now sticking to one
> connection per nbd:// BDS is okay.
> 
> See also: https://bugzilla.redhat.com/1708300
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  docs/interop/nbd.txt | 1 +
>  include/block/nbd.h  | 2 +-
>  blockdev-nbd.c       | 2 +-
>  nbd/server.c         | 4 +++-
>  qemu-nbd.c           | 2 +-
>  5 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index fc64473e02b2..6dfec7f47647 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -53,3 +53,4 @@ the operation of that feature.
>  * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>  * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>  NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 7b36d672f046..991fd52a5134 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
> 
>  NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                            uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                            void (*close)(NBDExport *), bool writethrough,
>                            BlockBackend *on_eject_blk, Error **errp);
>  void nbd_export_close(NBDExport *exp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 66eebab31875..e5d228771292 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>      }
> 
>      exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>                           NULL, false, on_eject_blk, errp);

Why is it okay to force the share bit on regardless of the value of
'writable' ?

>      if (!exp) {
>          return;
> diff --git a/nbd/server.c b/nbd/server.c
> index a2cf085f7635..a602d85070ff 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> 
>  NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                            uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                            void (*close)(NBDExport *), bool writethrough,
>                            BlockBackend *on_eject_blk, Error **errp)
>  {
> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>      perm = BLK_PERM_CONSISTENT_READ;
>      if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>          perm |= BLK_PERM_WRITE;
> +    } else if (shared) {
> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>      }
>      blk = blk_new(bdrv_get_aio_context(bs), perm,
>                    BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 049645491dab..55f5ceaf5c92 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
>      }
> 
>      export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> -                            export_description, bitmap, nbdflags,
> +                            export_description, bitmap, nbdflags, shared > 1,
>                              nbd_export_closed, writethrough, NULL,
>                              &error_fatal);
>
Eric Blake Aug. 15, 2019, 9:54 p.m. UTC | #3
On 8/15/19 4:45 PM, John Snow wrote:
> 
> 
> On 8/15/19 2:50 PM, Eric Blake wrote:
>> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
>> advertised when the server promises cache consistency between
>> simultaneous clients (basically, rules that determine what FUA and
>> flush from one client are able to guarantee for reads from another
>> client).  When we don't permit simultaneous clients (such as qemu-nbd
>> without -e), the bit makes no sense; and for writable images, we
>> probably have a lot more work before we can declare that actions from
>> one client are cache-consistent with actions from another.  But for
>> read-only images, where flush isn't changing any data, we might as
>> well advertise multi-conn support.  What's more, advertisement of the
>> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
>> use, where a second connection will succeed rather than hang until the
>> first client goes away.
>>
>> This patch affects qemu as server in advertising the bit.  We may want
>> to consider patches to qemu as client to attempt parallel connections
>> for higher throughput by spreading the load over those connections
>> when a server advertises multi-conn, but for now sticking to one
>> connection per nbd:// BDS is okay.
>>

>> +++ b/blockdev-nbd.c
>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>>      }
>>
>>      exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>>                           NULL, false, on_eject_blk, errp);
> 
> Why is it okay to force the share bit on regardless of the value of
> 'writable' ?

Well, it's probably not, except that...


>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>      perm = BLK_PERM_CONSISTENT_READ;
>>      if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>          perm |= BLK_PERM_WRITE;
>> +    } else if (shared) {
>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>>      }

requesting shared=true has no effect for a writable export.

I can tweak it for less confusion, though.
John Snow Aug. 15, 2019, 10:02 p.m. UTC | #4
On 8/15/19 5:54 PM, Eric Blake wrote:
> On 8/15/19 4:45 PM, John Snow wrote:
>>
>>
>> On 8/15/19 2:50 PM, Eric Blake wrote:
>>> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
>>> advertised when the server promises cache consistency between
>>> simultaneous clients (basically, rules that determine what FUA and
>>> flush from one client are able to guarantee for reads from another
>>> client).  When we don't permit simultaneous clients (such as qemu-nbd
>>> without -e), the bit makes no sense; and for writable images, we
>>> probably have a lot more work before we can declare that actions from
>>> one client are cache-consistent with actions from another.  But for
>>> read-only images, where flush isn't changing any data, we might as
>>> well advertise multi-conn support.  What's more, advertisement of the
>>> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
>>> use, where a second connection will succeed rather than hang until the
>>> first client goes away.
>>>
>>> This patch affects qemu as server in advertising the bit.  We may want
>>> to consider patches to qemu as client to attempt parallel connections
>>> for higher throughput by spreading the load over those connections
>>> when a server advertises multi-conn, but for now sticking to one
>>> connection per nbd:// BDS is okay.
>>>
> 
>>> +++ b/blockdev-nbd.c
>>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>>>      }
>>>
>>>      exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
>>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>>>                           NULL, false, on_eject_blk, errp);
>>
>> Why is it okay to force the share bit on regardless of the value of
>> 'writable' ?
> 
> Well, it's probably not, except that...
> 
> 
>>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>>      perm = BLK_PERM_CONSISTENT_READ;
>>>      if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>>          perm |= BLK_PERM_WRITE;
>>> +    } else if (shared) {
>>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>>>      }
> 
> requesting shared=true has no effect for a writable export.
> 
> I can tweak it for less confusion, though.
> 

"Yes John, when it's an else-if it really does matter what specific
condition it's following."

(Ah, there it is.)

Yeah, I think if you have hopes to support this flag in the future for
writable exports, I think it might be nicer to reject this bit for RW;
and adjust the caller to only request it conditionally.

Or not. I guess we don't have to maintain backwards compatibility for
internal API like that, so ... dealer's choice:

Reviewed-by: John Snow <jsnow@redhat.com>
no-reply@patchew.org Aug. 15, 2019, 10:36 p.m. UTC | #5
Patchew URL: https://patchew.org/QEMU/20190815185024.7010-1-eblake@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC      mips64-softmmu/trace/control-target.o
  CC      mips64-softmmu/trace/generated-helpers.o
  LINK    mips64-softmmu/qemu-system-mips64
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:209: qemu-system-mips64] Error 1
make: *** [Makefile:472: mips64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....


The full log is available at
http://patchew.org/logs/20190815185024.7010-1-eblake@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Vladimir Sementsov-Ogievskiy Aug. 16, 2019, 10:23 a.m. UTC | #6
15.08.2019 21:50, Eric Blake wrote:
> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
> advertised when the server promises cache consistency between
> simultaneous clients (basically, rules that determine what FUA and
> flush from one client are able to guarantee for reads from another
> client).  When we don't permit simultaneous clients (such as qemu-nbd
> without -e), the bit makes no sense; and for writable images, we
> probably have a lot more work before we can declare that actions from
> one client are cache-consistent with actions from another.  But for
> read-only images, where flush isn't changing any data, we might as
> well advertise multi-conn support.  What's more, advertisement of the
> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
> use, where a second connection will succeed rather than hang until the
> first client goes away.
> 
> This patch affects qemu as server in advertising the bit.  We may want
> to consider patches to qemu as client to attempt parallel connections
> for higher throughput by spreading the load over those connections
> when a server advertises multi-conn, but for now sticking to one
> connection per nbd:// BDS is okay.
> 
> See also: https://bugzilla.redhat.com/1708300
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   docs/interop/nbd.txt | 1 +
>   include/block/nbd.h  | 2 +-
>   blockdev-nbd.c       | 2 +-
>   nbd/server.c         | 4 +++-
>   qemu-nbd.c           | 2 +-
>   5 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index fc64473e02b2..6dfec7f47647 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -53,3 +53,4 @@ the operation of that feature.
>   * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>   * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>   NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 7b36d672f046..991fd52a5134 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
> 
>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                             uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                             void (*close)(NBDExport *), bool writethrough,
>                             BlockBackend *on_eject_blk, Error **errp);
>   void nbd_export_close(NBDExport *exp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 66eebab31875..e5d228771292 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>       }
> 
>       exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,

s/true/!writable ?

>                            NULL, false, on_eject_blk, errp);
>       if (!exp) {
>           return;
> diff --git a/nbd/server.c b/nbd/server.c
> index a2cf085f7635..a602d85070ff 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> 
>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>                             uint64_t size, const char *name, const char *desc,
> -                          const char *bitmap, uint16_t nbdflags,
> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>                             void (*close)(NBDExport *), bool writethrough,
>                             BlockBackend *on_eject_blk, Error **errp)
>   {
> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>       perm = BLK_PERM_CONSISTENT_READ;
>       if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>           perm |= BLK_PERM_WRITE;
> +    } else if (shared) {
> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>       }
>       blk = blk_new(bdrv_get_aio_context(bs), perm,
>                     BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 049645491dab..55f5ceaf5c92 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
>       }
> 
>       export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> -                            export_description, bitmap, nbdflags,
> +                            export_description, bitmap, nbdflags, shared > 1,
>                               nbd_export_closed, writethrough, NULL,
>                               &error_fatal);
>
Vladimir Sementsov-Ogievskiy Aug. 16, 2019, 10:47 a.m. UTC | #7
16.08.2019 13:23, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2019 21:50, Eric Blake wrote:
>> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
>> advertised when the server promises cache consistency between
>> simultaneous clients (basically, rules that determine what FUA and
>> flush from one client are able to guarantee for reads from another
>> client).  When we don't permit simultaneous clients (such as qemu-nbd
>> without -e), the bit makes no sense; and for writable images, we
>> probably have a lot more work before we can declare that actions from
>> one client are cache-consistent with actions from another.  But for
>> read-only images, where flush isn't changing any data, we might as
>> well advertise multi-conn support.  What's more, advertisement of the
>> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
>> use, where a second connection will succeed rather than hang until the
>> first client goes away.
>>
>> This patch affects qemu as server in advertising the bit.  We may want
>> to consider patches to qemu as client to attempt parallel connections
>> for higher throughput by spreading the load over those connections
>> when a server advertises multi-conn, but for now sticking to one
>> connection per nbd:// BDS is okay.
>>
>> See also: https://bugzilla.redhat.com/1708300
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>   docs/interop/nbd.txt | 1 +
>>   include/block/nbd.h  | 2 +-
>>   blockdev-nbd.c       | 2 +-
>>   nbd/server.c         | 4 +++-
>>   qemu-nbd.c           | 2 +-
>>   5 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
>> index fc64473e02b2..6dfec7f47647 100644
>> --- a/docs/interop/nbd.txt
>> +++ b/docs/interop/nbd.txt
>> @@ -53,3 +53,4 @@ the operation of that feature.
>>   * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>>   * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>>   NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 7b36d672f046..991fd52a5134 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
>>
>>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>                             uint64_t size, const char *name, const char *desc,
>> -                          const char *bitmap, uint16_t nbdflags,
>> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>>                             void (*close)(NBDExport *), bool writethrough,
>>                             BlockBackend *on_eject_blk, Error **errp);
>>   void nbd_export_close(NBDExport *exp);
>> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
>> index 66eebab31875..e5d228771292 100644
>> --- a/blockdev-nbd.c
>> +++ b/blockdev-nbd.c
>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>>       }
>>
>>       exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
> 
> s/true/!writable ?

Oh, I see, John already noticed this, it's checked in nbd_export_new anyway..

> 
>>                            NULL, false, on_eject_blk, errp);
>>       if (!exp) {
>>           return;
>> diff --git a/nbd/server.c b/nbd/server.c
>> index a2cf085f7635..a602d85070ff 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
>>
>>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>                             uint64_t size, const char *name, const char *desc,
>> -                          const char *bitmap, uint16_t nbdflags,
>> +                          const char *bitmap, uint16_t nbdflags, bool shared,
>>                             void (*close)(NBDExport *), bool writethrough,
>>                             BlockBackend *on_eject_blk, Error **errp)
>>   {
>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>       perm = BLK_PERM_CONSISTENT_READ;
>>       if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>           perm |= BLK_PERM_WRITE;
>> +    } else if (shared) {
>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;

For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why
to add a separate boolean to pass one of nbdflags flags?

Also, for qemu-nbd, shouldn't we allow -e only together with -r ?

>>       }
>>       blk = blk_new(bdrv_get_aio_context(bs), perm,
>>                     BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 049645491dab..55f5ceaf5c92 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
>>       }
>>
>>       export = nbd_export_new(bs, dev_offset, fd_size, export_name,
>> -                            export_description, bitmap, nbdflags,
>> +                            export_description, bitmap, nbdflags, shared > 1,
>>                               nbd_export_closed, writethrough, NULL,
>>                               &error_fatal);
>>
> 
>
Eric Blake Aug. 17, 2019, 2:30 p.m. UTC | #8
On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> +++ b/blockdev-nbd.c
>>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>>>       }
>>>
>>>       exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
>>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>>
>> s/true/!writable ?
> 
> Oh, I see, John already noticed this, it's checked in nbd_export_new anyway..

Still, since two reviewers have caught it, I'm fixing it :)


>>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>>       perm = BLK_PERM_CONSISTENT_READ;
>>>       if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>>           perm |= BLK_PERM_WRITE;
>>> +    } else if (shared) {
>>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
> 
> For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why
> to add a separate boolean to pass one of nbdflags flags?

Because I want to get rid of the nbdflags in my next patch.

> 
> Also, for qemu-nbd, shouldn't we allow -e only together with -r ?

I'm reluctant to; it might break whatever existing user is okay exposing
it (although such users are questionable, so maybe we can argue they
were already broken).  Maybe it's time to start a deprecation cycle?
Nir Soffer Aug. 18, 2019, 1:31 a.m. UTC | #9
On Sat, Aug 17, 2019 at 5:30 PM Eric Blake <eblake@redhat.com> wrote:

> On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> >>> +++ b/blockdev-nbd.c
> >>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool
> has_name, const char *name,
> >>>       }
> >>>
> >>>       exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> >>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
> >>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
> >>
> >> s/true/!writable ?
> >
> > Oh, I see, John already noticed this, it's checked in nbd_export_new
> anyway..
>
> Still, since two reviewers have caught it, I'm fixing it :)
>
>
> >>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
> uint64_t dev_offset,
> >>>       perm = BLK_PERM_CONSISTENT_READ;
> >>>       if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
> >>>           perm |= BLK_PERM_WRITE;
> >>> +    } else if (shared) {
> >>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
> >
> > For me it looks a bit strange: we already have nbdflags parameter for
> nbd_export_new(), why
> > to add a separate boolean to pass one of nbdflags flags?
>
> Because I want to get rid of the nbdflags in my next patch.
>
> >
> > Also, for qemu-nbd, shouldn't we allow -e only together with -r ?
>
> I'm reluctant to; it might break whatever existing user is okay exposing
> it (although such users are questionable, so maybe we can argue they
> were already broken).  Maybe it's time to start a deprecation cycle?
>

man qemu-nbd (on Centos 7.6) says:

       -e, --shared=num
           Allow up to num clients to share the device (default 1)

I see that in qemu-img 4.1 there is a note about consistency with writers:

       -e, --shared=num
           Allow up to num clients to share the device (default 1). Safe
for readers, but for now, consistency is not guaranteed between multiple
writers.
But it is not clear what are the consistency guarantees.

Supporting multiple writers is important. oVirt is giving the user a URL
(since 4.3), and the user
can use multiple connections using the same URL, each having a connection
to the same qemu-nbd
socket. I know that some backup vendors tried to use multiple connections
to speed up backups, and
they may try to do this also for restore.

An interesting use case would be using multiple connections on client side
to write in parallel to
same image, when every client is writing different ranges.

Do we have real issue in qemu-nbd serving multiple clients writing to
different parts of
the same image?

Nir
Eric Blake Aug. 19, 2019, 6:04 p.m. UTC | #10
On 8/17/19 8:31 PM, Nir Soffer wrote:
>>> Also, for qemu-nbd, shouldn't we allow -e only together with -r ?
>>
>> I'm reluctant to; it might break whatever existing user is okay exposing
>> it (although such users are questionable, so maybe we can argue they
>> were already broken).  Maybe it's time to start a deprecation cycle?
>>
> 
> man qemu-nbd (on Centos 7.6) says:
> 
>        -e, --shared=num
>            Allow up to num clients to share the device (default 1)
> 
> I see that in qemu-img 4.1 there is a note about consistency with writers:
> 
>        -e, --shared=num
>            Allow up to num clients to share the device (default 1). Safe
> for readers, but for now, consistency is not guaranteed between multiple
> writers.
> But it is not clear what are the consistency guarantees.
> 
> Supporting multiple writers is important. oVirt is giving the user a URL
> (since 4.3), and the user
> can use multiple connections using the same URL, each having a connection
> to the same qemu-nbd
> socket. I know that some backup vendors tried to use multiple connections
> to speed up backups, and
> they may try to do this also for restore.
> 
> An interesting use case would be using multiple connections on client side
> to write in parallel to
> same image, when every client is writing different ranges.

Good to know.

> 
> Do we have real issue in qemu-nbd serving multiple clients writing to
> different parts of
> the same image?

If a server advertises multi-conn on a writable image, then clients have
stronger guarantees about behavior on what happens with flush on one
client vs. write in another, to the point that you can make some better
assumptions about image consistency, including what one client will read
after another has written.  But as long as multiple clients only ever
access distinct portions of the disk, then multi-conn is not important
to that client (whether for reading or for writing).

So it sounds like I have no reason to deprecate qemu-nbd -e 2, even for
writable images.
Vladimir Sementsov-Ogievskiy Aug. 20, 2019, 9:07 a.m. UTC | #11
17.08.2019 17:30, Eric Blake wrote:
> On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>>> +++ b/blockdev-nbd.c
>>>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
>>>>        }
>>>>
>>>>        exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>>>> -                         writable ? 0 : NBD_FLAG_READ_ONLY,
>>>> +                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
>>>
>>> s/true/!writable ?
>>
>> Oh, I see, John already noticed this, it's checked in nbd_export_new anyway..
> 
> Still, since two reviewers have caught it, I'm fixing it :)

With it or without:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> 
> 
>>>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>>>        perm = BLK_PERM_CONSISTENT_READ;
>>>>        if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>>>            perm |= BLK_PERM_WRITE;
>>>> +    } else if (shared) {
>>>> +        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>>
>> For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why
>> to add a separate boolean to pass one of nbdflags flags?
> 
> Because I want to get rid of the nbdflags in my next patch.
> 
>>
>> Also, for qemu-nbd, shouldn't we allow -e only together with -r ?
> 
> I'm reluctant to; it might break whatever existing user is okay exposing
> it (although such users are questionable, so maybe we can argue they
> were already broken).  Maybe it's time to start a deprecation cycle?
>
Nir Soffer Aug. 20, 2019, 9:19 p.m. UTC | #12
On Mon, Aug 19, 2019 at 9:04 PM Eric Blake <eblake@redhat.com> wrote:

> On 8/17/19 8:31 PM, Nir Soffer wrote:
> >>> Also, for qemu-nbd, shouldn't we allow -e only together with -r ?
> >>
> >> I'm reluctant to; it might break whatever existing user is okay exposing
> >> it (although such users are questionable, so maybe we can argue they
> >> were already broken).  Maybe it's time to start a deprecation cycle?
> >>
> >
> > man qemu-nbd (on Centos 7.6) says:
> >
> >        -e, --shared=num
> >            Allow up to num clients to share the device (default 1)
> >
> > I see that in qemu-img 4.1 there is a note about consistency with
> writers:
> >
> >        -e, --shared=num
> >            Allow up to num clients to share the device (default 1). Safe
> > for readers, but for now, consistency is not guaranteed between multiple
> > writers.
> > But it is not clear what are the consistency guarantees.
> >
> > Supporting multiple writers is important. oVirt is giving the user a URL
> > (since 4.3), and the user
> > can use multiple connections using the same URL, each having a connection
> > to the same qemu-nbd
> > socket. I know that some backup vendors tried to use multiple connections
> > to speed up backups, and
> > they may try to do this also for restore.
> >
> > An interesting use case would be using multiple connections on client
> side
> > to write in parallel to
> > same image, when every client is writing different ranges.
>
> Good to know.
>
> >
> > Do we have real issue in qemu-nbd serving multiple clients writing to
> > different parts of
> > the same image?
>
> If a server advertises multi-conn on a writable image, then clients have
> stronger guarantees about behavior on what happens with flush on one
> client vs. write in another, to the point that you can make some better
> assumptions about image consistency, including what one client will read
> after another has written.  But as long as multiple clients only ever
> access distinct portions of the disk, then multi-conn is not important
> to that client (whether for reading or for writing).
>

Thanks for making this clear. I think we need to document this in oVirt,
so users will be careful about using multiple connections.



>
> So it sounds like I have no reason to deprecate qemu-nbd -e 2, even for
> writable images.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>
diff mbox series

Patch

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index fc64473e02b2..6dfec7f47647 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -53,3 +53,4 @@  the operation of that feature.
 * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
 * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
+* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7b36d672f046..991fd52a5134 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -326,7 +326,7 @@  typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
                           uint64_t size, const char *name, const char *desc,
-                          const char *bitmap, uint16_t nbdflags,
+                          const char *bitmap, uint16_t nbdflags, bool shared,
                           void (*close)(NBDExport *), bool writethrough,
                           BlockBackend *on_eject_blk, Error **errp);
 void nbd_export_close(NBDExport *exp);
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 66eebab31875..e5d228771292 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -189,7 +189,7 @@  void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
     }

     exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
-                         writable ? 0 : NBD_FLAG_READ_ONLY,
+                         writable ? 0 : NBD_FLAG_READ_ONLY, true,
                          NULL, false, on_eject_blk, errp);
     if (!exp) {
         return;
diff --git a/nbd/server.c b/nbd/server.c
index a2cf085f7635..a602d85070ff 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1460,7 +1460,7 @@  static void nbd_eject_notifier(Notifier *n, void *data)

 NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
                           uint64_t size, const char *name, const char *desc,
-                          const char *bitmap, uint16_t nbdflags,
+                          const char *bitmap, uint16_t nbdflags, bool shared,
                           void (*close)(NBDExport *), bool writethrough,
                           BlockBackend *on_eject_blk, Error **errp)
 {
@@ -1486,6 +1486,8 @@  NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
     perm = BLK_PERM_CONSISTENT_READ;
     if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
         perm |= BLK_PERM_WRITE;
+    } else if (shared) {
+        nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
     }
     blk = blk_new(bdrv_get_aio_context(bs), perm,
                   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 049645491dab..55f5ceaf5c92 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -1173,7 +1173,7 @@  int main(int argc, char **argv)
     }

     export = nbd_export_new(bs, dev_offset, fd_size, export_name,
-                            export_description, bitmap, nbdflags,
+                            export_description, bitmap, nbdflags, shared > 1,
                             nbd_export_closed, writethrough, NULL,
                             &error_fatal);