[1/1] qemu-nbd: add the option to use pre-created server socket
diff mbox

Message ID 1475146935-12118-1-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Sept. 29, 2016, 11:02 a.m. UTC
From: Denis Plotnikov <dplotnikov@virtuozzo.com>

Originally NBD server socket was created by qemu-nbd code. This leads to
the race when the management layer starts qemu-nbd server and allows a
client to connect to the server. In this case there is a possibility that
qemu-nbd does not open listening server socket yet. Creating listening
socket before starting of qemu-ndb and passing socket fd via command line
solves this issue completely.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-nbd.c    | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 qemu-nbd.texi |  6 ++++
 2 files changed, 84 insertions(+), 10 deletions(-)

Comments

Daniel P. Berrangé Sept. 29, 2016, 11:13 a.m. UTC | #1
On Thu, Sep 29, 2016 at 02:02:15PM +0300, Denis V. Lunev wrote:
> From: Denis Plotnikov <dplotnikov@virtuozzo.com>
> 
> Originally NBD server socket was created by qemu-nbd code. This leads to
> the race when the management layer starts qemu-nbd server and allows a
> client to connect to the server. In this case there is a possibility that
> qemu-nbd does not open listening server socket yet. Creating listening
> socket before starting of qemu-ndb and passing socket fd via command line
> solves this issue completely.

FWIW, this could be solved in qemu-nbd itself if we had a general
ability to request "daemon" mode - currently it only daemonizes
if attaching to a nbd block device.

The key would be that qemu-nbd would open the listening socket
before daemonizing. Thus when the mgmt application spawned
qemu-nbd in daemon mode, it can be sure that the listener
socket is present when waitpid() completes.

Regards,
Daniel
Paolo Bonzini Sept. 29, 2016, 12:18 p.m. UTC | #2
On 29/09/2016 13:13, Daniel P. Berrange wrote:
> On Thu, Sep 29, 2016 at 02:02:15PM +0300, Denis V. Lunev wrote:
>> From: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>
>> Originally NBD server socket was created by qemu-nbd code. This leads to
>> the race when the management layer starts qemu-nbd server and allows a
>> client to connect to the server. In this case there is a possibility that
>> qemu-nbd does not open listening server socket yet. Creating listening
>> socket before starting of qemu-ndb and passing socket fd via command line
>> solves this issue completely.
> 
> FWIW, this could be solved in qemu-nbd itself if we had a general
> ability to request "daemon" mode - currently it only daemonizes
> if attaching to a nbd block device.
> 
> The key would be that qemu-nbd would open the listening socket
> before daemonizing. Thus when the mgmt application spawned
> qemu-nbd in daemon mode, it can be sure that the listener
> socket is present when waitpid() completes.

This is 100% true, but I find daemonization to be a hack, so file
descriptor passing has its place.

Still, I'd prefer to have a --socket-activation option which uses the
systemd socket activation protocol.  The systemd protocol can be
implemented easily by any management layer.

Paolo
Daniel P. Berrangé Sept. 29, 2016, 12:32 p.m. UTC | #3
On Thu, Sep 29, 2016 at 02:18:54PM +0200, Paolo Bonzini wrote:
> 
> 
> On 29/09/2016 13:13, Daniel P. Berrange wrote:
> > On Thu, Sep 29, 2016 at 02:02:15PM +0300, Denis V. Lunev wrote:
> >> From: Denis Plotnikov <dplotnikov@virtuozzo.com>
> >>
> >> Originally NBD server socket was created by qemu-nbd code. This leads to
> >> the race when the management layer starts qemu-nbd server and allows a
> >> client to connect to the server. In this case there is a possibility that
> >> qemu-nbd does not open listening server socket yet. Creating listening
> >> socket before starting of qemu-ndb and passing socket fd via command line
> >> solves this issue completely.
> > 
> > FWIW, this could be solved in qemu-nbd itself if we had a general
> > ability to request "daemon" mode - currently it only daemonizes
> > if attaching to a nbd block device.
> > 
> > The key would be that qemu-nbd would open the listening socket
> > before daemonizing. Thus when the mgmt application spawned
> > qemu-nbd in daemon mode, it can be sure that the listener
> > socket is present when waitpid() completes.
> 
> This is 100% true, but I find daemonization to be a hack, so file
> descriptor passing has its place.
> 
> Still, I'd prefer to have a --socket-activation option which uses the
> systemd socket activation protocol.  The systemd protocol can be
> implemented easily by any management layer.

That's a nice idea - system socket activation is pretty simple todo


Regards,
Daniel
Stefan Hajnoczi Sept. 29, 2016, 1:41 p.m. UTC | #4
On Thu, Sep 29, 2016 at 12:13:18PM +0100, Daniel P. Berrange wrote:
> On Thu, Sep 29, 2016 at 02:02:15PM +0300, Denis V. Lunev wrote:
> > From: Denis Plotnikov <dplotnikov@virtuozzo.com>
> > 
> > Originally NBD server socket was created by qemu-nbd code. This leads to
> > the race when the management layer starts qemu-nbd server and allows a
> > client to connect to the server. In this case there is a possibility that
> > qemu-nbd does not open listening server socket yet. Creating listening
> > socket before starting of qemu-ndb and passing socket fd via command line
> > solves this issue completely.
> 
> FWIW, this could be solved in qemu-nbd itself if we had a general
> ability to request "daemon" mode - currently it only daemonizes
> if attaching to a nbd block device.
> 
> The key would be that qemu-nbd would open the listening socket
> before daemonizing. Thus when the mgmt application spawned
> qemu-nbd in daemon mode, it can be sure that the listener
> socket is present when waitpid() completes.

A patch to do that was just posted:
[PATCH v4 1/3] qemu-nbd: Add --fork option
Eric Blake Sept. 29, 2016, 3:05 p.m. UTC | #5
On 09/29/2016 06:02 AM, Denis V. Lunev wrote:
> From: Denis Plotnikov <dplotnikov@virtuozzo.com>
> 
> Originally NBD server socket was created by qemu-nbd code. This leads to
> the race when the management layer starts qemu-nbd server and allows a
> client to connect to the server. In this case there is a possibility that
> qemu-nbd does not open listening server socket yet. Creating listening
> socket before starting of qemu-ndb and passing socket fd via command line
> solves this issue completely.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> ---

> @@ -78,6 +79,7 @@ static void usage(const char *name)
>  "  -b, --bind=IFACE          interface to bind to (default `0.0.0.0')\n"
>  "  -k, --socket=PATH         path to the unix socket\n"
>  "                            (default '"SOCKET_PATH"')\n"
> +"  --server-sock-fd=NUM      pre-opened listening server socket file descriptor\n"
>  "  -e, --shared=NUM          device can be shared by NUM clients (default '1')\n"

Looking at 'qemu-nbd --help', all other long-option-only options are
indented so that the -- line up to the same column (see for example
--cache and --aio).

>  
> -static SocketAddress *nbd_build_socket_address(const char *sockpath,
> +static SocketAddress *nbd_build_sock_fd(const char *sockpath,
>                                                 const char *bindto,
>                                                 const char *port)

Indentation now looks off.

> +static void setup_address_and_port(const char **address, const char **port)
> +{
> +    if (*address == NULL) {
> +        *address = "0.0.0.0";
> +    }
> +
> +    if (*port == NULL) {
> +        *port = g_strdup_printf("%d", NBD_DEFAULT_PORT);;

s/;;/;/

Why is one of these parameters malloc'd, but the other pointing to const
storage? Do you have a memory leak, or else a risk of corrupting
read-only memory?
Denis V. Lunev Sept. 29, 2016, 3:52 p.m. UTC | #6
On 09/29/2016 02:02 PM, Denis V. Lunev wrote:
> From: Denis Plotnikov <dplotnikov@virtuozzo.com>
>
> Originally NBD server socket was created by qemu-nbd code. This leads to
> the race when the management layer starts qemu-nbd server and allows a
> client to connect to the server. In this case there is a possibility that
> qemu-nbd does not open listening server socket yet. Creating listening
> socket before starting of qemu-ndb and passing socket fd via command line
> solves this issue completely.
>
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
This patch also solves another problem.

Let us assume that qemu-nbd is started by the management software.
After that the software should report the port qemu-nbd is bound
to the another software.

Right now as far as I know there is no good way to do that. That
management software could try to bind to a port, release the
port and start qemu-nbd. This is definitely racy. With the approach
with passing file descriptor management layer could create a socket
and pass it to qemu-nbd without a race at all.

Den



> ---
>  qemu-nbd.c    | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  qemu-nbd.texi |  6 ++++
>  2 files changed, 84 insertions(+), 10 deletions(-)
>
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 99297a5..99bba38 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -48,6 +48,7 @@
>  #define QEMU_NBD_OPT_OBJECT        260
>  #define QEMU_NBD_OPT_TLSCREDS      261
>  #define QEMU_NBD_OPT_IMAGE_OPTS    262
> +#define QEMU_NBD_OPT_SRV_SOCKET_FD 263
>  
>  #define MBR_SIZE 512
>  
> @@ -78,6 +79,7 @@ static void usage(const char *name)
>  "  -b, --bind=IFACE          interface to bind to (default `0.0.0.0')\n"
>  "  -k, --socket=PATH         path to the unix socket\n"
>  "                            (default '"SOCKET_PATH"')\n"
> +"  --server-sock-fd=NUM      pre-opened listening server socket file descriptor\n"
>  "  -e, --shared=NUM          device can be shared by NUM clients (default '1')\n"
>  "  -t, --persistent          don't exit on the last connection\n"
>  "  -v, --verbose             display extra debugging information\n"
> @@ -382,7 +384,7 @@ static void nbd_update_server_watch(void)
>  }
>  
>  
> -static SocketAddress *nbd_build_socket_address(const char *sockpath,
> +static SocketAddress *nbd_build_sock_fd(const char *sockpath,
>                                                 const char *bindto,
>                                                 const char *port)
>  {
> @@ -459,6 +461,41 @@ static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
>      return creds;
>  }
>  
> +static void setup_address_and_port(const char **address, const char **port)
> +{
> +    if (*address == NULL) {
> +        *address = "0.0.0.0";
> +    }
> +
> +    if (*port == NULL) {
> +        *port = g_strdup_printf("%d", NBD_DEFAULT_PORT);;
> +    }
> +}
> +
> +/*
> ++ * Check socket parameters compatibility when sock_fd is given
> ++ */
> +static const char *sock_fd_validate_opts(char *device, char *sockpath,
> +                                         const char *address, const char *port)
> +{
> +    if (device != NULL) {
> +        return "NBD device can't be set when socket fd is specified";
> +    }
> +
> +    if (sockpath != NULL) {
> +        return "Unix socket can't be set when socket fd is specified";
> +    }
> +
> +    if (address != NULL) {
> +        return "The interface can't be set when socket fd is specified";
> +    }
> +
> +    if (port != NULL) {
> +        return "TCP port number can't be set when socket fd is specified";
> +    }
> +
> +    return NULL;
> +}
>  
>  int main(int argc, char **argv)
>  {
> @@ -467,7 +504,7 @@ int main(int argc, char **argv)
>      off_t dev_offset = 0;
>      uint16_t nbdflags = 0;
>      bool disconnect = false;
> -    const char *bindto = "0.0.0.0";
> +    const char *bindto = NULL;
>      const char *port = NULL;
>      char *sockpath = NULL;
>      char *device = NULL;
> @@ -503,6 +540,8 @@ int main(int argc, char **argv)
>          { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
>          { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
>          { "trace", required_argument, NULL, 'T' },
> +        { "server-sock-fd", required_argument, NULL,
> +          QEMU_NBD_OPT_SRV_SOCKET_FD },
>          { NULL, 0, NULL, 0 }
>      };
>      int ch;
> @@ -524,6 +563,7 @@ int main(int argc, char **argv)
>      bool imageOpts = false;
>      bool writethrough = true;
>      char *trace_file = NULL;
> +    int sock_fd = -1;
>  
>      /* The client thread uses SIGTERM to interrupt the server.  A signal
>       * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
> @@ -714,6 +754,15 @@ int main(int argc, char **argv)
>              g_free(trace_file);
>              trace_file = trace_opt_parse(optarg);
>              break;
> +        case QEMU_NBD_OPT_SRV_SOCKET_FD:
> +            sock_fd = qemu_parse_fd(optarg);
> +
> +            if (sock_fd < 0) {
> +                error_report("Invalid socket fd value: "
> +                             "it must be a non-negative integer");
> +                exit(EXIT_FAILURE);
> +            }
> +            break;
>          }
>      }
>  
> @@ -735,6 +784,17 @@ int main(int argc, char **argv)
>      trace_init_file(trace_file);
>      qemu_set_log(LOG_TRACE);
>  
> +    if (sock_fd != -1) {
> +        const char *err_msg = sock_fd_validate_opts(device, sockpath,
> +                                                    bindto, port);
> +        if (err_msg != NULL) {
> +            error_report("%s", err_msg);
> +            exit(EXIT_FAILURE);
> +        }
> +    } else {
> +        setup_address_and_port(&bindto, &port);
> +    }
> +
>      if (tlscredsid) {
>          if (sockpath) {
>              error_report("TLS is only supported with IPv4/IPv6");
> @@ -838,7 +898,22 @@ int main(int argc, char **argv)
>          snprintf(sockpath, 128, SOCKET_PATH, basename(device));
>      }
>  
> -    saddr = nbd_build_socket_address(sockpath, bindto, port);
> +    if (sock_fd == -1) {
> +        server_ioc = qio_channel_socket_new();
> +        saddr = nbd_build_sock_fd(sockpath, bindto, port);
> +        if (qio_channel_socket_listen_sync(server_ioc, saddr, &local_err) < 0) {
> +            object_unref(OBJECT(server_ioc));
> +            error_report_err(local_err);
> +            return 1;
> +        }
> +    } else {
> +        server_ioc = qio_channel_socket_new_fd(sock_fd, &local_err);
> +        if (server_ioc == NULL) {
> +            error_report("Failed to use the given server socket fd: %s",
> +                         error_get_pretty(local_err));
> +            exit(EXIT_FAILURE);
> +        }
> +    }
>  
>      if (qemu_init_main_loop(&local_err)) {
>          error_report_err(local_err);
> @@ -921,13 +996,6 @@ int main(int argc, char **argv)
>          newproto = true;
>      }
>  
> -    server_ioc = qio_channel_socket_new();
> -    if (qio_channel_socket_listen_sync(server_ioc, saddr, &local_err) < 0) {
> -        object_unref(OBJECT(server_ioc));
> -        error_report_err(local_err);
> -        return 1;
> -    }
> -
>      if (device) {
>          int ret;
>  
> diff --git a/qemu-nbd.texi b/qemu-nbd.texi
> index 91ebf04..4c9ea00 100644
> --- a/qemu-nbd.texi
> +++ b/qemu-nbd.texi
> @@ -34,6 +34,12 @@ The offset into the image
>  The interface to bind to (default @samp{0.0.0.0})
>  @item -k, --socket=@var{path}
>  Use a unix socket with path @var{path}
> +@item --server-sock-fd=@var{fd_num}
> +A server socket fd to be used for client connections.
> +The socket fd must be configured and switched to listening
> +state before passing to the program. The program uses
> +the given socket fd as is, without any additional
> +modifications.
>  @item --image-opts
>  Treat @var{filename} as a set of image options, instead of a plain
>  filename. If this flag is specified, the @var{-f} flag should

Patch
diff mbox

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 99297a5..99bba38 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -48,6 +48,7 @@ 
 #define QEMU_NBD_OPT_OBJECT        260
 #define QEMU_NBD_OPT_TLSCREDS      261
 #define QEMU_NBD_OPT_IMAGE_OPTS    262
+#define QEMU_NBD_OPT_SRV_SOCKET_FD 263
 
 #define MBR_SIZE 512
 
@@ -78,6 +79,7 @@  static void usage(const char *name)
 "  -b, --bind=IFACE          interface to bind to (default `0.0.0.0')\n"
 "  -k, --socket=PATH         path to the unix socket\n"
 "                            (default '"SOCKET_PATH"')\n"
+"  --server-sock-fd=NUM      pre-opened listening server socket file descriptor\n"
 "  -e, --shared=NUM          device can be shared by NUM clients (default '1')\n"
 "  -t, --persistent          don't exit on the last connection\n"
 "  -v, --verbose             display extra debugging information\n"
@@ -382,7 +384,7 @@  static void nbd_update_server_watch(void)
 }
 
 
-static SocketAddress *nbd_build_socket_address(const char *sockpath,
+static SocketAddress *nbd_build_sock_fd(const char *sockpath,
                                                const char *bindto,
                                                const char *port)
 {
@@ -459,6 +461,41 @@  static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, Error **errp)
     return creds;
 }
 
+static void setup_address_and_port(const char **address, const char **port)
+{
+    if (*address == NULL) {
+        *address = "0.0.0.0";
+    }
+
+    if (*port == NULL) {
+        *port = g_strdup_printf("%d", NBD_DEFAULT_PORT);;
+    }
+}
+
+/*
++ * Check socket parameters compatibility when sock_fd is given
++ */
+static const char *sock_fd_validate_opts(char *device, char *sockpath,
+                                         const char *address, const char *port)
+{
+    if (device != NULL) {
+        return "NBD device can't be set when socket fd is specified";
+    }
+
+    if (sockpath != NULL) {
+        return "Unix socket can't be set when socket fd is specified";
+    }
+
+    if (address != NULL) {
+        return "The interface can't be set when socket fd is specified";
+    }
+
+    if (port != NULL) {
+        return "TCP port number can't be set when socket fd is specified";
+    }
+
+    return NULL;
+}
 
 int main(int argc, char **argv)
 {
@@ -467,7 +504,7 @@  int main(int argc, char **argv)
     off_t dev_offset = 0;
     uint16_t nbdflags = 0;
     bool disconnect = false;
-    const char *bindto = "0.0.0.0";
+    const char *bindto = NULL;
     const char *port = NULL;
     char *sockpath = NULL;
     char *device = NULL;
@@ -503,6 +540,8 @@  int main(int argc, char **argv)
         { "tls-creds", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
         { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
         { "trace", required_argument, NULL, 'T' },
+        { "server-sock-fd", required_argument, NULL,
+          QEMU_NBD_OPT_SRV_SOCKET_FD },
         { NULL, 0, NULL, 0 }
     };
     int ch;
@@ -524,6 +563,7 @@  int main(int argc, char **argv)
     bool imageOpts = false;
     bool writethrough = true;
     char *trace_file = NULL;
+    int sock_fd = -1;
 
     /* The client thread uses SIGTERM to interrupt the server.  A signal
      * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -714,6 +754,15 @@  int main(int argc, char **argv)
             g_free(trace_file);
             trace_file = trace_opt_parse(optarg);
             break;
+        case QEMU_NBD_OPT_SRV_SOCKET_FD:
+            sock_fd = qemu_parse_fd(optarg);
+
+            if (sock_fd < 0) {
+                error_report("Invalid socket fd value: "
+                             "it must be a non-negative integer");
+                exit(EXIT_FAILURE);
+            }
+            break;
         }
     }
 
@@ -735,6 +784,17 @@  int main(int argc, char **argv)
     trace_init_file(trace_file);
     qemu_set_log(LOG_TRACE);
 
+    if (sock_fd != -1) {
+        const char *err_msg = sock_fd_validate_opts(device, sockpath,
+                                                    bindto, port);
+        if (err_msg != NULL) {
+            error_report("%s", err_msg);
+            exit(EXIT_FAILURE);
+        }
+    } else {
+        setup_address_and_port(&bindto, &port);
+    }
+
     if (tlscredsid) {
         if (sockpath) {
             error_report("TLS is only supported with IPv4/IPv6");
@@ -838,7 +898,22 @@  int main(int argc, char **argv)
         snprintf(sockpath, 128, SOCKET_PATH, basename(device));
     }
 
-    saddr = nbd_build_socket_address(sockpath, bindto, port);
+    if (sock_fd == -1) {
+        server_ioc = qio_channel_socket_new();
+        saddr = nbd_build_sock_fd(sockpath, bindto, port);
+        if (qio_channel_socket_listen_sync(server_ioc, saddr, &local_err) < 0) {
+            object_unref(OBJECT(server_ioc));
+            error_report_err(local_err);
+            return 1;
+        }
+    } else {
+        server_ioc = qio_channel_socket_new_fd(sock_fd, &local_err);
+        if (server_ioc == NULL) {
+            error_report("Failed to use the given server socket fd: %s",
+                         error_get_pretty(local_err));
+            exit(EXIT_FAILURE);
+        }
+    }
 
     if (qemu_init_main_loop(&local_err)) {
         error_report_err(local_err);
@@ -921,13 +996,6 @@  int main(int argc, char **argv)
         newproto = true;
     }
 
-    server_ioc = qio_channel_socket_new();
-    if (qio_channel_socket_listen_sync(server_ioc, saddr, &local_err) < 0) {
-        object_unref(OBJECT(server_ioc));
-        error_report_err(local_err);
-        return 1;
-    }
-
     if (device) {
         int ret;
 
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 91ebf04..4c9ea00 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -34,6 +34,12 @@  The offset into the image
 The interface to bind to (default @samp{0.0.0.0})
 @item -k, --socket=@var{path}
 Use a unix socket with path @var{path}
+@item --server-sock-fd=@var{fd_num}
+A server socket fd to be used for client connections.
+The socket fd must be configured and switched to listening
+state before passing to the program. The program uses
+the given socket fd as is, without any additional
+modifications.
 @item --image-opts
 Treat @var{filename} as a set of image options, instead of a plain
 filename. If this flag is specified, the @var{-f} flag should