From patchwork Fri Aug 23 14:40:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 11111919 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87F501399 for ; Fri, 23 Aug 2019 14:53:38 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 55B4922CE3 for ; Fri, 23 Aug 2019 14:53:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 55B4922CE3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:57368 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1Awb-0005BX-3w for patchwork-qemu-devel@patchwork.kernel.org; Fri, 23 Aug 2019 10:53:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52208) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1AkW-00082N-4h for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i1AkT-0008K0-Nn for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50392) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1i1AkM-0008AK-Qs; Fri, 23 Aug 2019 10:40:59 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D2BE3090FC1; Fri, 23 Aug 2019 14:40:58 +0000 (UTC) Received: from blue.redhat.com (ovpn-116-234.phx2.redhat.com [10.3.116.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id C1D3E60126; Fri, 23 Aug 2019 14:40:57 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Date: Fri, 23 Aug 2019 09:40:52 -0500 Message-Id: <20190823144054.27420-2-eblake@redhat.com> In-Reply-To: <20190823144054.27420-1-eblake@redhat.com> References: <25ead363-4f37-5450-b985-1876374e314d@redhat.com> <20190823144054.27420-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Fri, 23 Aug 2019 14:40:58 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [nbdkit PATCH 1/3] server: Add internal support for NBDKIT_FLAG_FAST_ZERO X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Qemu was able to demonstrate that knowing whether a zero operation is fast is useful when copying from one image to another: there is a choice between bulk pre-zeroing and then revisiting the data sections (fewer transactions, but depends on the zeroing to be fast), vs. visiting every portion of the disk only once (more transactions, but no time lost to duplicated I/O due to slow zeroes). As such, the NBD protocol is adding an extension to allow clients to request fast failure when zero is not efficient, from servers that advertise support for the new flag. In nbdkit, start by plumbing a new .can_fast_zero through the backends (although it stays 0 in this patch, later patches will actually expose it to plugins and filters to override). Advertise the flag to the client when the plugin provides a .can_fast_zero, and wire in passing the flag down to .zero in the plugin. In turn, if the flag is set and the implementation .zero fails with ENOTSUP/EOPNOTSUPP, we no longer attempt the .pwrite fallback. Signed-off-by: Eric Blake --- docs/nbdkit-protocol.pod | 11 +++++++++ server/internal.h | 2 ++ common/protocol/protocol.h | 11 +++++---- server/filters.c | 20 +++++++++++++--- server/plugins.c | 28 +++++++++++++++++++--- server/protocol-handshake.c | 7 ++++++ server/protocol.c | 46 +++++++++++++++++++++++++++++-------- include/nbdkit-common.h | 7 +++--- plugins/ocaml/ocaml.c | 1 - plugins/sh/call.c | 1 - 10 files changed, 110 insertions(+), 24 deletions(-) diff --git a/docs/nbdkit-protocol.pod b/docs/nbdkit-protocol.pod index 35db07b3..76c50bb8 100644 --- a/docs/nbdkit-protocol.pod +++ b/docs/nbdkit-protocol.pod @@ -173,6 +173,17 @@ This protocol extension allows a client to inform the server about intent to access a portion of the export, to allow the server an opportunity to cache things appropriately. +=item C + +Supported in nbdkit E 1.13.9. + +This protocol extension allows a server to advertise that it can rank +all zero requests as fast or slow, at which point the client can make +fast zero requests which fail immediately with C if the +request is no faster than a counterpart write would be, while normal +zero requests still benefit from compressed network traffic regardless +of the time taken. + =item Resize Extension I. diff --git a/server/internal.h b/server/internal.h index 22e13b6d..784c8b5c 100644 --- a/server/internal.h +++ b/server/internal.h @@ -168,6 +168,7 @@ struct connection { bool is_rotational; bool can_trim; bool can_zero; + bool can_fast_zero; bool can_fua; bool can_multi_conn; bool can_cache; @@ -275,6 +276,7 @@ struct backend { int (*is_rotational) (struct backend *, struct connection *conn); int (*can_trim) (struct backend *, struct connection *conn); int (*can_zero) (struct backend *, struct connection *conn); + int (*can_fast_zero) (struct backend *, struct connection *conn); int (*can_extents) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); int (*can_multi_conn) (struct backend *, struct connection *conn); diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h index e9386430..bf548390 100644 --- a/common/protocol/protocol.h +++ b/common/protocol/protocol.h @@ -96,6 +96,7 @@ extern const char *name_of_nbd_flag (int); #define NBD_FLAG_SEND_DF (1 << 7) #define NBD_FLAG_CAN_MULTI_CONN (1 << 8) #define NBD_FLAG_SEND_CACHE (1 << 10) +#define NBD_FLAG_SEND_FAST_ZERO (1 << 11) /* NBD options (new style handshake only). */ extern const char *name_of_nbd_opt (int); @@ -223,10 +224,11 @@ extern const char *name_of_nbd_cmd (int); #define NBD_CMD_BLOCK_STATUS 7 extern const char *name_of_nbd_cmd_flag (int); -#define NBD_CMD_FLAG_FUA (1<<0) -#define NBD_CMD_FLAG_NO_HOLE (1<<1) -#define NBD_CMD_FLAG_DF (1<<2) -#define NBD_CMD_FLAG_REQ_ONE (1<<3) +#define NBD_CMD_FLAG_FUA (1<<0) +#define NBD_CMD_FLAG_NO_HOLE (1<<1) +#define NBD_CMD_FLAG_DF (1<<2) +#define NBD_CMD_FLAG_REQ_ONE (1<<3) +#define NBD_CMD_FLAG_FAST_ZERO (1<<4) /* Error codes (previously errno). * See http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ca4414804114fd0095b317785bc0b51862e62ebb @@ -239,6 +241,7 @@ extern const char *name_of_nbd_error (int); #define NBD_EINVAL 22 #define NBD_ENOSPC 28 #define NBD_EOVERFLOW 75 +#define NBD_ENOTSUP 95 #define NBD_ESHUTDOWN 108 #endif /* NBDKIT_PROTOCOL_H */ diff --git a/server/filters.c b/server/filters.c index 14ca0cc6..0dd2393e 100644 --- a/server/filters.c +++ b/server/filters.c @@ -403,8 +403,11 @@ next_zero (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, int r; r = b_conn->b->zero (b_conn->b, b_conn->conn, count, offset, flags, err); - if (r == -1) - assert (*err && *err != ENOTSUP && *err != EOPNOTSUPP); + if (r == -1) { + assert (*err); + if (!(flags & NBDKIT_FLAG_FAST_ZERO)) + assert (*err != ENOTSUP && *err != EOPNOTSUPP); + } return r; } @@ -586,6 +589,15 @@ filter_can_zero (struct backend *b, struct connection *conn) return f->backend.next->can_zero (f->backend.next, conn); } +static int +filter_can_fast_zero (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + + debug ("%s: can_fast_zero", f->name); + return 0; /* Next patch will query or pass through */ +} + static int filter_can_extents (struct backend *b, struct connection *conn) { @@ -738,7 +750,8 @@ filter_zero (struct backend *b, struct connection *conn, void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; - assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA))); + assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA | + NBDKIT_FLAG_FAST_ZERO))); debug ("%s: zero count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, f->name, count, offset, flags); @@ -818,6 +831,7 @@ static struct backend filter_functions = { .is_rotational = filter_is_rotational, .can_trim = filter_can_trim, .can_zero = filter_can_zero, + .can_fast_zero = filter_can_fast_zero, .can_extents = filter_can_extents, .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, diff --git a/server/plugins.c b/server/plugins.c index 34cc3cb5..c6dcf408 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -401,6 +401,16 @@ plugin_can_zero (struct backend *b, struct connection *conn) return plugin_can_write (b, conn); } +static int +plugin_can_fast_zero (struct backend *b, struct connection *conn) +{ + assert (connection_get_handle (conn, 0)); + + debug ("can_fast_zero"); + + return 0; /* Upcoming patch will actually add support. */ +} + static int plugin_can_extents (struct backend *b, struct connection *conn) { @@ -621,15 +631,18 @@ plugin_zero (struct backend *b, struct connection *conn, int r = -1; bool may_trim = flags & NBDKIT_FLAG_MAY_TRIM; bool fua = flags & NBDKIT_FLAG_FUA; + bool fast_zero = flags & NBDKIT_FLAG_FAST_ZERO; bool emulate = false; bool need_flush = false; int can_zero = 1; /* TODO cache this per-connection? */ assert (connection_get_handle (conn, 0)); - assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA))); + assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA | + NBDKIT_FLAG_FAST_ZERO))); - debug ("zero count=%" PRIu32 " offset=%" PRIu64 " may_trim=%d fua=%d", - count, offset, may_trim, fua); + debug ("zero count=%" PRIu32 " offset=%" PRIu64 + " may_trim=%d fua=%d fast_zero=%d", + count, offset, may_trim, fua, fast_zero); if (fua && plugin_can_fua (b, conn) != NBDKIT_FUA_NATIVE) { flags &= ~NBDKIT_FLAG_FUA; @@ -643,6 +656,8 @@ plugin_zero (struct backend *b, struct connection *conn, } if (can_zero) { + /* if (!can_fast_zero) */ + flags &= ~NBDKIT_FLAG_FAST_ZERO; errno = 0; if (p->plugin.zero) r = p->plugin.zero (connection_get_handle (conn, 0), count, offset, @@ -658,6 +673,12 @@ plugin_zero (struct backend *b, struct connection *conn, goto done; } + if (fast_zero) { + assert (r == -1); + *err = EOPNOTSUPP; + goto done; + } + assert (p->plugin.pwrite || p->plugin._pwrite_old); flags &= ~NBDKIT_FLAG_MAY_TRIM; threadlocal_set_error (0); @@ -762,6 +783,7 @@ static struct backend plugin_functions = { .is_rotational = plugin_is_rotational, .can_trim = plugin_can_trim, .can_zero = plugin_can_zero, + .can_fast_zero = plugin_can_fast_zero, .can_extents = plugin_can_extents, .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c index 0f3bd280..84fcacfd 100644 --- a/server/protocol-handshake.c +++ b/server/protocol-handshake.c @@ -66,6 +66,13 @@ protocol_compute_eflags (struct connection *conn, uint16_t *flags) if (fl) { eflags |= NBD_FLAG_SEND_WRITE_ZEROES; conn->can_zero = true; + fl = backend->can_fast_zero (backend, conn); + if (fl == -1) + return -1; + if (fl) { + eflags |= NBD_FLAG_SEND_FAST_ZERO; + conn->can_fast_zero = true; + } } fl = backend->can_trim (backend, conn); diff --git a/server/protocol.c b/server/protocol.c index 72f6f2a8..77a045dc 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -110,7 +110,8 @@ validate_request (struct connection *conn, /* Validate flags */ if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE | - NBD_CMD_FLAG_DF | NBD_CMD_FLAG_REQ_ONE)) { + NBD_CMD_FLAG_DF | NBD_CMD_FLAG_REQ_ONE | + NBD_CMD_FLAG_FAST_ZERO)) { nbdkit_error ("invalid request: unknown flag (0x%x)", flags); *error = EINVAL; return false; @@ -121,6 +122,21 @@ validate_request (struct connection *conn, *error = EINVAL; return false; } + if (flags & NBD_CMD_FLAG_FAST_ZERO) { + if (cmd != NBD_CMD_WRITE_ZEROES) { + nbdkit_error ("invalid request: " + "FAST_ZERO flag needs WRITE_ZEROES request"); + *error = EINVAL; + return false; + } + if (!conn->can_fast_zero) { + nbdkit_error ("invalid request: " + "%s: fast zero support was not advertised", + name_of_nbd_cmd (cmd)); + *error = EINVAL; + return false; + } + } if (flags & NBD_CMD_FLAG_DF) { if (cmd != NBD_CMD_READ) { nbdkit_error ("invalid request: DF flag needs READ request"); @@ -297,8 +313,12 @@ handle_request (struct connection *conn, f |= NBDKIT_FLAG_MAY_TRIM; if (fua) f |= NBDKIT_FLAG_FUA; + if (flags & NBD_CMD_FLAG_FAST_ZERO) + f |= NBDKIT_FLAG_FAST_ZERO; if (backend->zero (backend, conn, count, offset, f, &err) == -1) { - assert (err && err != ENOTSUP && err != EOPNOTSUPP); + assert (err); + if (!(flags & NBD_CMD_FLAG_FAST_ZERO)) + assert (err != ENOTSUP && err != EOPNOTSUPP); return err; } break; @@ -368,7 +388,7 @@ skip_over_write_buffer (int sock, size_t count) /* Convert a system errno to an NBD_E* error code. */ static int -nbd_errno (int error, bool flag_df) +nbd_errno (int error, uint16_t flags) { switch (error) { case 0: @@ -390,10 +410,17 @@ nbd_errno (int error, bool flag_df) case ESHUTDOWN: return NBD_ESHUTDOWN; #endif + case ENOTSUP: +#if ENOTSUP != EOPNOTSUPP + case EOPNOTSUPP: +#endif + if (flags & NBD_CMD_FLAG_FAST_ZERO) + return NBD_ENOTSUP; + return NBD_EINVAL; case EOVERFLOW: - if (flag_df) + if (flags & NBD_CMD_FLAG_DF) return NBD_EOVERFLOW; - /* fallthrough */ + return NBD_EINVAL; case EINVAL: default: return NBD_EINVAL; @@ -402,7 +429,7 @@ nbd_errno (int error, bool flag_df) static int send_simple_reply (struct connection *conn, - uint64_t handle, uint16_t cmd, + uint64_t handle, uint16_t cmd, uint16_t flags, const char *buf, uint32_t count, uint32_t error) { @@ -413,7 +440,7 @@ send_simple_reply (struct connection *conn, reply.magic = htobe32 (NBD_SIMPLE_REPLY_MAGIC); reply.handle = handle; - reply.error = htobe32 (nbd_errno (error, false)); + reply.error = htobe32 (nbd_errno (error, flags)); r = conn->send (conn, &reply, sizeof reply, f); if (r == -1) { @@ -640,7 +667,7 @@ send_structured_reply_error (struct connection *conn, } /* Send the error. */ - error_data.error = htobe32 (nbd_errno (error, flags & NBD_CMD_FLAG_DF)); + error_data.error = htobe32 (nbd_errno (error, flags)); error_data.len = htobe16 (0); r = conn->send (conn, &error_data, sizeof error_data, 0); if (r == -1) { @@ -790,5 +817,6 @@ protocol_recv_request_send_reply (struct connection *conn) error); } else - return send_simple_reply (conn, request.handle, cmd, buf, count, error); + return send_simple_reply (conn, request.handle, cmd, flags, buf, count, + error); } diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h index e004aa18..f33ce133 100644 --- a/include/nbdkit-common.h +++ b/include/nbdkit-common.h @@ -57,9 +57,10 @@ extern "C" { #define NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS 2 #define NBDKIT_THREAD_MODEL_PARALLEL 3 -#define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */ -#define NBDKIT_FLAG_FUA (1<<1) /* Maps to NBD_CMD_FLAG_FUA */ -#define NBDKIT_FLAG_REQ_ONE (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */ +#define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */ +#define NBDKIT_FLAG_FUA (1<<1) /* Maps to NBD_CMD_FLAG_FUA */ +#define NBDKIT_FLAG_REQ_ONE (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */ +#define NBDKIT_FLAG_FAST_ZERO (1<<3) /* Maps to NBD_CMD_FLAG_FAST_ZERO */ #define NBDKIT_FUA_NONE 0 #define NBDKIT_FUA_EMULATE 1 diff --git a/plugins/ocaml/ocaml.c b/plugins/ocaml/ocaml.c index c472281f..144a449e 100644 --- a/plugins/ocaml/ocaml.c +++ b/plugins/ocaml/ocaml.c @@ -857,7 +857,6 @@ ocaml_nbdkit_set_error (value nv) case 5: err = ENOSPC; break; case 6: err = ESHUTDOWN; break; case 7: err = EOVERFLOW; break; - /* Necessary for .zero support */ case 8: err = EOPNOTSUPP; break; /* Other errno values that server/protocol.c treats specially */ case 9: err = EROFS; break; diff --git a/plugins/sh/call.c b/plugins/sh/call.c index b86e7c9c..ab43e5ea 100644 --- a/plugins/sh/call.c +++ b/plugins/sh/call.c @@ -347,7 +347,6 @@ handle_script_error (char *ebuf, size_t len) err = ESHUTDOWN; skip = 9; } - /* Necessary for .zero support */ else if (strncasecmp (ebuf, "ENOTSUP", 7) == 0) { err = ENOTSUP; skip = 7; From patchwork Fri Aug 23 14:40:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 11111973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5706B1399 for ; Fri, 23 Aug 2019 14:56:06 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 26D3822CE3 for ; Fri, 23 Aug 2019 14:56:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26D3822CE3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:57430 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1Ayy-000864-Vk for patchwork-qemu-devel@patchwork.kernel.org; Fri, 23 Aug 2019 10:56:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52223) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1AkY-00087E-MD for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i1AkV-0008M1-Cf for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53276) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1i1AkN-0008Af-Tu; Fri, 23 Aug 2019 10:41:00 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C8FB18D5BBB; Fri, 23 Aug 2019 14:40:58 +0000 (UTC) Received: from blue.redhat.com (ovpn-116-234.phx2.redhat.com [10.3.116.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 583FA60126; Fri, 23 Aug 2019 14:40:58 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Date: Fri, 23 Aug 2019 09:40:53 -0500 Message-Id: <20190823144054.27420-3-eblake@redhat.com> In-Reply-To: <20190823144054.27420-1-eblake@redhat.com> References: <25ead363-4f37-5450-b985-1876374e314d@redhat.com> <20190823144054.27420-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.69]); Fri, 23 Aug 2019 14:40:58 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [nbdkit PATCH 2/3] filters: Add .can_fast_zero hook X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Allow filters to affect the handling of the new NBD_CMD_FLAG_FAST_ZERO flag, then update affected filters. In particular, the log filter logs the state of the new flag (requires a tweak to expected output in test-fua.sh), the delay filter gains a bool parameter delay-fast-zero, several filters reject all fast requests because of local writes or splitting a single client request into multiple plugin requests, and the nozero filter gains additional modes for controlling fast zero advertisement and support: zeromode→ none emulate notrim plugin ↓fastzeromode (new) --------------------------------------------- default 0 2 4 4 none 0 1 1 1 slow 0 2 2 2 ignore 0 3 3 3 0 - no zero advertised, thus no fast zero advertised 1 - fast zero not advertised 2 - fast zero advertised, but fast zero requests fail with ENOTSUP (ie. a fast zero was not possible) 3 - fast zero advertised, but fast zero requests are treated the same as normal requests (ignoring the fast zero flag, aids testing at the probable cost of spec non-compliance) 4 - fast zero advertisement/reaction is up to the plugin Mode none/default remains the default for back-compat, and mode plugin/default has no semantic change compared to omitting the nozero filter from the command line. Filters untouched by this patch are fine inheriting whatever fast-zero behavior the underlying plugin uses. Signed-off-by: Eric Blake --- docs/nbdkit-filter.pod | 27 +++++++---- filters/delay/nbdkit-delay-filter.pod | 15 +++++- filters/log/nbdkit-log-filter.pod | 2 +- filters/nozero/nbdkit-nozero-filter.pod | 41 +++++++++++++--- server/filters.c | 15 +++++- include/nbdkit-filter.h | 3 ++ filters/blocksize/blocksize.c | 12 +++++ filters/cache/cache.c | 20 ++++++++ filters/cow/cow.c | 20 ++++++++ filters/delay/delay.c | 28 ++++++++++- filters/log/log.c | 16 ++++--- filters/nozero/nozero.c | 62 +++++++++++++++++++++++-- filters/truncate/truncate.c | 15 ++++++ tests/test-fua.sh | 4 +- 14 files changed, 248 insertions(+), 32 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 6e2bea61..ebce8961 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -361,6 +361,8 @@ calls. =head2 C<.can_zero> +=head2 C<.can_fast_zero> + =head2 C<.can_extents> =head2 C<.can_fua> @@ -380,6 +382,8 @@ calls. void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_fast_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -517,22 +521,25 @@ turn, the filter should not call Czero> if Ccan_zero> did not return true. On input, the parameter C may include C -unconditionally, and C based on the result of -C<.can_fua>. In turn, the filter may pass C -unconditionally, but should only pass C on to -Czero> if Ccan_fua> returned a positive -value. +unconditionally, C based on the result of +C<.can_fua>, and C based on the result of +C<.can_fast_zero>. In turn, the filter may pass +C unconditionally, but should only pass +C or C on to +Czero> if the corresponding Ccan_fua> or +Ccan_fast_zero> returned a positive value. Note that unlike the plugin C<.zero> which is permitted to fail with C or C to force a fallback to C<.pwrite>, the -function Czero> will never fail with C set to -C or C because the fallback has already taken -place. +function Czero> will not fail with C set to +C or C unless C was used, +because otherwise the fallback has already taken place. If there is an error, C<.zero> should call C with an error message B return -1 with C set to the positive errno -value to return to the client. The filter should never fail with -C or C (while plugins have automatic fallback to +value to return to the client. The filter should not fail with +C or C unless C includes +C (while plugins have automatic fallback to C<.pwrite>, filters do not). =head2 C<.extents> diff --git a/filters/delay/nbdkit-delay-filter.pod b/filters/delay/nbdkit-delay-filter.pod index 730cea4c..0a9c77f7 100644 --- a/filters/delay/nbdkit-delay-filter.pod +++ b/filters/delay/nbdkit-delay-filter.pod @@ -12,6 +12,7 @@ nbdkit-delay-filter - nbdkit delay filter delay-read=(SECS|NNms) delay-write=(SECS|NNms) delay-zero=(SECS|NNms) delay-trim=(SECS|NNms) delay-extents=(SECS|NNms) delay-cache=(SECS|NNms) + delay-fast-zero=BOOL =head1 DESCRIPTION @@ -56,7 +57,8 @@ Delay write operations by C seconds or C milliseconds. =item BNNB -Delay zero operations by C seconds or C milliseconds. +Delay zero operations by C seconds or C milliseconds. See +also B. =item BSECS @@ -85,6 +87,17 @@ milliseconds. Delay write, zero and trim operations by C seconds or C milliseconds. +=item BBOOL + +The NBD specification documents an extension called fast zero, in +which the client may request that a server should reply with +C as soon as possible if the zero operation offers no real +speedup over a corresponding write. By default, this parameter is +true, and fast zero requests are serviced by the plugin after the same +delay as any other zero request; but setting this parameter to false +instantly fails a fast zero response without waiting for or consulting +the plugin. + =back =head1 SEE ALSO diff --git a/filters/log/nbdkit-log-filter.pod b/filters/log/nbdkit-log-filter.pod index 9e102bc0..5d9625a1 100644 --- a/filters/log/nbdkit-log-filter.pod +++ b/filters/log/nbdkit-log-filter.pod @@ -55,7 +55,7 @@ on the connection). An example logging session of a client that performs a single successful read is: - 2018-01-27 20:38:22.959984 connection=1 Connect size=0x400 write=1 flush=1 rotational=0 trim=0 zero=1 fua=1 + 2018-01-27 20:38:22.959984 connection=1 Connect size=0x400 write=1 flush=1 rotational=0 trim=0 zero=1 fua=1 extents=1 cache=0 fast_zero=0 2018-01-27 20:38:23.001720 connection=1 Read id=1 offset=0x0 count=0x100 ... 2018-01-27 20:38:23.001995 connection=1 ...Read id=1 return=0 (Success) 2018-01-27 20:38:23.044259 connection=1 Disconnect transactions=1 diff --git a/filters/nozero/nbdkit-nozero-filter.pod b/filters/nozero/nbdkit-nozero-filter.pod index 144b8230..4fc7dc63 100644 --- a/filters/nozero/nbdkit-nozero-filter.pod +++ b/filters/nozero/nbdkit-nozero-filter.pod @@ -4,7 +4,8 @@ nbdkit-nozero-filter - nbdkit nozero filter =head1 SYNOPSIS - nbdkit --filter=nozero plugin [zeromode=MODE] [plugin-args...] + nbdkit --filter=nozero plugin [plugin-args...] \ + [zeromode=MODE] [fastzeromode=MODE] =head1 DESCRIPTION @@ -18,7 +19,7 @@ testing client or server fallbacks. =over 4 -=item B +=item B Optional, controls which mode the filter will use. Mode B (default) means that zero support is not advertised to the @@ -29,8 +30,30 @@ efficient way to write zeros. Since nbdkit E 1.13.4, mode B means that zero requests are forwarded on to the plugin, except that the plugin will never see the NBDKIT_MAY_TRIM flag, to determine if the client permitting trimming during zero operations -makes a difference (it is an error to request this mode if the plugin -does not support the C callback). +makes a difference. Since nbdkit E 1.13.9, mode B leaves +normal zero requests up to the plugin, useful when combined with +C for experimenting with the effects of fast zero +requests. It is an error to request B or B if the +plugin does not support the C callback. + +=item B + +Optional since nbdkit E 1.13.9, controls whether fast zeroes are +advertised to the client, and if so, how the filter will react to a +client fast zero request. Mode B avoids advertising fast zero +support. Mode B advertises fast zero support unconditionally, +but treats all fast zero requests as an immediate C failure +rather than performing a fallback. Mode B advertises fast +zero support, but treats all client fast zero requests as if the flag +had not been used (this behavior is typically contrary to the NBD +specification, but can be useful for comparison against the actual +fast zero implementation to see if fast zeroes make a difference). +Mode B is selected by default; when paired with +C, fast zeroes are advertised but fast zero requests +always fail (similar to C); when paired with C +or C, fast zero support is left to the plugin +(although in the latter case, the nozero filter could be omitted for +the same behavior). =back @@ -42,11 +65,17 @@ explicitly rather than with C: nbdkit --filter=nozero file disk.img Serve the file F, allowing the client to take advantage of -less network traffic via C, but still forcing -the data to be written explicitly rather than punching any holes: +less network traffic via C, but fail any fast +zero requests up front and force all other zero requests to write data +explicitly rather than punching any holes: nbdkit --filter=nozero file zeromode=emulate disk.img +Serve the file F, but do not advertise fast zero support to +the client even if the plugin supports it: + + nbdkit --filter=nozero file zeromode=plugin fastzeromode=none disk.img + =head1 SEE ALSO L, diff --git a/server/filters.c b/server/filters.c index 0dd2393e..f2de5e4e 100644 --- a/server/filters.c +++ b/server/filters.c @@ -314,6 +314,13 @@ next_can_zero (void *nxdata) return b_conn->b->can_zero (b_conn->b, b_conn->conn); } +static int +next_can_fast_zero (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_fast_zero (b_conn->b, b_conn->conn); +} + static int next_can_extents (void *nxdata) { @@ -445,6 +452,7 @@ static struct nbdkit_next_ops next_ops = { .is_rotational = next_is_rotational, .can_trim = next_can_trim, .can_zero = next_can_zero, + .can_fast_zero = next_can_fast_zero, .can_extents = next_can_extents, .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, @@ -593,9 +601,14 @@ static int filter_can_fast_zero (struct backend *b, struct connection *conn) { struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; debug ("%s: can_fast_zero", f->name); - return 0; /* Next patch will query or pass through */ + if (f->filter.can_fast_zero) + return f->filter.can_fast_zero (&next_ops, &nxdata, handle); + else + return f->backend.next->can_fast_zero (f->backend.next, conn); } static int diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 94f17789..d11cf881 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -71,6 +71,7 @@ struct nbdkit_next_ops { int (*is_rotational) (void *nxdata); int (*can_trim) (void *nxdata); int (*can_zero) (void *nxdata); + int (*can_fast_zero) (void *nxdata); int (*can_extents) (void *nxdata); int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); @@ -139,6 +140,8 @@ struct nbdkit_filter { void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_fast_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 0978887f..47638c74 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -307,6 +307,18 @@ blocksize_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint32_t drop; bool need_flush = false; + if (flags & NBDKIT_FLAG_FAST_ZERO) { + /* If we have to split the transaction, an ENOTSUP fast failure in + * a later call would be unnecessarily delayed behind earlier + * calls; it's easier to just declare that anything that can't be + * done in one call to the plugin is not fast. + */ + if ((offs | count) & (minblock - 1) || count > maxlen) { + *err = ENOTSUP; + return -1; + } + } + if ((flags & NBDKIT_FLAG_FUA) && next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { flags &= ~NBDKIT_FLAG_FUA; diff --git a/filters/cache/cache.c b/filters/cache/cache.c index b5dbccd2..7c1d6c4f 100644 --- a/filters/cache/cache.c +++ b/filters/cache/cache.c @@ -250,6 +250,17 @@ cache_can_cache (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) return NBDKIT_CACHE_NATIVE; } +/* Override the plugin's .can_fast_zero, because our .zero is not fast */ +static int +cache_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* It is better to advertise support even when we always reject fast + * zero attempts. + */ + return 1; +} + /* Read data. */ static int cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -418,6 +429,14 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, int r; bool need_flush = false; + /* We are purposefully avoiding next_ops->zero, so a zero request is + * never faster than plain writes. + */ + if (flags & NBDKIT_FLAG_FAST_ZERO) { + *err = ENOTSUP; + return -1; + } + block = malloc (blksize); if (block == NULL) { *err = errno; @@ -624,6 +643,7 @@ static struct nbdkit_filter filter = { .prepare = cache_prepare, .get_size = cache_get_size, .can_cache = cache_can_cache, + .can_fast_zero = cache_can_fast_zero, .pread = cache_pread, .pwrite = cache_pwrite, .zero = cache_zero, diff --git a/filters/cow/cow.c b/filters/cow/cow.c index 9d91d432..a5c1f978 100644 --- a/filters/cow/cow.c +++ b/filters/cow/cow.c @@ -179,6 +179,17 @@ cow_can_cache (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) return NBDKIT_FUA_NATIVE; } +/* Override the plugin's .can_fast_zero, because our .zero is not fast */ +static int +cow_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* It is better to advertise support even when we always reject fast + * zero attempts. + */ + return 1; +} + static int cow_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t flags, int *err); /* Read data. */ @@ -340,6 +351,14 @@ cow_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, blkoffs; int r; + /* We are purposefully avoiding next_ops->zero, so a zero request is + * never faster than plain writes. + */ + if (flags & NBDKIT_FLAG_FAST_ZERO) { + *err = ENOTSUP; + return -1; + } + block = malloc (BLKSIZE); if (block == NULL) { *err = errno; @@ -496,6 +515,7 @@ static struct nbdkit_filter filter = { .can_extents = cow_can_extents, .can_fua = cow_can_fua, .can_cache = cow_can_cache, + .can_fast_zero = cow_can_fast_zero, .pread = cow_pread, .pwrite = cow_pwrite, .zero = cow_zero, diff --git a/filters/delay/delay.c b/filters/delay/delay.c index c92a12d7..207d101e 100644 --- a/filters/delay/delay.c +++ b/filters/delay/delay.c @@ -37,6 +37,8 @@ #include #include #include +#include +#include #include @@ -46,6 +48,7 @@ static int delay_zero_ms = 0; /* zero delay (milliseconds) */ static int delay_trim_ms = 0; /* trim delay (milliseconds) */ static int delay_extents_ms = 0;/* extents delay (milliseconds) */ static int delay_cache_ms = 0; /* cache delay (milliseconds) */ +static int delay_fast_zero = 1; /* whether delaying zero includes fast zero */ static int parse_delay (const char *key, const char *value) @@ -182,6 +185,12 @@ delay_config (nbdkit_next_config *next, void *nxdata, return -1; return 0; } + else if (strcmp (key, "delay-fast-zero") == 0) { + delay_fast_zero = nbdkit_parse_bool (value); + if (delay_fast_zero < 0) + return -1; + return 0; + } else return next (nxdata, key, value); } @@ -194,7 +203,19 @@ delay_config (nbdkit_next_config *next, void *nxdata, "delay-trim=[ms] Trim delay in seconds/milliseconds.\n" \ "delay-extents=[ms] Extents delay in seconds/milliseconds.\n" \ "delay-cache=[ms] Cache delay in seconds/milliseconds.\n" \ - "wdelay=[ms] Write, zero and trim delay in secs/msecs." + "wdelay=[ms] Write, zero and trim delay in secs/msecs.\n" \ + "delay-fast-zero= Delay fast zero requests (default true).\n" + +/* Override the plugin's .can_fast_zero if needed */ +static int +delay_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* Advertise if we are handling fast zero requests locally */ + if (delay_zero_ms && !delay_fast_zero) + return 1; + return next_ops->can_fast_zero (nxdata); +} /* Read data. */ static int @@ -225,6 +246,10 @@ delay_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, uint32_t flags, int *err) { + if ((flags & NBDKIT_FLAG_FAST_ZERO) && delay_zero_ms && !delay_fast_zero) { + *err = ENOTSUP; + return -1; + } if (zero_delay (err) == -1) return -1; return next_ops->zero (nxdata, count, offset, flags, err); @@ -269,6 +294,7 @@ static struct nbdkit_filter filter = { .version = PACKAGE_VERSION, .config = delay_config, .config_help = delay_config_help, + .can_fast_zero = delay_can_fast_zero, .pread = delay_pread, .pwrite = delay_pwrite, .zero = delay_zero, diff --git a/filters/log/log.c b/filters/log/log.c index 7cf741e1..95667c61 100644 --- a/filters/log/log.c +++ b/filters/log/log.c @@ -260,14 +260,15 @@ log_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) int F = next_ops->can_fua (nxdata); int e = next_ops->can_extents (nxdata); int c = next_ops->can_cache (nxdata); + int Z = next_ops->can_fast_zero (nxdata); if (size < 0 || w < 0 || f < 0 || r < 0 || t < 0 || z < 0 || F < 0 || - e < 0 || c < 0) + e < 0 || c < 0 || Z < 0) return -1; output (h, "Connect", 0, "size=0x%" PRIx64 " write=%d flush=%d " - "rotational=%d trim=%d zero=%d fua=%d extents=%d cache=%d", - size, w, f, r, t, z, F, e, c); + "rotational=%d trim=%d zero=%d fua=%d extents=%d cache=%d " + "fast_zero=%d", size, w, f, r, t, z, F, e, c, Z); return 0; } @@ -360,10 +361,13 @@ log_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t id = get_id (h); int r; - assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); - output (h, "Zero", id, "offset=0x%" PRIx64 " count=0x%x trim=%d fua=%d ...", + assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM | + NBDKIT_FLAG_FAST_ZERO))); + output (h, "Zero", id, + "offset=0x%" PRIx64 " count=0x%x trim=%d fua=%d fast=%d...", offs, count, !!(flags & NBDKIT_FLAG_MAY_TRIM), - !!(flags & NBDKIT_FLAG_FUA)); + !!(flags & NBDKIT_FLAG_FUA), + !!(flags & NBDKIT_FLAG_FAST_ZERO)); r = next_ops->zero (nxdata, count, offs, flags, err); output_return (h, "...Zero", id, r, err); return r; diff --git a/filters/nozero/nozero.c b/filters/nozero/nozero.c index 964cce9f..e54f7c62 100644 --- a/filters/nozero/nozero.c +++ b/filters/nozero/nozero.c @@ -38,6 +38,7 @@ #include #include #include +#include #include @@ -49,8 +50,16 @@ static enum ZeroMode { NONE, EMULATE, NOTRIM, + PLUGIN, } zeromode; +static enum FastZeroMode { + DEFAULT, + SLOW, + IGNORE, + NOFAST, +} fastzeromode; + static int nozero_config (nbdkit_next_config *next, void *nxdata, const char *key, const char *value) @@ -60,17 +69,35 @@ nozero_config (nbdkit_next_config *next, void *nxdata, zeromode = EMULATE; else if (strcmp (value, "notrim") == 0) zeromode = NOTRIM; + else if (strcmp (value, "plugin") == 0) + zeromode = PLUGIN; else if (strcmp (value, "none") != 0) { nbdkit_error ("unknown zeromode '%s'", value); return -1; } return 0; } + + if (strcmp (key, "fastzeromode") == 0) { + if (strcmp (value, "none") == 0) + fastzeromode = NOFAST; + else if (strcmp (value, "ignore") == 0) + fastzeromode = IGNORE; + else if (strcmp (value, "slow") == 0) + fastzeromode = SLOW; + else if (strcmp (value, "default") != 0) { + nbdkit_error ("unknown fastzeromode '%s'", value); + return -1; + } + return 0; + } + return next (nxdata, key, value); } #define nozero_config_help \ - "zeromode= Either 'none' (default), 'emulate', or 'notrim'.\n" \ + "zeromode= One of 'none' (default), 'emulate', 'notrim', 'plugin'.\n" \ + "fastzeromode= One of 'default', 'none', 'slow', 'ignore'.\n" /* Check that desired mode is supported by plugin. */ static int @@ -78,12 +105,13 @@ nozero_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { int r; - if (zeromode == NOTRIM) { + if (zeromode == NOTRIM || zeromode == PLUGIN) { r = next_ops->can_zero (nxdata); if (r == -1) return -1; if (!r) { - nbdkit_error ("zeromode 'notrim' requires plugin zero support"); + nbdkit_error ("zeromode '%s' requires plugin zero support", + zeromode == NOTRIM ? "notrim" : "plugin"); return -1; } } @@ -94,9 +122,22 @@ nozero_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) static int nozero_can_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { + /* For NOTRIM and PLUGIN modes, we've already verified next_ops->can_zero */ return zeromode != NONE; } +/* Advertise desired FAST_ZERO mode. */ +static int +nozero_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + if (zeromode == NONE) + return 0; + if (zeromode != EMULATE && fastzeromode == DEFAULT) + return next_ops->can_fast_zero (nxdata); + return fastzeromode != NOFAST; +} + static int nozero_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offs, uint32_t flags, @@ -106,9 +147,21 @@ nozero_zero (struct nbdkit_next_ops *next_ops, void *nxdata, bool need_flush = false; assert (zeromode != NONE); - flags &= ~NBDKIT_FLAG_MAY_TRIM; + if (flags & NBDKIT_FLAG_FAST_ZERO) { + assert (fastzeromode != NOFAST); + if (fastzeromode == SLOW || + (fastzeromode == DEFAULT && zeromode == EMULATE)) { + *err = ENOTSUP; + return -1; + } + if (fastzeromode == IGNORE) + flags &= ~NBDKIT_FLAG_FAST_ZERO; + } if (zeromode == NOTRIM) + flags &= ~NBDKIT_FLAG_MAY_TRIM; + + if (zeromode != EMULATE) return next_ops->zero (nxdata, count, offs, flags, err); if (flags & NBDKIT_FLAG_FUA) { @@ -144,6 +197,7 @@ static struct nbdkit_filter filter = { .config_help = nozero_config_help, .prepare = nozero_prepare, .can_zero = nozero_can_zero, + .can_fast_zero = nozero_can_fast_zero, .zero = nozero_zero, }; diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c index 93d8f074..47d70b31 100644 --- a/filters/truncate/truncate.c +++ b/filters/truncate/truncate.c @@ -201,6 +201,14 @@ truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, return h->size; } +/* Override the plugin's .can_fast_zero, because zeroing a tail is fast. */ +static int +truncate_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 1; +} + /* Read data. */ static int truncate_pread (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -297,6 +305,12 @@ truncate_zero (struct nbdkit_next_ops *next_ops, void *nxdata, n = count; else n = h->real_size - offset; + if (flags & NBDKIT_FLAG_FAST_ZERO && + next_ops->can_fast_zero (nxdata) <= 0) { + /* TODO: Cache per connection? */ + *err = ENOTSUP; + return -1; + } return next_ops->zero (nxdata, n, offset, flags, err); } return 0; @@ -392,6 +406,7 @@ static struct nbdkit_filter filter = { .close = truncate_close, .prepare = truncate_prepare, .get_size = truncate_get_size, + .can_fast_zero = truncate_can_fast_zero, .pread = truncate_pread, .pwrite = truncate_pwrite, .trim = truncate_trim, diff --git a/tests/test-fua.sh b/tests/test-fua.sh index c0d82db7..1c869e96 100755 --- a/tests/test-fua.sh +++ b/tests/test-fua.sh @@ -106,14 +106,14 @@ test $(grep -c 'connection=1 Flush' fua1.log) -lt \ # all earlier parts of the transaction do not have fua flush1=$(grep -c 'connection=1 Flush' fua2.log || :) flush2=$(grep -c 'connection=2 Flush' fua2.log || :) -fua=$(grep -c 'connection=2.*fua=1 \.' fua2.log || :) +fua=$(grep -c 'connection=2.*fua=1 .*\.' fua2.log || :) test $(( $flush2 - $flush1 + $fua )) = 2 # Test 3: every part of split has fua, and no flushes are added flush1=$(grep -c 'connection=1 Flush' fua3.log || :) flush2=$(grep -c 'connection=2 Flush' fua3.log || :) test $flush1 = $flush2 -test $(grep -c 'connection=2.*fua=1 \.' fua3.log) = 32 +test $(grep -c 'connection=2.*fua=1 .*\.' fua3.log) = 32 # Test 4: flush is no-op, and every transaction has fua if grep 'fua=0' fua4.log; then From patchwork Fri Aug 23 14:40:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 11111901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDC311813 for ; Fri, 23 Aug 2019 14:47:21 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8DF3221848 for ; Fri, 23 Aug 2019 14:47:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DF3221848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:57256 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1AqV-0006Vc-Nh for patchwork-qemu-devel@patchwork.kernel.org; Fri, 23 Aug 2019 10:47:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52222) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1AkY-00086n-Ow for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i1AkV-0008Lh-45 for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:41:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51692) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1i1AkN-0008Bz-VG; Fri, 23 Aug 2019 10:41:00 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6051A30A7B83; Fri, 23 Aug 2019 14:40:59 +0000 (UTC) Received: from blue.redhat.com (ovpn-116-234.phx2.redhat.com [10.3.116.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id E0F6960126; Fri, 23 Aug 2019 14:40:58 +0000 (UTC) From: Eric Blake To: libguestfs@redhat.com Date: Fri, 23 Aug 2019 09:40:54 -0500 Message-Id: <20190823144054.27420-4-eblake@redhat.com> In-Reply-To: <20190823144054.27420-1-eblake@redhat.com> References: <25ead363-4f37-5450-b985-1876374e314d@redhat.com> <20190823144054.27420-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Fri, 23 Aug 2019 14:40:59 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [nbdkit PATCH 3/3] plugins: Add .can_fast_zero hook X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, nbd@other.debian.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Allow plugins to affect the handling of the new NBD_CMD_FLAG_FAST_ZERO flag, then update affected plugins. In particular, in-memory plugins are always fast; the full plugin is better served by omitting .zero and relying on .pwrite fallback; nbd takes advantage of libnbd extensions proposed in parallel to pass through support; and v2 language bindings expose the choice to their scripts. The testsuite is updated thanks to the sh plugin to cover this. In turn, the sh plugin has to be a bit smarter about handling missing can_fast_zero to get decent fallback support from nbdkit. Plugins untouched by this patch either do not support .zero with flags (including v1 plugins; these are all okay with the default behavior of advertising but always failing fast zeroes), or are too difficult to analyze in this patch (so not advertising is easier than having to decide - in particular, the file plugin is tricky, since BLKZEROOUT is not reliably fast). The nozero filter can be used to adjust fast zero settings for a plugin that has not yet updated. Signed-off-by: Eric Blake --- docs/nbdkit-plugin.pod | 74 +++++++++++++++---- plugins/sh/nbdkit-sh-plugin.pod | 13 +++- server/plugins.c | 25 +++++-- include/nbdkit-plugin.h | 2 + plugins/data/data.c | 14 +++- plugins/full/full.c | 12 ++-- plugins/memory/memory.c | 14 +++- plugins/nbd/nbd.c | 28 +++++++- plugins/null/null.c | 8 +++ plugins/ocaml/ocaml.c | 25 +++++++ plugins/sh/sh.c | 39 +++++++--- plugins/ocaml/NBDKit.ml | 10 ++- plugins/ocaml/NBDKit.mli | 2 + plugins/rust/src/lib.rs | 3 + tests/test-eflags.sh | 122 ++++++++++++++++++++++++++++---- 15 files changed, 332 insertions(+), 59 deletions(-) diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index bc3d9749..f3793e7a 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -609,19 +609,47 @@ C<.trim> callback has been defined. This is called during the option negotiation phase to find out if the plugin wants the C<.zero> callback to be utilized. Support for -writing zeroes is still advertised to the client (unless the nbdkit -filter nozero is also used), so returning false merely serves as a way -to avoid complicating the C<.zero> callback to have to fail with -C or C on the connections where it will never be -more efficient than using C<.pwrite> up front. +writing zeroes is still advertised to the client (unless the +L is also used), so returning false merely +serves as a way to avoid complicating the C<.zero> callback to have to +fail with C or C on the connections where it will +never be more efficient than using C<.pwrite> up front. If there is an error, C<.can_zero> should call C with an error message and return C<-1>. -This callback is not required. If omitted, then nbdkit always tries -C<.zero> first if it is present, and gracefully falls back to -C<.pwrite> if C<.zero> was absent or failed with C or -C. +This callback is not required. If omitted, then for a normal zero +request, nbdkit always tries C<.zero> first if it is present, and +gracefully falls back to C<.pwrite> if C<.zero> was absent or failed +with C or C. + +=head2 C<.can_fast_zero> + + int can_fast_zero (void *handle); + +This is called during the option negotiation phase to find out if the +plugin wants to advertise support for fast zero requests. If this +support is not advertised, a client cannot attempt fast zero requests, +and has no way to tell if writing zeroes offers any speedups compared +to using C<.pwrite> (other than compressed network traffic). If +support is advertised, then C<.zero> will have +C set when the client has requested a fast +zero, in which case the plugin must fail with C or +C up front if the request would not offer any benefits +over C<.pwrite>. Advertising support for fast zero requests does not +require that writing zeroes be fast, only that the result (whether +success or failure) is fast, so this should be advertised when +feasible. + +If there is an error, C<.can_fast_zero> should call C +with an error message and return C<-1>. + +This callback is not required. If omitted, then nbdkit returns true +if C<.zero> is absent or C<.can_zero> returns false (in those cases, +nbdkit fails all fast zero requests, as its fallback to C<.pwrite> is +not inherently faster), otherwise false (since it cannot be determined +in advance if the plugin's C<.zero> will properly honor the semantics +of C). =head2 C<.can_extents> @@ -804,15 +832,25 @@ bytes of zeroes at C in the backing store. This function will not be called if C<.can_zero> returned false. On input, the parameter C may include C -unconditionally, and C based on the result of -C<.can_fua>. +unconditionally, C based on the result of +C<.can_fua>, and C based on the result of +C<.can_fast_zero>. If C is requested, the operation can punch a hole instead of writing actual zero bytes, but only if subsequent -reads from the hole read as zeroes. If this callback is omitted, or -if it fails with C or C (whether by -C or C), then C<.pwrite> will be used -instead. +reads from the hole read as zeroes. + +If C is requested, the plugin must decide up +front if the implementation is likely to be faster than a +corresponding C<.pwrite>; if not, then it must immediately fail with +C or C (whether by C or +C) and preferably without modifying the exported image. It is +acceptable to always fail a fast zero request (as a fast failure is +better than attempting the write only to find out after the fact that +it was not fast after all). Note that on Linux, support for +C is insufficient for determining whether a zero +request to a block device will be fast (because the kernel will +perform a slow fallback when needed). The callback must write the whole C bytes if it can. The NBD protocol doesn't allow partial writes (instead, these would be @@ -823,6 +861,11 @@ If there is an error, C<.zero> should call C with an error message, and C to record an appropriate error (unless C is sufficient), then return C<-1>. +If this callback is omitted, or if it fails with C or +C (whether by C or C), then +C<.pwrite> will be used as an automatic fallback except when the +client requested a fast zero. + =head2 C<.extents> int extents (void *handle, uint32_t count, uint64_t offset, @@ -1221,6 +1264,7 @@ and then users will be able to run it like this: =head1 SEE ALSO L, +L, L. Standard plugins provided by nbdkit: diff --git a/plugins/sh/nbdkit-sh-plugin.pod b/plugins/sh/nbdkit-sh-plugin.pod index 9e9a133e..adb8a0db 100644 --- a/plugins/sh/nbdkit-sh-plugin.pod +++ b/plugins/sh/nbdkit-sh-plugin.pod @@ -289,7 +289,10 @@ The script should exit with code C<0> for true or code C<3> for false. =item C +=item C + /path/to/script is_rotational + /path/to/script can_fast_zero The script should exit with code C<0> for true or code C<3> for false. @@ -361,12 +364,18 @@ also provide a C method which exits with code C<0> (true). /path/to/script zero The C parameter can be an empty string or a comma-separated -list of the flags: C<"fua"> and C<"may_trim"> (eg. C<"">, C<"fua">, -C<"fua,may_trim"> are all possible values). +list of the flags: C<"fua">, C<"may_trim">, and C<"fast"> (eg. C<"">, +C<"fua">, C<"fua,may_trim,fast"> are some of the 8 possible values). Unlike in other languages, if you provide a C method you B also provide a C method which exits with code C<0> (true). +To trigger a fallback to on a normal zero request, or to +respond quickly to the C<"fast"> flag that a specific zero request is +no faster than a corresponding write, the script must output +C or C to stderr (possibly followed by a +description of the problem) before exiting with code C<1> (failure). + =item C /path/to/script extents diff --git a/server/plugins.c b/server/plugins.c index c6dcf408..84329cf4 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -404,11 +404,25 @@ plugin_can_zero (struct backend *b, struct connection *conn) static int plugin_can_fast_zero (struct backend *b, struct connection *conn) { + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + int r; + assert (connection_get_handle (conn, 0)); debug ("can_fast_zero"); - return 0; /* Upcoming patch will actually add support. */ + if (p->plugin.can_fast_zero) + return p->plugin.can_fast_zero (connection_get_handle (conn, 0)); + /* Advertise support for fast zeroes if no .zero or .can_zero is + * false: in those cases, we fail fast instead of using .pwrite. + * This also works when v1 plugin has only ._zero_old. + */ + if (p->plugin.zero == NULL) + return 1; + r = plugin_can_zero (b, conn); + if (r == -1) + return -1; + return !r; } static int @@ -656,15 +670,18 @@ plugin_zero (struct backend *b, struct connection *conn, } if (can_zero) { - /* if (!can_fast_zero) */ - flags &= ~NBDKIT_FLAG_FAST_ZERO; errno = 0; if (p->plugin.zero) r = p->plugin.zero (connection_get_handle (conn, 0), count, offset, flags); - else if (p->plugin._zero_old) + else if (p->plugin._zero_old) { + if (fast_zero) { + *err = EOPNOTSUPP; + return -1; + } r = p->plugin._zero_old (connection_get_handle (conn, 0), count, offset, may_trim); + } else emulate = true; if (r == -1) diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index 632df867..45ae7053 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -132,6 +132,8 @@ struct nbdkit_plugin { int (*cache) (void *handle, uint32_t count, uint64_t offset, uint32_t flags); int (*thread_model) (void); + + int (*can_fast_zero) (void *handle); }; extern void nbdkit_set_error (int err); diff --git a/plugins/data/data.c b/plugins/data/data.c index 14fb8997..9004a487 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -349,6 +349,13 @@ data_can_cache (void *handle) return NBDKIT_CACHE_NATIVE; } +/* Fast zero. */ +static int +data_can_fast_zero (void *handle) +{ + return 1; +} + /* Read data. */ static int data_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -375,8 +382,10 @@ data_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, static int data_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { - /* Flushing, and thus FUA flag, is a no-op */ - assert ((flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM)) == 0); + /* Flushing, and thus FUA flag, is a no-op. Assume that + * sparse_array_zero generally beats writes, so FAST_ZERO is a no-op. */ + assert ((flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM | + NBDKIT_FLAG_FAST_ZERO)) == 0); ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); sparse_array_zero (sa, count, offset); return 0; @@ -423,6 +432,7 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = data_can_multi_conn, .can_fua = data_can_fua, .can_cache = data_can_cache, + .can_fast_zero = data_can_fast_zero, .pread = data_pread, .pwrite = data_pwrite, .zero = data_zero, diff --git a/plugins/full/full.c b/plugins/full/full.c index 9cfbcfcd..0b69a8c9 100644 --- a/plugins/full/full.c +++ b/plugins/full/full.c @@ -129,13 +129,10 @@ full_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, return -1; } -/* Write zeroes. */ -static int -full_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) -{ - errno = ENOSPC; - return -1; -} +/* Omitting full_zero is intentional: that way, nbdkit defaults to + * permitting fast zeroes which respond with ENOTSUP, while normal + * zeroes fall back to pwrite and respond with ENOSPC. + */ /* Trim. */ static int @@ -172,7 +169,6 @@ static struct nbdkit_plugin plugin = { .can_cache = full_can_cache, .pread = full_pread, .pwrite = full_pwrite, - .zero = full_zero, .trim = full_trim, .extents = full_extents, /* In this plugin, errno is preserved properly along error return diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index 09162ea2..e831a7b5 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -147,6 +147,13 @@ memory_can_cache (void *handle) return NBDKIT_CACHE_NATIVE; } +/* Fast zero. */ +static int +memory_can_fast_zero (void *handle) +{ + return 1; +} + /* Read data. */ static int memory_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -173,8 +180,10 @@ memory_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, static int memory_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { - /* Flushing, and thus FUA flag, is a no-op */ - assert ((flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM)) == 0); + /* Flushing, and thus FUA flag, is a no-op. Assume that + * sparse_array_zero generally beats writes, so FAST_ZERO is a no-op. */ + assert ((flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM | + NBDKIT_FLAG_FAST_ZERO)) == 0); ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); sparse_array_zero (sa, count, offset); return 0; @@ -221,6 +230,7 @@ static struct nbdkit_plugin plugin = { .can_fua = memory_can_fua, .can_multi_conn = memory_can_multi_conn, .can_cache = memory_can_cache, + .can_fast_zero = memory_can_fast_zero, .pread = memory_pread, .pwrite = memory_pwrite, .zero = memory_zero, diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 09c8891e..cddcfde2 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -633,6 +633,24 @@ nbdplug_can_zero (void *handle) return i; } +static int +nbdplug_can_fast_zero (void *handle) +{ +#if LIBNBD_HAVE_NBD_CAN_FAST_ZERO + struct handle *h = handle; + int i = nbd_can_fast_zero (h->nbd); + + if (i == -1) { + nbdkit_error ("failure to check fast zero flag: %s", nbd_get_error ()); + return -1; + } + return i; +#else + /* libnbd 0.9.8 lacks fast zero support */ + return 0; +#endif +} + static int nbdplug_can_fua (void *handle) { @@ -724,12 +742,19 @@ nbdplug_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) struct transaction s; uint32_t f = 0; - assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); + assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM | + NBDKIT_FLAG_FAST_ZERO))); if (!(flags & NBDKIT_FLAG_MAY_TRIM)) f |= LIBNBD_CMD_FLAG_NO_HOLE; if (flags & NBDKIT_FLAG_FUA) f |= LIBNBD_CMD_FLAG_FUA; +#if LIBNBD_HAVE_NBD_CAN_FAST_ZERO + if (flags & NBDKIT_FLAG_FAST_ZERO) + f |= LIBNBD_CMD_FLAG_FAST_ZERO; +#else + assert (!(flags & NBDKIT_FLAG_FAST_ZERO)); +#endif nbdplug_prepare (&s); nbdplug_register (h, &s, nbd_aio_zero (h->nbd, count, offset, s.cb, f)); return nbdplug_reply (h, &s); @@ -831,6 +856,7 @@ static struct nbdkit_plugin plugin = { .is_rotational = nbdplug_is_rotational, .can_trim = nbdplug_can_trim, .can_zero = nbdplug_can_zero, + .can_fast_zero = nbdplug_can_fast_zero, .can_fua = nbdplug_can_fua, .can_multi_conn = nbdplug_can_multi_conn, .can_extents = nbdplug_can_extents, diff --git a/plugins/null/null.c b/plugins/null/null.c index 647624ba..559cb815 100644 --- a/plugins/null/null.c +++ b/plugins/null/null.c @@ -100,6 +100,13 @@ null_can_cache (void *handle) return NBDKIT_CACHE_NATIVE; } +/* Fast zero. */ +static int +null_can_fast_zero (void *handle) +{ + return 1; +} + /* Read data. */ static int null_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -167,6 +174,7 @@ static struct nbdkit_plugin plugin = { .get_size = null_get_size, .can_multi_conn = null_can_multi_conn, .can_cache = null_can_cache, + .can_fast_zero = null_can_fast_zero, .pread = null_pread, .pwrite = null_pwrite, .zero = null_zero, diff --git a/plugins/ocaml/ocaml.c b/plugins/ocaml/ocaml.c index 144a449e..a655f9ca 100644 --- a/plugins/ocaml/ocaml.c +++ b/plugins/ocaml/ocaml.c @@ -134,6 +134,8 @@ static value cache_fn; static value thread_model_fn; +static value can_fast_zero_fn; + /*----------------------------------------------------------------------*/ /* Wrapper functions that translate calls from C (ie. nbdkit) to OCaml. */ @@ -705,6 +707,25 @@ thread_model_wrapper (void) CAMLreturnT (int, Int_val (rv)); } +static int +can_fast_zero_wrapper (void *h) +{ + CAMLparam0 (); + CAMLlocal1 (rv); + + caml_leave_blocking_section (); + + rv = caml_callback_exn (can_fast_zero_fn, *(value *) h); + if (Is_exception_result (rv)) { + nbdkit_error ("%s", caml_format_exception (Extract_exception (rv))); + caml_enter_blocking_section (); + CAMLreturnT (int, -1); + } + + caml_enter_blocking_section (); + CAMLreturnT (int, Bool_val (rv)); +} + /*----------------------------------------------------------------------*/ /* set_* functions called from OCaml code at load time to initialize * fields in the plugin struct. @@ -792,6 +813,8 @@ SET(cache) SET(thread_model) +SET(can_fast_zero) + #undef SET static void @@ -836,6 +859,8 @@ remove_roots (void) REMOVE (thread_model); + REMOVE (can_fast_zero); + #undef REMOVE } diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c index c73b08b7..d5db8b59 100644 --- a/plugins/sh/sh.c +++ b/plugins/sh/sh.c @@ -478,6 +478,9 @@ flags_string (uint32_t flags, char *buf, size_t len) if (flags & NBDKIT_FLAG_REQ_ONE) flag_append ("req_one", &comma, &buf, &len); + + if (flags & NBDKIT_FLAG_FAST_ZERO) + flag_append("fast", &comma, &buf, &len); } static void @@ -536,7 +539,7 @@ sh_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, /* Common code for handling all boolean methods like can_write etc. */ static int -boolean_method (void *handle, const char *method_name) +boolean_method (void *handle, const char *method_name, int def) { char *h = handle; const char *args[] = { script, method_name, h, NULL }; @@ -546,8 +549,8 @@ boolean_method (void *handle, const char *method_name) return 1; case RET_FALSE: /* false */ return 0; - case MISSING: /* missing => assume false */ - return 0; + case MISSING: /* missing => caller chooses default */ + return def; case ERROR: /* error cases */ return -1; default: abort (); @@ -557,37 +560,37 @@ boolean_method (void *handle, const char *method_name) static int sh_can_write (void *handle) { - return boolean_method (handle, "can_write"); + return boolean_method (handle, "can_write", 0); } static int sh_can_flush (void *handle) { - return boolean_method (handle, "can_flush"); + return boolean_method (handle, "can_flush", 0); } static int sh_is_rotational (void *handle) { - return boolean_method (handle, "is_rotational"); + return boolean_method (handle, "is_rotational", 0); } static int sh_can_trim (void *handle) { - return boolean_method (handle, "can_trim"); + return boolean_method (handle, "can_trim", 0); } static int sh_can_zero (void *handle) { - return boolean_method (handle, "can_zero"); + return boolean_method (handle, "can_zero", 0); } static int sh_can_extents (void *handle) { - return boolean_method (handle, "can_extents"); + return boolean_method (handle, "can_extents", 0); } /* Not a boolean method, the method prints "none", "emulate" or "native". */ @@ -646,7 +649,7 @@ sh_can_fua (void *handle) static int sh_can_multi_conn (void *handle) { - return boolean_method (handle, "can_multi_conn"); + return boolean_method (handle, "can_multi_conn", 0); } /* Not a boolean method, the method prints "none", "emulate" or "native". */ @@ -696,6 +699,21 @@ sh_can_cache (void *handle) } } +static int +sh_can_fast_zero (void *handle) +{ + int r = boolean_method (handle, "can_fast_zero", 2); + if (r < 2) + return r; + /* We need to duplicate the logic of plugins.c: if can_fast_zero is + * missing, we advertise fast fail support when can_zero is false. + */ + r = sh_can_zero (handle); + if (r == -1) + return -1; + return !r; +} + static int sh_flush (void *handle, uint32_t flags) { @@ -962,6 +980,7 @@ static struct nbdkit_plugin plugin = { .can_fua = sh_can_fua, .can_multi_conn = sh_can_multi_conn, .can_cache = sh_can_cache, + .can_fast_zero = sh_can_fast_zero, .pread = sh_pread, .pwrite = sh_pwrite, diff --git a/plugins/ocaml/NBDKit.ml b/plugins/ocaml/NBDKit.ml index e54a7705..7002ac03 100644 --- a/plugins/ocaml/NBDKit.ml +++ b/plugins/ocaml/NBDKit.ml @@ -96,6 +96,8 @@ type 'a plugin = { cache : ('a -> int32 -> int64 -> flags -> unit) option; thread_model : (unit -> thread_model) option; + + can_fast_zero : ('a -> bool) option; } let default_callbacks = { @@ -141,6 +143,8 @@ let default_callbacks = { cache = None; thread_model = None; + + can_fast_zero = None; } external set_name : string -> unit = "ocaml_nbdkit_set_name" "noalloc" @@ -186,6 +190,8 @@ external set_cache : ('a -> int32 -> int64 -> flags -> unit) -> unit = "ocaml_nb external set_thread_model : (unit -> thread_model) -> unit = "ocaml_nbdkit_set_thread_model" +external set_can_fast_zero : ('a -> bool) -> unit = "ocaml_nbdkit_set_can_fast_zero" + let may f = function None -> () | Some a -> f a let register_plugin plugin = @@ -249,7 +255,9 @@ let register_plugin plugin = may set_can_cache plugin.can_cache; may set_cache plugin.cache; - may set_thread_model plugin.thread_model + may set_thread_model plugin.thread_model; + + may set_can_fast_zero plugin.can_fast_zero external _set_error : int -> unit = "ocaml_nbdkit_set_error" "noalloc" diff --git a/plugins/ocaml/NBDKit.mli b/plugins/ocaml/NBDKit.mli index 778250ef..06648b7f 100644 --- a/plugins/ocaml/NBDKit.mli +++ b/plugins/ocaml/NBDKit.mli @@ -101,6 +101,8 @@ type 'a plugin = { cache : ('a -> int32 -> int64 -> flags -> unit) option; thread_model : (unit -> thread_model) option; + + can_fast_zero : ('a -> bool) option; } (** The plugin fields and callbacks. ['a] is the handle type. *) diff --git a/plugins/rust/src/lib.rs b/plugins/rust/src/lib.rs index 53619dd9..313b4ca6 100644 --- a/plugins/rust/src/lib.rs +++ b/plugins/rust/src/lib.rs @@ -105,6 +105,8 @@ pub struct Plugin { flags: u32) -> c_int>, pub thread_model: Option ThreadModel>, + + pub can_fast_zero: Option c_int>, } #[repr(C)] @@ -163,6 +165,7 @@ impl Plugin { can_cache: None, cache: None, thread_model: None, + can_fast_zero: None, } } } diff --git a/tests/test-eflags.sh b/tests/test-eflags.sh index f5cd43ed..9b3a6a3a 100755 --- a/tests/test-eflags.sh +++ b/tests/test-eflags.sh @@ -68,6 +68,7 @@ SEND_DF=$(( 1 << 7 )) CAN_MULTI_CONN=$(( 1 << 8 )) SEND_RESIZE=$(( 1 << 9 )) SEND_CACHE=$(( 1 << 10 )) +SEND_FAST_ZERO=$(( 1 << 11 )) do_nbdkit () { @@ -133,8 +134,8 @@ EOF #---------------------------------------------------------------------- # can_write=true # -# NBD_FLAG_SEND_WRITE_ZEROES is set on writable connections -# even when can_zero returns false, because nbdkit reckons it +# NBD_FLAG_SEND_WRITE_ZEROES and NBD_FLAG_SEND_FAST_ZERO are set on writable +# connections even when can_zero returns false, because nbdkit reckons it # can emulate zeroing using pwrite. do_nbdkit <<'EOF' @@ -145,8 +146,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # --filter=nozero @@ -255,8 +256,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # can_write=true @@ -271,8 +272,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # -r @@ -304,8 +305,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # -r @@ -341,8 +342,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_FLUSH|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|SEND_FLUSH|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_FLUSH|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_FLUSH|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # can_write=true @@ -361,8 +362,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_FLUSH|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "$LINENO: expected HAS_FLAGS|SEND_FLUSH|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_FLUSH|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_FLUSH|SEND_WRITE_ZEROES|SEND_DF|SEND_FAST_ZERO" #---------------------------------------------------------------------- # -r @@ -448,3 +449,96 @@ EOF [ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || fail "$LINENO: expected HAS_FLAGS|READ_ONLY|SEND_DF" + +#---------------------------------------------------------------------- +# -r +# can_fast_zero=true +# +# Fast zero support isn't advertised without regular zero support + +do_nbdkit -r <<'EOF' +case "$1" in + get_size) echo 1M ;; + can_fast_zero) exit 0 ;; + *) exit 2 ;; +esac +EOF + +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || + fail "$LINENO: expected HAS_FLAGS|READ_ONLY|SEND_DF" + +#---------------------------------------------------------------------- +# --filter=nozero +# can_write=true +# can_fast_zero=true +# +# Fast zero support isn't advertised without regular zero support + +do_nbdkit --filter=nozero <<'EOF' +case "$1" in + get_size) echo 1M ;; + can_write) exit 0 ;; + can_fast_zero) exit 0 ;; + *) exit 2 ;; +esac +EOF + +[ $eflags -eq $(( HAS_FLAGS|SEND_DF )) ] || + fail "$LINENO: expected HAS_FLAGS|SEND_DF" + +#---------------------------------------------------------------------- +# can_write=true +# can_zero=true +# +# Fast zero support is omitted for a plugin that has .zero but did not opt in + +do_nbdkit -r <<'EOF' +case "$1" in + get_size) echo 1M ;; + can_write) exit 0 ;; + can_zero) exit 0 ;; + *) exit 2 ;; +esac +EOF + +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || + fail "$LINENO: expected HAS_FLAGS|READ_ONLY|SEND_DF" + +#---------------------------------------------------------------------- +# can_write=true +# can_zero=true +# can_fast_zero=false +# +# Fast zero support is omitted if the plugin says so + +do_nbdkit -r <<'EOF' +case "$1" in + get_size) echo 1M ;; + can_write) exit 0 ;; + can_zero) exit 0 ;; + can_fast_zero) exit 3 ;; + *) exit 2 ;; +esac +EOF + +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || + fail "$LINENO: expected HAS_FLAGS|READ_ONLY|SEND_DF" + +#---------------------------------------------------------------------- +# can_write=true +# can_zero=false +# can_fast_zero=false +# +# Fast zero support is omitted if the plugin says so + +do_nbdkit -r <<'EOF' +case "$1" in + get_size) echo 1M ;; + can_write) exit 0 ;; + can_fast_zero) exit 3 ;; + *) exit 2 ;; +esac +EOF + +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || + fail "$LINENO: expected HAS_FLAGS|READ_ONLY|SEND_DF" From patchwork Fri Aug 23 14:37:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 11111903 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 180C513B1 for ; Fri, 23 Aug 2019 14:47:56 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ECFD52166E for ; Fri, 23 Aug 2019 14:47:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ECFD52166E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:57270 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1Ar4-00075q-QU for patchwork-qemu-devel@patchwork.kernel.org; Fri, 23 Aug 2019 10:47:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51454) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i1AhG-00046Z-1X for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:37:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i1AhE-0004jc-RL for qemu-devel@nongnu.org; Fri, 23 Aug 2019 10:37:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54338) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1i1AhB-0004eH-Un; Fri, 23 Aug 2019 10:37:42 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CFDC718C8910; Fri, 23 Aug 2019 14:37:36 +0000 (UTC) Received: from blue.redhat.com (ovpn-116-234.phx2.redhat.com [10.3.116.234]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6FB126CE58; Fri, 23 Aug 2019 14:37:36 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Date: Fri, 23 Aug 2019 09:37:25 -0500 Message-Id: <20190823143726.27062-5-eblake@redhat.com> In-Reply-To: <20190823143726.27062-1-eblake@redhat.com> References: <25ead363-4f37-5450-b985-1876374e314d@redhat.com> <20190823143726.27062-1-eblake@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.70]); Fri, 23 Aug 2019 14:37:36 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 4/5] nbd: Implement server use of NBD FAST_ZERO X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "open list:Network Block Dev..." , libguestfs@redhat.com, nbd@other.debian.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The server side is fairly straightforward: we can always advertise support for detection of fast zero, and implement it by mapping the request to the block layer BDRV_REQ_NO_FALLBACK. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy --- Again, this may be worth merging with the previous patch. --- nbd/server.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index 981bc3cb1151..3c0799eda87f 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1514,7 +1514,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; } } else { - exp->nbdflags |= NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES; + exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES | + NBD_FLAG_SEND_FAST_ZERO); } assert(size <= INT64_MAX - dev_offset); exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE); @@ -2167,7 +2168,7 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, if (request->type == NBD_CMD_READ && client->structured_reply) { valid_flags |= NBD_CMD_FLAG_DF; } else if (request->type == NBD_CMD_WRITE_ZEROES) { - valid_flags |= NBD_CMD_FLAG_NO_HOLE; + valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO; } else if (request->type == NBD_CMD_BLOCK_STATUS) { valid_flags |= NBD_CMD_FLAG_REQ_ONE; } @@ -2306,6 +2307,9 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, if (!(request->flags & NBD_CMD_FLAG_NO_HOLE)) { flags |= BDRV_REQ_MAY_UNMAP; } + if (request->flags & NBD_CMD_FLAG_FAST_ZERO) { + flags |= BDRV_REQ_NO_FALLBACK; + } ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset, request->len, flags); return nbd_send_generic_reply(client, request->handle, ret,