From patchwork Fri Apr 22 23:40:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 8916621 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9F46E9F1C1 for ; Sat, 23 Apr 2016 00:05:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CA439201C7 for ; Sat, 23 Apr 2016 00:05:06 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 58CA8200E9 for ; Sat, 23 Apr 2016 00:05:05 +0000 (UTC) Received: from localhost ([::1]:42033 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atl4K-0003mz-L9 for patchwork-qemu-devel@patchwork.kernel.org; Fri, 22 Apr 2016 20:05:04 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45756) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atkhv-0004Ul-95 for qemu-devel@nongnu.org; Fri, 22 Apr 2016 19:41:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1atkht-0005ml-Tw for qemu-devel@nongnu.org; Fri, 22 Apr 2016 19:41:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54444) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atkhq-0005f3-UD; Fri, 22 Apr 2016 19:41:51 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8C9D583F45; Fri, 22 Apr 2016 23:41:50 +0000 (UTC) Received: from red.redhat.com (ovpn-113-21.phx2.redhat.com [10.3.113.21]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u3MNfHXX028475; Fri, 22 Apr 2016 19:41:50 -0400 From: Eric Blake To: qemu-devel@nongnu.org Date: Fri, 22 Apr 2016 17:40:52 -0600 Message-Id: <1461368452-10389-45-git-send-email-eblake@redhat.com> In-Reply-To: <1461368452-10389-1-git-send-email-eblake@redhat.com> References: <1461368452-10389-1-git-send-email-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3 44/44] nbd: Implement NBD_OPT_BLOCK_SIZE on client X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Paolo Bonzini , alex@alex.org.uk, qemu-block@nongnu.org, Max Reitz Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The upstream NBD Protocol has defined a new extension to allow the server to advertise block sizes to the client, as well as a way for the client to inform the server that it intends to obey block sizes. Pass any received sizes on to the block layer. Use the minimum block size as the sector size we pass to the kernel - which also has the nice effect of cooperating with (non-qemu) servers that don't do read-modify-write when exposing a block device with 4k sectors; it can also allow us to visit a file larger than 2T on a 32-bit kernel. Signed-off-by: Eric Blake --- include/block/nbd.h | 3 +++ block/nbd-client.c | 3 +++ block/nbd.c | 17 +++++++++--- nbd/client.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 87 insertions(+), 10 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index a5c68df..27a6854 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -133,6 +133,9 @@ enum { struct NbdExportInfo { uint64_t size; uint16_t flags; + uint32_t min_block; + uint32_t opt_block; + uint32_t max_block; }; typedef struct NbdExportInfo NbdExportInfo; diff --git a/block/nbd-client.c b/block/nbd-client.c index 2b6ac27..602a8ab 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -443,6 +443,9 @@ int nbd_client_init(BlockDriverState *bs, logout("Failed to negotiate with the NBD server\n"); return ret; } + if (client->info.min_block > bs->request_alignment) { + bs->request_alignment = client->info.min_block; + } qemu_co_mutex_init(&client->send_mutex); qemu_co_mutex_init(&client->free_sema); diff --git a/block/nbd.c b/block/nbd.c index 5172039..bb7df55 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -407,9 +407,20 @@ static int nbd_co_flush(BlockDriverState *bs) static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) { - bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS; - bs->bl.max_write_zeroes = UINT32_MAX >> BDRV_SECTOR_BITS; - bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS; + NbdClientSession *s = nbd_get_client_session(bs); + int max = UINT32_MAX >> BDRV_SECTOR_BITS; + + if (s->info.max_block) { + max = s->info.max_block >> BDRV_SECTOR_BITS; + } + bs->bl.max_discard = max; + bs->bl.max_write_zeroes = max; + bs->bl.max_transfer_length = max; + + if (s->info.opt_block && + s->info.opt_block >> BDRV_SECTOR_BITS > bs->bl.opt_transfer_length) { + bs->bl.opt_transfer_length = s->info.opt_block >> BDRV_SECTOR_BITS; + } } static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num, diff --git a/nbd/client.c b/nbd/client.c index dac4f29..24f6b0b 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -232,6 +232,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, nbd_opt_reply *reply, reply->option); break; + case NBD_REP_ERR_BLOCK_SIZE_REQD: + error_setg(errp, "Server wants OPT_BLOCK_SIZE before option %" PRIx32, + reply->option); + break; + default: error_setg(errp, "Unknown error code when asking for option %" PRIx32, reply->option); @@ -333,6 +338,21 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname, * flags still 0 is a witness of a broken server. */ info->flags = 0; + /* Some servers use NBD_OPT_GO to advertise non-default block + * sizes, and require that we first use NBD_OPT_BLOCK_SIZE to + * agree to that. */ + TRACE("Attempting NBD_OPT_BLOCK_SIZE"); + if (nbd_send_option_request(ioc, NBD_OPT_BLOCK_SIZE, 0, NULL, errp) < 0) { + return -1; + } + if (nbd_receive_option_reply(ioc, NBD_OPT_BLOCK_SIZE, &reply, errp) < 0) { + return -1; + } + error = nbd_handle_reply_err(ioc, &reply, errp); + if (error < 0) { + return error; + } + TRACE("Attempting NBD_OPT_GO for export '%s'", wantname); if (nbd_send_option_request(ioc, NBD_OPT_GO, -1, wantname, errp) < 0) { return -1; @@ -402,6 +422,45 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname, info->size, info->flags); break; + case NBD_INFO_BLOCK_SIZE: + if (len != sizeof(info->min_block) * 3) { + error_setg(errp, "remaining export info len %" PRIu32 + " is unexpected size", len); + return -1; + } + if (read_sync(ioc, &info->min_block, sizeof(info->min_block)) != + sizeof(info->min_block)) { + error_setg(errp, "failed to read info minimum block size"); + return -1; + } + be32_to_cpus(&info->min_block); + if (!is_power_of_2(info->min_block)) { + error_setg(errp, "server minimum block size %" PRId32 + "is not a power of two", info->min_block); + return -1; + } + if (read_sync(ioc, &info->opt_block, sizeof(info->opt_block)) != + sizeof(info->opt_block)) { + error_setg(errp, "failed to read info preferred block size"); + return -1; + } + be32_to_cpus(&info->opt_block); + if (!is_power_of_2(info->opt_block) || + info->opt_block < info->min_block) { + error_setg(errp, "server preferred block size %" PRId32 + "is not valid", info->opt_block); + return -1; + } + if (read_sync(ioc, &info->max_block, sizeof(info->max_block)) != + sizeof(info->max_block)) { + error_setg(errp, "failed to read info maximum block size"); + return -1; + } + be32_to_cpus(&info->max_block); + TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32, + info->min_block, info->opt_block, info->max_block); + break; + default: TRACE("ignoring unknown export info %" PRIu16, type); if (drop_sync(ioc, len) != len) { @@ -710,8 +769,9 @@ fail: #ifdef __linux__ int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info) { - unsigned long sectors = info->size / BDRV_SECTOR_SIZE; - if (info->size / BDRV_SECTOR_SIZE != sectors) { + unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block); + unsigned long sectors = info->size / sector_size; + if (info->size / sector_size != sectors) { LOG("Export size %" PRId64 " too large for 32-bit kernel", info->size); return -E2BIG; } @@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info) return -serrno; } - TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE); + TRACE("Setting block size to %lu", sector_size); - if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) { + if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) { int serrno = errno; LOG("Failed setting NBD block size"); return -serrno; } TRACE("Setting size to %lu block(s)", sectors); - if (size % BDRV_SECTOR_SIZE) { - TRACE("Ignoring trailing %d bytes of export", - (int) (size % BDRV_SECTOR_SIZE)); + if (info->size % sector_size) { + TRACE("Ignoring trailing %" PRId64 " bytes of export", + info->size % sector_size); } if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {