From patchwork Tue Jul 19 18:47:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 9238299 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5D300600CB for ; Tue, 19 Jul 2016 20:35:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D8D826C2F for ; Tue, 19 Jul 2016 20:35:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41F5D2756B; Tue, 19 Jul 2016 20:35:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CD8A926C2F for ; Tue, 19 Jul 2016 20:35:27 +0000 (UTC) Received: from localhost ([::1]:58674 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPbji-0005xr-T9 for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Jul 2016 16:35:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45670) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPa4B-0002MB-RU for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:48:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bPa49-0005Td-QZ for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:48:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60713) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPa49-0005TZ-I8 for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:48:25 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 18A5FC04B317; Tue, 19 Jul 2016 18:48:25 +0000 (UTC) Received: from localhost (ovpn-112-17.ams2.redhat.com [10.36.112.17]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6JImN9b015930; Tue, 19 Jul 2016 14:48:24 -0400 From: Stefan Hajnoczi To: Date: Tue, 19 Jul 2016 19:47:53 +0100 Message-Id: <1468954095-5159-4-git-send-email-stefanha@redhat.com> In-Reply-To: <1468954095-5159-1-git-send-email-stefanha@redhat.com> References: <1468954095-5159-1-git-send-email-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 19 Jul 2016 18:48:25 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL for-2.7 03/25] block: Fragment writes to max transfer length X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Blake Drivers should be able to rely on the block layer honoring the max transfer length, rather than needing to return -EINVAL (iscsi) or manually fragment things (nbd). We already fragment write zeroes at the block layer; this patch adds the fragmentation for normal writes, after requests have been aligned (fragmenting before alignment would lead to multiple unaligned requests, rather than just the head and tail). When fragmenting a large request where FUA was requested, but where we know that FUA is implemented by flushing all requests rather than the given request, then we can still get by with only one flush. Note, however, that we need a followup patch to the raw format driver to avoid a regression in the number of flushes actually issued. The return value was previously nebulous on success (sometimes zero, sometimes the length written); since we never have a short write, and since fragmenting may store yet another positive value in 'ret', change the function to always return 0 on success, matching what we do in bdrv_aligned_preadv(). Signed-off-by: Eric Blake Message-id: 1468607524-19021-4-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi --- block/io.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/block/io.c b/block/io.c index c3574a4..410394d 100644 --- a/block/io.c +++ b/block/io.c @@ -1269,7 +1269,8 @@ fail: } /* - * Forwards an already correctly aligned write request to the BlockDriver. + * Forwards an already correctly aligned write request to the BlockDriver, + * after possibly fragmenting it. */ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, BdrvTrackedRequest *req, int64_t offset, unsigned int bytes, @@ -1281,6 +1282,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, int64_t start_sector = offset >> BDRV_SECTOR_BITS; int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); + uint64_t bytes_remaining = bytes; + int max_transfer; assert(is_power_of_2(align)); assert((offset & (align - 1)) == 0); @@ -1288,6 +1291,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, assert(!qiov || bytes == qiov->size); assert((bs->open_flags & BDRV_O_NO_IO) == 0); assert(!(flags & ~BDRV_REQ_MASK)); + max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), + align); waited = wait_serialising_requests(req); assert(!waited || !req->serialising); @@ -1310,9 +1315,34 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, } else if (flags & BDRV_REQ_ZERO_WRITE) { bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO); ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags); - } else { + } else if (bytes <= max_transfer) { bdrv_debug_event(bs, BLKDBG_PWRITEV); ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags); + } else { + bdrv_debug_event(bs, BLKDBG_PWRITEV); + while (bytes_remaining) { + int num = MIN(bytes_remaining, max_transfer); + QEMUIOVector local_qiov; + int local_flags = flags; + + assert(num); + if (num < bytes_remaining && (flags & BDRV_REQ_FUA) && + !(bs->supported_write_flags & BDRV_REQ_FUA)) { + /* If FUA is going to be emulated by flush, we only + * need to flush on the last iteration */ + local_flags &= ~BDRV_REQ_FUA; + } + qemu_iovec_init(&local_qiov, qiov->niov); + qemu_iovec_concat(&local_qiov, qiov, bytes - bytes_remaining, num); + + ret = bdrv_driver_pwritev(bs, offset + bytes - bytes_remaining, + num, &local_qiov, local_flags); + qemu_iovec_destroy(&local_qiov); + if (ret < 0) { + break; + } + bytes_remaining -= num; + } } bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE); @@ -1325,6 +1355,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, if (ret >= 0) { bs->total_sectors = MAX(bs->total_sectors, end_sector); + ret = 0; } return ret;