From patchwork Thu Nov 10 23:10:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Blake X-Patchwork-Id: 9422199 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5986B6022E for ; Thu, 10 Nov 2016 23:14:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39742297A8 for ; Thu, 10 Nov 2016 23:14:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B20C297B3; Thu, 10 Nov 2016 23:14:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 480D8297A8 for ; Thu, 10 Nov 2016 23:14:53 +0000 (UTC) Received: from localhost ([::1]:49820 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c4yYV-0002AW-CI for patchwork-qemu-devel@patchwork.kernel.org; Thu, 10 Nov 2016 18:14:51 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52467) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c4yUR-00086r-8p for qemu-devel@nongnu.org; Thu, 10 Nov 2016 18:10:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c4yUQ-0007uE-0x for qemu-devel@nongnu.org; Thu, 10 Nov 2016 18:10:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39498) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c4yUN-0007rj-7E; Thu, 10 Nov 2016 18:10:35 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 84EAFC05AA54; Thu, 10 Nov 2016 23:10:34 +0000 (UTC) Received: from red.redhat.com (ovpn-116-116.phx2.redhat.com [10.3.116.116]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAANAQ4b020077; Thu, 10 Nov 2016 18:10:33 -0500 From: Eric Blake To: qemu-devel@nongnu.org Date: Thu, 10 Nov 2016 17:10:26 -0600 Message-Id: <1478819426-7564-3-git-send-email-eblake@redhat.com> In-Reply-To: <1478819426-7564-1-git-send-email-eblake@redhat.com> References: <1478819426-7564-1-git-send-email-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 10 Nov 2016 23:10:34 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 2/2] block: Pass unaligned discard requests to drivers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , qemu-block@nongnu.org, pl@kamp.de, qemu-stable@nongnu.org, Max Reitz , Stefan Hajnoczi , pbonzini@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Discard is advisory, so rounding the requests to alignment boundaries is never semantically wrong from the data that the guest sees. But at least the Dell Equallogic iSCSI SANs has an interesting property that its advertised discard alignment is 15M, yet documents that discarding a sequence of 1M slices will eventually result in the 15M page being marked as discarded, and it is possible to observe which pages have been discarded. Between commits 9f1963b and b8d0a980, we converted the block layer to a byte-based interface that ultimately ignores any unaligned head or tail based on the driver's advertised discard granularity, which means that qemu 2.7 refuses to pass any discard request smaller than 15M down to the Dell Equallogic hardware. This is a slight regression in behavior compared to earlier qemu, where a guest executing discards in power-of-2 chunks used to be able to get every page discarded, but is now left with various pages still allocated because the guest requests did not align with the hardware's 15M pages. Since the SCSI specification says nothing about a minimum discard granularity, and only documents the preferred alignment, it is best if the block layer gives the driver every bit of information about discard requests, rather than rounding it to alignment boundaries early. Rework the block layer discard algorithm to mirror the write zero algorithm: always peel off any unaligned head or tail and manage that in isolation, then do the bulk of the request on an aligned boundary. The fallback when the driver returns -ENOTSUP for an unaligned request is to silently ignore that portion of the discard request; but for devices that can pass the partial request all the way down to hardware, this can result in the hardware coalescing requests and discarding aligned pages after all. Reported by: Peter Lieven Signed-off-by: Eric Blake CC: qemu-stable@nongnu.org --- block/io.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/block/io.c b/block/io.c index aa532a5..76c3030 100644 --- a/block/io.c +++ b/block/io.c @@ -2421,7 +2421,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, { BdrvTrackedRequest req; int max_pdiscard, ret; - int head, align; + int head, tail, align; if (!bs->drv) { return -ENOMEDIUM; @@ -2444,19 +2444,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, return 0; } - /* Discard is advisory, so ignore any unaligned head or tail */ + /* Discard is advisory, but some devices track and coalesce + * unaligned requests, so we must pass everything down rather than + * round here. Still, most devices will just silently ignore + * unaligned requests (by returning -ENOTSUP), so we must fragment + * the request accordingly. */ align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment); assert(align % bs->bl.request_alignment == 0); head = offset % align; - if (head) { - head = MIN(count, align - head); - count -= head; - offset += head; - } - count = QEMU_ALIGN_DOWN(count, align); - if (!count) { - return 0; - } + tail = (offset + count) % align; bdrv_inc_in_flight(bs); tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD); @@ -2468,11 +2464,25 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX), align); - assert(max_pdiscard); + assert(max_pdiscard >= bs->bl.request_alignment); while (count > 0) { int ret; - int num = MIN(count, max_pdiscard); + int num = count; + + if (head) { + /* Make a small request up to the first aligned sector. */ + num = MIN(count, align - head); + head = (head + num) % align; + assert(num < max_pdiscard); + } else if (tail && num > align) { + /* Shorten the request to the last aligned sector. */ + num -= tail; + } + /* limit request size */ + if (num > max_pdiscard) { + num = max_pdiscard; + } if (bs->drv->bdrv_co_pdiscard) { ret = bs->drv->bdrv_co_pdiscard(bs, offset, num);