From patchwork Thu May 12 14:35:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Wolf X-Patchwork-Id: 9082631 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EC2029F1C3 for ; Thu, 12 May 2016 15:03:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3F06220221 for ; Thu, 12 May 2016 15:03:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 643282020F for ; Thu, 12 May 2016 15:03:30 +0000 (UTC) Received: from localhost ([::1]:58108 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b0s9B-0002e9-Dc for patchwork-qemu-devel@patchwork.kernel.org; Thu, 12 May 2016 11:03:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52527) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b0rjk-0006Ju-Ur for qemu-devel@nongnu.org; Thu, 12 May 2016 10:37:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b0rjd-0004Ct-85 for qemu-devel@nongnu.org; Thu, 12 May 2016 10:37:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47051) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b0rjU-000488-VQ; Thu, 12 May 2016 10:36:57 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 687EBA14F7; Thu, 12 May 2016 14:36:56 +0000 (UTC) Received: from noname.redhat.com (ovpn-116-37.ams2.redhat.com [10.36.116.37]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u4CEZp9Q028730; Thu, 12 May 2016 10:36:55 -0400 From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 12 May 2016 16:35:28 +0200 Message-Id: <1463063749-2201-49-git-send-email-kwolf@redhat.com> In-Reply-To: <1463063749-2201-1-git-send-email-kwolf@redhat.com> References: <1463063749-2201-1-git-send-email-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 12 May 2016 14:36:56 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 48/69] qcow2: improve qcow2_co_write_zeroes() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Denis V. Lunev" There is a possibility that qcow2_co_write_zeroes() will be called with the partial block. This could be synthetically triggered with qemu-io -c "write -z 32k 4k" and can happen in the real life in qemu-nbd. The latter happens under the following conditions: (1) qemu-nbd is started with --detect-zeroes=on and is connected to the kernel NBD client (2) third party program opens kernel NBD device with O_DIRECT (3) third party program performs write operation with memory buffer not aligned to the page In this case qcow2_co_write_zeroes() is unable to perform the operation and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller switches to non-optimized version and writes real zeroes to the disk. The patch creates a shortcut. If the block is read as zeroes, f.e. if it is unallocated, the request is extended to cover full block. User-visible situation with this block is not changed. Before the patch the block is filled in the image with real zeroes. After that patch the block is marked as zeroed in metadata. Thus any subsequent changes in backing store chain are not affected. Kevin, thank you for a cool suggestion. Signed-off-by: Denis V. Lunev Reviewed-by: Roman Kagan CC: Kevin Wolf CC: Max Reitz Signed-off-by: Kevin Wolf --- block/qcow2.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 59 insertions(+), 6 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 3090538..555627a 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2411,21 +2411,74 @@ finish: return ret; } + +static bool is_zero_cluster(BlockDriverState *bs, int64_t start) +{ + BDRVQcow2State *s = bs->opaque; + int nr; + BlockDriverState *file; + int64_t res = bdrv_get_block_status_above(bs, NULL, start, + s->cluster_sectors, &nr, &file); + return res >= 0 && ((res & BDRV_BLOCK_ZERO) || !(res & BDRV_BLOCK_DATA)); +} + +static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start) +{ + BDRVQcow2State *s = bs->opaque; + int nr = s->cluster_sectors; + uint64_t off; + int ret; + + ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off); + return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO; +} + static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) { int ret; BDRVQcow2State *s = bs->opaque; - /* Emulate misaligned zero writes */ - if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) { - return -ENOTSUP; + int head = sector_num % s->cluster_sectors; + int tail = (sector_num + nb_sectors) % s->cluster_sectors; + + if (head != 0 || tail != 0) { + int64_t cl_end = -1; + + sector_num -= head; + nb_sectors += head; + + if (tail != 0) { + nb_sectors += s->cluster_sectors - tail; + } + + if (!is_zero_cluster(bs, sector_num)) { + return -ENOTSUP; + } + + if (nb_sectors > s->cluster_sectors) { + /* Technically the request can cover 2 clusters, f.e. 4k write + at s->cluster_sectors - 2k offset. One of these cluster can + be zeroed, one unallocated */ + cl_end = sector_num + nb_sectors - s->cluster_sectors; + if (!is_zero_cluster(bs, cl_end)) { + return -ENOTSUP; + } + } + + qemu_co_mutex_lock(&s->lock); + /* We can have new write after previous check */ + if (!is_zero_cluster_top_locked(bs, sector_num) || + (cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) { + qemu_co_mutex_unlock(&s->lock); + return -ENOTSUP; + } + } else { + qemu_co_mutex_lock(&s->lock); } /* Whatever is left can use real zero clusters */ - qemu_co_mutex_lock(&s->lock); - ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, - nb_sectors); + ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors); qemu_co_mutex_unlock(&s->lock); return ret;