From patchwork Thu Mar 21 13:33:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fiona Ebner X-Patchwork-Id: 13598751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1BB0BC54E58 for ; Thu, 21 Mar 2024 13:35:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rnIYh-0003WX-Sl; Thu, 21 Mar 2024 09:34:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIYF-0003QP-Uq; Thu, 21 Mar 2024 09:33:49 -0400 Received: from proxmox-new.maurer-it.com ([94.136.29.106]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIY7-0007yH-Ge; Thu, 21 Mar 2024 09:33:47 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 123C040C7A; Thu, 21 Mar 2024 14:33:27 +0100 (CET) From: Fiona Ebner To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, qemu-stable@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, fam@euphon.net, stefanha@redhat.com, t.lamprecht@proxmox.com, w.bumiller@proxmox.com Subject: [PATCH v2 1/3] block/io: accept NULL qiov in bdrv_pad_request Date: Thu, 21 Mar 2024 14:33:21 +0100 Message-Id: <20240321133323.831133-2-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240321133323.831133-1-f.ebner@proxmox.com> References: <20240321133323.831133-1-f.ebner@proxmox.com> MIME-Version: 1.0 Received-SPF: pass client-ip=94.136.29.106; envelope-from=f.ebner@proxmox.com; helo=proxmox-new.maurer-it.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Stefan Reiter Some operations, e.g. block-stream, perform reads while discarding the results (only copy-on-read matters). In this case, they will pass NULL as the target QEMUIOVector, which will however trip bdrv_pad_request, since it wants to extend its passed vector. In particular, this is the case for the blk_co_preadv() call in stream_populate(). If there is no qiov, no operation can be done with it, but the bytes and offset still need to be updated, so the subsequent aligned read will actually be aligned and not run into an assertion failure. In particular, this can happen when the request alignment of the top node is larger than the allocated part of the bottom node, in which case padding becomes necessary. For example: > ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M -o cluster_size=32768 > ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2 > ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2 > ./qemu-system-x86_64 --qmp stdio \ > --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \ > < {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "compress", "file": "node0", "node-name": "node1" } } > {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": "node1" } } > EOF Originally-by: Stefan Reiter Signed-off-by: Thomas Lamprecht [FE: do update bytes and offset in any case add reproducer to commit message] Signed-off-by: Fiona Ebner --- No changes in v2. block/io.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/block/io.c b/block/io.c index 33150c0359..395bea3bac 100644 --- a/block/io.c +++ b/block/io.c @@ -1726,22 +1726,29 @@ static int bdrv_pad_request(BlockDriverState *bs, return 0; } - sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes, - &sliced_head, &sliced_tail, - &sliced_niov); + /* + * For prefetching in stream_populate(), no qiov is passed along, because + * only copy-on-read matters. + */ + if (qiov && *qiov) { + sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes, + &sliced_head, &sliced_tail, + &sliced_niov); - /* Guaranteed by bdrv_check_request32() */ - assert(*bytes <= SIZE_MAX); - ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov, - sliced_head, *bytes); - if (ret < 0) { - bdrv_padding_finalize(pad); - return ret; + /* Guaranteed by bdrv_check_request32() */ + assert(*bytes <= SIZE_MAX); + ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov, + sliced_head, *bytes); + if (ret < 0) { + bdrv_padding_finalize(pad); + return ret; + } + *qiov = &pad->local_qiov; + *qiov_offset = 0; } + *bytes += pad->head + pad->tail; *offset -= pad->head; - *qiov = &pad->local_qiov; - *qiov_offset = 0; if (padded) { *padded = true; } From patchwork Thu Mar 21 13:33:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fiona Ebner X-Patchwork-Id: 13598750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1BB55CD11BF for ; Thu, 21 Mar 2024 13:35:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rnIYP-0003UF-Mb; Thu, 21 Mar 2024 09:33:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIYE-0003Q1-55; Thu, 21 Mar 2024 09:33:48 -0400 Received: from proxmox-new.maurer-it.com ([94.136.29.106]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIY7-0007yJ-0m; Thu, 21 Mar 2024 09:33:45 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 24CB940DF8; Thu, 21 Mar 2024 14:33:27 +0100 (CET) From: Fiona Ebner To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, qemu-stable@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, fam@euphon.net, stefanha@redhat.com, t.lamprecht@proxmox.com, w.bumiller@proxmox.com Subject: [PATCH v2 2/3] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes Date: Thu, 21 Mar 2024 14:33:22 +0100 Message-Id: <20240321133323.831133-3-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240321133323.831133-1-f.ebner@proxmox.com> References: <20240321133323.831133-1-f.ebner@proxmox.com> MIME-Version: 1.0 Received-SPF: pass client-ip=94.136.29.106; envelope-from=f.ebner@proxmox.com; helo=proxmox-new.maurer-it.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The old_bs variable in bdrv_next() is currently determined by looking at the old block backend. However, if the block graph changes before the next bdrv_next() call, it might be that the associated BDS is not the same that was referenced previously. In that case, the wrong BDS is unreferenced, leading to an assertion failure later: > bdrv_unref: Assertion `bs->refcnt > 0' failed. In particular, this can happen in the context of bdrv_flush_all(), when polling for bdrv_co_flush() in the generated co-wrapper leads to a graph change (for example with a stream block job [0]). A racy reproducer: > #!/bin/bash > rm -f /tmp/backing.qcow2 > rm -f /tmp/top.qcow2 > ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M > ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2 > ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2 > ./qemu-system-x86_64 --qmp stdio \ > --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \ > < {"execute": "qmp_capabilities"} > {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": "node0" } } > {"execute": "quit"} > EOF [0]: > #0 bdrv_replace_child_tran (child=..., new_bs=..., tran=...) > #1 bdrv_replace_node_noperm (from=..., to=..., auto_skip=..., tran=..., errp=...) > #2 bdrv_replace_node_common (from=..., to=..., auto_skip=..., detach_subchain=..., errp=...) > #3 bdrv_drop_filter (bs=..., errp=...) > #4 bdrv_cor_filter_drop (cor_filter_bs=...) > #5 stream_prepare (job=...) > #6 job_prepare_locked (job=...) > #7 job_txn_apply_locked (fn=..., job=...) > #8 job_do_finalize_locked (job=...) > #9 job_exit (opaque=...) > #10 aio_bh_poll (ctx=...) > #11 aio_poll (ctx=..., blocking=...) > #12 bdrv_poll_co (s=...) > #13 bdrv_flush (bs=...) > #14 bdrv_flush_all () > #15 do_vm_stop (state=..., send_stop=...) > #16 vm_shutdown () Signed-off-by: Fiona Ebner --- Not sure if this is the correct fix, or if the call site should rather be adapted somehow? New in v2. block/block-backend.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 9c4de79e6b..28af1eb17a 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -599,14 +599,14 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it) /* Must be called from the main loop */ assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + old_bs = it->bs; + /* First, return all root nodes of BlockBackends. In order to avoid * returning a BDS twice when multiple BBs refer to it, we only return it * if the BB is the first one in the parent list of the BDS. */ if (it->phase == BDRV_NEXT_BACKEND_ROOTS) { BlockBackend *old_blk = it->blk; - old_bs = old_blk ? blk_bs(old_blk) : NULL; - do { it->blk = blk_all_next(it->blk); bs = it->blk ? blk_bs(it->blk) : NULL; @@ -620,11 +620,10 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it) if (bs) { bdrv_ref(bs); bdrv_unref(old_bs); + it->bs = bs; return bs; } it->phase = BDRV_NEXT_MONITOR_OWNED; - } else { - old_bs = it->bs; } /* Then return the monitor-owned BDSes without a BB attached. Ignore all From patchwork Thu Mar 21 13:33:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fiona Ebner X-Patchwork-Id: 13598753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2A3FBC54E58 for ; Thu, 21 Mar 2024 13:35:12 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rnIYo-0003qE-Mq; Thu, 21 Mar 2024 09:34:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIYJ-0003Tb-SO; Thu, 21 Mar 2024 09:33:52 -0400 Received: from proxmox-new.maurer-it.com ([94.136.29.106]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnIY7-0007yK-0n; Thu, 21 Mar 2024 09:33:50 -0400 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 785CD40E55; Thu, 21 Mar 2024 14:33:27 +0100 (CET) From: Fiona Ebner To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, qemu-stable@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, fam@euphon.net, stefanha@redhat.com, t.lamprecht@proxmox.com, w.bumiller@proxmox.com Subject: [PATCH v2 3/3] iotests: add test for stream job with an unaligned prefetch read Date: Thu, 21 Mar 2024 14:33:23 +0100 Message-Id: <20240321133323.831133-4-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240321133323.831133-1-f.ebner@proxmox.com> References: <20240321133323.831133-1-f.ebner@proxmox.com> MIME-Version: 1.0 Received-SPF: pass client-ip=94.136.29.106; envelope-from=f.ebner@proxmox.com; helo=proxmox-new.maurer-it.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Previously, bdrv_pad_request() could not deal with a NULL qiov when a read needed to be aligned. During prefetch, a stream job will pass a NULL qiov. Add a test case to cover this scenario. By accident, also covers a previous race during shutdown, where block graph changes during iteration in bdrv_flush_all() could lead to unreferencing the wrong block driver state and an assertion failure later. Signed-off-by: Fiona Ebner --- New in v2. .../tests/stream-unaligned-prefetch | 86 +++++++++++++++++++ .../tests/stream-unaligned-prefetch.out | 5 ++ 2 files changed, 91 insertions(+) create mode 100755 tests/qemu-iotests/tests/stream-unaligned-prefetch create mode 100644 tests/qemu-iotests/tests/stream-unaligned-prefetch.out diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch b/tests/qemu-iotests/tests/stream-unaligned-prefetch new file mode 100755 index 0000000000..546db1d369 --- /dev/null +++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch @@ -0,0 +1,86 @@ +#!/usr/bin/env python3 +# group: rw quick +# +# Test what happens when a stream job does an unaligned prefetch read +# which requires padding while having a NULL qiov. +# +# Copyright (C) Proxmox Server Solutions GmbH +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import os +import iotests +from iotests import imgfmt, qemu_img_create, qemu_io, QMPTestCase + +image_size = 1 * 1024 * 1024 +cluster_size = 64 * 1024 +base = os.path.join(iotests.test_dir, 'base.img') +top = os.path.join(iotests.test_dir, 'top.img') + +class TestStreamUnalignedPrefetch(QMPTestCase): + def setUp(self) -> None: + """ + Create two images: + - base image {base} with {cluster_size // 2} bytes allocated + - top image {top} without any data allocated and coarser + cluster size + + Attach a compress filter for the top image, because that + requires that the request alignment is the top image's cluster + size. + """ + qemu_img_create('-f', imgfmt, + '-o', 'cluster_size={}'.format(cluster_size // 2), + base, str(image_size)) + qemu_io('-c', f'write 0 {cluster_size // 2}', base) + qemu_img_create('-f', imgfmt, + '-o', 'cluster_size={}'.format(cluster_size), + top, str(image_size)) + + self.vm = iotests.VM() + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': imgfmt, + 'node-name': 'base', + 'file': { + 'driver': 'file', + 'filename': base + } + })) + self.vm.add_blockdev(self.vm.qmp_to_opts({ + 'driver': 'compress', + 'node-name': 'compress-top', + 'file': { + 'driver': imgfmt, + 'node-name': 'top', + 'file': { + 'driver': 'file', + 'filename': top + }, + 'backing': 'base' + } + })) + self.vm.launch() + + def tearDown(self) -> None: + self.vm.shutdown() + os.remove(top) + os.remove(base) + + def test_stream_unaligned_prefetch(self) -> None: + self.vm.cmd('block-stream', job_id='stream', device='compress-top') + + +if __name__ == '__main__': + iotests.main(supported_fmts=['qcow2'], supported_protocols=['file']) diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch.out b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out new file mode 100644 index 0000000000..ae1213e6f8 --- /dev/null +++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out @@ -0,0 +1,5 @@ +. +---------------------------------------------------------------------- +Ran 1 tests + +OK