diff mbox series

virtio_pmem: do flush synchronously

Message ID 20230620032838.1598793-1-houtao@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series virtio_pmem: do flush synchronously | expand

Commit Message

Hou Tao June 20, 2023, 3:28 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

The following warning was reported when doing fsync on a pmem device:

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct+0x340/0x520
 Modules linked in:
 CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:submit_bio_noacct+0x340/0x520
 ......
 Call Trace:
  <TASK>
  ? asm_exc_invalid_op+0x1b/0x20
  ? submit_bio_noacct+0x340/0x520
  ? submit_bio_noacct+0xd5/0x520
  submit_bio+0x37/0x60
  async_pmem_flush+0x79/0xa0
  nvdimm_flush+0x17/0x40
  pmem_submit_bio+0x370/0x390
  __submit_bio+0xbc/0x190
  submit_bio_noacct_nocheck+0x14d/0x370
  submit_bio_noacct+0x1ef/0x520
  submit_bio+0x55/0x60
  submit_bio_wait+0x5a/0xc0
  blkdev_issue_flush+0x44/0x60

The root cause is that submit_bio_noacct() needs bio_op() is either
WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
REQ_OP_WRITE when allocating flush bio.

The reason for allocating a new flush bio is to execute the flush
command asynchrously and doesn't want to block the original submit_bio()
invocation. However the original submit_bio() will be blocked anyway,
because the nested submit_bio() for the flush bio just places the flush
bio in current->bio_list and the original submit_bio() only returns
after submitting all bio in bio_list.

So just removing the allocation of new flush bio and do synchronous
flush directly.

Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
Hi Jens & Dan,

I found Pankaj was working on the optimization of virtio-pmem flush bio
[0], but considering the last status update was 1/12/2022, so could you
please pick the patch up for v6.7 and we can do the flush optimization
later ?

[0]: https://lore.kernel.org/lkml/20220111161937.56272-1-pankaj.gupta.linux@gmail.com/T/

 drivers/nvdimm/nd_virtio.c | 16 ----------------
 1 file changed, 16 deletions(-)

Comments

Christoph Hellwig June 21, 2023, 12:13 p.m. UTC | #1
I think the proper minimal fix is to pass in a REQ_WRITE in addition to
REQ_PREFLUSH.  We can than have a discussion on the merits of this
weird async pmem flush scheme separately.
diff mbox series

Patch

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index c6a648fd8744..a7d510f446e0 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -100,22 +100,6 @@  static int virtio_pmem_flush(struct nd_region *nd_region)
 /* The asynchronous flush callback function */
 int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
 {
-	/*
-	 * Create child bio for asynchronous flush and chain with
-	 * parent bio. Otherwise directly call nd_region flush.
-	 */
-	if (bio && bio->bi_iter.bi_sector != -1) {
-		struct bio *child = bio_alloc(bio->bi_bdev, 0, REQ_PREFLUSH,
-					      GFP_ATOMIC);
-
-		if (!child)
-			return -ENOMEM;
-		bio_clone_blkg_association(child, bio);
-		child->bi_iter.bi_sector = -1;
-		bio_chain(child, bio);
-		submit_bio(child);
-		return 0;
-	}
 	if (virtio_pmem_flush(nd_region))
 		return -EIO;