diff mbox series

[v3,02/11] blk: Introduce ->corrupted_range() for block device

Message ID 20210208105530.3072869-3-ruansy.fnst@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show
Series fsdax: introduce fs query to support reflink | expand

Commit Message

Ruan Shiyang Feb. 8, 2021, 10:55 a.m. UTC
In fsdax mode, the memory failure happens on block device.  So, it is
needed to introduce an interface for block devices.  Each kind of block
device can handle the memory failure in ther own ways.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christoph Hellwig Feb. 10, 2021, 1:21 p.m. UTC | #1
On Mon, Feb 08, 2021 at 06:55:21PM +0800, Shiyang Ruan wrote:
> In fsdax mode, the memory failure happens on block device.  So, it is
> needed to introduce an interface for block devices.  Each kind of block
> device can handle the memory failure in ther own ways.

As told before: DAX operations please do not add anything to the block
device.  We've been working very hard to decouple DAX from the block
device, and while we're not done regressing the split should not happen.
Darrick J. Wong March 4, 2021, 10:42 p.m. UTC | #2
On Wed, Feb 10, 2021 at 02:21:39PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 08, 2021 at 06:55:21PM +0800, Shiyang Ruan wrote:
> > In fsdax mode, the memory failure happens on block device.  So, it is
> > needed to introduce an interface for block devices.  Each kind of block
> > device can handle the memory failure in ther own ways.
> 
> As told before: DAX operations please do not add anything to the block
> device.  We've been working very hard to decouple DAX from the block
> device, and while we're not done regressing the split should not happen.

I agree with you (Christoph) that (strictly speaking) within the scope of
the DAX work this isn't needed; xfs should be able to consume the
->memory_failure events directly and DTRT.

My vision here, however, is to establish upcalls for /both/ types of
stroage.

Regular block devices can use ->corrupted_range to push error
notifications upwards through the block stack to a filesystem, and we
can finally do a teensy bit more with scsi sense data about media
errors, or thinp wanting to warn the filesystem that it's getting low on
space and maybe this would be an agreeable time to self-FITRIM, or raid
noticing that a mirror is inconsistent and can the fs do something to
resolve the dispute, etc.  Maybe we can use this mechanism to warn a
filesystem that someone did "echo 1 > /sys/block/sda/device/delete" and
we had better persist everything while we still can.

Memory devices will use ->memory_failure to tell us about ADR errors,
and I guess upcoming and past hotremove events.  For fsdax you'd
probably have to send the announcement and invalidate the current ptes
to force filesystem pagefaults and the like.

Either way, I think this piece is fine, but I would change the dax
side to send the ->memory_failure events directly to xfs.

A gap here is that xfs can attach to rt/log devices but we don't
currently plumb in enough information that get_active_super can find
the correct filesystem.

I dunno, maybe we should add this to the thread here[1]?

[1] https://lore.kernel.org/linux-xfs/CAPcyv4g3ZwbdLFx8bqMcNvXyrob8y6sBXXu=xPTmTY0VSk5HCw@mail.gmail.com/T/#m55a5c67153d0d10f3ff05a69d7e502914d97ac9d

--D
Christoph Hellwig March 5, 2021, 6:10 a.m. UTC | #3
On Thu, Mar 04, 2021 at 02:42:50PM -0800, Darrick J. Wong wrote:
> My vision here, however, is to establish upcalls for /both/ types of
> stroage.

I already have patches for doing these kinds of callbacks properly
for the block layer. They will be posted shortly.
diff mbox series

Patch

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..e0f5585aa06f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1867,6 +1867,8 @@  struct block_device_operations {
 	int (*report_zones)(struct gendisk *, sector_t sector,
 			unsigned int nr_zones, report_zones_cb cb, void *data);
 	char *(*devnode)(struct gendisk *disk, umode_t *mode);
+	int (*corrupted_range)(struct gendisk *disk, struct block_device *bdev,
+			       loff_t offset, size_t len, void *data);
 	struct module *owner;
 	const struct pr_ops *pr_ops;
 };