diff mbox series

dm: track io errors per mapped device

Message ID 20200507230532.5733-1-kj@orbekk.com (mailing list archive)
State New, archived
Headers show
Series dm: track io errors per mapped device | expand

Commit Message

Kjetil Orbekk May 7, 2020, 11:05 p.m. UTC
From: Khazhismel Kumykov <khazhy@google.com>

This will track ioerr_cnt on all dm targets and expose it as
<device>/dm/ioerr_cnt.

Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
[kj@orbekk.com: added support for bio based devices in dm.c]
Signed-off-by: Kjetil Orbekk <kj@orbekk.com>
---
 drivers/md/dm-core.h  |  2 ++
 drivers/md/dm-rq.c    |  4 ++++
 drivers/md/dm-sysfs.c |  8 ++++++++
 drivers/md/dm.c       | 10 ++++++++++
 drivers/md/dm.h       |  1 +
 5 files changed, 25 insertions(+)

Comments

Alasdair G Kergon May 8, 2020, 12:18 a.m. UTC | #1
On Thu, May 07, 2020 at 07:05:33PM -0400, Kjetil Orbekk wrote:
> This will track ioerr_cnt on all dm targets and expose it as
> <device>/dm/ioerr_cnt.

How do you propose to use this?
What are you trying to measure and why?
- How exact must the number be to meet your requirements?

Or to put it another way, do you need to know the exact number you are
exposing, or do you derive something else from this which could also be
derived from an alternative number?

In particular, given the way we split and clone and stack devices (so
there may be an element of multiplication), and reload tables (so
existing values might become irrelevant), did you consider alternative
semantics before selecting this approach?
(Or to put it another way, is there a need to reset it or track
the value since the last resume?)

(Documentation is also needed - which ought to explain the semantics
and how the observed values interact with the use of device-mapper
features.)

Alasdair
Chaitanya Kulkarni May 8, 2020, 1:12 a.m. UTC | #2
On 05/07/2020 04:06 PM, Kjetil Orbekk wrote:
> +		if (tio->error)
> +			atomic_inc(&md->ioerr_cnt);

Given that there are so many errors how would user know what
kind of error is generated and how many times?
Kjetil Orbekk May 8, 2020, 7:18 p.m. UTC | #3
Hi Alasdair,

Thank you for your time and comments.

On Thu, May 7, 2020, at 20:18, Alasdair G Kergon wrote:
> On Thu, May 07, 2020 at 07:05:33PM -0400, Kjetil Orbekk wrote:
> > This will track ioerr_cnt on all dm targets and expose it as
> > <device>/dm/ioerr_cnt.
> 
> How do you propose to use this?
> What are you trying to measure and why?
> - How exact must the number be to meet your requirements?

This is proposed in order to detect if I/O errors have occurred on the dm device. Deriving this from the ioerr_cnt from the underlying device was considered, but it's not reliable for dm devices that tolerate some underlying errors (raid setups and similar).

> Or to put it another way, do you need to know the exact number you are
> exposing, or do you derive something else from this which could also be
> derived from an alternative number?

Our use case is to detect if I/O errors have happened at all. We expect the ioerr_cnt to increase when there are errors, but the precise number is not important in our environment.

> In particular, given the way we split and clone and stack devices (so
> there may be an element of multiplication), and reload tables (so
> existing values might become irrelevant), did you consider alternative
> semantics before selecting this approach?
>
> (Or to put it another way, is there a need to reset it or track
> the value since the last resume?)

I'm not very familiar with dm and I don't follow how the cloning and stacking will lead to multiplication. Do you have any suggestions for how I might deal with that?

Resetting the value would not be desirable for our use case, because the probing process can miss I/O errors that happen right before a device is suspended and then resumed, though I can imagine that there might be cases where one would want that. Users could look at increases in ioerr_cnt instead of the absolute numbers, or I could provide a way to reset the counter if desired.

> (Documentation is also needed - which ought to explain the semantics
> and how the observed values interact with the use of device-mapper
> features.)

I will be happy to provide an updated patch with inline documentation once I have addressed your comments. Are there any other places where I need to update documentation?
Kjetil Orbekk May 8, 2020, 7:22 p.m. UTC | #4
On Thu, May 7, 2020, at 21:12, Chaitanya Kulkarni wrote:
> On 05/07/2020 04:06 PM, Kjetil Orbekk wrote:
> > +		if (tio->error)
> > +			atomic_inc(&md->ioerr_cnt);
> 
> Given that there are so many errors how would user know what
> kind of error is generated and how many times?

The intended use case is to provide an easy way to check if errors have occurred at all, and then the user needs to investigate using other means. I replied with more detail to Alasdair's email.
Mike Snitzer May 8, 2020, 9:09 p.m. UTC | #5
On Fri, May 08 2020 at  3:22pm -0400,
kj@orbekk.com <kj@orbekk.com> wrote:

> On Thu, May 7, 2020, at 21:12, Chaitanya Kulkarni wrote:
> > On 05/07/2020 04:06 PM, Kjetil Orbekk wrote:
> > > +		if (tio->error)
> > > +			atomic_inc(&md->ioerr_cnt);
> > 
> > Given that there are so many errors how would user know what
> > kind of error is generated and how many times?
> 
> The intended use case is to provide an easy way to check if errors
> have occurred at all, and then the user needs to investigate using
> other means. I replied with more detail to Alasdair's email.

But most operations initiated by the user that fail will be felt by the
upper layer that the user is interfacing with.

Only exception that springs to mind is dm-writecache's writeback that
occurs after writes have already been acknowledged back to the upper
layers -- in that case dm-writecache provides an error flag that is
exposed via writecache_status.

Anyway, just not seeing why you need a upper-layer use-case agnostic
flag to know an error occurred in DM.  That layers that interface with
the DM device will have been notified of all errors.

And why just for DM devices?  Why not all block devices (in which case
DM would get this feature "for free")?

Mike
Kjetil Orbekk May 9, 2020, 4:01 p.m. UTC | #6
On Fri, May 8, 2020, at 17:09, Mike Snitzer wrote:
> On Fri, May 08 2020 at  3:22pm -0400,
> kj@orbekk.com <kj@orbekk.com> wrote:
> 
> > On Thu, May 7, 2020, at 21:12, Chaitanya Kulkarni wrote:
> > > On 05/07/2020 04:06 PM, Kjetil Orbekk wrote:
> > > > +		if (tio->error)
> > > > +			atomic_inc(&md->ioerr_cnt);
> > > 
> > > Given that there are so many errors how would user know what
> > > kind of error is generated and how many times?
> > 
> > The intended use case is to provide an easy way to check if errors
> > have occurred at all, and then the user needs to investigate using
> > other means. I replied with more detail to Alasdair's email.
> 
> But most operations initiated by the user that fail will be felt by the
> upper layer that the user is interfacing with.
> 
> Only exception that springs to mind is dm-writecache's writeback that
> occurs after writes have already been acknowledged back to the upper
> layers -- in that case dm-writecache provides an error flag that is
> exposed via writecache_status.
> 
> Anyway, just not seeing why you need a upper-layer use-case agnostic
> flag to know an error occurred in DM.  That layers that interface with
> the DM device will have been notified of all errors.

It's used as a signal by a probing process which is not in the IO path itself.

> And why just for DM devices?  Why not all block devices (in which case
> DM would get this feature "for free")?

This sounds like a good idea to me. Looks like I could add this to disk_stats and expose it through the block/<device>/stats file. I'll try to see if this is feasible.
diff mbox series

Patch

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index c4ef1fceead6..c6cc0f9e4d9a 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -108,6 +108,8 @@  struct mapped_device {
 
 	struct dm_stats stats;
 
+	atomic_t ioerr_cnt;
+
 	/* for blk-mq request-based DM support */
 	struct blk_mq_tag_set *tag_set;
 	bool init_tio_pdu:1;
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3f8577e2c13b..1c1fe96ca7a4 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -171,6 +171,8 @@  static void dm_end_request(struct request *clone, blk_status_t error)
 	tio->ti->type->release_clone_rq(clone, NULL);
 
 	rq_end_stats(md, rq);
+	if (error)
+		atomic_inc(&md->ioerr_cnt);
 	blk_mq_end_request(rq, error);
 	rq_completed(md);
 }
@@ -268,6 +270,8 @@  static void dm_softirq_done(struct request *rq)
 		struct mapped_device *md = tio->md;
 
 		rq_end_stats(md, rq);
+		if (tio->error)
+			atomic_inc(&md->ioerr_cnt);
 		blk_mq_end_request(rq, tio->error);
 		rq_completed(md);
 		return;
diff --git a/drivers/md/dm-sysfs.c b/drivers/md/dm-sysfs.c
index a05fcd50e1b9..5d59790b4b17 100644
--- a/drivers/md/dm-sysfs.c
+++ b/drivers/md/dm-sysfs.c
@@ -74,6 +74,12 @@  static ssize_t dm_attr_name_show(struct mapped_device *md, char *buf)
 	return strlen(buf);
 }
 
+static ssize_t dm_attr_ioerr_cnt_show(struct mapped_device *md, char *buf)
+{
+	sprintf(buf, "%d\n", dm_ioerr_cnt(md));
+	return strlen(buf);
+}
+
 static ssize_t dm_attr_uuid_show(struct mapped_device *md, char *buf)
 {
 	if (dm_copy_name_and_uuid(md, NULL, buf))
@@ -99,6 +105,7 @@  static ssize_t dm_attr_use_blk_mq_show(struct mapped_device *md, char *buf)
 }
 
 static DM_ATTR_RO(name);
+static DM_ATTR_RO(ioerr_cnt);
 static DM_ATTR_RO(uuid);
 static DM_ATTR_RO(suspended);
 static DM_ATTR_RO(use_blk_mq);
@@ -106,6 +113,7 @@  static DM_ATTR_RW(rq_based_seq_io_merge_deadline);
 
 static struct attribute *dm_attrs[] = {
 	&dm_attr_name.attr,
+	&dm_attr_ioerr_cnt.attr,
 	&dm_attr_uuid.attr,
 	&dm_attr_suspended.attr,
 	&dm_attr_use_blk_mq.attr,
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index db9e46114653..7a264379473e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1017,6 +1017,9 @@  static void clone_endio(struct bio *bio)
 			disable_write_zeroes(md);
 	}
 
+	if (error)
+		atomic_inc(&md->ioerr_cnt);
+
 	if (endio) {
 		int r = endio(tio->ti, bio, &error);
 		switch (r) {
@@ -1304,6 +1307,7 @@  static blk_qc_t __map_bio(struct dm_target_io *tio)
 		break;
 	case DM_MAPIO_KILL:
 		free_tio(tio);
+		atomic_inc(&md->ioerr_cnt);
 		dec_pending(io, BLK_STS_IOERR);
 		break;
 	case DM_MAPIO_REQUEUE:
@@ -2014,6 +2018,7 @@  static struct mapped_device *alloc_dev(int minor)
 		goto bad;
 
 	dm_stats_init(&md->stats);
+	atomic_set(&md->ioerr_cnt, 0);
 
 	/* Populate the mapping, nobody knows we exist yet */
 	spin_lock(&_minor_lock);
@@ -2992,6 +2997,11 @@  int dm_noflush_suspending(struct dm_target *ti)
 }
 EXPORT_SYMBOL_GPL(dm_noflush_suspending);
 
+int dm_ioerr_cnt(struct mapped_device *md)
+{
+	return atomic_read(&md->ioerr_cnt);
+}
+
 struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
 					    unsigned integrity, unsigned per_io_data_size,
 					    unsigned min_pool_size)
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index d7c4f6606b5f..fafb9d379ce9 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -82,6 +82,7 @@  void dm_unlock_md_type(struct mapped_device *md);
 void dm_set_md_type(struct mapped_device *md, enum dm_queue_mode type);
 enum dm_queue_mode dm_get_md_type(struct mapped_device *md);
 struct target_type *dm_get_immutable_target_type(struct mapped_device *md);
+int dm_ioerr_cnt(struct mapped_device *md);
 
 int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t);