diff mbox series

[v3] block: introduce block_rq_error tracepoint

Message ID 20200203053650.8923-1-xiyou.wangcong@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] block: introduce block_rq_error tracepoint | expand

Commit Message

Cong Wang Feb. 3, 2020, 5:36 a.m. UTC
Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case
and provides the device name for convenience too. With this patch,
rasdaemon could switch to block_rq_error.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 block/blk-core.c             |  4 +++-
 include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

Comments

Steven Rostedt Feb. 3, 2020, 6:26 p.m. UTC | #1
On Sun,  2 Feb 2020 21:36:50 -0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  block/blk-core.c             |  4 +++-
>  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 089e890ab208..0c7ad70d06be 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1450,8 +1450,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
>  #endif
>  
>  	if (unlikely(error && !blk_rq_is_passthrough(req) &&
> -		     !(req->rq_flags & RQF_QUIET)))
> +		     !(req->rq_flags & RQF_QUIET))) {
> +		trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);

I'm curious to why you don't just pass error into the trace event.
Looks like blk_status_to_errno() is a function call and that injects
code at the location of the call. Note, it is not a big deal as I
believe (haven't looked at the objdump of it), the call may be placed
in the nop portion of the code, and not hit when the trace point is not
enabled. But moving the blk_status_to_errno() call to the
TP_fast_assign() will move it to another section entirely.

I did see trace_blk_rq_complete() does the same thing, so perhaps that
could just be a clean up change after this on both trace events.



> +
> +	TP_printk("%d,%d %s %s %llu + %u [%d]",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __get_str(name), __entry->rwbs,
> +		  (unsigned long long)__entry->sector,
> +		  __entry->nr_sector, __entry->error)
> +);
> +

Other than my comment above, for the trace event correctness point of view:

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve
Cong Wang Feb. 3, 2020, 8:24 p.m. UTC | #2
On Mon, Feb 3, 2020 at 10:26 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sun,  2 Feb 2020 21:36:50 -0800
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  block/blk-core.c             |  4 +++-
> >  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 089e890ab208..0c7ad70d06be 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1450,8 +1450,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
> >  #endif
> >
> >       if (unlikely(error && !blk_rq_is_passthrough(req) &&
> > -                  !(req->rq_flags & RQF_QUIET)))
> > +                  !(req->rq_flags & RQF_QUIET))) {
> > +             trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
>
> I'm curious to why you don't just pass error into the trace event.
> Looks like blk_status_to_errno() is a function call and that injects
> code at the location of the call. Note, it is not a big deal as I
> believe (haven't looked at the objdump of it), the call may be placed
> in the nop portion of the code, and not hit when the trace point is not
> enabled. But moving the blk_status_to_errno() call to the
> TP_fast_assign() will move it to another section entirely.
>
> I did see trace_blk_rq_complete() does the same thing, so perhaps that
> could just be a clean up change after this on both trace events.

Yes, it is clearly another copy-n-paste of trace_blk_rq_complete().
I trust the current code base as I believe it already passed
your reviews when it was merged. It looks like not the case.

Anyway, I am happy to address all of these in a followup patch.

Thanks.
Cong Wang Feb. 18, 2020, 6:40 p.m. UTC | #3
Hi, Jens


On Sun, Feb 2, 2020 at 9:37 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Currently, rasdaemon uses the existing tracepoint block_rq_complete
> and filters out non-error cases in order to capture block disk errors.
>
> But there are a few problems with this approach:
>
> 1. Even kernel trace filter could do the filtering work, there is
>    still some overhead after we enable this tracepoint.
>
> 2. The filter is merely based on errno, which does not align with kernel
>    logic to check the errors for print_req_error().
>
> 3. block_rq_complete only provides dev major and minor to identify
>    the block device, it is not convenient to use in user-space.
>
> So introduce a new tracepoint block_rq_error just for the error case
> and provides the device name for convenience too. With this patch,
> rasdaemon could switch to block_rq_error.
>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Can you take this patch?

Thanks!
Cong Wang Feb. 25, 2020, 8:37 p.m. UTC | #4
On Tue, Feb 18, 2020 at 10:40 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi, Jens
>
>
> On Sun, Feb 2, 2020 at 9:37 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > Currently, rasdaemon uses the existing tracepoint block_rq_complete
> > and filters out non-error cases in order to capture block disk errors.
> >
> > But there are a few problems with this approach:
> >
> > 1. Even kernel trace filter could do the filtering work, there is
> >    still some overhead after we enable this tracepoint.
> >
> > 2. The filter is merely based on errno, which does not align with kernel
> >    logic to check the errors for print_req_error().
> >
> > 3. block_rq_complete only provides dev major and minor to identify
> >    the block device, it is not convenient to use in user-space.
> >
> > So introduce a new tracepoint block_rq_error just for the error case
> > and provides the device name for convenience too. With this patch,
> > rasdaemon could switch to block_rq_error.
> >
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>
> Can you take this patch?

Any response?

Thanks.
Yang Shi Jan. 24, 2022, 8:54 p.m. UTC | #5
Hi folks,

I think the problems fixed by this patch still exist and we do need
this patch to make disk error handling in rasdaemon easier. I saw
Steven already gave his reviewed-by, I'm wondering why this patch was
not merged to upstream? I didn't see any unsolved comments.

If it looks fine, would Jens (I guess it should go with block tree)
please merge this patch upstream? The latest kernel moved
blk_update_request() to blk-mq.c, if it is ok to move forward, I could
prepare a new version.

Thanks,
Yang

On Sun, Feb 2, 2020 at 11:15 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Currently, rasdaemon uses the existing tracepoint block_rq_complete
> and filters out non-error cases in order to capture block disk errors.
>
> But there are a few problems with this approach:
>
> 1. Even kernel trace filter could do the filtering work, there is
>    still some overhead after we enable this tracepoint.
>
> 2. The filter is merely based on errno, which does not align with kernel
>    logic to check the errors for print_req_error().
>
> 3. block_rq_complete only provides dev major and minor to identify
>    the block device, it is not convenient to use in user-space.
>
> So introduce a new tracepoint block_rq_error just for the error case
> and provides the device name for convenience too. With this patch,
> rasdaemon could switch to block_rq_error.
>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  block/blk-core.c             |  4 +++-
>  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 089e890ab208..0c7ad70d06be 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1450,8 +1450,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
>  #endif
>
>         if (unlikely(error && !blk_rq_is_passthrough(req) &&
> -                    !(req->rq_flags & RQF_QUIET)))
> +                    !(req->rq_flags & RQF_QUIET))) {
> +               trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
>                 print_req_error(req, error, __func__);
> +       }
>
>         blk_account_io_completion(req, nr_bytes);
>
> diff --git a/include/trace/events/block.h b/include/trace/events/block.h
> index 81b43f5bdf23..575054e7cfa0 100644
> --- a/include/trace/events/block.h
> +++ b/include/trace/events/block.h
> @@ -145,6 +145,47 @@ TRACE_EVENT(block_rq_complete,
>                   __entry->nr_sector, __entry->error)
>  );
>
> +/**
> + * block_rq_error - block IO operation error reported by device driver
> + * @rq: block operations request
> + * @error: status code
> + * @nr_bytes: number of completed bytes
> + *
> + * The block_rq_error tracepoint event indicates that some portion
> + * of operation request has failed as reported by the device driver.
> + */
> +TRACE_EVENT(block_rq_error,
> +
> +       TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
> +
> +       TP_ARGS(rq, error, nr_bytes),
> +
> +       TP_STRUCT__entry(
> +               __field(  dev_t,        dev                     )
> +               __string( name,         rq->rq_disk ? rq->rq_disk->disk_name : "?")
> +               __field(  sector_t,     sector                  )
> +               __field(  unsigned int, nr_sector               )
> +               __field(  int,          error                   )
> +               __array(  char,         rwbs,   RWBS_LEN        )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->dev       = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
> +               __assign_str(name,   rq->rq_disk ? rq->rq_disk->disk_name : "?");
> +               __entry->sector    = blk_rq_pos(rq);
> +               __entry->nr_sector = nr_bytes >> 9;
> +               __entry->error     = error;
> +
> +               blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
> +       ),
> +
> +       TP_printk("%d,%d %s %s %llu + %u [%d]",
> +                 MAJOR(__entry->dev), MINOR(__entry->dev),
> +                 __get_str(name), __entry->rwbs,
> +                 (unsigned long long)__entry->sector,
> +                 __entry->nr_sector, __entry->error)
> +);
> +
>  DECLARE_EVENT_CLASS(block_rq,
>
>         TP_PROTO(struct request_queue *q, struct request *rq),
> --
> 2.21.1
>
Steven Rostedt Jan. 25, 2022, 2:37 p.m. UTC | #6
On Mon, 24 Jan 2022 12:54:01 -0800
Yang Shi <shy828301@gmail.com> wrote:

> Hi folks,
> 
> I think the problems fixed by this patch still exist and we do need
> this patch to make disk error handling in rasdaemon easier. I saw
> Steven already gave his reviewed-by, I'm wondering why this patch was
> not merged to upstream? I didn't see any unsolved comments.

Maybe I did that prematurely, as I think I found a mistake in the tracing
below.

> 
> If it looks fine, would Jens (I guess it should go with block tree)
> please merge this patch upstream? The latest kernel moved
> blk_update_request() to blk-mq.c, if it is ok to move forward, I could
> prepare a new version.
> 
> Thanks,
> Yang
> 
> On Sun, Feb 2, 2020 at 11:15 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > Currently, rasdaemon uses the existing tracepoint block_rq_complete
> > and filters out non-error cases in order to capture block disk errors.
> >
> > But there are a few problems with this approach:
> >
> > 1. Even kernel trace filter could do the filtering work, there is
> >    still some overhead after we enable this tracepoint.
> >
> > 2. The filter is merely based on errno, which does not align with kernel
> >    logic to check the errors for print_req_error().
> >
> > 3. block_rq_complete only provides dev major and minor to identify
> >    the block device, it is not convenient to use in user-space.
> >
> > So introduce a new tracepoint block_rq_error just for the error case
> > and provides the device name for convenience too. With this patch,
> > rasdaemon could switch to block_rq_error.
> >
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  block/blk-core.c             |  4 +++-
> >  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 089e890ab208..0c7ad70d06be 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1450,8 +1450,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
> >  #endif
> >
> >         if (unlikely(error && !blk_rq_is_passthrough(req) &&
> > -                    !(req->rq_flags & RQF_QUIET)))
> > +                    !(req->rq_flags & RQF_QUIET))) {
> > +               trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
> >                 print_req_error(req, error, __func__);
> > +       }
> >
> >         blk_account_io_completion(req, nr_bytes);
> >
> > diff --git a/include/trace/events/block.h b/include/trace/events/block.h
> > index 81b43f5bdf23..575054e7cfa0 100644
> > --- a/include/trace/events/block.h
> > +++ b/include/trace/events/block.h
> > @@ -145,6 +145,47 @@ TRACE_EVENT(block_rq_complete,
> >                   __entry->nr_sector, __entry->error)
> >  );
> >
> > +/**
> > + * block_rq_error - block IO operation error reported by device driver
> > + * @rq: block operations request
> > + * @error: status code
> > + * @nr_bytes: number of completed bytes
> > + *
> > + * The block_rq_error tracepoint event indicates that some portion
> > + * of operation request has failed as reported by the device driver.
> > + */
> > +TRACE_EVENT(block_rq_error,
> > +
> > +       TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
> > +
> > +       TP_ARGS(rq, error, nr_bytes),
> > +
> > +       TP_STRUCT__entry(
> > +               __field(  dev_t,        dev                     )
> > +               __string( name,         rq->rq_disk ? rq->rq_disk->disk_name : "?")
> > +               __field(  sector_t,     sector                  )
> > +               __field(  unsigned int, nr_sector               )
> > +               __field(  int,          error                   )
> > +               __array(  char,         rwbs,   RWBS_LEN        )

Why is the above not "__string" ?

> > +       ),
> > +
> > +       TP_fast_assign(
> > +               __entry->dev       = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
> > +               __assign_str(name,   rq->rq_disk ? rq->rq_disk->disk_name : "?");

__assign_str() will not work on an __array() type. It only works here
because you added it at the end, but it's just shear luck that it didn't
crash.

-- Steve


> > +               __entry->sector    = blk_rq_pos(rq);
> > +               __entry->nr_sector = nr_bytes >> 9;
> > +               __entry->error     = error;
> > +
> > +               blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
> > +       ),
> > +
> > +       TP_printk("%d,%d %s %s %llu + %u [%d]",
> > +                 MAJOR(__entry->dev), MINOR(__entry->dev),
> > +                 __get_str(name), __entry->rwbs,
> > +                 (unsigned long long)__entry->sector,
> > +                 __entry->nr_sector, __entry->error)
> > +);
> > +
> >  DECLARE_EVENT_CLASS(block_rq,
> >
> >         TP_PROTO(struct request_queue *q, struct request *rq),
> > --
> > 2.21.1
> >
Steven Rostedt Jan. 25, 2022, 2:38 p.m. UTC | #7
On Tue, 25 Jan 2022 09:37:02 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > > +TRACE_EVENT(block_rq_error,
> > > +
> > > +       TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
> > > +
> > > +       TP_ARGS(rq, error, nr_bytes),
> > > +
> > > +       TP_STRUCT__entry(
> > > +               __field(  dev_t,        dev                     )
> > > +               __string( name,         rq->rq_disk ? rq->rq_disk->disk_name : "?")
> > > +               __field(  sector_t,     sector                  )
> > > +               __field(  unsigned int, nr_sector               )
> > > +               __field(  int,          error                   )
> > > +               __array(  char,         rwbs,   RWBS_LEN        )  
> 
> Why is the above not "__string" ?
> 
> > > +       ),
> > > +
> > > +       TP_fast_assign(
> > > +               __entry->dev       = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
> > > +               __assign_str(name,   rq->rq_disk ? rq->rq_disk->disk_name : "?");  
> 
> __assign_str() will not work on an __array() type. It only works here
> because you added it at the end, but it's just shear luck that it didn't
> crash.

Never mind :-p  I see the above is for name which is __string, and the
array is for rwbs which is filled below. I need to finish my first cup of
coffee before reviewing patches.

-- Steve

> 
> 
> > > +               __entry->sector    = blk_rq_pos(rq);
> > > +               __entry->nr_sector = nr_bytes >> 9;
> > > +               __entry->error     = error;
> > > +
> > > +               blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
> > > +       ),
> > > +
> > > +       TP_printk("%d,%d %s %s %llu + %u [%d]",
> > > +                 MAJOR(__entry->dev), MINOR(__entry->dev),
> > > +                 __get_str(name), __entry->rwbs,
> > > +                 (unsigned long long)__entry->sector,
> > > +                 __entry->nr_sector, __entry->error)
> > > +);
> > > +
Yang Shi Jan. 25, 2022, 7:58 p.m. UTC | #8
On Tue, Jan 25, 2022 at 6:38 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 25 Jan 2022 09:37:02 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > > +TRACE_EVENT(block_rq_error,
> > > > +
> > > > +       TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
> > > > +
> > > > +       TP_ARGS(rq, error, nr_bytes),
> > > > +
> > > > +       TP_STRUCT__entry(
> > > > +               __field(  dev_t,        dev                     )
> > > > +               __string( name,         rq->rq_disk ? rq->rq_disk->disk_name : "?")
> > > > +               __field(  sector_t,     sector                  )
> > > > +               __field(  unsigned int, nr_sector               )
> > > > +               __field(  int,          error                   )
> > > > +               __array(  char,         rwbs,   RWBS_LEN        )
> >
> > Why is the above not "__string" ?
> >
> > > > +       ),
> > > > +
> > > > +       TP_fast_assign(
> > > > +               __entry->dev       = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
> > > > +               __assign_str(name,   rq->rq_disk ? rq->rq_disk->disk_name : "?");
> >
> > __assign_str() will not work on an __array() type. It only works here
> > because you added it at the end, but it's just shear luck that it didn't
> > crash.
>
> Never mind :-p  I see the above is for name which is __string, and the
> array is for rwbs which is filled below. I need to finish my first cup of
> coffee before reviewing patches.

Never mind. Other than the code restructure, I also found some data
structure (struct request) and function (blk_fill_rwbs) change. I
think I'd better rebase the patch to 5.17-rc1 then resubmit it. Since
there is no fundamental change to the patch, can I keep your
reviewed-by tag?


>
> -- Steve
>
> >
> >
> > > > +               __entry->sector    = blk_rq_pos(rq);
> > > > +               __entry->nr_sector = nr_bytes >> 9;
> > > > +               __entry->error     = error;
> > > > +
> > > > +               blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
> > > > +       ),
> > > > +
> > > > +       TP_printk("%d,%d %s %s %llu + %u [%d]",
> > > > +                 MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > +                 __get_str(name), __entry->rwbs,
> > > > +                 (unsigned long long)__entry->sector,
> > > > +                 __entry->nr_sector, __entry->error)
> > > > +);
> > > > +
Steven Rostedt Jan. 25, 2022, 8:03 p.m. UTC | #9
On Tue, 25 Jan 2022 11:58:10 -0800
Yang Shi <shy828301@gmail.com> wrote:

> Never mind. Other than the code restructure, I also found some data
> structure (struct request) and function (blk_fill_rwbs) change. I
> think I'd better rebase the patch to 5.17-rc1 then resubmit it. Since
> there is no fundamental change to the patch, can I keep your
> reviewed-by tag?

Sure, but please Cc me.

-- Steve
Yang Shi Jan. 25, 2022, 8:19 p.m. UTC | #10
On Tue, Jan 25, 2022 at 12:03 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 25 Jan 2022 11:58:10 -0800
> Yang Shi <shy828301@gmail.com> wrote:
>
> > Never mind. Other than the code restructure, I also found some data
> > structure (struct request) and function (blk_fill_rwbs) change. I
> > think I'd better rebase the patch to 5.17-rc1 then resubmit it. Since
> > there is no fundamental change to the patch, can I keep your
> > reviewed-by tag?
>
> Sure, but please Cc me.

Yeah, definitely. Thanks.

>
> -- Steve
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 089e890ab208..0c7ad70d06be 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1450,8 +1450,10 @@  bool blk_update_request(struct request *req, blk_status_t error,
 #endif
 
 	if (unlikely(error && !blk_rq_is_passthrough(req) &&
-		     !(req->rq_flags & RQF_QUIET)))
+		     !(req->rq_flags & RQF_QUIET))) {
+		trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
 		print_req_error(req, error, __func__);
+	}
 
 	blk_account_io_completion(req, nr_bytes);
 
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 81b43f5bdf23..575054e7cfa0 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -145,6 +145,47 @@  TRACE_EVENT(block_rq_complete,
 		  __entry->nr_sector, __entry->error)
 );
 
+/**
+ * block_rq_error - block IO operation error reported by device driver
+ * @rq: block operations request
+ * @error: status code
+ * @nr_bytes: number of completed bytes
+ *
+ * The block_rq_error tracepoint event indicates that some portion
+ * of operation request has failed as reported by the device driver.
+ */
+TRACE_EVENT(block_rq_error,
+
+	TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
+
+	TP_ARGS(rq, error, nr_bytes),
+
+	TP_STRUCT__entry(
+		__field(  dev_t,	dev			)
+		__string( name,		rq->rq_disk ? rq->rq_disk->disk_name : "?")
+		__field(  sector_t,	sector			)
+		__field(  unsigned int,	nr_sector		)
+		__field(  int,		error			)
+		__array(  char,		rwbs,	RWBS_LEN	)
+	),
+
+	TP_fast_assign(
+		__entry->dev	   = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
+		__assign_str(name,   rq->rq_disk ? rq->rq_disk->disk_name : "?");
+		__entry->sector    = blk_rq_pos(rq);
+		__entry->nr_sector = nr_bytes >> 9;
+		__entry->error     = error;
+
+		blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
+	),
+
+	TP_printk("%d,%d %s %s %llu + %u [%d]",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __get_str(name), __entry->rwbs,
+		  (unsigned long long)__entry->sector,
+		  __entry->nr_sector, __entry->error)
+);
+
 DECLARE_EVENT_CLASS(block_rq,
 
 	TP_PROTO(struct request_queue *q, struct request *rq),