diff mbox series

block: only call sched requeue_request() for scheduled requests

Message ID 80b64d8a0408e7c8c30b363aaf6d39e735521c1d.1599597940.git.osandov@osandov.com (mailing list archive)
State New, archived
Headers show
Series block: only call sched requeue_request() for scheduled requests | expand

Commit Message

Omar Sandoval Sept. 8, 2020, 8:46 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

Yang Yang reported the following crash caused by requeueing a flush
request in Kyber:

  [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
  ...
  [    2.517468] pc : clear_bit+0x18/0x2c
  [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
  [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
  ...
  [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
  [    2.517602] Call trace:
  [    2.517606]  clear_bit+0x18/0x2c
  [    2.517619]  kyber_finish_request+0x74/0x80
  [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
  [    2.517637]  __scsi_queue_insert+0x11c/0x148
  [    2.517640]  scsi_softirq_done+0x114/0x130
  [    2.517643]  blk_done_softirq+0x7c/0xb0
  [    2.517651]  __do_softirq+0x208/0x3bc
  [    2.517657]  run_ksoftirqd+0x34/0x60
  [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
  [    2.517667]  kthread+0x110/0x120
  [    2.517669]  ret_from_fork+0x10/0x18

This happens because Kyber doesn't track flush requests, so
kyber_finish_request() reads a garbage domain token. Only call the
scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
the finish_request() hook in blk_mq_free_request()). Now that we're
handling it in blk-mq, also remove the check from BFQ.

Reported-by: Yang Yang <yang.yang@vivo.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/bfq-iosched.c  | 12 ------------
 block/blk-mq-sched.h |  2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

Comments

Jens Axboe Sept. 8, 2020, 11:42 p.m. UTC | #1
On 9/8/20 2:46 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Yang Yang reported the following crash caused by requeueing a flush
> request in Kyber:
> 
>   [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
>   ...
>   [    2.517468] pc : clear_bit+0x18/0x2c
>   [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
>   [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
>   ...
>   [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
>   [    2.517602] Call trace:
>   [    2.517606]  clear_bit+0x18/0x2c
>   [    2.517619]  kyber_finish_request+0x74/0x80
>   [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
>   [    2.517637]  __scsi_queue_insert+0x11c/0x148
>   [    2.517640]  scsi_softirq_done+0x114/0x130
>   [    2.517643]  blk_done_softirq+0x7c/0xb0
>   [    2.517651]  __do_softirq+0x208/0x3bc
>   [    2.517657]  run_ksoftirqd+0x34/0x60
>   [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
>   [    2.517667]  kthread+0x110/0x120
>   [    2.517669]  ret_from_fork+0x10/0x18
> 
> This happens because Kyber doesn't track flush requests, so
> kyber_finish_request() reads a garbage domain token. Only call the
> scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
> the finish_request() hook in blk_mq_free_request()). Now that we're
> handling it in blk-mq, also remove the check from BFQ.

Thanks, applied.
diff mbox series

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a4c0bec920cb..ee767fa000e4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5895,18 +5895,6 @@  static void bfq_finish_requeue_request(struct request *rq)
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
 
-	/*
-	 * Requeue and finish hooks are invoked in blk-mq without
-	 * checking whether the involved request is actually still
-	 * referenced in the scheduler. To handle this fact, the
-	 * following two checks make this function exit in case of
-	 * spurious invocations, for which there is nothing to do.
-	 *
-	 * First, check whether rq has nothing to do with an elevator.
-	 */
-	if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
-		return;
-
 	/*
 	 * rq either is not associated with any icq, or is an already
 	 * requeued request that has not (yet) been re-inserted into
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 126021fc3a11..e81ca1bf6e10 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -66,7 +66,7 @@  static inline void blk_mq_sched_requeue_request(struct request *rq)
 	struct request_queue *q = rq->q;
 	struct elevator_queue *e = q->elevator;
 
-	if (e && e->type->ops.requeue_request)
+	if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
 		e->type->ops.requeue_request(rq);
 }