Message ID | 20200907074346.5383-1-yang.yang@vivo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kyber: Fix crash in kyber_finish_request() | expand |
CC Omar On 9/7/20 1:43 AM, Yang Yang wrote: > Kernel crash when requeue flush request. > It can be reproduced as below: > > [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 > ... > [ 2.517468] pc : clear_bit+0x18/0x2c > [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 > [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 > ... > [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) > [ 2.517602] Call trace: > [ 2.517606] clear_bit+0x18/0x2c > [ 2.517619] kyber_finish_request+0x74/0x80 > [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 > [ 2.517637] __scsi_queue_insert+0x11c/0x148 > [ 2.517640] scsi_softirq_done+0x114/0x130 > [ 2.517643] blk_done_softirq+0x7c/0xb0 > [ 2.517651] __do_softirq+0x208/0x3bc > [ 2.517657] run_ksoftirqd+0x34/0x60 > [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 > [ 2.517667] kthread+0x110/0x120 > [ 2.517669] ret_from_fork+0x10/0x18 > > Signed-off-by: Yang Yang <yang.yang@vivo.com> > --- > block/kyber-iosched.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c > index a38c5ab103d1..af73afe7a05c 100644 > --- a/block/kyber-iosched.c > +++ b/block/kyber-iosched.c > @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq) > { > struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; > > + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) > + return; > + > rq_clear_domain_token(kqd, rq); > } > >
On Mon, Sep 07, 2020 at 10:41:16AM -0600, Jens Axboe wrote: > CC Omar > > On 9/7/20 1:43 AM, Yang Yang wrote: > > Kernel crash when requeue flush request. > > It can be reproduced as below: > > > > [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 > > ... > > [ 2.517468] pc : clear_bit+0x18/0x2c > > [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 > > [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 > > ... > > [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) > > [ 2.517602] Call trace: > > [ 2.517606] clear_bit+0x18/0x2c > > [ 2.517619] kyber_finish_request+0x74/0x80 > > [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 > > [ 2.517637] __scsi_queue_insert+0x11c/0x148 > > [ 2.517640] scsi_softirq_done+0x114/0x130 > > [ 2.517643] blk_done_softirq+0x7c/0xb0 > > [ 2.517651] __do_softirq+0x208/0x3bc > > [ 2.517657] run_ksoftirqd+0x34/0x60 > > [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 > > [ 2.517667] kthread+0x110/0x120 > > [ 2.517669] ret_from_fork+0x10/0x18 > > > > Signed-off-by: Yang Yang <yang.yang@vivo.com> > > --- > > block/kyber-iosched.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c > > index a38c5ab103d1..af73afe7a05c 100644 > > --- a/block/kyber-iosched.c > > +++ b/block/kyber-iosched.c > > @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq) > > { > > struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; > > > > + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) > > + return; > > + > > rq_clear_domain_token(kqd, rq); > > } > > > > It looks like BFQ also has this check. Wouldn't it make more sense to check it in blk-mq, like we do for .finish_request() in blk_mq_free_request()? Something along these lines: diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index c34b090178a9..fa98470df3f0 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5895,18 +5895,6 @@ static void bfq_finish_requeue_request(struct request *rq) struct bfq_queue *bfqq = RQ_BFQQ(rq); struct bfq_data *bfqd; - /* - * Requeue and finish hooks are invoked in blk-mq without - * checking whether the involved request is actually still - * referenced in the scheduler. To handle this fact, the - * following two checks make this function exit in case of - * spurious invocations, for which there is nothing to do. - * - * First, check whether rq has nothing to do with an elevator. - */ - if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) - return; - /* * rq either is not associated with any icq, or is an already * requeued request that has not (yet) been re-inserted into diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 126021fc3a11..e81ca1bf6e10 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -66,7 +66,7 @@ static inline void blk_mq_sched_requeue_request(struct request *rq) struct request_queue *q = rq->q; struct elevator_queue *e = q->elevator; - if (e && e->type->ops.requeue_request) + if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request) e->type->ops.requeue_request(rq); }
On 9/8/20 1:00 PM, Omar Sandoval wrote: > On Mon, Sep 07, 2020 at 10:41:16AM -0600, Jens Axboe wrote: >> CC Omar >> >> On 9/7/20 1:43 AM, Yang Yang wrote: >>> Kernel crash when requeue flush request. >>> It can be reproduced as below: >>> >>> [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 >>> ... >>> [ 2.517468] pc : clear_bit+0x18/0x2c >>> [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 >>> [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 >>> ... >>> [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) >>> [ 2.517602] Call trace: >>> [ 2.517606] clear_bit+0x18/0x2c >>> [ 2.517619] kyber_finish_request+0x74/0x80 >>> [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 >>> [ 2.517637] __scsi_queue_insert+0x11c/0x148 >>> [ 2.517640] scsi_softirq_done+0x114/0x130 >>> [ 2.517643] blk_done_softirq+0x7c/0xb0 >>> [ 2.517651] __do_softirq+0x208/0x3bc >>> [ 2.517657] run_ksoftirqd+0x34/0x60 >>> [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 >>> [ 2.517667] kthread+0x110/0x120 >>> [ 2.517669] ret_from_fork+0x10/0x18 >>> >>> Signed-off-by: Yang Yang <yang.yang@vivo.com> >>> --- >>> block/kyber-iosched.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c >>> index a38c5ab103d1..af73afe7a05c 100644 >>> --- a/block/kyber-iosched.c >>> +++ b/block/kyber-iosched.c >>> @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq) >>> { >>> struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; >>> >>> + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) >>> + return; >>> + >>> rq_clear_domain_token(kqd, rq); >>> } >>> >>> > > It looks like BFQ also has this check. Wouldn't it make more sense to > check it in blk-mq, like we do for .finish_request() in > blk_mq_free_request()? Something along these lines: Yeah I think so, that's much better than working around it in the consumer of it.
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index a38c5ab103d1..af73afe7a05c 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -611,6 +611,9 @@ static void kyber_finish_request(struct request *rq) { struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; + if (unlikely(!(rq->rq_flags & RQF_ELVPRIV))) + return; + rq_clear_domain_token(kqd, rq); }
Kernel crash when requeue flush request. It can be reproduced as below: [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 ... [ 2.517468] pc : clear_bit+0x18/0x2c [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 ... [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) [ 2.517602] Call trace: [ 2.517606] clear_bit+0x18/0x2c [ 2.517619] kyber_finish_request+0x74/0x80 [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 [ 2.517637] __scsi_queue_insert+0x11c/0x148 [ 2.517640] scsi_softirq_done+0x114/0x130 [ 2.517643] blk_done_softirq+0x7c/0xb0 [ 2.517651] __do_softirq+0x208/0x3bc [ 2.517657] run_ksoftirqd+0x34/0x60 [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 [ 2.517667] kthread+0x110/0x120 [ 2.517669] ret_from_fork+0x10/0x18 Signed-off-by: Yang Yang <yang.yang@vivo.com> --- block/kyber-iosched.c | 3 +++ 1 file changed, 3 insertions(+)