Message ID | 20190927081955.44680-1-yuyufen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] block: fix null pointer dereference in blk_mq_rq_timed_out() | expand |
On Fri, Sep 27, 2019 at 04:19:55PM +0800, Yufen Yu wrote: > We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() > as following: > > [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 > [ 108.827059] PGD 0 P4D 0 > [ 108.827313] Oops: 0000 [#1] SMP PTI > [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 > [ 108.829503] Workqueue: kblockd blk_mq_timeout_work > [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 > [ 108.838191] Call Trace: > [ 108.838406] bt_iter+0x74/0x80 > [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 > [ 108.839074] ? __switch_to_asm+0x34/0x70 > [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f > [ 108.840732] blk_mq_timeout_work+0x74/0x200 > [ 108.841151] process_one_work+0x297/0x680 > [ 108.841550] worker_thread+0x29c/0x6f0 > [ 108.841926] ? rescuer_thread+0x580/0x580 > [ 108.842344] kthread+0x16a/0x1a0 > [ 108.842666] ? kthread_flush_work+0x170/0x170 > [ 108.843100] ret_from_fork+0x35/0x40 > > The bug is caused by the race between timeout handle and completion for > flush request. > > When timeout handle function blk_mq_rq_timed_out() try to read > 'req->q->mq_ops', the 'req' have completed and reinitiated by next > flush request, which would call blk_rq_init() to clear 'req' as 0. > > After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), > normal requests lifetime are protected by refcount. Until 'rq->ref' > drop to zero, the request can really be free. Thus, these requests > cannot been reused before timeout handle finish. > > However, flush request has defined .end_io and rq->end_io() is still > called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' > can be reused by the next flush request handle, resulting in null > pointer deference BUG ON. > > We fix this problem by covering flush request with 'rq->ref'. > If the refcount is not zero, flush_end_io() return and wait the > last holder recall it. To record the request status, we add a new > entry 'rq_status', which will be used in flush_end_io(). > > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: stable@vger.kernel.org # v4.18+ > Signed-off-by: Yufen Yu <yuyufen@huawei.com> > > ------- > v2: > - move rq_status from struct request to struct blk_flush_queue > v3: > - remove unnecessary '{}' pair. > v4: > - let spinlock to protect 'fq->rq_status' > v5: > - move rq_status after flush_running_idx member of struct blk_flush_queue > --- > block/blk-flush.c | 10 ++++++++++ > block/blk-mq.c | 5 ++++- > block/blk.h | 7 +++++++ > 3 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index aedd9320e605..1eec9cbe5a0a 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) > > /* release the tag's ownership to the req cloned from */ > spin_lock_irqsave(&fq->mq_flush_lock, flags); > + > + if (!refcount_dec_and_test(&flush_rq->ref)) { > + fq->rq_status = error; > + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); > + return; > + } > + > + if (fq->rq_status != BLK_STS_OK) > + error = fq->rq_status; > + > hctx = flush_rq->mq_hctx; > if (!q->elevator) { > blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 20a49be536b5..e04fa9ab5574 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, > */ > if (blk_mq_req_expired(rq, next)) > blk_mq_rq_timed_out(rq, reserved); > - if (refcount_dec_and_test(&rq->ref)) > + > + if (is_flush_rq(rq, hctx)) > + rq->end_io(rq, 0); > + else if (refcount_dec_and_test(&rq->ref)) > __blk_mq_free_request(rq); > > return true; > diff --git a/block/blk.h b/block/blk.h > index ed347f7a97b1..2d8cdafee799 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -19,6 +19,7 @@ struct blk_flush_queue { > unsigned int flush_queue_delayed:1; > unsigned int flush_pending_idx:1; > unsigned int flush_running_idx:1; > + blk_status_t rq_status; > unsigned long flush_pending_since; > struct list_head flush_queue[2]; > struct list_head flush_data_in_flight; > @@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q) > kobject_get(&q->kobj); > } > > +static inline bool > +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx) > +{ > + return hctx->fq->flush_rq == req; > +} > + > struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q, > int node, int cmd_size, gfp_t flags); > void blk_free_flush_queue(struct blk_flush_queue *q); > -- > 2.17.2 > Reviewed-by: Ming Lei <ming.lei@redhat.com> thanks, Ming
On 9/27/19 4:19 PM, Yufen Yu wrote: > We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() > as following: > > [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 > [ 108.827059] PGD 0 P4D 0 > [ 108.827313] Oops: 0000 [#1] SMP PTI > [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 > [ 108.829503] Workqueue: kblockd blk_mq_timeout_work > [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 > [ 108.838191] Call Trace: > [ 108.838406] bt_iter+0x74/0x80 > [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 > [ 108.839074] ? __switch_to_asm+0x34/0x70 > [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f > [ 108.840732] blk_mq_timeout_work+0x74/0x200 > [ 108.841151] process_one_work+0x297/0x680 > [ 108.841550] worker_thread+0x29c/0x6f0 > [ 108.841926] ? rescuer_thread+0x580/0x580 > [ 108.842344] kthread+0x16a/0x1a0 > [ 108.842666] ? kthread_flush_work+0x170/0x170 > [ 108.843100] ret_from_fork+0x35/0x40 > > The bug is caused by the race between timeout handle and completion for > flush request. > > When timeout handle function blk_mq_rq_timed_out() try to read > 'req->q->mq_ops', the 'req' have completed and reinitiated by next > flush request, which would call blk_rq_init() to clear 'req' as 0. > > After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), > normal requests lifetime are protected by refcount. Until 'rq->ref' > drop to zero, the request can really be free. Thus, these requests > cannot been reused before timeout handle finish. > > However, flush request has defined .end_io and rq->end_io() is still > called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' > can be reused by the next flush request handle, resulting in null > pointer deference BUG ON. > > We fix this problem by covering flush request with 'rq->ref'. > If the refcount is not zero, flush_end_io() return and wait the > last holder recall it. To record the request status, we add a new > entry 'rq_status', which will be used in flush_end_io(). > > Cc: Ming Lei <ming.lei@redhat.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: stable@vger.kernel.org # v4.18+ > Signed-off-by: Yufen Yu <yuyufen@huawei.com> > > ------- > v2: > - move rq_status from struct request to struct blk_flush_queue > v3: > - remove unnecessary '{}' pair. > v4: > - let spinlock to protect 'fq->rq_status' > v5: > - move rq_status after flush_running_idx member of struct blk_flush_queue > --- > block/blk-flush.c | 10 ++++++++++ > block/blk-mq.c | 5 ++++- > block/blk.h | 7 +++++++ > 3 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index aedd9320e605..1eec9cbe5a0a 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) > > /* release the tag's ownership to the req cloned from */ > spin_lock_irqsave(&fq->mq_flush_lock, flags); > + > + if (!refcount_dec_and_test(&flush_rq->ref)) { > + fq->rq_status = error; > + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); > + return; > + } > + > + if (fq->rq_status != BLK_STS_OK) > + error = fq->rq_status; > + > hctx = flush_rq->mq_hctx; > if (!q->elevator) { > blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 20a49be536b5..e04fa9ab5574 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, > */ > if (blk_mq_req_expired(rq, next)) > blk_mq_rq_timed_out(rq, reserved); > - if (refcount_dec_and_test(&rq->ref)) > + > + if (is_flush_rq(rq, hctx)) > + rq->end_io(rq, 0); > + else if (refcount_dec_and_test(&rq->ref)) > __blk_mq_free_request(rq); > > return true; > diff --git a/block/blk.h b/block/blk.h > index ed347f7a97b1..2d8cdafee799 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -19,6 +19,7 @@ struct blk_flush_queue { > unsigned int flush_queue_delayed:1; > unsigned int flush_pending_idx:1; > unsigned int flush_running_idx:1; > + blk_status_t rq_status; > unsigned long flush_pending_since; > struct list_head flush_queue[2]; > struct list_head flush_data_in_flight; > @@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q) > kobject_get(&q->kobj); > } > > +static inline bool > +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx) > +{ > + return hctx->fq->flush_rq == req; > +} > + > struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q, > int node, int cmd_size, gfp_t flags); > void blk_free_flush_queue(struct blk_flush_queue *q); > Looks good to me. Reviewed-by: Bob Liu <bob.liu@oracle.com>
On 9/27/19 10:19 AM, Yufen Yu wrote: > We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() > as following: > > [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 > [ 108.827059] PGD 0 P4D 0 > [ 108.827313] Oops: 0000 [#1] SMP PTI > [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 > [ 108.829503] Workqueue: kblockd blk_mq_timeout_work > [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 > [ 108.838191] Call Trace: > [ 108.838406] bt_iter+0x74/0x80 > [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 > [ 108.839074] ? __switch_to_asm+0x34/0x70 > [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f > [ 108.840732] blk_mq_timeout_work+0x74/0x200 > [ 108.841151] process_one_work+0x297/0x680 > [ 108.841550] worker_thread+0x29c/0x6f0 > [ 108.841926] ? rescuer_thread+0x580/0x580 > [ 108.842344] kthread+0x16a/0x1a0 > [ 108.842666] ? kthread_flush_work+0x170/0x170 > [ 108.843100] ret_from_fork+0x35/0x40 > > The bug is caused by the race between timeout handle and completion for > flush request. > > When timeout handle function blk_mq_rq_timed_out() try to read > 'req->q->mq_ops', the 'req' have completed and reinitiated by next > flush request, which would call blk_rq_init() to clear 'req' as 0. > > After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), > normal requests lifetime are protected by refcount. Until 'rq->ref' > drop to zero, the request can really be free. Thus, these requests > cannot been reused before timeout handle finish. > > However, flush request has defined .end_io and rq->end_io() is still > called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' > can be reused by the next flush request handle, resulting in null > pointer deference BUG ON. > > We fix this problem by covering flush request with 'rq->ref'. > If the refcount is not zero, flush_end_io() return and wait the > last holder recall it. To record the request status, we add a new > entry 'rq_status', which will be used in flush_end_io(). Thanks, applied.
diff --git a/block/blk-flush.c b/block/blk-flush.c index aedd9320e605..1eec9cbe5a0a 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -214,6 +214,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) /* release the tag's ownership to the req cloned from */ spin_lock_irqsave(&fq->mq_flush_lock, flags); + + if (!refcount_dec_and_test(&flush_rq->ref)) { + fq->rq_status = error; + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); + return; + } + + if (fq->rq_status != BLK_STS_OK) + error = fq->rq_status; + hctx = flush_rq->mq_hctx; if (!q->elevator) { blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); diff --git a/block/blk-mq.c b/block/blk-mq.c index 20a49be536b5..e04fa9ab5574 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -912,7 +912,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, */ if (blk_mq_req_expired(rq, next)) blk_mq_rq_timed_out(rq, reserved); - if (refcount_dec_and_test(&rq->ref)) + + if (is_flush_rq(rq, hctx)) + rq->end_io(rq, 0); + else if (refcount_dec_and_test(&rq->ref)) __blk_mq_free_request(rq); return true; diff --git a/block/blk.h b/block/blk.h index ed347f7a97b1..2d8cdafee799 100644 --- a/block/blk.h +++ b/block/blk.h @@ -19,6 +19,7 @@ struct blk_flush_queue { unsigned int flush_queue_delayed:1; unsigned int flush_pending_idx:1; unsigned int flush_running_idx:1; + blk_status_t rq_status; unsigned long flush_pending_since; struct list_head flush_queue[2]; struct list_head flush_data_in_flight; @@ -47,6 +48,12 @@ static inline void __blk_get_queue(struct request_queue *q) kobject_get(&q->kobj); } +static inline bool +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx) +{ + return hctx->fq->flush_rq == req; +} + struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q, int node, int cmd_size, gfp_t flags); void blk_free_flush_queue(struct blk_flush_queue *q);
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() as following: [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 108.827059] PGD 0 P4D 0 [ 108.827313] Oops: 0000 [#1] SMP PTI [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 [ 108.829503] Workqueue: kblockd blk_mq_timeout_work [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 [ 108.838191] Call Trace: [ 108.838406] bt_iter+0x74/0x80 [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 [ 108.839074] ? __switch_to_asm+0x34/0x70 [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f [ 108.840732] blk_mq_timeout_work+0x74/0x200 [ 108.841151] process_one_work+0x297/0x680 [ 108.841550] worker_thread+0x29c/0x6f0 [ 108.841926] ? rescuer_thread+0x580/0x580 [ 108.842344] kthread+0x16a/0x1a0 [ 108.842666] ? kthread_flush_work+0x170/0x170 [ 108.843100] ret_from_fork+0x35/0x40 The bug is caused by the race between timeout handle and completion for flush request. When timeout handle function blk_mq_rq_timed_out() try to read 'req->q->mq_ops', the 'req' have completed and reinitiated by next flush request, which would call blk_rq_init() to clear 'req' as 0. After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), normal requests lifetime are protected by refcount. Until 'rq->ref' drop to zero, the request can really be free. Thus, these requests cannot been reused before timeout handle finish. However, flush request has defined .end_io and rq->end_io() is still called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' can be reused by the next flush request handle, resulting in null pointer deference BUG ON. We fix this problem by covering flush request with 'rq->ref'. If the refcount is not zero, flush_end_io() return and wait the last holder recall it. To record the request status, we add a new entry 'rq_status', which will be used in flush_end_io(). Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Yufen Yu <yuyufen@huawei.com> ------- v2: - move rq_status from struct request to struct blk_flush_queue v3: - remove unnecessary '{}' pair. v4: - let spinlock to protect 'fq->rq_status' v5: - move rq_status after flush_running_idx member of struct blk_flush_queue --- block/blk-flush.c | 10 ++++++++++ block/blk-mq.c | 5 ++++- block/blk.h | 7 +++++++ 3 files changed, 21 insertions(+), 1 deletion(-)