[2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
diff mbox series

Message ID 20191021224259.209542-3-bvanassche@acm.org
State New
Headers show
Series
  • Reduce the amount of memory required for request queues
Related show

Commit Message

Bart Van Assche Oct. 21, 2019, 10:42 p.m. UTC
If blk_poll() is called if no requests are in progress, it may happen that
blk_mq_update_nr_hw_queues() modifies the data structures used by blk_poll(),
e.g. q->queue_hw_ctx[]. Fix this race by serializing blk_poll() against
blk_mq_update_nr_hw_queues().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

Comments

Ming Lei Oct. 22, 2019, 9:41 a.m. UTC | #1
On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
> If blk_poll() is called if no requests are in progress, it may happen that
> blk_mq_update_nr_hw_queues() modifies the data structures used by blk_poll(),
> e.g. q->queue_hw_ctx[]. Fix this race by serializing blk_poll() against
> blk_mq_update_nr_hw_queues().
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 38 +++++++++++++++++++++++++-------------
>  1 file changed, 25 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7528678ef41f..ea64d951f411 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3439,19 +3439,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
>  	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
>  }
>  
> -/**
> - * blk_poll - poll for IO completions
> - * @q:  the queue
> - * @cookie: cookie passed back at IO submission time
> - * @spin: whether to spin for completions
> - *
> - * Description:
> - *    Poll for completions on the passed in queue. Returns number of
> - *    completed entries found. If @spin is true, then blk_poll will continue
> - *    looping until at least one completion is found, unless the task is
> - *    otherwise marked running (or we need to reschedule).
> - */
> -int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> +static int __blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	long state;
> @@ -3503,6 +3491,30 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>  	__set_current_state(TASK_RUNNING);
>  	return 0;
>  }
> +
> +/**
> + * blk_poll - poll for IO completions
> + * @q:  the queue
> + * @cookie: cookie passed back at IO submission time
> + * @spin: whether to spin for completions
> + *
> + * Description:
> + *    Poll for completions on the passed in queue. Returns number of
> + *    completed entries found. If @spin is true, then blk_poll will continue
> + *    looping until at least one completion is found, unless the task is
> + *    otherwise marked running (or we need to reschedule).
> + */
> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> +{
> +	int ret;
> +
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +		return 0;
> +	ret = __blk_poll(q, cookie, spin);
> +	blk_queue_exit(q);
> +
> +	return ret;
> +}

IMO, this change isn't required. Caller of blk_poll is supposed to
hold refcount of the request queue, then the related hctx data structure
won't go away. When the hctx is in transient state, there can't be IO
to be polled, and it is safe to call into IO path.

BTW, .poll is absolutely the fast path, we should be careful to add code
in this path.

Thanks,
Ming
Bart Van Assche Oct. 23, 2019, 8:58 p.m. UTC | #2
On 2019-10-22 02:41, Ming Lei wrote:
> On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
>> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>> +{
>> +	int ret;
>> +
>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>> +		return 0;
>> +	ret = __blk_poll(q, cookie, spin);
>> +	blk_queue_exit(q);
>> +
>> +	return ret;
>> +}
> 
> IMO, this change isn't required. Caller of blk_poll is supposed to
> hold refcount of the request queue, then the related hctx data structure
> won't go away. When the hctx is in transient state, there can't be IO
> to be polled, and it is safe to call into IO path.
> 
> BTW, .poll is absolutely the fast path, we should be careful to add code
> in this path.

Hi Ming,

I'm not sure whether all blk_poll() callers really hold a refcount on
the request queue. Anyway, I will convert this code change into a
comment that explains that blk_poll() callers must hold a request queue
reference.

Bart.
Ming Lei Oct. 24, 2019, 12:33 a.m. UTC | #3
On Wed, Oct 23, 2019 at 01:58:24PM -0700, Bart Van Assche wrote:
> On 2019-10-22 02:41, Ming Lei wrote:
> > On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
> >> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> >> +{
> >> +	int ret;
> >> +
> >> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> >> +		return 0;
> >> +	ret = __blk_poll(q, cookie, spin);
> >> +	blk_queue_exit(q);
> >> +
> >> +	return ret;
> >> +}
> > 
> > IMO, this change isn't required. Caller of blk_poll is supposed to
> > hold refcount of the request queue, then the related hctx data structure
> > won't go away. When the hctx is in transient state, there can't be IO
> > to be polled, and it is safe to call into IO path.
> > 
> > BTW, .poll is absolutely the fast path, we should be careful to add code
> > in this path.
> 
> Hi Ming,
> 
> I'm not sure whether all blk_poll() callers really hold a refcount on
> the request queue.

Please see __device_add_disk() which takes one extra queue ref, which
won't be dropped until the disk is released.

Meantime blkdev_get() holds gendisk reference via bdev_get_gendisk(),
and blkdev_get() is called by blkdev_open().

The above way should guarantee that the request queue won't go away
when IO is submitted to queue via blkdev fs inode.

> Anyway, I will convert this code change into a
> comment that explains that blk_poll() callers must hold a request queue
> reference.

Good point.


Thanks,
Ming

Patch
diff mbox series

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7528678ef41f..ea64d951f411 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3439,19 +3439,7 @@  static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-/**
- * blk_poll - poll for IO completions
- * @q:  the queue
- * @cookie: cookie passed back at IO submission time
- * @spin: whether to spin for completions
- *
- * Description:
- *    Poll for completions on the passed in queue. Returns number of
- *    completed entries found. If @spin is true, then blk_poll will continue
- *    looping until at least one completion is found, unless the task is
- *    otherwise marked running (or we need to reschedule).
- */
-int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+static int __blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 	long state;
@@ -3503,6 +3491,30 @@  int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	__set_current_state(TASK_RUNNING);
 	return 0;
 }
+
+/**
+ * blk_poll - poll for IO completions
+ * @q:  the queue
+ * @cookie: cookie passed back at IO submission time
+ * @spin: whether to spin for completions
+ *
+ * Description:
+ *    Poll for completions on the passed in queue. Returns number of
+ *    completed entries found. If @spin is true, then blk_poll will continue
+ *    looping until at least one completion is found, unless the task is
+ *    otherwise marked running (or we need to reschedule).
+ */
+int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+{
+	int ret;
+
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+		return 0;
+	ret = __blk_poll(q, cookie, spin);
+	blk_queue_exit(q);
+
+	return ret;
+}
 EXPORT_SYMBOL_GPL(blk_poll);
 
 unsigned int blk_mq_rq_cpu(struct request *rq)