diff mbox

[8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen

Message ID 20170831172728.15817-9-ming.lei@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Ming Lei Aug. 31, 2017, 5:27 p.m. UTC
REQF_PREEMPT is a bit special because it is required to be
dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is frozen, since we
will freeze queue before quiescing SCSI device in the
following patch for supporting safe SCSI quiescing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 25 +++++++++++++++++--------
 block/blk-mq.c         | 11 +++++++++--
 block/blk.h            |  5 +++++
 include/linux/blk-mq.h |  7 ++++---
 include/linux/blkdev.h | 17 +++++++++++++++--
 5 files changed, 50 insertions(+), 15 deletions(-)

Comments

Bart Van Assche Aug. 31, 2017, 10:50 p.m. UTC | #1
On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,

>  	/* create ioc upfront */

>  	create_io_context(gfp_mask, q->node);

>  

> -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

> +	/*

> +	 * When queue is frozen, we still need to allocate req for

> +	 * REQF_PREEMPT.

> +	 */

> +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))

> +		blk_queue_enter_live(q);

> +	else

> +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));


Hello Ming,

Sorry but I doubt that calling blk_queue_enter_live() from inside
blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
before a request queue has been marked dead. What prevents a kernel thread
that holds a reference on a request queue and that is running concurrently
with blk_cleanup_queue() to call blk_old_get_request() after a queue has
been marked dead?

Thanks,

Bart.
Ming Lei Sept. 1, 2017, 3:55 a.m. UTC | #2
On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> >  	/* create ioc upfront */
> >  	create_io_context(gfp_mask, q->node);
> >  
> > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > +	/*
> > +	 * When queue is frozen, we still need to allocate req for
> > +	 * REQF_PREEMPT.
> > +	 */
> > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > +		blk_queue_enter_live(q);
> > +	else
> > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> 
> Hello Ming,
> 
> Sorry but I doubt that calling blk_queue_enter_live() from inside
> blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> before a request queue has been marked dead. What prevents a kernel thread
> that holds a reference on a request queue and that is running concurrently
> with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> been marked dead?

I have sent one delta patch in list, which will only call
blk_queue_enter_live() iff SCSI device is in QUIESCE.
Bart Van Assche Sept. 1, 2017, 3:43 p.m. UTC | #3
On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:

> > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:

> > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,

> > >  	/* create ioc upfront */

> > >  	create_io_context(gfp_mask, q->node);

> > >  

> > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

> > > +	/*

> > > +	 * When queue is frozen, we still need to allocate req for

> > > +	 * REQF_PREEMPT.

> > > +	 */

> > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))

> > > +		blk_queue_enter_live(q);

> > > +	else

> > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

> > 

> > Sorry but I doubt that calling blk_queue_enter_live() from inside

> > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe

> > before a request queue has been marked dead. What prevents a kernel thread

> > that holds a reference on a request queue and that is running concurrently

> > with blk_cleanup_queue() to call blk_old_get_request() after a queue has

> > been marked dead?

> 

> I have sent one delta patch in list, which will only call

> blk_queue_enter_live() iff SCSI device is in QUIESCE.


That wouldn't make this hack less ugly.

It is possible to trigger the running -> quiesce state transition through
/sys/class/scsi_device/*/device/state and the quiesce -> removed transition
through /sys/class/scsi_device/*/device/delete. An example:

modprobe scsi_debug delay=0 dev_size_mb=16
lsscsi | grep scsi_debug
cd /sys/class/scsi_device/8:0:0:0/device
echo quiesce > state
echo 1 > delete

So the code from your patch 8/9 can race against device removal.

I think we need a better approach than the REQ_PREEMPT hack. How about
implementing resume as a callback by adding an additional function pointer
to struct request_queue / struct blk_mq_ops instead of implementing it as
a request? For SCSI devices races of resume against removal can e.g. be
handled by checking the scsi_device_set_state() return value. That function
namely does not allow the removing/removed to running state transition.

Bart.
Ming Lei Sept. 1, 2017, 4:56 p.m. UTC | #4
On Fri, Sep 01, 2017 at 03:43:39PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> > On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> > > >  	/* create ioc upfront */
> > > >  	create_io_context(gfp_mask, q->node);
> > > >  
> > > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > > +	/*
> > > > +	 * When queue is frozen, we still need to allocate req for
> > > > +	 * REQF_PREEMPT.
> > > > +	 */
> > > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > > +		blk_queue_enter_live(q);
> > > > +	else
> > > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > 
> > > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > > before a request queue has been marked dead. What prevents a kernel thread
> > > that holds a reference on a request queue and that is running concurrently
> > > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > > been marked dead?
> > 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> That wouldn't make this hack less ugly.
> 
> It is possible to trigger the running -> quiesce state transition through
> /sys/class/scsi_device/*/device/state and the quiesce -> removed transition
> through /sys/class/scsi_device/*/device/delete. An example:
> 
> modprobe scsi_debug delay=0 dev_size_mb=16
> lsscsi | grep scsi_debug
> cd /sys/class/scsi_device/8:0:0:0/device
> echo quiesce > state
> echo 1 > delete
> 
> So the code from your patch 8/9 can race against device removal.
> 
> I think we need a better approach than the REQ_PREEMPT hack. How about
> implementing resume as a callback by adding an additional function pointer
> to struct request_queue / struct blk_mq_ops instead of implementing it as
> a request? For SCSI devices races of resume against removal can e.g. be
> handled by checking the scsi_device_set_state() return value. That function
> namely does not allow the removing/removed to running state transition.

If there is race between resume vs. removal, that is nothing to do
this patchset.

We definitely need to prevent new requests from being allocated after
SCSI device is put into quiesce. I don't see another better way
than freezing queue, because it is the only available way to 
prevent new req allocation.

Actually there is race between normal freezing and the preempt freezing
in this patchset, and it will be fixed in V2.
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index a84672f4db32..4e048cbf82e1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1403,7 +1403,8 @@  static struct request *get_request(struct request_queue *q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, gfp_t gfp_mask,
+					   unsigned int flags)
 {
 	struct request *rq;
 	int ret = 0;
@@ -1413,9 +1414,17 @@  static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
 	if (ret)
 		return ERR_PTR(ret);
+
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
@@ -1431,26 +1440,26 @@  static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+				  gfp_t gfp_mask, unsigned int flags)
 {
 	struct request *req;
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+			flags | ((gfp_mask & __GFP_DIRECT_RECLAIM) ?
+				0 : BLK_MQ_REQ_NOWAIT));
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, gfp_mask, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24de78afbe9a..bd0a498e04df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -382,9 +382,16 @@  struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
-	int ret;
+	int ret = 0;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk.h b/block/blk.h
index 6847c5435cca..784e487c7bb7 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -79,6 +79,11 @@  static inline void blk_queue_enter_live(struct request_queue *q)
 	percpu_ref_get(&q->q_usage_counter);
 }
 
+static inline bool blk_queue_is_frozen(struct request_queue *q)
+{
+	return percpu_ref_is_dead(&q->q_usage_counter);
+}
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 void blk_flush_integrity(void);
 bool __bio_integrity_endio(struct bio *);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f90d78eb85df..0ba5cb043172 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,9 +200,10 @@  void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_PREEMPT	= BLK_REQ_PREEMPT, /* allocate for RQF_PREEMPT */
+	BLK_MQ_REQ_NOWAIT	= (1 << 8), /* return when out of requests */
+	BLK_MQ_REQ_RESERVED	= (1 << 9), /* allocate from reserved pool */
+	BLK_MQ_REQ_INTERNAL	= (1 << 10), /* allocate internal/sched tag */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02959ca03b66..10369ba86102 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -858,6 +858,11 @@  enum {
 	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
 };
 
+/* passed to __blk_get_request */
+enum {
+	BLK_REQ_PREEMPT	= (1 << 0), /* allocate for RQF_PREEMPT */
+};
+
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
 /*
@@ -940,8 +945,9 @@  extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
-extern struct request *blk_get_request(struct request_queue *, unsigned int op,
-				       gfp_t gfp_mask);
+extern struct request *__blk_get_request(struct request_queue *,
+					 unsigned int op, gfp_t gfp_mask,
+					 unsigned int flags);
 extern void blk_requeue_request(struct request_queue *, struct request *);
 extern int blk_lld_busy(struct request_queue *q);
 extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
@@ -992,6 +998,13 @@  blk_status_t errno_to_blk_status(int errno);
 
 bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 
+static inline struct request *blk_get_request(struct request_queue *q,
+					      unsigned int op,
+					      gfp_t gfp_mask)
+{
+	return __blk_get_request(q, op, gfp_mask, 0);
+}
+
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
 	return bdev->bd_disk->queue;	/* this is never NULL */