diff mbox

[V6,13/14] block: mq-deadline: Limit write request dispatch for zoned block devices

Message ID 20171002071535.8007-14-damien.lemoal@wdc.com (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Damien Le Moal Oct. 2, 2017, 7:15 a.m. UTC
When dispatching write requests to a zoned block device, only allow
requests targeting an unlocked zone. Requests targeting a locked zone
are left in the scheduler queue to preserve the initial write order.
If no write request can be dispatched, allow reads to be dispatched
even if the write batch is not done.

To ensure that the search for an appropriate write request is atomic
in deadline_fifo_request() and deadline_next_request() with reagrd to
write requests zone lock state, introduce the spinlock zone_lock.
Holding this lock while doing the search in these functions as well as
when unlocking the target zone of a completed write request in
dd_completed_request() ensure that the search does not pickup a write
request in the middle of a zone queued write sequence.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 3 deletions(-)

Comments

Bart Van Assche Oct. 2, 2017, 11:44 p.m. UTC | #1
On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> When dispatching write requests to a zoned block device, only allow

> requests targeting an unlocked zone. Requests targeting a locked zone

> are left in the scheduler queue to preserve the initial write order.

> If no write request can be dispatched, allow reads to be dispatched

> even if the write batch is not done.

>

> To ensure that the search for an appropriate write request is atomic

> in deadline_fifo_request() and deadline_next_request() with reagrd to

                                                              ^^^^^^
                                                              regard?
> write requests zone lock state, introduce the spinlock zone_lock.

> Holding this lock while doing the search in these functions as well as

> when unlocking the target zone of a completed write request in

> dd_completed_request() ensure that the search does not pickup a write

> request in the middle of a zone queued write sequence.


Since there is already a spinlock in the mq-deadline scheduler that serializes
most operations, do we really need a new spinlock?

>  /*

>   * Write unlock the target zone of a write request.

> + * Clearing the target zone write lock bit is done with the scheduler zone_lock

> + * spinlock held so that deadline_next_request() and deadline_fifo_request()

> + * cannot see the lock state of a zone change due to a request completion during

> + * their eventual search for an appropriate write request. Otherwise, for a zone

> + * with multiple write requests queued, a non sequential write request

> + * can be chosen.

>   */

>  static void deadline_wunlock_zone(struct deadline_data *dd,

>  				  struct request *rq)

>  {

> +	unsigned long flags;

> +

> +	spin_lock_irqsave(&dd->zone_lock, flags);

> +

>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));

>  	deadline_clear_request_zone_wlock(rq);

> +

> +	spin_unlock_irqrestore(&dd->zone_lock, flags);

>  }


Is a request removed from the sort and fifo lists before being dispatched? If so,
does that mean that obtaining zone_lock in the above function is not necessary?

>  static struct request *

>  deadline_fifo_request(struct deadline_data *dd, int data_dir)

>  {

> +	struct request *rq;

> +	unsigned long flags;

> +

>  	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))

>  		return NULL;

>  

>  	if (list_empty(&dd->fifo_list[data_dir]))

>  		return NULL;

>  

> -	return rq_entry_fifo(dd->fifo_list[data_dir].next);

> +	if (!dd->zones_wlock || data_dir == READ)

> +		return rq_entry_fifo(dd->fifo_list[data_dir].next);

> +

> +	spin_lock_irqsave(&dd->zone_lock, flags);

> +

> +	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {

> +		if (deadline_can_dispatch_request(dd, rq))

> +			goto out;

> +	}

> +	rq = NULL;

> +

> +out:

> +	spin_unlock_irqrestore(&dd->zone_lock, flags);

> +

> +	return rq;

>  }


Same question here: is it really necessary to obtain zone_lock?

Thanks,

Bart.
Damien Le Moal Oct. 3, 2017, 12:19 a.m. UTC | #2
Bart,

On 10/3/17 08:44, Bart Van Assche wrote:
> On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
>> When dispatching write requests to a zoned block device, only allow
>> requests targeting an unlocked zone. Requests targeting a locked zone
>> are left in the scheduler queue to preserve the initial write order.
>> If no write request can be dispatched, allow reads to be dispatched
>> even if the write batch is not done.
>>
>> To ensure that the search for an appropriate write request is atomic
>> in deadline_fifo_request() and deadline_next_request() with reagrd to
>                                                               ^^^^^^
>                                                               regard?

Will fix.

>> write requests zone lock state, introduce the spinlock zone_lock.
>> Holding this lock while doing the search in these functions as well as
>> when unlocking the target zone of a completed write request in
>> dd_completed_request() ensure that the search does not pickup a write
>> request in the middle of a zone queued write sequence.
> 
> Since there is already a spinlock in the mq-deadline scheduler that serializes
> most operations, do we really need a new spinlock?

The dd->lock spinlock is used without IRQ disabling. So it does not
protect against request completion method calls. That spinlock being a
"big lock", I did not want to use it with IRQ disabled.

>>  /*
>>   * Write unlock the target zone of a write request.
>> + * Clearing the target zone write lock bit is done with the scheduler zone_lock
>> + * spinlock held so that deadline_next_request() and deadline_fifo_request()
>> + * cannot see the lock state of a zone change due to a request completion during
>> + * their eventual search for an appropriate write request. Otherwise, for a zone
>> + * with multiple write requests queued, a non sequential write request
>> + * can be chosen.
>>   */
>>  static void deadline_wunlock_zone(struct deadline_data *dd,
>>  				  struct request *rq)
>>  {
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>> +
>>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
>>  	deadline_clear_request_zone_wlock(rq);
>> +
>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>>  }
> 
> Is a request removed from the sort and fifo lists before being dispatched? If so,
> does that mean that obtaining zone_lock in the above function is not necessary?

Yes, a request is removed from the sort tree and fifo list before
dispatching. But the dd->zone_lock spinlock is not there to protect
that, dd->lock protects the sort tree and fifo list. dd->zone_lock was
added to prevent the completed_request() method from changing a zone
lock state while deadline_fifo_requests() or deadline_next_request() are
running. Ex:

Imagine this case: write request A for a zone Z is being executed (it
was dispatched) so Z is locked. Dispatch runs and inspects the next
requests in sort order. Let say we have the sequential writes B, C, D, E
queued for the same zone Z. First B is inspected and cannot be
dispatched (zone locked). Inspection move on to C, but before the that,
A completes and Z is unlocked. Then C will be OK to go as the zone is
now unlocked. But it is the wrong choice as it will result in out of
order write. B must be the next request dispatched after A.

dd->zone_lock prevents this from happening. Without this spinlock, the
bad example case above happens very easily.

>>  static struct request *
>>  deadline_fifo_request(struct deadline_data *dd, int data_dir)
>>  {
>> +	struct request *rq;
>> +	unsigned long flags;
>> +
>>  	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
>>  		return NULL;
>>  
>>  	if (list_empty(&dd->fifo_list[data_dir]))
>>  		return NULL;
>>  
>> -	return rq_entry_fifo(dd->fifo_list[data_dir].next);
>> +	if (!dd->zones_wlock || data_dir == READ)
>> +		return rq_entry_fifo(dd->fifo_list[data_dir].next);
>> +
>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>> +
>> +	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
>> +		if (deadline_can_dispatch_request(dd, rq))
>> +			goto out;
>> +	}
>> +	rq = NULL;
>> +
>> +out:
>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>> +
>> +	return rq;
>>  }
> 
> Same question here: is it really necessary to obtain zone_lock?

See above. Same comment.

Best regards.
Bart Van Assche Oct. 3, 2017, 8:56 p.m. UTC | #3
On Tue, 2017-10-03 at 09:19 +0900, Damien Le Moal wrote:
> On 10/3/17 08:44, Bart Van Assche wrote:

> > On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:

> > >  static void deadline_wunlock_zone(struct deadline_data *dd,

> > >  				  struct request *rq)

> > >  {

> > > +	unsigned long flags;

> > > +

> > > +	spin_lock_irqsave(&dd->zone_lock, flags);

> > > +

> > >  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));

> > >  	deadline_clear_request_zone_wlock(rq);

> > > +

> > > +	spin_unlock_irqrestore(&dd->zone_lock, flags);

> > >  }

> > 

> > Is a request removed from the sort and fifo lists before being dispatched? If so,

> > does that mean that obtaining zone_lock in the above function is not necessary?

> 

> Yes, a request is removed from the sort tree and fifo list before

> dispatching. But the dd->zone_lock spinlock is not there to protect

> that, dd->lock protects the sort tree and fifo list. dd->zone_lock was

> added to prevent the completed_request() method from changing a zone

> lock state while deadline_fifo_requests() or deadline_next_request() are

> running. Ex:

> 

> Imagine this case: write request A for a zone Z is being executed (it

> was dispatched) so Z is locked. Dispatch runs and inspects the next

> requests in sort order. Let say we have the sequential writes B, C, D, E

> queued for the same zone Z. First B is inspected and cannot be

> dispatched (zone locked). Inspection move on to C, but before the that,

> A completes and Z is unlocked. Then C will be OK to go as the zone is

> now unlocked. But it is the wrong choice as it will result in out of

> order write. B must be the next request dispatched after A.

> 

> dd->zone_lock prevents this from happening. Without this spinlock, the

> bad example case above happens very easily.


Hello Damien,

Thanks for the detailed and really clear reply. I hope you do not mind that I
have a few further questions about this patch?
- Does the zone_lock spinlock have to be acquired by both deadline_wunlock_zone()
  callers or only by the call from the request completion path?
- Why do both the mq-deadline and the sd driver each have their own instance of
  the zones_wlock bitmap? Has it been considered to convert both bitmaps into a
  single bitmap that is shared by both kernel components and that exists e.g. at
  the request queue level?

Thanks,

Bart.
Damien Le Moal Oct. 3, 2017, 11:03 p.m. UTC | #4
Bart,

On 10/4/17 05:56, Bart Van Assche wrote:
> On Tue, 2017-10-03 at 09:19 +0900, Damien Le Moal wrote:
>> On 10/3/17 08:44, Bart Van Assche wrote:
>>> On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
>>>>  static void deadline_wunlock_zone(struct deadline_data *dd,
>>>>  				  struct request *rq)
>>>>  {
>>>> +	unsigned long flags;
>>>> +
>>>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>>>> +
>>>>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
>>>>  	deadline_clear_request_zone_wlock(rq);
>>>> +
>>>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>>>>  }
>>>
>>> Is a request removed from the sort and fifo lists before being dispatched? If so,
>>> does that mean that obtaining zone_lock in the above function is not necessary?
>>
>> Yes, a request is removed from the sort tree and fifo list before
>> dispatching. But the dd->zone_lock spinlock is not there to protect
>> that, dd->lock protects the sort tree and fifo list. dd->zone_lock was
>> added to prevent the completed_request() method from changing a zone
>> lock state while deadline_fifo_requests() or deadline_next_request() are
>> running. Ex:
>>
>> Imagine this case: write request A for a zone Z is being executed (it
>> was dispatched) so Z is locked. Dispatch runs and inspects the next
>> requests in sort order. Let say we have the sequential writes B, C, D, E
>> queued for the same zone Z. First B is inspected and cannot be
>> dispatched (zone locked). Inspection move on to C, but before the that,
>> A completes and Z is unlocked. Then C will be OK to go as the zone is
>> now unlocked. But it is the wrong choice as it will result in out of
>> order write. B must be the next request dispatched after A.
>>
>> dd->zone_lock prevents this from happening. Without this spinlock, the
>> bad example case above happens very easily.
> 
> Hello Damien,
> 
> Thanks for the detailed and really clear reply. I hope you do not mind that I
> have a few further questions about this patch?
> - Does the zone_lock spinlock have to be acquired by both deadline_wunlock_zone()
>   callers or only by the call from the request completion path?

Not really. Only the completion path is strongly needed as the insert
path unlock is under dd->lock, which prevents concurrent execution of
the sort or fifo request search. So the zone_lock lock/unlock could be
moved out of deadline_wunlock_zone() and done directly in
dd_completed_request().

> - Why do both the mq-deadline and the sd driver each have their own instance of
>   the zones_wlock bitmap? Has it been considered to convert both bitmaps into a
>   single bitmap that is shared by both kernel components and that exists e.g. at
>   the request queue level?

The sd driver level zone locking handles only the legacy path. Hence the
zone lock bitmap attached to the scsi disk struct. For scsi-mq/blk-mq,
mq-deadline does the zone locking. Both bitmaps do not exist at the same
time.

Indeed we could move the zone lock bitmap in the request queue so that
it is common between legacy and mq cases. Christoph has a series doing
that, and going further by doing the zone locking directly within the
block layer dispatch code for both legacy and mq path. So the scsi level
locking and mq-deadline locking become unnecessary. Working on that new
series right now.

Best regards.
diff mbox

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 93a1aede5dd0..cbca24e1f96e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -61,6 +61,7 @@  struct deadline_data {
 	spinlock_t lock;
 	struct list_head dispatch;
 
+	spinlock_t zone_lock;
 	unsigned long *zones_wlock;
 };
 
@@ -244,12 +245,24 @@  static void deadline_wlock_zone(struct deadline_data *dd,
 
 /*
  * Write unlock the target zone of a write request.
+ * Clearing the target zone write lock bit is done with the scheduler zone_lock
+ * spinlock held so that deadline_next_request() and deadline_fifo_request()
+ * cannot see the lock state of a zone change due to a request completion during
+ * their eventual search for an appropriate write request. Otherwise, for a zone
+ * with multiple write requests queued, a non sequential write request
+ * can be chosen.
  */
 static void deadline_wunlock_zone(struct deadline_data *dd,
 				  struct request *rq)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+
 	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
 	deadline_clear_request_zone_wlock(rq);
+
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
 }
 
 /*
@@ -279,19 +292,47 @@  static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 }
 
 /*
+ * Test if a request can be dispatched.
+ */
+static inline bool deadline_can_dispatch_request(struct deadline_data *dd,
+						 struct request *rq)
+{
+	if (!deadline_request_needs_zone_wlock(dd, rq))
+		return true;
+	return !deadline_zone_is_wlocked(dd, rq);
+}
+
+/*
  * For the specified data direction, return the next request to
  * dispatch using arrival ordered lists.
  */
 static struct request *
 deadline_fifo_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
 	if (list_empty(&dd->fifo_list[data_dir]))
 		return NULL;
 
-	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+	if (!dd->zones_wlock || data_dir == READ)
+		return rq_entry_fifo(dd->fifo_list[data_dir].next);
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+
+	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
+		if (deadline_can_dispatch_request(dd, rq))
+			goto out;
+	}
+	rq = NULL;
+
+out:
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -301,10 +342,25 @@  deadline_fifo_request(struct deadline_data *dd, int data_dir)
 static struct request *
 deadline_next_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
-	return dd->next_rq[data_dir];
+	rq = dd->next_rq[data_dir];
+	if (!dd->zones_wlock || data_dir == READ)
+		return rq;
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+	while (rq) {
+		if (deadline_can_dispatch_request(dd, rq))
+			break;
+		rq = deadline_latter_request(rq);
+	}
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -346,7 +402,8 @@  static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	if (reads) {
 		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
 
-		if (writes && (dd->starved++ >= dd->writes_starved))
+		if (deadline_fifo_request(dd, WRITE) &&
+		    (dd->starved++ >= dd->writes_starved))
 			goto dispatch_writes;
 
 		data_dir = READ;
@@ -391,6 +448,13 @@  static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		rq = next_rq;
 	}
 
+	/*
+	 * If we only have writes queued and none of them can be dispatched,
+	 * rq will be NULL.
+	 */
+	if (!rq)
+		return NULL;
+
 	dd->batching = 0;
 
 dispatch_request:
@@ -490,6 +554,7 @@  static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
 	spin_lock_init(&dd->lock);
 	INIT_LIST_HEAD(&dd->dispatch);
 
+	spin_lock_init(&dd->zone_lock);
 	ret = deadline_init_zones_wlock(q, dd);
 	if (ret)
 		goto out_free_dd;