diff mbox

[v2,2/2] block: Add iocontext priority to request

Message ID 1475694007-11999-3-git-send-email-adam.manzanares@hgst.com (mailing list archive)
State New, archived
Headers show

Commit Message

Adam Manzanares Oct. 5, 2016, 7 p.m. UTC
Patch adds an association between iocontext ioprio and the ioprio of
a request. This feature is only enabled if a queue flag is set to
indicate that requests should have ioprio associated with them. The
queue flag is exposed as the req_prio queue sysfs entry.

Signed-off-by: Adam Mananzanares <adam.manzanares@hgst.com>
---
 block/blk-core.c       |  8 +++++++-
 block/blk-sysfs.c      | 32 ++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  2 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

Comments

Hannes Reinecke Oct. 6, 2016, 6:24 a.m. UTC | #1
On 10/05/2016 09:00 PM, Adam Manzanares wrote:
> Patch adds an association between iocontext ioprio and the ioprio of
> a request. This feature is only enabled if a queue flag is set to
> indicate that requests should have ioprio associated with them. The
> queue flag is exposed as the req_prio queue sysfs entry.
>
> Signed-off-by: Adam Mananzanares <adam.manzanares@hgst.com>
> ---
>  block/blk-core.c       |  8 +++++++-
>  block/blk-sysfs.c      | 32 ++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h |  2 ++
>  3 files changed, 41 insertions(+), 1 deletion(-)
>
As the previous patch depends on this one it should actually be the 
first in the series.

But other than that:

Reviewed-by: Hannes Reinecke <hare@xuse.com>

Cheers,

Hannes
Jeff Moyer Oct. 6, 2016, 7:46 p.m. UTC | #2
Hi, Adam,

Adam Manzanares <adam.manzanares@hgst.com> writes:

> Patch adds an association between iocontext ioprio and the ioprio of
> a request. This feature is only enabled if a queue flag is set to
> indicate that requests should have ioprio associated with them. The
> queue flag is exposed as the req_prio queue sysfs entry.
>
> Signed-off-by: Adam Mananzanares <adam.manzanares@hgst.com>

I like the idea of the patch, but I have a few comments.

First, don't add a tunable, there's no need for it.  (And in the future,
if you do add tunables, document them.)  That should make your patch
much smaller.

> @@ -1648,6 +1649,7 @@ out:
>  
>  void init_request_from_bio(struct request *req, struct bio *bio)
>  {
> +	struct io_context *ioc = rq_ioc(bio);

That can return NULL, and you blindly dereference it later.

> @@ -1656,7 +1658,11 @@ void init_request_from_bio(struct request *req, struct bio *bio)
>  
>  	req->errors = 0;
>  	req->__sector = bio->bi_iter.bi_sector;
> -	req->ioprio = bio_prio(bio);
> +	if (blk_queue_req_prio(req->q))
> +		req->ioprio = ioprio_best(bio_prio(bio), ioc->ioprio);
> +	else
> +		req->ioprio = bio_prio(bio);
> +

If the bio actually has an ioprio (only happens for bcache at this
point), you should use it.  Something like this:

        req->ioprio = bio_prio(bio);
        if (!req->ioprio && ioc)
		req->ioprio = ioc->ioprio;

Finally, please re-order your series as Hannes suggested.

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adam Manzanares Oct. 10, 2016, 8:37 p.m. UTC | #3
Hello Jeff,

The 10/06/2016 15:46, Jeff Moyer wrote:
> Hi, Adam,
> 
> Adam Manzanares <adam.manzanares@hgst.com> writes:
> 
> > Patch adds an association between iocontext ioprio and the ioprio of
> > a request. This feature is only enabled if a queue flag is set to
> > indicate that requests should have ioprio associated with them. The
> > queue flag is exposed as the req_prio queue sysfs entry.
> >
> > Signed-off-by: Adam Mananzanares <adam.manzanares@hgst.com>
> 
> I like the idea of the patch, but I have a few comments.
> 
> First, don't add a tunable, there's no need for it.  (And in the future,
> if you do add tunables, document them.)  That should make your patch
> much smaller.
> 

I have a strong preference for making this a tunable for the following 
reason. I am concerned that this could negatively impact performance if this 
feature is not properly implemented on a device. In addition, this feature 
can make a dramatic difference in the performance of prioritized vs 
non-prioritized IO. Priority IO is improved, but it comes at the cost of 
non-prioritized IO. If someone has tuned a system in such a way that things 
work well as is, I do not want to cause any surprises.

I can see the argument for not having the tunable in the block layer, but 
then we need to add a tunable to all request based drivers that may leverage
the iopriority information. This has the potential to generate a lot more 
code and documentation.  I also would like to use the tunable when the 
iopriority is set on the request so we can preserve the default behavior. 
This can be a concern when we have drivers that use request iopriority 
information, such as the fusion mptsas driver.

I will also document the tunable :) if we agree that it is necessary.

> > @@ -1648,6 +1649,7 @@ out:
> >  
> >  void init_request_from_bio(struct request *req, struct bio *bio)
> >  {
> > +	struct io_context *ioc = rq_ioc(bio);
> 
> That can return NULL, and you blindly dereference it later.
>

Ouch, this will be cleaned up in the next revision.

> > @@ -1656,7 +1658,11 @@ void init_request_from_bio(struct request *req, struct bio *bio)
> >  
> >  	req->errors = 0;
> >  	req->__sector = bio->bi_iter.bi_sector;
> > -	req->ioprio = bio_prio(bio);
> > +	if (blk_queue_req_prio(req->q))
> > +		req->ioprio = ioprio_best(bio_prio(bio), ioc->ioprio);
> > +	else
> > +		req->ioprio = bio_prio(bio);
> > +
> 
> If the bio actually has an ioprio (only happens for bcache at this
> point), you should use it.  Something like this:
> 
>         req->ioprio = bio_prio(bio);
>         if (!req->ioprio && ioc)
> 		req->ioprio = ioc->ioprio;
>

I caught this in the explanation of the first patch I sent out. I am still
assuming that this will be a tunable, but I will have the bio_prio take 
precedence in the next patch.

> Finally, please re-order your series as Hannes suggested.

Will do. 

> 
> Thanks!
> Jeff

Take care,
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 36c7ac3..17c3ce5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -33,6 +33,7 @@ 
 #include <linux/ratelimit.h>
 #include <linux/pm_runtime.h>
 #include <linux/blk-cgroup.h>
+#include <linux/ioprio.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/block.h>
@@ -1648,6 +1649,7 @@  out:
 
 void init_request_from_bio(struct request *req, struct bio *bio)
 {
+	struct io_context *ioc = rq_ioc(bio);
 	req->cmd_type = REQ_TYPE_FS;
 
 	req->cmd_flags |= bio->bi_opf & REQ_COMMON_MASK;
@@ -1656,7 +1658,11 @@  void init_request_from_bio(struct request *req, struct bio *bio)
 
 	req->errors = 0;
 	req->__sector = bio->bi_iter.bi_sector;
-	req->ioprio = bio_prio(bio);
+	if (blk_queue_req_prio(req->q))
+		req->ioprio = ioprio_best(bio_prio(bio), ioc->ioprio);
+	else
+		req->ioprio = bio_prio(bio);
+
 	blk_rq_bio_prep(req->q, req, bio);
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f87a7e7..268a71a 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -384,6 +384,31 @@  static ssize_t queue_dax_show(struct request_queue *q, char *page)
 	return queue_var_show(blk_queue_dax(q), page);
 }
 
+static ssize_t queue_req_prio_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_req_prio(q), page);
+}
+
+static ssize_t queue_req_prio_store(struct request_queue *q, const char *page,
+				    size_t count)
+{
+	unsigned long req_prio_on;
+	ssize_t ret;
+
+	ret = queue_var_store(&req_prio_on, page, count);
+	if (ret < 0)
+		return ret;
+
+	spin_lock_irq(q->queue_lock);
+	if (req_prio_on)
+		queue_flag_set(QUEUE_FLAG_REQ_PRIO, q);
+	else
+		queue_flag_clear(QUEUE_FLAG_REQ_PRIO, q);
+	spin_unlock_irq(q->queue_lock);
+
+	return ret;
+}
+
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_requests_show,
@@ -526,6 +551,12 @@  static struct queue_sysfs_entry queue_dax_entry = {
 	.show = queue_dax_show,
 };
 
+static struct queue_sysfs_entry queue_req_prio_entry = {
+	.attr = {.name = "req_prio", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_req_prio_show,
+	.store = queue_req_prio_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
@@ -553,6 +584,7 @@  static struct attribute *default_attrs[] = {
 	&queue_poll_entry.attr,
 	&queue_wc_entry.attr,
 	&queue_dax_entry.attr,
+	&queue_req_prio_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e79055c..23e1e2d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@  struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_REQ_PRIO    27	/* Use iocontext ioprio */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,7 @@  static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_req_prio(q)	test_bit(QUEUE_FLAG_REQ_PRIO, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \