diff mbox

nbd: create a recv workqueue per nbd device

Message ID 1484322673-10606-1-git-send-email-jbacik@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Jan. 13, 2017, 3:51 p.m. UTC
Since we are in the memory reclaim path we need our recv work to be on a
workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
set WQ_HIGHPRI since we are in the completion path for IO.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Sagi Grimberg Jan. 13, 2017, 10:24 p.m. UTC | #1
Hey Josef,

> Since we are in the memory reclaim path we need our recv work to be on a
> workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
> set WQ_HIGHPRI since we are in the completion path for IO.

Really a workqueue per device?? Did this really give performance
advantage? Can this really scale with number of devices?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 14, 2017, 1:04 a.m. UTC | #2
> On Jan 13, 2017, at 5:24 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> Hey Josef,
> 
>> Since we are in the memory reclaim path we need our recv work to be on a
>> workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
>> set WQ_HIGHPRI since we are in the completion path for IO.
> 
> Really a workqueue per device?? Did this really give performance
> advantage? Can this really scale with number of devices?

I don't see why not, especially since these things run the whole time the device is active.  I have patches forthcoming to make device creation dynamic so we don't have a bunch all at once.  That being said I'm not married to the idea, just seemed like a good idea at the time and not particularly harmful.  Thanks,

Josef--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Jan. 14, 2017, 9:15 p.m. UTC | #3
>> Hey Josef,
>>
>>> Since we are in the memory reclaim path we need our recv work to be on a
>>> workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
>>> set WQ_HIGHPRI since we are in the completion path for IO.
>>
>> Really a workqueue per device?? Did this really give performance
>> advantage? Can this really scale with number of devices?
>
> I don't see why not, especially since these things run the whole time the device is active.  I have patches forthcoming to make device creation dynamic so we don't have a bunch all at once.  That being said I'm not married to the idea, just seemed like a good idea at the time and not particularly harmful.  Thanks,

I just don't see how having a worqueue per device helps anything? There
are plenty of active workers per workqueue and even if its not enough
you can specify more with max_active.

I guess what I'm trying to say is that I don't understand what this is
solving. The commit message explains why you need WQ_MEM_RECLAIM and why
you want WQ_HIGHPRI, but does not explain why workqueue per device is
helping/solving anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 14, 2017, 9:27 p.m. UTC | #4
> On Jan 14, 2017, at 4:15 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> 
>>> Hey Josef,
>>> 
>>>> Since we are in the memory reclaim path we need our recv work to be on a
>>>> workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
>>>> set WQ_HIGHPRI since we are in the completion path for IO.
>>> 
>>> Really a workqueue per device?? Did this really give performance
>>> advantage? Can this really scale with number of devices?
>> 
>> I don't see why not, especially since these things run the whole time the device is active.  I have patches forthcoming to make device creation dynamic so we don't have a bunch all at once.  That being said I'm not married to the idea, just seemed like a good idea at the time and not particularly harmful.  Thanks,
> 
> I just don't see how having a worqueue per device helps anything? There
> are plenty of active workers per workqueue and even if its not enough
> you can specify more with max_active.
> 
> I guess what I'm trying to say is that I don't understand what this is
> solving. The commit message explains why you need WQ_MEM_RECLAIM and why
> you want WQ_HIGHPRI, but does not explain why workqueue per device is
> helping/solving anything.

There's no reason for it, that's just the way I did it. I will test both ways on Tuesday and if there's no measurable difference then I'll do a global one.  Thanks,

Josef--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 99c8446..e0a8d51 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -70,6 +70,7 @@  struct nbd_device {
 	struct task_struct *task_recv;
 	struct task_struct *task_setup;
 
+	struct workqueue_struct *recv_workqueue;
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	struct dentry *dbg_dir;
 #endif
@@ -787,7 +788,7 @@  static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 			INIT_WORK(&args[i].work, recv_work);
 			args[i].nbd = nbd;
 			args[i].index = i;
-			queue_work(system_long_wq, &args[i].work);
+			queue_work(nbd->recv_workqueue, &args[i].work);
 		}
 		wait_event_interruptible(nbd->recv_wq,
 					 atomic_read(&nbd->recv_threads) == 0);
@@ -1074,6 +1075,16 @@  static int __init nbd_init(void)
 			goto out;
 		}
 
+		nbd_dev[i].recv_workqueue =
+			alloc_workqueue("knbd-recv",
+					WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
+		if (!nbd_dev[i].recv_workqueue) {
+			blk_mq_free_tag_set(&nbd_dev[i].tag_set);
+			blk_cleanup_queue(disk->queue);
+			put_disk(disk);
+			goto out;
+		}
+
 		/*
 		 * Tell the block layer that we are not a rotational device
 		 */
@@ -1115,6 +1126,7 @@  static int __init nbd_init(void)
 		blk_mq_free_tag_set(&nbd_dev[i].tag_set);
 		blk_cleanup_queue(nbd_dev[i].disk->queue);
 		put_disk(nbd_dev[i].disk);
+		destroy_workqueue(nbd_dev[i].recv_workqueue);
 	}
 	kfree(nbd_dev);
 	return err;
@@ -1134,6 +1146,7 @@  static void __exit nbd_cleanup(void)
 			blk_cleanup_queue(disk->queue);
 			blk_mq_free_tag_set(&nbd_dev[i].tag_set);
 			put_disk(disk);
+			destroy_workqueue(nbd_dev[i].recv_workqueue);
 		}
 	}
 	unregister_blkdev(NBD_MAJOR, "nbd");