Message ID | 20230412084730.51694-2-kch@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | null_blk: allow user to set QUEUE_FLAG_NOWAIT | expand |
+ dlemoal@kenrel.org. On 4/12/23 01:47, Chaitanya Kulkarni wrote: > QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when > it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that > gets assigned in following code path see blk_mq_init_allocated_queue():- > > null_add_dev() > if (dev->queue_mode == NULL_Q_MQ) { > blk_mq_alloc_disk() > __blk_mq_alloc_disk() > blk_mq_init_queue_data() > blk_mq_init_allocated_queue() > q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; > } > > But it is not set when null_blk is loaded with NULL_Q_BIO mode in following > code path like other bio drivers do e.g. nvme-multipath :- > > if (dev->queue_mode == NULL_Q_BIO) { > nullb->disk = blk_alloc_disk(nullb->dev->home_node); > blk_alloc_disk() > blk_alloc_queue() > __alloc_disk_nodw() > } > > Add a new module parameter nowait and respective configfs attr that will > set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in > null_add_dev() irrespective of the queue mode, by default keep it > enabled to retain the original behaviour for the NULL_Q_MQ mode. > > Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction > in the null_alloc_page() for alloc_pages() and in null_insert_page() > for radix_tree_preload(). > > Observed performance difference with this patch for fio iouring with > configfs and non configfs null_blk with queue modes NULL_Q_BIO and > NULL_Q_MQ:- > > * Configfs Membacked mode:- > > NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :- > --------------------------------------------- > configfs-qmode-0-nowait-0-fio-1.log: read: IOPS=1295k, BW=5058MiB/s > configfs-qmode-0-nowait-0-fio-2.log: read: IOPS=1362k, BW=5318MiB/s > configfs-qmode-0-nowait-0-fio-3.log: read: IOPS=1332k, BW=5201MiB/s > > NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :- > --------------------------------------------- > configfs-qmode-0-nowait-1-fio-1.log: read: IOPS=2095k, BW=8183MiB/s > configfs-qmode-0-nowait-1-fio-2.log: read: IOPS=2085k, BW=8145MiB/s > configfs-qmode-0-nowait-1-fio-3.log: read: IOPS=2036k, BW=7955MiB/s > > NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :- > --------------------------------------------- > configfs-qmode-2-nowait-0-fio-1.log: read: IOPS=1288k, BW=5031MiB/s > configfs-qmode-2-nowait-0-fio-2.log: read: IOPS=1239k, BW=4839MiB/s > configfs-qmode-2-nowait-0-fio-3.log: read: IOPS=1252k, BW=4889MiB/s > > NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :- > --------------------------------------------- > configfs-qmode-2-nowait-1-fio-1.log: read: IOPS=2101k, BW=8208MiB/s > configfs-qmode-2-nowait-1-fio-2.log: read: IOPS=2091k, BW=8169MiB/s > configfs-qmode-2-nowait-1-fio-3.log: read: IOPS=2088k, BW=8155MiB/s > > * Non Configfs non-membacked mode:- > > NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :- > --------------------------------------------- > qmode-0-nowait-0-fio-1.log: read: IOPS=1362k, BW=5321MiB/s > qmode-0-nowait-0-fio-2.log: read: IOPS=1334k, BW=5210MiB/s > qmode-0-nowait-0-fio-3.log: read: IOPS=1386k, BW=5416MiB/s > > NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :- > --------------------------------------------- > qmode-0-nowait-1-fio-1.log: read: IOPS=5405k, BW=20.6GiB/s > qmode-0-nowait-1-fio-2.log: read: IOPS=5502k, BW=21.0GiB/s > qmode-0-nowait-1-fio-3.log: read: IOPS=5518k, BW=21.0GiB/s > > NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :- > --------------------------------------------- > qmode-2-nowait-0-fio-1.log: read: IOPS=1356k, BW=5296MiB/s > qmode-2-nowait-0-fio-2.log: read: IOPS=1318k, BW=5148MiB/s > qmode-2-nowait-0-fio-3.log: read: IOPS=1252k, BW=4891MiB/s > > NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :- > --------------------------------------------- > qmode-2-nowait-1-fio-1.log: read: IOPS=5466k, BW=20.9GiB/s > qmode-2-nowait-1-fio-2.log: read: IOPS=5446k, BW=20.8GiB/s > qmode-2-nowait-1-fio-3.log: read: IOPS=5482k, BW=20.9GiB/s > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > > In caes prep patch is needed (ideally it should be) for adding > gfp args to null_alloc_page() please let me know. > > -ck > > drivers/block/null_blk/main.c | 22 +++++++++++++++++----- > drivers/block/null_blk/null_blk.h | 1 + > 2 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index bc2c58724df3..67683f2fd6e1 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -77,6 +77,10 @@ enum { > NULL_IRQ_TIMER = 2, > }; > > +static bool g_nowait = true; > +module_param_named(nowait, g_nowait, bool, 0444); > +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True"); > + > static bool g_virt_boundary = false; > module_param_named(virt_boundary, g_virt_boundary, bool, 0444); > MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); > @@ -413,6 +417,7 @@ NULLB_DEVICE_ATTR(irqmode, uint, NULL); > NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL); > NULLB_DEVICE_ATTR(index, uint, NULL); > NULLB_DEVICE_ATTR(blocking, bool, NULL); > +NULLB_DEVICE_ATTR(nowait, bool, NULL); > NULLB_DEVICE_ATTR(use_per_node_hctx, bool, NULL); > NULLB_DEVICE_ATTR(memory_backed, bool, NULL); > NULLB_DEVICE_ATTR(discard, bool, NULL); > @@ -554,6 +559,7 @@ static struct configfs_attribute *nullb_device_attrs[] = { > &nullb_device_attr_hw_queue_depth, > &nullb_device_attr_index, > &nullb_device_attr_blocking, > + &nullb_device_attr_nowait, > &nullb_device_attr_use_per_node_hctx, > &nullb_device_attr_power, > &nullb_device_attr_memory_backed, > @@ -628,7 +634,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item) > static ssize_t memb_group_features_show(struct config_item *item, char *page) > { > return snprintf(page, PAGE_SIZE, > - "badblocks,blocking,blocksize,cache_size," > + "badblocks,blocking,nowait,blocksize,cache_size," > "completion_nsec,discard,home_node,hw_queue_depth," > "irqmode,max_sectors,mbps,memory_backed,no_sched," > "poll_queues,power,queue_mode,shared_tag_bitmap,size," > @@ -696,6 +702,7 @@ static struct nullb_device *null_alloc_dev(void) > dev->irqmode = g_irqmode; > dev->hw_queue_depth = g_hw_queue_depth; > dev->blocking = g_blocking; > + dev->nowait = g_nowait; > dev->memory_backed = g_memory_backed; > dev->discard = g_discard; > dev->cache_size = g_cache_size; > @@ -830,7 +837,7 @@ static void null_complete_rq(struct request *rq) > end_cmd(blk_mq_rq_to_pdu(rq)); > } > > -static struct nullb_page *null_alloc_page(void) > +static struct nullb_page *null_alloc_page(gfp_t gfp) > { > struct nullb_page *t_page; > > @@ -838,7 +845,7 @@ static struct nullb_page *null_alloc_page(void) > if (!t_page) > return NULL; > > - t_page->page = alloc_pages(GFP_NOIO, 0); > + t_page->page = alloc_pages(gfp, 0); > if (!t_page->page) { > kfree(t_page); > return NULL; > @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, > > spin_unlock_irq(&nullb->lock); > > - t_page = null_alloc_page(); > + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); > if (!t_page) > goto out_lock; > > - if (radix_tree_preload(GFP_NOIO)) > + if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO)) > goto out_freepage; > > spin_lock_irq(&nullb->lock); > @@ -2093,6 +2100,11 @@ static int null_add_dev(struct nullb_device *dev) > blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q); > blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q); > > + if (dev->nowait) > + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q); > + else > + blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q); > + > mutex_lock(&lock); > rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL); > if (rv < 0) { > diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h > index eb5972c50be8..1d7af446d728 100644 > --- a/drivers/block/null_blk/null_blk.h > +++ b/drivers/block/null_blk/null_blk.h > @@ -107,6 +107,7 @@ struct nullb_device { > unsigned int index; /* index of the disk, only valid with a disk */ > unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */ > bool blocking; /* blocking blk-mq device */ > + bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */ > bool use_per_node_hctx; /* use per-node allocation for hardware context */ > bool power; /* power on/off the device */ > bool memory_backed; /* if data is stored in memory */
On 4/12/23 17:47, Chaitanya Kulkarni wrote: > QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when > it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that > gets assigned in following code path see blk_mq_init_allocated_queue():- Can you fix the grammar/punctuation in this sentence ? Looks like some words are missing, making it hard to read. > > null_add_dev() > if (dev->queue_mode == NULL_Q_MQ) { > blk_mq_alloc_disk() > __blk_mq_alloc_disk() > blk_mq_init_queue_data() > blk_mq_init_allocated_queue() > q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; > } > > But it is not set when null_blk is loaded with NULL_Q_BIO mode in following > code path like other bio drivers do e.g. nvme-multipath :- > > if (dev->queue_mode == NULL_Q_BIO) { > nullb->disk = blk_alloc_disk(nullb->dev->home_node); > blk_alloc_disk() > blk_alloc_queue() > __alloc_disk_nodw() > } > > Add a new module parameter nowait and respective configfs attr that will > set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in > null_add_dev() irrespective of the queue mode, by default keep it > enabled to retain the original behaviour for the NULL_Q_MQ mode. Nope. You are changing the behavior. See below. > > Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction > in the null_alloc_page() for alloc_pages() and in null_insert_page() > for radix_tree_preload(). > > Observed performance difference with this patch for fio iouring with > configfs and non configfs null_blk with queue modes NULL_Q_BIO and > NULL_Q_MQ:- [...] > @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, > > spin_unlock_irq(&nullb->lock); > > - t_page = null_alloc_page(); > + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); This can potentially result in failed allocations, so IO errors, which otherwise would not happen without this change. Not nice. Memory backing also sets BLK_MQ_F_BLOCKING, which I am not sure is compatible with QUEUE_FLAG_NOWAIT... Would need to check that again.
On 4/12/23 02:41, Damien Le Moal wrote: > On 4/12/23 17:47, Chaitanya Kulkarni wrote: >> QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when >> it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that >> gets assigned in following code path see blk_mq_init_allocated_queue():- > Can you fix the grammar/punctuation in this sentence ? Looks like some words are > missing, making it hard to read. done. >> null_add_dev() >> if (dev->queue_mode == NULL_Q_MQ) { >> blk_mq_alloc_disk() >> __blk_mq_alloc_disk() >> blk_mq_init_queue_data() >> blk_mq_init_allocated_queue() >> q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; >> } >> >> But it is not set when null_blk is loaded with NULL_Q_BIO mode in following >> code path like other bio drivers do e.g. nvme-multipath :- >> >> if (dev->queue_mode == NULL_Q_BIO) { >> nullb->disk = blk_alloc_disk(nullb->dev->home_node); >> blk_alloc_disk() >> blk_alloc_queue() >> __alloc_disk_nodw() >> } >> >> Add a new module parameter nowait and respective configfs attr that will >> set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in >> null_add_dev() irrespective of the queue mode, by default keep it >> enabled to retain the original behaviour for the NULL_Q_MQ mode. > Nope. You are changing the behavior. See below. > true, changed it so that this flag is only set for the NULL_Q_BIO ... >> Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction >> in the null_alloc_page() for alloc_pages() and in null_insert_page() >> for radix_tree_preload(). >> >> Observed performance difference with this patch for fio iouring with >> configfs and non configfs null_blk with queue modes NULL_Q_BIO and >> NULL_Q_MQ:- > [...] > >> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, >> >> spin_unlock_irq(&nullb->lock); >> >> - t_page = null_alloc_page(); >> + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); > This can potentially result in failed allocations, so IO errors, which otherwise > would not happen without this change. Not nice. Memory backing also sets > BLK_MQ_F_BLOCKING, which I am not sure is compatible with QUEUE_FLAG_NOWAIT... > Would need to check that again. > > yes we should avoid that, here is explanation :- BLK_MQ_F_BLOCKING is set in null_init_tag_set(). null_init_tag_set() has two callers:- 1. null_init() :- checks if queue_mode=NULL_Q_MQ and shared_tags=1 before setting BLK_MQ_F_BLOCKING for shared tagset. 2. null_add_dev():- checks if queue_mode=NULL_Q_MQ before setting BLK_MQ_F_BLOCKING for non-shared tagset. With nowait only allowed for qmode=NULL_Q_BIO (see [1]) it will error out with qmode=NULL_Q_MQ and will never set BLK_MQ_F_BLOCKING, I'll add check in the null_validate_conf() just to make sure in V3. Thanks for review comments. -ck [1] V2:- https://marc.info/?l=linux-block&r=1&b=202304&w=2
> +static bool g_nowait = true; > +module_param_named(nowait, g_nowait, bool, 0444); > +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True"); Copy paste error. MODULE_PARM_DESC(nowait,... > + > static bool g_virt_boundary = false; > module_param_named(virt_boundary, g_virt_boundary, bool, 0444); > MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); > @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, > > spin_unlock_irq(&nullb->lock); > > - t_page = null_alloc_page(); > + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); > if (!t_page) > goto out_lock; > > - if (radix_tree_preload(GFP_NOIO)) > + if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO)) This is not correct. You need to use radix_tree_maybe_preload because GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in radix_tree_preload: /* Warn on non-sensical use... */ WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); I also verified this locally with your patch and while doing a simple fio write with setting memory_backed=1. WARNING: CPU: 2 PID: 515 at lib/radix-tree.c:366 radix_tree_preload+0x12/0x20 ... RIP: 0010:radix_tree_preload+0x12/0x20 <snip> Call Trace: <TASK> null_insert_page+0x186/0x4e0 [null_blk] null_transfer+0x588/0x990 [null_blk] <snip> > goto out_freepage; > > spin_lock_irq(&nullb->lock); > @@ -2093,6 +2100,11 @@ static int null_add_dev(struct nullb_device *dev) > blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q); > blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q); > > + if (dev->nowait) > + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q); > + else > + blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q); > + > mutex_lock(&lock); > rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL); > if (rv < 0) { > diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h > index eb5972c50be8..1d7af446d728 100644 > --- a/drivers/block/null_blk/null_blk.h > +++ b/drivers/block/null_blk/null_blk.h > @@ -107,6 +107,7 @@ struct nullb_device { > unsigned int index; /* index of the disk, only valid with a disk */ > unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */ > bool blocking; /* blocking blk-mq device */ > + bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */ > bool use_per_node_hctx; /* use per-node allocation for hardware context */ > bool power; /* power on/off the device */ > bool memory_backed; /* if data is stored in memory */ > -- > 2.29.0 >
On 4/18/23 04:41, Pankaj Raghav wrote: >> +static bool g_nowait = true; >> +module_param_named(nowait, g_nowait, bool, 0444); >> +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True"); > Copy paste error. MODULE_PARM_DESC(nowait,... fixed.. >> + >> static bool g_virt_boundary = false; >> module_param_named(virt_boundary, g_virt_boundary, bool, 0444); >> MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); >> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, >> >> spin_unlock_irq(&nullb->lock); >> >> - t_page = null_alloc_page(); >> + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); >> if (!t_page) >> goto out_lock; >> >> - if (radix_tree_preload(GFP_NOIO)) >> + if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO)) > This is not correct. You need to use radix_tree_maybe_preload because > GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in > radix_tree_preload: > > /* Warn on non-sensical use... */ > WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); > > I also verified this locally with your patch and while doing a simple fio > write with setting memory_backed=1. > > WARNING: CPU: 2 PID: 515 at lib/radix-tree.c:366 radix_tree_preload+0x12/0x20 > ... > RIP: 0010:radix_tree_preload+0x12/0x20 > <snip> > Call Trace: > <TASK> > null_insert_page+0x186/0x4e0 [null_blk] > null_transfer+0x588/0x990 [null_blk] > <snip> > thanks for pointing this out, will sendout V3 with above fix... -ck
On 4/18/23 04:41, Pankaj Raghav wrote: > > This is not correct. You need to use radix_tree_maybe_preload because > GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in > radix_tree_preload: > > /* Warn on non-sensical use... */ > WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); > > I also verified this locally with your patch and while doing a simple fio > write with setting memory_backed=1. > > plz share the exact fio job you have used to generate this error, I need add that to the patch testlog. -ck
On 2023-04-18 21:46, Chaitanya Kulkarni wrote: > On 4/18/23 04:41, Pankaj Raghav wrote: >> >> This is not correct. You need to use radix_tree_maybe_preload because >> GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in >> radix_tree_preload: >> >> /* Warn on non-sensical use... */ >> WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); >> >> I also verified this locally with your patch and while doing a simple fio >> write with setting memory_backed=1. >> >> > > plz share the exact fio job you have used to generate this error, > I need add that to the patch testlog. > This should do it: $ modprobe null_blk nowait=1 queue_mode=0 memory_backed=1 $ fio -iodepth=1 -rw=write -ioengine=io_uring -size=2M -name=io_uring_1 -filename=/dev/nullb0
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index bc2c58724df3..67683f2fd6e1 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -77,6 +77,10 @@ enum { NULL_IRQ_TIMER = 2, }; +static bool g_nowait = true; +module_param_named(nowait, g_nowait, bool, 0444); +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True"); + static bool g_virt_boundary = false; module_param_named(virt_boundary, g_virt_boundary, bool, 0444); MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False"); @@ -413,6 +417,7 @@ NULLB_DEVICE_ATTR(irqmode, uint, NULL); NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL); NULLB_DEVICE_ATTR(index, uint, NULL); NULLB_DEVICE_ATTR(blocking, bool, NULL); +NULLB_DEVICE_ATTR(nowait, bool, NULL); NULLB_DEVICE_ATTR(use_per_node_hctx, bool, NULL); NULLB_DEVICE_ATTR(memory_backed, bool, NULL); NULLB_DEVICE_ATTR(discard, bool, NULL); @@ -554,6 +559,7 @@ static struct configfs_attribute *nullb_device_attrs[] = { &nullb_device_attr_hw_queue_depth, &nullb_device_attr_index, &nullb_device_attr_blocking, + &nullb_device_attr_nowait, &nullb_device_attr_use_per_node_hctx, &nullb_device_attr_power, &nullb_device_attr_memory_backed, @@ -628,7 +634,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item) static ssize_t memb_group_features_show(struct config_item *item, char *page) { return snprintf(page, PAGE_SIZE, - "badblocks,blocking,blocksize,cache_size," + "badblocks,blocking,nowait,blocksize,cache_size," "completion_nsec,discard,home_node,hw_queue_depth," "irqmode,max_sectors,mbps,memory_backed,no_sched," "poll_queues,power,queue_mode,shared_tag_bitmap,size," @@ -696,6 +702,7 @@ static struct nullb_device *null_alloc_dev(void) dev->irqmode = g_irqmode; dev->hw_queue_depth = g_hw_queue_depth; dev->blocking = g_blocking; + dev->nowait = g_nowait; dev->memory_backed = g_memory_backed; dev->discard = g_discard; dev->cache_size = g_cache_size; @@ -830,7 +837,7 @@ static void null_complete_rq(struct request *rq) end_cmd(blk_mq_rq_to_pdu(rq)); } -static struct nullb_page *null_alloc_page(void) +static struct nullb_page *null_alloc_page(gfp_t gfp) { struct nullb_page *t_page; @@ -838,7 +845,7 @@ static struct nullb_page *null_alloc_page(void) if (!t_page) return NULL; - t_page->page = alloc_pages(GFP_NOIO, 0); + t_page->page = alloc_pages(gfp, 0); if (!t_page->page) { kfree(t_page); return NULL; @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, spin_unlock_irq(&nullb->lock); - t_page = null_alloc_page(); + t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO); if (!t_page) goto out_lock; - if (radix_tree_preload(GFP_NOIO)) + if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO)) goto out_freepage; spin_lock_irq(&nullb->lock); @@ -2093,6 +2100,11 @@ static int null_add_dev(struct nullb_device *dev) blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q); blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q); + if (dev->nowait) + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q); + else + blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q); + mutex_lock(&lock); rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL); if (rv < 0) { diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index eb5972c50be8..1d7af446d728 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -107,6 +107,7 @@ struct nullb_device { unsigned int index; /* index of the disk, only valid with a disk */ unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */ bool blocking; /* blocking blk-mq device */ + bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */ bool use_per_node_hctx; /* use per-node allocation for hardware context */ bool power; /* power on/off the device */ bool memory_backed; /* if data is stored in memory */
QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that gets assigned in following code path see blk_mq_init_allocated_queue():- null_add_dev() if (dev->queue_mode == NULL_Q_MQ) { blk_mq_alloc_disk() __blk_mq_alloc_disk() blk_mq_init_queue_data() blk_mq_init_allocated_queue() q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; } But it is not set when null_blk is loaded with NULL_Q_BIO mode in following code path like other bio drivers do e.g. nvme-multipath :- if (dev->queue_mode == NULL_Q_BIO) { nullb->disk = blk_alloc_disk(nullb->dev->home_node); blk_alloc_disk() blk_alloc_queue() __alloc_disk_nodw() } Add a new module parameter nowait and respective configfs attr that will set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in null_add_dev() irrespective of the queue mode, by default keep it enabled to retain the original behaviour for the NULL_Q_MQ mode. Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction in the null_alloc_page() for alloc_pages() and in null_insert_page() for radix_tree_preload(). Observed performance difference with this patch for fio iouring with configfs and non configfs null_blk with queue modes NULL_Q_BIO and NULL_Q_MQ:- * Configfs Membacked mode:- NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :- --------------------------------------------- configfs-qmode-0-nowait-0-fio-1.log: read: IOPS=1295k, BW=5058MiB/s configfs-qmode-0-nowait-0-fio-2.log: read: IOPS=1362k, BW=5318MiB/s configfs-qmode-0-nowait-0-fio-3.log: read: IOPS=1332k, BW=5201MiB/s NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :- --------------------------------------------- configfs-qmode-0-nowait-1-fio-1.log: read: IOPS=2095k, BW=8183MiB/s configfs-qmode-0-nowait-1-fio-2.log: read: IOPS=2085k, BW=8145MiB/s configfs-qmode-0-nowait-1-fio-3.log: read: IOPS=2036k, BW=7955MiB/s NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :- --------------------------------------------- configfs-qmode-2-nowait-0-fio-1.log: read: IOPS=1288k, BW=5031MiB/s configfs-qmode-2-nowait-0-fio-2.log: read: IOPS=1239k, BW=4839MiB/s configfs-qmode-2-nowait-0-fio-3.log: read: IOPS=1252k, BW=4889MiB/s NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :- --------------------------------------------- configfs-qmode-2-nowait-1-fio-1.log: read: IOPS=2101k, BW=8208MiB/s configfs-qmode-2-nowait-1-fio-2.log: read: IOPS=2091k, BW=8169MiB/s configfs-qmode-2-nowait-1-fio-3.log: read: IOPS=2088k, BW=8155MiB/s * Non Configfs non-membacked mode:- NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :- --------------------------------------------- qmode-0-nowait-0-fio-1.log: read: IOPS=1362k, BW=5321MiB/s qmode-0-nowait-0-fio-2.log: read: IOPS=1334k, BW=5210MiB/s qmode-0-nowait-0-fio-3.log: read: IOPS=1386k, BW=5416MiB/s NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :- --------------------------------------------- qmode-0-nowait-1-fio-1.log: read: IOPS=5405k, BW=20.6GiB/s qmode-0-nowait-1-fio-2.log: read: IOPS=5502k, BW=21.0GiB/s qmode-0-nowait-1-fio-3.log: read: IOPS=5518k, BW=21.0GiB/s NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :- --------------------------------------------- qmode-2-nowait-0-fio-1.log: read: IOPS=1356k, BW=5296MiB/s qmode-2-nowait-0-fio-2.log: read: IOPS=1318k, BW=5148MiB/s qmode-2-nowait-0-fio-3.log: read: IOPS=1252k, BW=4891MiB/s NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :- --------------------------------------------- qmode-2-nowait-1-fio-1.log: read: IOPS=5466k, BW=20.9GiB/s qmode-2-nowait-1-fio-2.log: read: IOPS=5446k, BW=20.8GiB/s qmode-2-nowait-1-fio-3.log: read: IOPS=5482k, BW=20.9GiB/s Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- In caes prep patch is needed (ideally it should be) for adding gfp args to null_alloc_page() please let me know. -ck drivers/block/null_blk/main.c | 22 +++++++++++++++++----- drivers/block/null_blk/null_blk.h | 1 + 2 files changed, 18 insertions(+), 5 deletions(-)