diff mbox series

[1/1] null_blk: allow user to set QUEUE_FLAG_NOWAIT

Message ID 20230412084730.51694-2-kch@nvidia.com (mailing list archive)
State New, archived
Headers show
Series null_blk: allow user to set QUEUE_FLAG_NOWAIT | expand

Commit Message

Chaitanya Kulkarni April 12, 2023, 8:47 a.m. UTC
QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when
it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that
gets assigned in following code path see blk_mq_init_allocated_queue():-

null_add_dev()
if (dev->queue_mode == NULL_Q_MQ) {
        blk_mq_alloc_disk()
          __blk_mq_alloc_disk()
	    blk_mq_init_queue_data()
              blk_mq_init_allocated_queue()
                q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
}

But it is not set when null_blk is loaded with NULL_Q_BIO mode in following
code path like other bio drivers do e.g. nvme-multipath :-

if (dev->queue_mode == NULL_Q_BIO) {
        nullb->disk = blk_alloc_disk(nullb->dev->home_node);
        	blk_alloc_disk()
        	  blk_alloc_queue()
        	  __alloc_disk_nodw()
}

Add a new module parameter nowait and respective configfs attr that will
set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in
null_add_dev() irrespective of the queue mode, by default keep it
enabled to retain the original behaviour for the NULL_Q_MQ mode.

Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction
in the null_alloc_page() for alloc_pages() and in null_insert_page()
for radix_tree_preload().   

Observed performance difference with this patch for fio iouring with
configfs and non configfs null_blk with queue modes NULL_Q_BIO and
NULL_Q_MQ:-

* Configfs Membacked mode:-

NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :-
---------------------------------------------
configfs-qmode-0-nowait-0-fio-1.log:  read: IOPS=1295k, BW=5058MiB/s
configfs-qmode-0-nowait-0-fio-2.log:  read: IOPS=1362k, BW=5318MiB/s
configfs-qmode-0-nowait-0-fio-3.log:  read: IOPS=1332k, BW=5201MiB/s

NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :-
---------------------------------------------
configfs-qmode-0-nowait-1-fio-1.log:  read: IOPS=2095k, BW=8183MiB/s
configfs-qmode-0-nowait-1-fio-2.log:  read: IOPS=2085k, BW=8145MiB/s
configfs-qmode-0-nowait-1-fio-3.log:  read: IOPS=2036k, BW=7955MiB/s

NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :-
---------------------------------------------
configfs-qmode-2-nowait-0-fio-1.log:  read: IOPS=1288k, BW=5031MiB/s
configfs-qmode-2-nowait-0-fio-2.log:  read: IOPS=1239k, BW=4839MiB/s
configfs-qmode-2-nowait-0-fio-3.log:  read: IOPS=1252k, BW=4889MiB/s

NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :-
---------------------------------------------
configfs-qmode-2-nowait-1-fio-1.log:  read: IOPS=2101k, BW=8208MiB/s
configfs-qmode-2-nowait-1-fio-2.log:  read: IOPS=2091k, BW=8169MiB/s
configfs-qmode-2-nowait-1-fio-3.log:  read: IOPS=2088k, BW=8155MiB/s

* Non Configfs non-membacked mode:-

NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :-
---------------------------------------------
qmode-0-nowait-0-fio-1.log:  read: IOPS=1362k, BW=5321MiB/s
qmode-0-nowait-0-fio-2.log:  read: IOPS=1334k, BW=5210MiB/s
qmode-0-nowait-0-fio-3.log:  read: IOPS=1386k, BW=5416MiB/s

NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :-
---------------------------------------------
qmode-0-nowait-1-fio-1.log:  read: IOPS=5405k, BW=20.6GiB/s
qmode-0-nowait-1-fio-2.log:  read: IOPS=5502k, BW=21.0GiB/s
qmode-0-nowait-1-fio-3.log:  read: IOPS=5518k, BW=21.0GiB/s

NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :-
---------------------------------------------
qmode-2-nowait-0-fio-1.log:  read: IOPS=1356k, BW=5296MiB/s
qmode-2-nowait-0-fio-2.log:  read: IOPS=1318k, BW=5148MiB/s
qmode-2-nowait-0-fio-3.log:  read: IOPS=1252k, BW=4891MiB/s

NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :-
---------------------------------------------
qmode-2-nowait-1-fio-1.log:  read: IOPS=5466k, BW=20.9GiB/s
qmode-2-nowait-1-fio-2.log:  read: IOPS=5446k, BW=20.8GiB/s
qmode-2-nowait-1-fio-3.log:  read: IOPS=5482k, BW=20.9GiB/s

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---

In caes prep patch is needed (ideally it should be) for adding
gfp args to null_alloc_page() please let me know.

-ck

 drivers/block/null_blk/main.c     | 22 +++++++++++++++++-----
 drivers/block/null_blk/null_blk.h |  1 +
 2 files changed, 18 insertions(+), 5 deletions(-)

Comments

Chaitanya Kulkarni April 12, 2023, 8:50 a.m. UTC | #1
+ dlemoal@kenrel.org.

On 4/12/23 01:47, Chaitanya Kulkarni wrote:
> QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when
> it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that
> gets assigned in following code path see blk_mq_init_allocated_queue():-
>
> null_add_dev()
> if (dev->queue_mode == NULL_Q_MQ) {
>          blk_mq_alloc_disk()
>            __blk_mq_alloc_disk()
> 	    blk_mq_init_queue_data()
>                blk_mq_init_allocated_queue()
>                  q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
> }
>
> But it is not set when null_blk is loaded with NULL_Q_BIO mode in following
> code path like other bio drivers do e.g. nvme-multipath :-
>
> if (dev->queue_mode == NULL_Q_BIO) {
>          nullb->disk = blk_alloc_disk(nullb->dev->home_node);
>          	blk_alloc_disk()
>          	  blk_alloc_queue()
>          	  __alloc_disk_nodw()
> }
>
> Add a new module parameter nowait and respective configfs attr that will
> set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in
> null_add_dev() irrespective of the queue mode, by default keep it
> enabled to retain the original behaviour for the NULL_Q_MQ mode.
>
> Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction
> in the null_alloc_page() for alloc_pages() and in null_insert_page()
> for radix_tree_preload().
>
> Observed performance difference with this patch for fio iouring with
> configfs and non configfs null_blk with queue modes NULL_Q_BIO and
> NULL_Q_MQ:-
>
> * Configfs Membacked mode:-
>
> NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :-
> ---------------------------------------------
> configfs-qmode-0-nowait-0-fio-1.log:  read: IOPS=1295k, BW=5058MiB/s
> configfs-qmode-0-nowait-0-fio-2.log:  read: IOPS=1362k, BW=5318MiB/s
> configfs-qmode-0-nowait-0-fio-3.log:  read: IOPS=1332k, BW=5201MiB/s
>
> NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :-
> ---------------------------------------------
> configfs-qmode-0-nowait-1-fio-1.log:  read: IOPS=2095k, BW=8183MiB/s
> configfs-qmode-0-nowait-1-fio-2.log:  read: IOPS=2085k, BW=8145MiB/s
> configfs-qmode-0-nowait-1-fio-3.log:  read: IOPS=2036k, BW=7955MiB/s
>
> NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :-
> ---------------------------------------------
> configfs-qmode-2-nowait-0-fio-1.log:  read: IOPS=1288k, BW=5031MiB/s
> configfs-qmode-2-nowait-0-fio-2.log:  read: IOPS=1239k, BW=4839MiB/s
> configfs-qmode-2-nowait-0-fio-3.log:  read: IOPS=1252k, BW=4889MiB/s
>
> NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :-
> ---------------------------------------------
> configfs-qmode-2-nowait-1-fio-1.log:  read: IOPS=2101k, BW=8208MiB/s
> configfs-qmode-2-nowait-1-fio-2.log:  read: IOPS=2091k, BW=8169MiB/s
> configfs-qmode-2-nowait-1-fio-3.log:  read: IOPS=2088k, BW=8155MiB/s
>
> * Non Configfs non-membacked mode:-
>
> NULL_Q_BIO mode QUEUE_FLAG_NOWAIT disabled :-
> ---------------------------------------------
> qmode-0-nowait-0-fio-1.log:  read: IOPS=1362k, BW=5321MiB/s
> qmode-0-nowait-0-fio-2.log:  read: IOPS=1334k, BW=5210MiB/s
> qmode-0-nowait-0-fio-3.log:  read: IOPS=1386k, BW=5416MiB/s
>
> NULL_Q_BIO mode QUEUE_FLAG_NOWAIT enabled :-
> ---------------------------------------------
> qmode-0-nowait-1-fio-1.log:  read: IOPS=5405k, BW=20.6GiB/s
> qmode-0-nowait-1-fio-2.log:  read: IOPS=5502k, BW=21.0GiB/s
> qmode-0-nowait-1-fio-3.log:  read: IOPS=5518k, BW=21.0GiB/s
>
> NULL_Q_MQ mode QUEUE_FLAG_NOWAIT disabled :-
> ---------------------------------------------
> qmode-2-nowait-0-fio-1.log:  read: IOPS=1356k, BW=5296MiB/s
> qmode-2-nowait-0-fio-2.log:  read: IOPS=1318k, BW=5148MiB/s
> qmode-2-nowait-0-fio-3.log:  read: IOPS=1252k, BW=4891MiB/s
>
> NULL_Q_MQ mode QUEUE_FLAG_NOWAIT enabled :-
> ---------------------------------------------
> qmode-2-nowait-1-fio-1.log:  read: IOPS=5466k, BW=20.9GiB/s
> qmode-2-nowait-1-fio-2.log:  read: IOPS=5446k, BW=20.8GiB/s
> qmode-2-nowait-1-fio-3.log:  read: IOPS=5482k, BW=20.9GiB/s
>
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
>
> In caes prep patch is needed (ideally it should be) for adding
> gfp args to null_alloc_page() please let me know.
>
> -ck
>
>   drivers/block/null_blk/main.c     | 22 +++++++++++++++++-----
>   drivers/block/null_blk/null_blk.h |  1 +
>   2 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index bc2c58724df3..67683f2fd6e1 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -77,6 +77,10 @@ enum {
>   	NULL_IRQ_TIMER		= 2,
>   };
>   
> +static bool g_nowait = true;
> +module_param_named(nowait, g_nowait, bool, 0444);
> +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True");
> +
>   static bool g_virt_boundary = false;
>   module_param_named(virt_boundary, g_virt_boundary, bool, 0444);
>   MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False");
> @@ -413,6 +417,7 @@ NULLB_DEVICE_ATTR(irqmode, uint, NULL);
>   NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
>   NULLB_DEVICE_ATTR(index, uint, NULL);
>   NULLB_DEVICE_ATTR(blocking, bool, NULL);
> +NULLB_DEVICE_ATTR(nowait, bool, NULL);
>   NULLB_DEVICE_ATTR(use_per_node_hctx, bool, NULL);
>   NULLB_DEVICE_ATTR(memory_backed, bool, NULL);
>   NULLB_DEVICE_ATTR(discard, bool, NULL);
> @@ -554,6 +559,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
>   	&nullb_device_attr_hw_queue_depth,
>   	&nullb_device_attr_index,
>   	&nullb_device_attr_blocking,
> +	&nullb_device_attr_nowait,
>   	&nullb_device_attr_use_per_node_hctx,
>   	&nullb_device_attr_power,
>   	&nullb_device_attr_memory_backed,
> @@ -628,7 +634,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
>   static ssize_t memb_group_features_show(struct config_item *item, char *page)
>   {
>   	return snprintf(page, PAGE_SIZE,
> -			"badblocks,blocking,blocksize,cache_size,"
> +			"badblocks,blocking,nowait,blocksize,cache_size,"
>   			"completion_nsec,discard,home_node,hw_queue_depth,"
>   			"irqmode,max_sectors,mbps,memory_backed,no_sched,"
>   			"poll_queues,power,queue_mode,shared_tag_bitmap,size,"
> @@ -696,6 +702,7 @@ static struct nullb_device *null_alloc_dev(void)
>   	dev->irqmode = g_irqmode;
>   	dev->hw_queue_depth = g_hw_queue_depth;
>   	dev->blocking = g_blocking;
> +	dev->nowait = g_nowait;
>   	dev->memory_backed = g_memory_backed;
>   	dev->discard = g_discard;
>   	dev->cache_size = g_cache_size;
> @@ -830,7 +837,7 @@ static void null_complete_rq(struct request *rq)
>   	end_cmd(blk_mq_rq_to_pdu(rq));
>   }
>   
> -static struct nullb_page *null_alloc_page(void)
> +static struct nullb_page *null_alloc_page(gfp_t gfp)
>   {
>   	struct nullb_page *t_page;
>   
> @@ -838,7 +845,7 @@ static struct nullb_page *null_alloc_page(void)
>   	if (!t_page)
>   		return NULL;
>   
> -	t_page->page = alloc_pages(GFP_NOIO, 0);
> +	t_page->page = alloc_pages(gfp, 0);
>   	if (!t_page->page) {
>   		kfree(t_page);
>   		return NULL;
> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
>   
>   	spin_unlock_irq(&nullb->lock);
>   
> -	t_page = null_alloc_page();
> +	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);
>   	if (!t_page)
>   		goto out_lock;
>   
> -	if (radix_tree_preload(GFP_NOIO))
> +	if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO))
>   		goto out_freepage;
>   
>   	spin_lock_irq(&nullb->lock);
> @@ -2093,6 +2100,11 @@ static int null_add_dev(struct nullb_device *dev)
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>   	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
>   
> +	if (dev->nowait)
> +		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q);
> +
>   	mutex_lock(&lock);
>   	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
>   	if (rv < 0) {
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index eb5972c50be8..1d7af446d728 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -107,6 +107,7 @@ struct nullb_device {
>   	unsigned int index; /* index of the disk, only valid with a disk */
>   	unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */
>   	bool blocking; /* blocking blk-mq device */
> +	bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */
>   	bool use_per_node_hctx; /* use per-node allocation for hardware context */
>   	bool power; /* power on/off the device */
>   	bool memory_backed; /* if data is stored in memory */
Damien Le Moal April 12, 2023, 9:41 a.m. UTC | #2
On 4/12/23 17:47, Chaitanya Kulkarni wrote:
> QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when
> it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that
> gets assigned in following code path see blk_mq_init_allocated_queue():-

Can you fix the grammar/punctuation in this sentence ? Looks like some words are
missing, making it hard to read.

> 
> null_add_dev()
> if (dev->queue_mode == NULL_Q_MQ) {
>         blk_mq_alloc_disk()
>           __blk_mq_alloc_disk()
> 	    blk_mq_init_queue_data()
>               blk_mq_init_allocated_queue()
>                 q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
> }
> 
> But it is not set when null_blk is loaded with NULL_Q_BIO mode in following
> code path like other bio drivers do e.g. nvme-multipath :-
> 
> if (dev->queue_mode == NULL_Q_BIO) {
>         nullb->disk = blk_alloc_disk(nullb->dev->home_node);
>         	blk_alloc_disk()
>         	  blk_alloc_queue()
>         	  __alloc_disk_nodw()
> }
> 
> Add a new module parameter nowait and respective configfs attr that will
> set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in
> null_add_dev() irrespective of the queue mode, by default keep it
> enabled to retain the original behaviour for the NULL_Q_MQ mode.

Nope. You are changing the behavior. See below.

> 
> Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction
> in the null_alloc_page() for alloc_pages() and in null_insert_page()
> for radix_tree_preload().   
> 
> Observed performance difference with this patch for fio iouring with
> configfs and non configfs null_blk with queue modes NULL_Q_BIO and
> NULL_Q_MQ:-

[...]

> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
>  
>  	spin_unlock_irq(&nullb->lock);
>  
> -	t_page = null_alloc_page();
> +	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);

This can potentially result in failed allocations, so IO errors, which otherwise
would not happen without this change. Not nice. Memory backing also sets
BLK_MQ_F_BLOCKING, which I am not sure is compatible with QUEUE_FLAG_NOWAIT...
Would need to check that again.
Chaitanya Kulkarni April 18, 2023, 7 a.m. UTC | #3
On 4/12/23 02:41, Damien Le Moal wrote:
> On 4/12/23 17:47, Chaitanya Kulkarni wrote:
>> QUEUE_FLAG_NOWAIT is set by default to mq drivers such null_blk when
>> it is used with NULL_Q_MQ mode as a part of QUEUE_FLAG_MQ_DEFAULT that
>> gets assigned in following code path see blk_mq_init_allocated_queue():-
> Can you fix the grammar/punctuation in this sentence ? Looks like some words are
> missing, making it hard to read.

done.

>> null_add_dev()
>> if (dev->queue_mode == NULL_Q_MQ) {
>>          blk_mq_alloc_disk()
>>            __blk_mq_alloc_disk()
>> 	    blk_mq_init_queue_data()
>>                blk_mq_init_allocated_queue()
>>                  q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
>> }
>>
>> But it is not set when null_blk is loaded with NULL_Q_BIO mode in following
>> code path like other bio drivers do e.g. nvme-multipath :-
>>
>> if (dev->queue_mode == NULL_Q_BIO) {
>>          nullb->disk = blk_alloc_disk(nullb->dev->home_node);
>>          	blk_alloc_disk()
>>          	  blk_alloc_queue()
>>          	  __alloc_disk_nodw()
>> }
>>
>> Add a new module parameter nowait and respective configfs attr that will
>> set or clear the QUEUE_FLAG_NOWAIT based on a value set by user in
>> null_add_dev() irrespective of the queue mode, by default keep it
>> enabled to retain the original behaviour for the NULL_Q_MQ mode.
> Nope. You are changing the behavior. See below.
>

true, changed it so that this flag is only set for the NULL_Q_BIO ...

>> Depending on nowait value use GFP_NOWAIT or GFP_NOIO for the alloction
>> in the null_alloc_page() for alloc_pages() and in null_insert_page()
>> for radix_tree_preload().
>>
>> Observed performance difference with this patch for fio iouring with
>> configfs and non configfs null_blk with queue modes NULL_Q_BIO and
>> NULL_Q_MQ:-
> [...]
>
>> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
>>   
>>   	spin_unlock_irq(&nullb->lock);
>>   
>> -	t_page = null_alloc_page();
>> +	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);
> This can potentially result in failed allocations, so IO errors, which otherwise
> would not happen without this change. Not nice. Memory backing also sets
> BLK_MQ_F_BLOCKING, which I am not sure is compatible with QUEUE_FLAG_NOWAIT...
> Would need to check that again.
>
>

yes we should avoid that, here is explanation :-

BLK_MQ_F_BLOCKING is set in null_init_tag_set().
null_init_tag_set() has two callers:-

1. null_init() :-   checks if queue_mode=NULL_Q_MQ and shared_tags=1
                     before setting BLK_MQ_F_BLOCKING for shared tagset.
2. null_add_dev():- checks if queue_mode=NULL_Q_MQ before
                     setting BLK_MQ_F_BLOCKING for non-shared tagset.

With nowait only allowed for qmode=NULL_Q_BIO (see [1]) it will error
out with qmode=NULL_Q_MQ and will never set BLK_MQ_F_BLOCKING,
I'll add check in the null_validate_conf() just to make sure in V3.

Thanks for review comments.

-ck

[1] V2:- https://marc.info/?l=linux-block&r=1&b=202304&w=2
Pankaj Raghav April 18, 2023, 11:41 a.m. UTC | #4
> +static bool g_nowait = true;
> +module_param_named(nowait, g_nowait, bool, 0444);
> +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True");
Copy paste error. MODULE_PARM_DESC(nowait,...
> +
>  static bool g_virt_boundary = false;
>  module_param_named(virt_boundary, g_virt_boundary, bool, 0444);
>  MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False");
> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
>  
>  	spin_unlock_irq(&nullb->lock);
>  
> -	t_page = null_alloc_page();
> +	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);
>  	if (!t_page)
>  		goto out_lock;
>  
> -	if (radix_tree_preload(GFP_NOIO))
> +	if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO))

This is not correct. You need to use radix_tree_maybe_preload because
GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in
radix_tree_preload:

/* Warn on non-sensical use... */
WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));

I also verified this locally with your patch and while doing a simple fio 
write with setting memory_backed=1.

WARNING: CPU: 2 PID: 515 at lib/radix-tree.c:366 radix_tree_preload+0x12/0x20
...
RIP: 0010:radix_tree_preload+0x12/0x20
<snip>
Call Trace:
 <TASK>
 null_insert_page+0x186/0x4e0 [null_blk]
 null_transfer+0x588/0x990 [null_blk]
<snip>

>  		goto out_freepage;
>  
>  	spin_lock_irq(&nullb->lock);
> @@ -2093,6 +2100,11 @@ static int null_add_dev(struct nullb_device *dev)
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>  	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
>  
> +	if (dev->nowait)
> +		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q);
> +
>  	mutex_lock(&lock);
>  	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
>  	if (rv < 0) {
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index eb5972c50be8..1d7af446d728 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -107,6 +107,7 @@ struct nullb_device {
>  	unsigned int index; /* index of the disk, only valid with a disk */
>  	unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */
>  	bool blocking; /* blocking blk-mq device */
> +	bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */
>  	bool use_per_node_hctx; /* use per-node allocation for hardware context */
>  	bool power; /* power on/off the device */
>  	bool memory_backed; /* if data is stored in memory */
> -- 
> 2.29.0
>
Chaitanya Kulkarni April 18, 2023, 7:40 p.m. UTC | #5
On 4/18/23 04:41, Pankaj Raghav wrote:
>> +static bool g_nowait = true;
>> +module_param_named(nowait, g_nowait, bool, 0444);
>> +MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True");
> Copy paste error. MODULE_PARM_DESC(nowait,...

fixed..

>> +
>>   static bool g_virt_boundary = false;
>>   module_param_named(virt_boundary, g_virt_boundary, bool, 0444);
>>   MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False");
>> @@ -983,11 +990,11 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
>>   
>>   	spin_unlock_irq(&nullb->lock);
>>   
>> -	t_page = null_alloc_page();
>> +	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);
>>   	if (!t_page)
>>   		goto out_lock;
>>   
>> -	if (radix_tree_preload(GFP_NOIO))
>> +	if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO))
> This is not correct. You need to use radix_tree_maybe_preload because
> GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in
> radix_tree_preload:
>
> /* Warn on non-sensical use... */
> WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));
>
> I also verified this locally with your patch and while doing a simple fio
> write with setting memory_backed=1.
>
> WARNING: CPU: 2 PID: 515 at lib/radix-tree.c:366 radix_tree_preload+0x12/0x20
> ...
> RIP: 0010:radix_tree_preload+0x12/0x20
> <snip>
> Call Trace:
>   <TASK>
>   null_insert_page+0x186/0x4e0 [null_blk]
>   null_transfer+0x588/0x990 [null_blk]
> <snip>
>

thanks for pointing this out, will sendout V3 with above fix...

-ck
Chaitanya Kulkarni April 18, 2023, 7:46 p.m. UTC | #6
On 4/18/23 04:41, Pankaj Raghav wrote:
>
> This is not correct. You need to use radix_tree_maybe_preload because
> GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in
> radix_tree_preload:
>
> /* Warn on non-sensical use... */
> WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));
>
> I also verified this locally with your patch and while doing a simple fio
> write with setting memory_backed=1.
>
>

plz share the exact fio job you have used to generate this error,
I need add that to the patch testlog.

-ck
Pankaj Raghav April 19, 2023, 7:42 a.m. UTC | #7
On 2023-04-18 21:46, Chaitanya Kulkarni wrote:
> On 4/18/23 04:41, Pankaj Raghav wrote:
>>
>> This is not correct. You need to use radix_tree_maybe_preload because
>> GFP_NOWAIT should not block and this WARN_ON_ONCE flag will trigger in
>> radix_tree_preload:
>>
>> /* Warn on non-sensical use... */
>> WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask));
>>
>> I also verified this locally with your patch and while doing a simple fio
>> write with setting memory_backed=1.
>>
>>
> 
> plz share the exact fio job you have used to generate this error,
> I need add that to the patch testlog.
> 
This should do it:

$ modprobe null_blk nowait=1 queue_mode=0 memory_backed=1
$ fio -iodepth=1 -rw=write -ioengine=io_uring -size=2M -name=io_uring_1  -filename=/dev/nullb0
diff mbox series

Patch

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index bc2c58724df3..67683f2fd6e1 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -77,6 +77,10 @@  enum {
 	NULL_IRQ_TIMER		= 2,
 };
 
+static bool g_nowait = true;
+module_param_named(nowait, g_nowait, bool, 0444);
+MODULE_PARM_DESC(virt_boundary, "Set QUEUE_FLAG_NOWAIT irrespective of queue mode. Default: True");
+
 static bool g_virt_boundary = false;
 module_param_named(virt_boundary, g_virt_boundary, bool, 0444);
 MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False");
@@ -413,6 +417,7 @@  NULLB_DEVICE_ATTR(irqmode, uint, NULL);
 NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
 NULLB_DEVICE_ATTR(index, uint, NULL);
 NULLB_DEVICE_ATTR(blocking, bool, NULL);
+NULLB_DEVICE_ATTR(nowait, bool, NULL);
 NULLB_DEVICE_ATTR(use_per_node_hctx, bool, NULL);
 NULLB_DEVICE_ATTR(memory_backed, bool, NULL);
 NULLB_DEVICE_ATTR(discard, bool, NULL);
@@ -554,6 +559,7 @@  static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_hw_queue_depth,
 	&nullb_device_attr_index,
 	&nullb_device_attr_blocking,
+	&nullb_device_attr_nowait,
 	&nullb_device_attr_use_per_node_hctx,
 	&nullb_device_attr_power,
 	&nullb_device_attr_memory_backed,
@@ -628,7 +634,7 @@  nullb_group_drop_item(struct config_group *group, struct config_item *item)
 static ssize_t memb_group_features_show(struct config_item *item, char *page)
 {
 	return snprintf(page, PAGE_SIZE,
-			"badblocks,blocking,blocksize,cache_size,"
+			"badblocks,blocking,nowait,blocksize,cache_size,"
 			"completion_nsec,discard,home_node,hw_queue_depth,"
 			"irqmode,max_sectors,mbps,memory_backed,no_sched,"
 			"poll_queues,power,queue_mode,shared_tag_bitmap,size,"
@@ -696,6 +702,7 @@  static struct nullb_device *null_alloc_dev(void)
 	dev->irqmode = g_irqmode;
 	dev->hw_queue_depth = g_hw_queue_depth;
 	dev->blocking = g_blocking;
+	dev->nowait = g_nowait;
 	dev->memory_backed = g_memory_backed;
 	dev->discard = g_discard;
 	dev->cache_size = g_cache_size;
@@ -830,7 +837,7 @@  static void null_complete_rq(struct request *rq)
 	end_cmd(blk_mq_rq_to_pdu(rq));
 }
 
-static struct nullb_page *null_alloc_page(void)
+static struct nullb_page *null_alloc_page(gfp_t gfp)
 {
 	struct nullb_page *t_page;
 
@@ -838,7 +845,7 @@  static struct nullb_page *null_alloc_page(void)
 	if (!t_page)
 		return NULL;
 
-	t_page->page = alloc_pages(GFP_NOIO, 0);
+	t_page->page = alloc_pages(gfp, 0);
 	if (!t_page->page) {
 		kfree(t_page);
 		return NULL;
@@ -983,11 +990,11 @@  static struct nullb_page *null_insert_page(struct nullb *nullb,
 
 	spin_unlock_irq(&nullb->lock);
 
-	t_page = null_alloc_page();
+	t_page = null_alloc_page(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO);
 	if (!t_page)
 		goto out_lock;
 
-	if (radix_tree_preload(GFP_NOIO))
+	if (radix_tree_preload(nullb->dev->nowait ? GFP_NOWAIT : GFP_NOIO))
 		goto out_freepage;
 
 	spin_lock_irq(&nullb->lock);
@@ -2093,6 +2100,11 @@  static int null_add_dev(struct nullb_device *dev)
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
 
+	if (dev->nowait)
+		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, nullb->q);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, nullb->q);
+
 	mutex_lock(&lock);
 	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
 	if (rv < 0) {
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index eb5972c50be8..1d7af446d728 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -107,6 +107,7 @@  struct nullb_device {
 	unsigned int index; /* index of the disk, only valid with a disk */
 	unsigned int mbps; /* Bandwidth throttle cap (in MB/s) */
 	bool blocking; /* blocking blk-mq device */
+	bool nowait; /* to set QUEUE_FLAG_NOWAIT on device queue */
 	bool use_per_node_hctx; /* use per-node allocation for hardware context */
 	bool power; /* power on/off the device */
 	bool memory_backed; /* if data is stored in memory */