diff mbox series

[v2,6/6] nvme-multipath: disable polling for underlying namespace request queue

Message ID 20181213063819.13614-7-sagi@grimberg.me (mailing list archive)
State New, archived
Headers show
Series restore nvme-rdma polling | expand

Commit Message

Sagi Grimberg Dec. 13, 2018, 6:38 a.m. UTC
Since the multipath device does not support polling (yet) we cannot
pass requests to the polling queue map as those will not generate
interrupt so we cannot reap the completion.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christoph Hellwig Dec. 13, 2018, 8:31 a.m. UTC | #1
On Wed, Dec 12, 2018 at 10:38:18PM -0800, Sagi Grimberg wrote:
> Since the multipath device does not support polling (yet) we cannot
> pass requests to the polling queue map as those will not generate
> interrupt so we cannot reap the completion.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

As said before I don't think this is the right place.  We need
to handle the stacking case in the block layer.  Something like this
untested patch below should do the trick:

--
From b810e53e8a9ec6f62202c2ad65e6e56277bcace7 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 13 Dec 2018 09:30:13 +0100
Subject: block: clear REQ_HIPRI if polling is not supported

This prevents a HIPRI bio from being submitted through a stacking
driver that does not support polling and thus won't poll for I/O
completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 268d2b8e9843..efa10789ddc0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -921,6 +921,9 @@ generic_make_request_checks(struct bio *bio)
 		}
 	}
 
+	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+		bio->bi_opf &= ~REQ_HIPRI;
+
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
 		if (!blk_queue_discard(q))
Steve Wise Dec. 13, 2018, 3:28 p.m. UTC | #2
On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> Since the multipath device does not support polling (yet) we cannot
> pass requests to the polling queue map as those will not generate
> interrupt so we cannot reap the completion.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Looks good.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Sagi Grimberg Dec. 13, 2018, 3:49 p.m. UTC | #3
> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> +		bio->bi_opf &= ~REQ_HIPRI;
> +

Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
avoid the extra atomic operation in the host path?

Would it make sense?
Christoph Hellwig Dec. 13, 2018, 3:52 p.m. UTC | #4
On Thu, Dec 13, 2018 at 07:49:42AM -0800, Sagi Grimberg wrote:
>
>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>> +		bio->bi_opf &= ~REQ_HIPRI;
>> +
>
> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
> avoid the extra atomic operation in the host path?
>
> Would it make sense?

test_bit is not usually implemented as an atomic operation.

Take a look at e.g.

arch/x86/include/asm/bitops.h:constant_test_bit()
Sagi Grimberg Dec. 13, 2018, 4:14 p.m. UTC | #5
>>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>>> +		bio->bi_opf &= ~REQ_HIPRI;
>>> +
>>
>> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
>> avoid the extra atomic operation in the host path?
>>
>> Would it make sense?
> 
> test_bit is not usually implemented as an atomic operation.
> 
> Take a look at e.g.
> 
> arch/x86/include/asm/bitops.h:constant_test_bit()

Ah.. But its still read from volatile argument so still more expensive?
Christoph Hellwig Dec. 13, 2018, 8:13 p.m. UTC | #6
On Thu, Dec 13, 2018 at 08:14:57AM -0800, Sagi Grimberg wrote:
>>>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>>>> +		bio->bi_opf &= ~REQ_HIPRI;
>>>> +
>>>
>>> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
>>> avoid the extra atomic operation in the host path?
>>>
>>> Would it make sense?
>>
>> test_bit is not usually implemented as an atomic operation.
>>
>> Take a look at e.g.
>>
>> arch/x86/include/asm/bitops.h:constant_test_bit()
>
> Ah.. But its still read from volatile argument so still more expensive?

I don't think the volatile should make a difference.  I actually
compiled both versions and the test_bit version generates a movq + testl
insted of testb:

-	movq	120(%rbx), %rdx	# MEM[(const long unsigned int *)q_38 + 120B], _135
-	testl	$524288, %edx	#, _135
+	testb	$8, 122(%rbx)	#, q_40->queue_flags

But actually generates a larger object:

 36966	   9470	     88	  46524	   b5bc	blk-core.o-opencode
 36956	   9470	     88	  46514	   b5b2	blk-core.o-test-bit

No idea what is going there.
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index eb1c10b0eaf0..5a6c29ee669c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1550,6 +1550,8 @@  static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->head->disk) {
 		nvme_update_disk_info(ns->head->disk, ns, id);
 		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
+		/* multipath device does not support polling */
+		blk_queue_flag_clear(QUEUE_FLAG_POLL, ns->queue);
 	}
 #endif
 }