diff mbox

[dm-devel] split scsi passthrough fields out of struct request V2

Message ID e1d5f4e6-c63f-545d-710f-d4d10fa8d2e0@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe Jan. 26, 2017, 9:51 p.m. UTC
On 01/26/2017 02:47 PM, Bart Van Assche wrote:
> (gdb) list *(blk_mq_sched_get_request+0x310)
> 0xffffffff8132dcf0 is in blk_mq_sched_get_request (block/blk-mq-sched.c:136).
> 131                                     rq->rq_flags |= RQF_QUEUED;
> 132                     } else
> 133                             rq = __blk_mq_alloc_request(data, op);
> 134             } else {
> 135                     rq = __blk_mq_alloc_request(data, op);
> 136                     data->hctx->tags->rqs[rq->tag] = rq;
> 137             }
> 138
> 139             if (rq) {
> 140                     if (!op_is_flush(op)) {
> 
> (gdb) disas blk_mq_sched_get_request
> [ ... ]
>    0xffffffff8132dce3 <+771>:   callq  0xffffffff81324ab0 <__blk_mq_alloc_request>
>    0xffffffff8132dce8 <+776>:   mov    %rax,%rcx
>    0xffffffff8132dceb <+779>:   mov    0x18(%r12),%rax
>    0xffffffff8132dcf0 <+784>:   movslq 0x5c(%rcx),%rdx
> [ ... ]
> (gdb) print &((struct request *)0)->tag
> $1 = (int *) 0x5c <irq_stack_union+92>
> 
> I think this means that rq == NULL and that a test for rq is missing after the
> __blk_mq_alloc_request() call?

That is exactly what it means, looks like that one path doesn't handle
that.  You'd have to exhaust the pool with atomic allocs for this to
trigger, we don't do that at all in the normal IO path. So good catch,
must be the dm part that enables this since it does NOWAIT allocations.

Comments

Jens Axboe Jan. 26, 2017, 11:26 p.m. UTC | #1
On 01/26/2017 04:14 PM, Bart Van Assche wrote:
> On Thu, 2017-01-26 at 14:51 -0700, Jens Axboe wrote:
>> That is exactly what it means, looks like that one path doesn't handle
>> that.  You'd have to exhaust the pool with atomic allocs for this to
>> trigger, we don't do that at all in the normal IO path. So good catch,
>> must be the dm part that enables this since it does NOWAIT allocations.
>>
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 3136696f4991..c27613de80c5 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -134,7 +134,8 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>>  			rq = __blk_mq_alloc_request(data, op);
>>  	} else {
>>  		rq = __blk_mq_alloc_request(data, op);
>> -		data->hctx->tags->rqs[rq->tag] = rq;
>> +		if (rq)
>> +			data->hctx->tags->rqs[rq->tag] = rq;
>>  	}
>>  
>>  	if (rq) {
> 
> Hello Jens,
> 
> With these two patches applied the scheduling-while-atomic complaint and
> the oops are gone. However, some tasks get stuck. Is the console output
> below enough to figure out what is going on or do you want me to bisect
> this? I don't think that any requests got stuck since no pending requests
> are shown in /sys/block/*/mq/*/{pending,*/rq_list}.

What device is stuck? Is it running with an mq scheduler attached, or
with "none"?

Would also be great to see the output of /sys/block/*/mq/*/tags and
sched_tags so we can see if they have anything pending.

From a quick look at the below, it looks like a request leak. Bisection
would most likely be very helpful.
Bart Van Assche Jan. 26, 2017, 11:47 p.m. UTC | #2
On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote:
> What device is stuck? Is it running with an mq scheduler attached, or
> with "none"?
> 
> Would also be great to see the output of /sys/block/*/mq/*/tags and
> sched_tags so we can see if they have anything pending.
> 
> From a quick look at the below, it looks like a request leak. Bisection
> would most likely be very helpful.

Hello Jens,

This happens with and without scheduler attached. The most recent test I ran
was with the deadline scheduler configured as default scheduler for all blk-mq
devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and
CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are
/dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows:

# (cd /sys/class/block && grep -aH '' dm*/mq/*/tags)
dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
dm-0/mq/0/tags:nr_free=1795, nr_reserved=0
dm-0/mq/0/tags:active_queues=0
dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
dm-1/mq/0/tags:nr_free=2047, nr_reserved=0
dm-1/mq/0/tags:active_queues=0
# (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags)
dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0
dm-0/mq/0/sched_tags:active_queues=0
dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0
dm-1/mq/0/sched_tags:active_queues=0

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Jan. 26, 2017, 11:50 p.m. UTC | #3
On 01/26/2017 04:47 PM, Bart Van Assche wrote:
> On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote:
>> What device is stuck? Is it running with an mq scheduler attached, or
>> with "none"?
>>
>> Would also be great to see the output of /sys/block/*/mq/*/tags and
>> sched_tags so we can see if they have anything pending.
>>
>> From a quick look at the below, it looks like a request leak. Bisection
>> would most likely be very helpful.
> 
> Hello Jens,
> 
> This happens with and without scheduler attached. The most recent test I ran
> was with the deadline scheduler configured as default scheduler for all blk-mq
> devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and
> CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are
> /dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows:
> 
> # (cd /sys/class/block && grep -aH '' dm*/mq/*/tags)
> dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
> dm-0/mq/0/tags:nr_free=1795, nr_reserved=0
> dm-0/mq/0/tags:active_queues=0
> dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
> dm-1/mq/0/tags:nr_free=2047, nr_reserved=0
> dm-1/mq/0/tags:active_queues=0
> # (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags)
> dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
> dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0
> dm-0/mq/0/sched_tags:active_queues=0
> dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
> dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0
> dm-1/mq/0/sched_tags:active_queues=0

Clearly we are missing some requests. How do I setup dm similarly to
you?

Does it reproduce without Christoph's patchset?
Jens Axboe Jan. 27, 2017, 12:33 a.m. UTC | #4
On 01/26/2017 04:50 PM, Jens Axboe wrote:
> On 01/26/2017 04:47 PM, Bart Van Assche wrote:
>> On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote:
>>> What device is stuck? Is it running with an mq scheduler attached, or
>>> with "none"?
>>>
>>> Would also be great to see the output of /sys/block/*/mq/*/tags and
>>> sched_tags so we can see if they have anything pending.
>>>
>>> From a quick look at the below, it looks like a request leak. Bisection
>>> would most likely be very helpful.
>>
>> Hello Jens,
>>
>> This happens with and without scheduler attached. The most recent test I ran
>> was with the deadline scheduler configured as default scheduler for all blk-mq
>> devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and
>> CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are
>> /dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows:
>>
>> # (cd /sys/class/block && grep -aH '' dm*/mq/*/tags)
>> dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
>> dm-0/mq/0/tags:nr_free=1795, nr_reserved=0
>> dm-0/mq/0/tags:active_queues=0
>> dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64
>> dm-1/mq/0/tags:nr_free=2047, nr_reserved=0
>> dm-1/mq/0/tags:active_queues=0
>> # (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags)
>> dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
>> dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0
>> dm-0/mq/0/sched_tags:active_queues=0
>> dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64
>> dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0
>> dm-1/mq/0/sched_tags:active_queues=0
> 
> Clearly we are missing some requests. How do I setup dm similarly to
> you?
> 
> Does it reproduce without Christoph's patchset?

I have dm-mpath running using blk_mq and with mq-deadline on both dm and
the lower level device, and it seems to be running just fine here.
Note, this is without Christoph's patchset, I'll try that next once
xfstest finishes.
Bart Van Assche Jan. 27, 2017, 12:38 a.m. UTC | #5
On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote:
> Clearly we are missing some requests. How do I setup dm similarly to
> you?
> 
> Does it reproduce without Christoph's patchset?

Hello Jens,

I see similar behavior with the blk-mq-sched branch of
git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
booting happens much slower than usual and I/O hangs if I run the
srp-test software.

Regarding creating a similar dm setup: I hope that in the future it
will become possible to run the srp-test software without any special
hardware and with in-tree drivers. Today running the srp-test software
with in-tree drivers namely requires IB hardware. This is how to run the
srp-test software today with in-tree drivers:
* Find a system with at least two InfiniBand ports.
* Make sure that the appropriate IB driver in the kernel is enabled and
  also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp,
  ib_srpt and dm-mpath are built as kernel modules.
* If none of the IB ports are connected to an IB switch, connect the
  two ports to each other and configure and start the opensm software
  such that the port states change from "Initializing" to "Active".
* Check with "ibstat | grep State: Active" that at least one port is
  in the active state.
* Configure multipathd as explained in
  https://github.com/bvanassche/srp-test/blob/master/README.md.
* Restart multipathd to make sure it picks up /etc/multipath.conf.
* Clone https://github.com/bvanassche/srp-test and start it as follows:
  srp-test/run_tests -t 02-mq

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Jan. 27, 2017, 12:41 a.m. UTC | #6
On 01/26/2017 05:38 PM, Bart Van Assche wrote:
> On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote:
>> Clearly we are missing some requests. How do I setup dm similarly to
>> you?
>>
>> Does it reproduce without Christoph's patchset?
> 
> Hello Jens,
> 
> I see similar behavior with the blk-mq-sched branch of
> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
> booting happens much slower than usual and I/O hangs if I run the
> srp-test software.

Please don't run that, run for-4.11/block and merge it to master.
Same behavior?

> Regarding creating a similar dm setup: I hope that in the future it
> will become possible to run the srp-test software without any special
> hardware and with in-tree drivers. Today running the srp-test software
> with in-tree drivers namely requires IB hardware. This is how to run the
> srp-test software today with in-tree drivers:
> * Find a system with at least two InfiniBand ports.
> * Make sure that the appropriate IB driver in the kernel is enabled and
>   also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp,
>   ib_srpt and dm-mpath are built as kernel modules.
> * If none of the IB ports are connected to an IB switch, connect the
>   two ports to each other and configure and start the opensm software
>   such that the port states change from "Initializing" to "Active".
> * Check with "ibstat | grep State: Active" that at least one port is
>   in the active state.
> * Configure multipathd as explained in
>   https://github.com/bvanassche/srp-test/blob/master/README.md.
> * Restart multipathd to make sure it picks up /etc/multipath.conf.
> * Clone https://github.com/bvanassche/srp-test and start it as follows:
>   srp-test/run_tests -t 02-mq

I can't run that. Any chance of a test case that doesn't require IB?
Bart Van Assche Jan. 27, 2017, 1:15 a.m. UTC | #7
On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
> > I see similar behavior with the blk-mq-sched branch of
> > git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
> > booting happens much slower than usual and I/O hangs if I run the
> > srp-test software.
> 
> Please don't run that, run for-4.11/block and merge it to master.
> Same behavior?

I have not yet had the chance to run the srp-test software against that
kernel. But I already see that booting takes more than ten times longer
than usual. Note: as far as I know the dm-mpath driver is not involved
in the boot process of my test system.

> > Regarding creating a similar dm setup: I hope that in the future it
> > will become possible to run the srp-test software without any special
> > hardware and with in-tree drivers. Today running the srp-test software
> > with in-tree drivers namely requires IB hardware. This is how to run the
> > srp-test software today with in-tree drivers:
> > * Find a system with at least two InfiniBand ports.
> > * Make sure that the appropriate IB driver in the kernel is enabled and
> >   also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp,
> >   ib_srpt and dm-mpath are built as kernel modules.
> > * If none of the IB ports are connected to an IB switch, connect the
> >   two ports to each other and configure and start the opensm software
> >   such that the port states change from "Initializing" to "Active".
> > * Check with "ibstat | grep State: Active" that at least one port is
> >   in the active state.
> > * Configure multipathd as explained in
> >   https://github.com/bvanassche/srp-test/blob/master/README.md.
> > * Restart multipathd to make sure it picks up /etc/multipath.conf.
> > * Clone https://github.com/bvanassche/srp-test and start it as follows:
> >   srp-test/run_tests -t 02-mq
> 
> I can't run that. Any chance of a test case that doesn't require IB?

It is possible to run that test on top of the SoftRoCE driver. I will first
check myself whether the latest version of the SoftRoCE driver is stable
enough to run srp-test on top of it (see also
https://github.com/dledford/linux/commits/k.o/for-4.11).

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Jan. 27, 2017, 1:22 a.m. UTC | #8
On 01/26/2017 06:15 PM, Bart Van Assche wrote:
> On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
>> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
>>> I see similar behavior with the blk-mq-sched branch of
>>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
>>> booting happens much slower than usual and I/O hangs if I run the
>>> srp-test software.
>>
>> Please don't run that, run for-4.11/block and merge it to master.
>> Same behavior?
> 
> I have not yet had the chance to run the srp-test software against that
> kernel. But I already see that booting takes more than ten times longer
> than usual. Note: as far as I know the dm-mpath driver is not involved
> in the boot process of my test system.

What's your boot device? I've been booting this on a variety of setups,
no problems observed. It's booting my laptop, and on SCSI and SATA as
well. What is your root drive? What is the queue depth of it?
Controller?
Bart Van Assche Jan. 27, 2017, 5:02 p.m. UTC | #9
On Thu, 2017-01-26 at 18:22 -0700, Jens Axboe wrote:
> What's your boot device? I've been booting this on a variety of setups,
> no problems observed. It's booting my laptop, and on SCSI and SATA as
> well. What is your root drive? What is the queue depth of it?
> Controller?

The boot device in my test setup is a SATA hard disk:

# cat /proc/cmdline  
BOOT_IMAGE=/boot/vmlinuz-4.10.0-rc5-dbg+ root=UUID=60a4b064-b3ef-4d28-96d3-3c13ecbec43e resume=/dev/sda2 showopts
# ls -l /dev/disk/by-uuid/60a4b064-b3ef-4d28-96d3-3c13ecbec43e
lrwxrwxrwx 1 root root 10 Jan 27 08:43 /dev/disk/by-uuid/60a4b064-b3ef-4d28-96d3-3c13ecbec43e -> ../../sda1
# cat /sys/block/sda/queue/nr_requests  
31
# lsscsi | grep sda
[0:0:0:0]    disk    ATA      ST1000NM0033-9ZM GA67  /dev/sda
# hdparm -i /dev/sda

/dev/sda:
 Model=ST1000NM0033-9ZM173, FwRev=GA67, SerialNo=Z1W2HM75
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4  
 DMA modes:  mdma0 mdma1 mdma2  
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6  
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 3136696f4991..c27613de80c5 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -134,7 +134,8 @@  struct request *blk_mq_sched_get_request(struct request_queue *q,
 			rq = __blk_mq_alloc_request(data, op);
 	} else {
 		rq = __blk_mq_alloc_request(data, op);
-		data->hctx->tags->rqs[rq->tag] = rq;
+		if (rq)
+			data->hctx->tags->rqs[rq->tag] = rq;
 	}
 
 	if (rq) {