Message ID | e1d5f4e6-c63f-545d-710f-d4d10fa8d2e0@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/26/2017 04:14 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 14:51 -0700, Jens Axboe wrote: >> That is exactly what it means, looks like that one path doesn't handle >> that. You'd have to exhaust the pool with atomic allocs for this to >> trigger, we don't do that at all in the normal IO path. So good catch, >> must be the dm part that enables this since it does NOWAIT allocations. >> >> >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >> index 3136696f4991..c27613de80c5 100644 >> --- a/block/blk-mq-sched.c >> +++ b/block/blk-mq-sched.c >> @@ -134,7 +134,8 @@ struct request *blk_mq_sched_get_request(struct request_queue *q, >> rq = __blk_mq_alloc_request(data, op); >> } else { >> rq = __blk_mq_alloc_request(data, op); >> - data->hctx->tags->rqs[rq->tag] = rq; >> + if (rq) >> + data->hctx->tags->rqs[rq->tag] = rq; >> } >> >> if (rq) { > > Hello Jens, > > With these two patches applied the scheduling-while-atomic complaint and > the oops are gone. However, some tasks get stuck. Is the console output > below enough to figure out what is going on or do you want me to bisect > this? I don't think that any requests got stuck since no pending requests > are shown in /sys/block/*/mq/*/{pending,*/rq_list}. What device is stuck? Is it running with an mq scheduler attached, or with "none"? Would also be great to see the output of /sys/block/*/mq/*/tags and sched_tags so we can see if they have anything pending. From a quick look at the below, it looks like a request leak. Bisection would most likely be very helpful.
On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote: > What device is stuck? Is it running with an mq scheduler attached, or > with "none"? > > Would also be great to see the output of /sys/block/*/mq/*/tags and > sched_tags so we can see if they have anything pending. > > From a quick look at the below, it looks like a request leak. Bisection > would most likely be very helpful. Hello Jens, This happens with and without scheduler attached. The most recent test I ran was with the deadline scheduler configured as default scheduler for all blk-mq devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are /dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows: # (cd /sys/class/block && grep -aH '' dm*/mq/*/tags) dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 dm-0/mq/0/tags:nr_free=1795, nr_reserved=0 dm-0/mq/0/tags:active_queues=0 dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 dm-1/mq/0/tags:nr_free=2047, nr_reserved=0 dm-1/mq/0/tags:active_queues=0 # (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags) dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0 dm-0/mq/0/sched_tags:active_queues=0 dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0 dm-1/mq/0/sched_tags:active_queues=0 Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/26/2017 04:47 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote: >> What device is stuck? Is it running with an mq scheduler attached, or >> with "none"? >> >> Would also be great to see the output of /sys/block/*/mq/*/tags and >> sched_tags so we can see if they have anything pending. >> >> From a quick look at the below, it looks like a request leak. Bisection >> would most likely be very helpful. > > Hello Jens, > > This happens with and without scheduler attached. The most recent test I ran > was with the deadline scheduler configured as default scheduler for all blk-mq > devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and > CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are > /dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows: > > # (cd /sys/class/block && grep -aH '' dm*/mq/*/tags) > dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 > dm-0/mq/0/tags:nr_free=1795, nr_reserved=0 > dm-0/mq/0/tags:active_queues=0 > dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 > dm-1/mq/0/tags:nr_free=2047, nr_reserved=0 > dm-1/mq/0/tags:active_queues=0 > # (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags) > dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 > dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0 > dm-0/mq/0/sched_tags:active_queues=0 > dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 > dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0 > dm-1/mq/0/sched_tags:active_queues=0 Clearly we are missing some requests. How do I setup dm similarly to you? Does it reproduce without Christoph's patchset?
On 01/26/2017 04:50 PM, Jens Axboe wrote: > On 01/26/2017 04:47 PM, Bart Van Assche wrote: >> On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote: >>> What device is stuck? Is it running with an mq scheduler attached, or >>> with "none"? >>> >>> Would also be great to see the output of /sys/block/*/mq/*/tags and >>> sched_tags so we can see if they have anything pending. >>> >>> From a quick look at the below, it looks like a request leak. Bisection >>> would most likely be very helpful. >> >> Hello Jens, >> >> This happens with and without scheduler attached. The most recent test I ran >> was with the deadline scheduler configured as default scheduler for all blk-mq >> devices (CONFIG_DEFAULT_SQ_IOSCHED="mq-deadline" and >> CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline"). The block devices that hang are >> /dev/dm-0 and /dev/dm-1. The tags and sched_tags data is as follows: >> >> # (cd /sys/class/block && grep -aH '' dm*/mq/*/tags) >> dm-0/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 >> dm-0/mq/0/tags:nr_free=1795, nr_reserved=0 >> dm-0/mq/0/tags:active_queues=0 >> dm-1/mq/0/tags:nr_tags=2048, reserved_tags=0, bits_per_word=64 >> dm-1/mq/0/tags:nr_free=2047, nr_reserved=0 >> dm-1/mq/0/tags:active_queues=0 >> # (cd /sys/class/block && grep -aH '' dm*/mq/*/sched_tags) >> dm-0/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 >> dm-0/mq/0/sched_tags:nr_free=0, nr_reserved=0 >> dm-0/mq/0/sched_tags:active_queues=0 >> dm-1/mq/0/sched_tags:nr_tags=256, reserved_tags=0, bits_per_word=64 >> dm-1/mq/0/sched_tags:nr_free=254, nr_reserved=0 >> dm-1/mq/0/sched_tags:active_queues=0 > > Clearly we are missing some requests. How do I setup dm similarly to > you? > > Does it reproduce without Christoph's patchset? I have dm-mpath running using blk_mq and with mq-deadline on both dm and the lower level device, and it seems to be running just fine here. Note, this is without Christoph's patchset, I'll try that next once xfstest finishes.
On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote: > Clearly we are missing some requests. How do I setup dm similarly to > you? > > Does it reproduce without Christoph's patchset? Hello Jens, I see similar behavior with the blk-mq-sched branch of git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): booting happens much slower than usual and I/O hangs if I run the srp-test software. Regarding creating a similar dm setup: I hope that in the future it will become possible to run the srp-test software without any special hardware and with in-tree drivers. Today running the srp-test software with in-tree drivers namely requires IB hardware. This is how to run the srp-test software today with in-tree drivers: * Find a system with at least two InfiniBand ports. * Make sure that the appropriate IB driver in the kernel is enabled and also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp, ib_srpt and dm-mpath are built as kernel modules. * If none of the IB ports are connected to an IB switch, connect the two ports to each other and configure and start the opensm software such that the port states change from "Initializing" to "Active". * Check with "ibstat | grep State: Active" that at least one port is in the active state. * Configure multipathd as explained in https://github.com/bvanassche/srp-test/blob/master/README.md. * Restart multipathd to make sure it picks up /etc/multipath.conf. * Clone https://github.com/bvanassche/srp-test and start it as follows: srp-test/run_tests -t 02-mq Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/26/2017 05:38 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote: >> Clearly we are missing some requests. How do I setup dm similarly to >> you? >> >> Does it reproduce without Christoph's patchset? > > Hello Jens, > > I see similar behavior with the blk-mq-sched branch of > git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): > booting happens much slower than usual and I/O hangs if I run the > srp-test software. Please don't run that, run for-4.11/block and merge it to master. Same behavior? > Regarding creating a similar dm setup: I hope that in the future it > will become possible to run the srp-test software without any special > hardware and with in-tree drivers. Today running the srp-test software > with in-tree drivers namely requires IB hardware. This is how to run the > srp-test software today with in-tree drivers: > * Find a system with at least two InfiniBand ports. > * Make sure that the appropriate IB driver in the kernel is enabled and > also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp, > ib_srpt and dm-mpath are built as kernel modules. > * If none of the IB ports are connected to an IB switch, connect the > two ports to each other and configure and start the opensm software > such that the port states change from "Initializing" to "Active". > * Check with "ibstat | grep State: Active" that at least one port is > in the active state. > * Configure multipathd as explained in > https://github.com/bvanassche/srp-test/blob/master/README.md. > * Restart multipathd to make sure it picks up /etc/multipath.conf. > * Clone https://github.com/bvanassche/srp-test and start it as follows: > srp-test/run_tests -t 02-mq I can't run that. Any chance of a test case that doesn't require IB?
On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote: > On 01/26/2017 05:38 PM, Bart Van Assche wrote: > > I see similar behavior with the blk-mq-sched branch of > > git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): > > booting happens much slower than usual and I/O hangs if I run the > > srp-test software. > > Please don't run that, run for-4.11/block and merge it to master. > Same behavior? I have not yet had the chance to run the srp-test software against that kernel. But I already see that booting takes more than ten times longer than usual. Note: as far as I know the dm-mpath driver is not involved in the boot process of my test system. > > Regarding creating a similar dm setup: I hope that in the future it > > will become possible to run the srp-test software without any special > > hardware and with in-tree drivers. Today running the srp-test software > > with in-tree drivers namely requires IB hardware. This is how to run the > > srp-test software today with in-tree drivers: > > * Find a system with at least two InfiniBand ports. > > * Make sure that the appropriate IB driver in the kernel is enabled and > > also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp, > > ib_srpt and dm-mpath are built as kernel modules. > > * If none of the IB ports are connected to an IB switch, connect the > > two ports to each other and configure and start the opensm software > > such that the port states change from "Initializing" to "Active". > > * Check with "ibstat | grep State: Active" that at least one port is > > in the active state. > > * Configure multipathd as explained in > > https://github.com/bvanassche/srp-test/blob/master/README.md. > > * Restart multipathd to make sure it picks up /etc/multipath.conf. > > * Clone https://github.com/bvanassche/srp-test and start it as follows: > > srp-test/run_tests -t 02-mq > > I can't run that. Any chance of a test case that doesn't require IB? It is possible to run that test on top of the SoftRoCE driver. I will first check myself whether the latest version of the SoftRoCE driver is stable enough to run srp-test on top of it (see also https://github.com/dledford/linux/commits/k.o/for-4.11). Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/26/2017 06:15 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote: >> On 01/26/2017 05:38 PM, Bart Van Assche wrote: >>> I see similar behavior with the blk-mq-sched branch of >>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): >>> booting happens much slower than usual and I/O hangs if I run the >>> srp-test software. >> >> Please don't run that, run for-4.11/block and merge it to master. >> Same behavior? > > I have not yet had the chance to run the srp-test software against that > kernel. But I already see that booting takes more than ten times longer > than usual. Note: as far as I know the dm-mpath driver is not involved > in the boot process of my test system. What's your boot device? I've been booting this on a variety of setups, no problems observed. It's booting my laptop, and on SCSI and SATA as well. What is your root drive? What is the queue depth of it? Controller?
On Thu, 2017-01-26 at 18:22 -0700, Jens Axboe wrote: > What's your boot device? I've been booting this on a variety of setups, > no problems observed. It's booting my laptop, and on SCSI and SATA as > well. What is your root drive? What is the queue depth of it? > Controller? The boot device in my test setup is a SATA hard disk: # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.10.0-rc5-dbg+ root=UUID=60a4b064-b3ef-4d28-96d3-3c13ecbec43e resume=/dev/sda2 showopts # ls -l /dev/disk/by-uuid/60a4b064-b3ef-4d28-96d3-3c13ecbec43e lrwxrwxrwx 1 root root 10 Jan 27 08:43 /dev/disk/by-uuid/60a4b064-b3ef-4d28-96d3-3c13ecbec43e -> ../../sda1 # cat /sys/block/sda/queue/nr_requests 31 # lsscsi | grep sda [0:0:0:0] disk ATA ST1000NM0033-9ZM GA67 /dev/sda # hdparm -i /dev/sda /dev/sda: Model=ST1000NM0033-9ZM173, FwRev=GA67, SerialNo=Z1W2HM75 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-4,5,6,7 * signifies the current active mode Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 3136696f4991..c27613de80c5 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -134,7 +134,8 @@ struct request *blk_mq_sched_get_request(struct request_queue *q, rq = __blk_mq_alloc_request(data, op); } else { rq = __blk_mq_alloc_request(data, op); - data->hctx->tags->rqs[rq->tag] = rq; + if (rq) + data->hctx->tags->rqs[rq->tag] = rq; } if (rq) {