Message ID | alpine.LNX.2.00.1501121830010.4026@localhost.lm.intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Mon, Jan 12 2015 at 1:35pm -0500, Keith Busch <keith.busch@intel.com> wrote: > On Mon, 12 Jan 2015, Keith Busch wrote: > >Oh, let's look at "__blk_rq_prep_clone". dm calls that after > >blk_get_request() for the blk-mq based multipath types and overrides the > >destinations cmd_flags with the source's even though the source was not > >allocated from a blk-mq based queue, much less a shared tag. > > Untested patch. This will also preserve the failfast cmd_flag dm-mpath > set after allocating. Ah, good point. The failfast flag would get cleared with the patch I just proposed (unless REQ_FAILFAST_TRANSPORT was added to REQ_PRESERVE_CLONE_MASK). Anyway, I'm happy to see this implemented however you guys think is best. I think I like Keith's patch better than mine. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Jan 12 2015 at 2:11pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Jan 12 2015 at 1:35pm -0500, > Keith Busch <keith.busch@intel.com> wrote: > > > On Mon, 12 Jan 2015, Keith Busch wrote: > > >Oh, let's look at "__blk_rq_prep_clone". dm calls that after > > >blk_get_request() for the blk-mq based multipath types and overrides the > > >destinations cmd_flags with the source's even though the source was not > > >allocated from a blk-mq based queue, much less a shared tag. > > > > Untested patch. This will also preserve the failfast cmd_flag dm-mpath > > set after allocating. > > Ah, good point. The failfast flag would get cleared with the patch I > just proposed (unless REQ_FAILFAST_TRANSPORT was added to > REQ_PRESERVE_CLONE_MASK). > > Anyway, I'm happy to see this implemented however you guys think is > best. I think I like Keith's patch better than mine. FYI, I staged Keith's patch here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch. Please test, it should hopefully take care of the stall you've been seeing. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 01/12/15 21:22, Mike Snitzer wrote: > FYI, I staged Keith's patch here: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e > > Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch. > Please test, it should hopefully take care of the stall you've been > seeing. Hello Mike, In the quick test I ran the I/O stalls were indeed gone. Thanks :-) However, I hit another issue while running I/O on top of a multipath device (on a kernel with lockdep and SLUB memory poisoning enabled): NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116] CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G W 3.19.0-rc4-debug+ #1 Call Trace: [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0 [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20 [<ffffffff81342c8e>] alloc_iova+0x2e/0x250 [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0 [<ffffffff81348a15>] intel_map_sg+0xc5/0x260 [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp] [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod] [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod] [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370 [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150 [<ffffffff8124bade>] blk_get_request+0x2e/0xe0 [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath] [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath] [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod] [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0 [<ffffffff81075c0f>] kthread+0xef/0x110 [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0 Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 13 2015 at 7:29am -0500, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 01/12/15 21:22, Mike Snitzer wrote: > > FYI, I staged Keith's patch here: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e > > > > Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch. > > Please test, it should hopefully take care of the stall you've been > > seeing. > > Hello Mike, > > In the quick test I ran the I/O stalls were indeed gone. Thanks :-) Good news, followed by a new mole rearing its head ;) > However, I hit another issue while running I/O on top of a multipath > device (on a kernel with lockdep and SLUB memory poisoning enabled): > > NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116] > CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G W 3.19.0-rc4-debug+ #1 > Call Trace: > [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0 > [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20 > [<ffffffff81342c8e>] alloc_iova+0x2e/0x250 > [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0 > [<ffffffff81348a15>] intel_map_sg+0xc5/0x260 > [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp] > [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod] > [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod] > [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370 > [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150 > [<ffffffff8124bade>] blk_get_request+0x2e/0xe0 > [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath] > [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath] > [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod] > [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0 > [<ffffffff81075c0f>] kthread+0xef/0x110 > [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0 Unfortunate. Is this still with a 16MB backing device or is it real hardware? Can you share the workload so that myself and/or Keith could try to reproduce? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 01/13/15 15:18, Mike Snitzer wrote: > On Tue, Jan 13 2015 at 7:29am -0500, > Bart Van Assche <bart.vanassche@sandisk.com> wrote: >> However, I hit another issue while running I/O on top of a multipath >> device (on a kernel with lockdep and SLUB memory poisoning enabled): >> >> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116] >> CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G W 3.19.0-rc4-debug+ #1 >> Call Trace: >> [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0 >> [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20 >> [<ffffffff81342c8e>] alloc_iova+0x2e/0x250 >> [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0 >> [<ffffffff81348a15>] intel_map_sg+0xc5/0x260 >> [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp] >> [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod] >> [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod] >> [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370 >> [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150 >> [<ffffffff8124bade>] blk_get_request+0x2e/0xe0 >> [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath] >> [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath] >> [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod] >> [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0 >> [<ffffffff81075c0f>] kthread+0xef/0x110 >> [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0 > > Unfortunate. Is this still with a 16MB backing device or is it real > hardware? Can you share the workload so that myself and/or Keith could > try to reproduce? Hello Mike, This is still with a 16MB RAM disk as backing device. The fio job I used to trigger this was as follows: dev=/dev/sdc fio --bs=4K --ioengine=libaio --rw=randread --buffered=0 --numjobs=12 \ --iodepth=128 --iodepth_batch=64 --iodepth_batch_complete=64 \ --thread --norandommap --loops=$((2**31)) --runtime=60 \ --group_reporting --gtod_reduce=1 --name=$dev --filename=$dev \ --invalidate=1 Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 01/12/2015 11:35 AM, Keith Busch wrote: > On Mon, 12 Jan 2015, Keith Busch wrote: >> Oh, let's look at "__blk_rq_prep_clone". dm calls that after >> blk_get_request() for the blk-mq based multipath types and overrides the >> destinations cmd_flags with the source's even though the source was not >> allocated from a blk-mq based queue, much less a shared tag. > > Untested patch. This will also preserve the failfast cmd_flag dm-mpath > set after allocating. > > --- > diff --git a/block/blk-core.c b/block/blk-core.c > index 7e78931..6201090 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); > static void __blk_rq_prep_clone(struct request *dst, struct request *src) > { > dst->cpu = src->cpu; > - dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > + if (dst->q->mq_ops) > + dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > + else > + dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; Making the two cases different is a bit... nonsensical. We should do this for both cases, if safe, or move the MQ_INFLIGHT flag and expand the CLONE_MASK.
On Tue, 13 Jan 2015, Jens Axboe wrote: > On 01/12/2015 11:35 AM, Keith Busch wrote: >> On Mon, 12 Jan 2015, Keith Busch wrote: >>> Oh, let's look at "__blk_rq_prep_clone". dm calls that after >>> blk_get_request() for the blk-mq based multipath types and overrides the >>> destinations cmd_flags with the source's even though the source was not >>> allocated from a blk-mq based queue, much less a shared tag. >> >> Untested patch. This will also preserve the failfast cmd_flag dm-mpath >> set after allocating. >> >> --- >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 7e78931..6201090 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); >> static void __blk_rq_prep_clone(struct request *dst, struct request *src) >> { >> dst->cpu = src->cpu; >> - dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; >> + if (dst->q->mq_ops) >> + dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; >> + else >> + dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > > Making the two cases different is a bit... nonsensical. We should do this for > both cases, if safe, or move the MQ_INFLIGHT flag and expand the CLONE_MASK. Expanding the clone mask won't do any good since the src doesn't come from blk-mq and wouldn't ever have MQ_INFLIGHT set. Blk-mq initializes the cmd_flags when you get one so I assumed OR'ing was safe. For the non-blk-mq case, we have no guarantees how the req was initialized and could have nonsense cmd_flags. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 13 2015 at 9:59am -0500, Jens Axboe <axboe@kernel.dk> wrote: > On 01/12/2015 11:35 AM, Keith Busch wrote: > >On Mon, 12 Jan 2015, Keith Busch wrote: > >>Oh, let's look at "__blk_rq_prep_clone". dm calls that after > >>blk_get_request() for the blk-mq based multipath types and overrides the > >>destinations cmd_flags with the source's even though the source was not > >>allocated from a blk-mq based queue, much less a shared tag. > > > >Untested patch. This will also preserve the failfast cmd_flag dm-mpath > >set after allocating. > > > >--- > >diff --git a/block/blk-core.c b/block/blk-core.c > >index 7e78931..6201090 100644 > >--- a/block/blk-core.c > >+++ b/block/blk-core.c > >@@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); > > static void __blk_rq_prep_clone(struct request *dst, struct request *src) > > { > > dst->cpu = src->cpu; > >- dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > >+ if (dst->q->mq_ops) > >+ dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > >+ else > >+ dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > > Making the two cases different is a bit... nonsensical. We should do > this for both cases, if safe, or move the MQ_INFLIGHT flag and > expand the CLONE_MASK. ok, i'll work through it. k -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 13 2015 at 10:11am -0500, Keith Busch <keith.busch@intel.com> wrote: > On Tue, 13 Jan 2015, Jens Axboe wrote: > >On 01/12/2015 11:35 AM, Keith Busch wrote: > >>On Mon, 12 Jan 2015, Keith Busch wrote: > >>>Oh, let's look at "__blk_rq_prep_clone". dm calls that after > >>>blk_get_request() for the blk-mq based multipath types and overrides the > >>>destinations cmd_flags with the source's even though the source was not > >>>allocated from a blk-mq based queue, much less a shared tag. > >> > >>Untested patch. This will also preserve the failfast cmd_flag dm-mpath > >>set after allocating. > >> > >>--- > >>diff --git a/block/blk-core.c b/block/blk-core.c > >>index 7e78931..6201090 100644 > >>--- a/block/blk-core.c > >>+++ b/block/blk-core.c > >>@@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); > >> static void __blk_rq_prep_clone(struct request *dst, struct request *src) > >> { > >> dst->cpu = src->cpu; > >>- dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > >>+ if (dst->q->mq_ops) > >>+ dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > >>+ else > >>+ dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; > > > >Making the two cases different is a bit... nonsensical. We should > >do this for both cases, if safe, or move the MQ_INFLIGHT flag and > >expand the CLONE_MASK. > > Expanding the clone mask won't do any good since the src doesn't come > from blk-mq and wouldn't ever have MQ_INFLIGHT set. Blk-mq initializes > the cmd_flags when you get one so I assumed OR'ing was safe. For the > non-blk-mq case, we have no guarantees how the req was initialized and > could have nonsense cmd_flags. It _could_ but the only consumer of blk_rq_prep_clone() is request-based DM, and the non-blk-mq case uses blk_rq_init() prior to calling blk_rq_prep_clone() so the cmd_flags will be 0. I've revised the patch to just do it for both cases since it will be safe, see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=6a55c9861326dc2a731c7978d93567dd4e62d2f7 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 13 2015 at 9:28am -0500, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 01/13/15 15:18, Mike Snitzer wrote: > > On Tue, Jan 13 2015 at 7:29am -0500, > > Bart Van Assche <bart.vanassche@sandisk.com> wrote: > >> However, I hit another issue while running I/O on top of a multipath > >> device (on a kernel with lockdep and SLUB memory poisoning enabled): > >> > >> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116] > >> CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G W 3.19.0-rc4-debug+ #1 > >> Call Trace: > >> [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0 > >> [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20 > >> [<ffffffff81342c8e>] alloc_iova+0x2e/0x250 > >> [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0 > >> [<ffffffff81348a15>] intel_map_sg+0xc5/0x260 > >> [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp] > >> [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod] > >> [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod] > >> [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370 > >> [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150 > >> [<ffffffff8124bade>] blk_get_request+0x2e/0xe0 > >> [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath] > >> [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath] > >> [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod] > >> [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0 > >> [<ffffffff81075c0f>] kthread+0xef/0x110 > >> [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0 > > > > Unfortunate. Is this still with a 16MB backing device or is it real > > hardware? Can you share the workload so that myself and/or Keith could > > try to reproduce? > > Hello Mike, > > This is still with a 16MB RAM disk as backing device. The fio job I > used to trigger this was as follows: > > dev=/dev/sdc > fio --bs=4K --ioengine=libaio --rw=randread --buffered=0 --numjobs=12 \ > --iodepth=128 --iodepth_batch=64 --iodepth_batch_complete=64 \ > --thread --norandommap --loops=$((2**31)) --runtime=60 \ > --group_reporting --gtod_reduce=1 --name=$dev --filename=$dev \ > --invalidate=1 OK, I assume you specified the mpath device for the test that failed. This test works fine on my 100MB scsi_debug device with 4 paths exported over virtio-blk to a guest that assembles the mpath device. Could be a hang that is unique to scsi-mq. Any chance you'd be willing to provide a HOWTO for setting up your SRP/iscsi configuration? Are you carrying any related changes that are not upstream? (I can hunt down the email in this thread where you describe your kernel tree...) I'll try to reproduce but this info could be useful to others that are more scsi-mq inclined who might need to chase this too. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hey Jens, I _think_ we've resolved the issues Bart raised for request-based DM's support for blk-mq devices (anything remaining seems specific to iSER's blk-mq support which is in development). Though Keith did have that one additional patch for that block scatter gather attribute that we still need to review closer. Anyway, I think what we have is a solid start and see no reason to hold these changes back further. So I've rebased the 'dm-for-3.20' branch of linux-dm.git ontop of 3.19-rc6 and reordered the required block changes to be at the front of the series, see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20 (these changes have been in Linux next for a month, via linux-dm.git 'for-next') With your OK, I'd be happy to carry the required block changes and ultimately request Linus pull them for 3.20 (I can backfill your Acks if you approve). BUT I also have no problem with you picking up the block changes to submit via your block tree (I'd just have to rebase ontop of your 3.20 branch once you pull them in). Let me know what you think, thanks. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 01/27/2015 11:42 AM, Mike Snitzer wrote: > Hey Jens, > > I _think_ we've resolved the issues Bart raised for request-based DM's > support for blk-mq devices (anything remaining seems specific to iSER's > blk-mq support which is in development). Though Keith did have that one > additional patch for that block scatter gather attribute that we still > need to review closer. > > Anyway, I think what we have is a solid start and see no reason to hold > these changes back further. So I've rebased the 'dm-for-3.20' branch of > linux-dm.git ontop of 3.19-rc6 and reordered the required block changes > to be at the front of the series, see: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20 > > (these changes have been in Linux next for a month, via linux-dm.git > 'for-next') > > With your OK, I'd be happy to carry the required block changes and > ultimately request Linus pull them for 3.20 (I can backfill your Acks if > you approve). BUT I also have no problem with you picking up the block > changes to submit via your block tree (I'd just have to rebase ontop of > your 3.20 branch once you pull them in). I'd prefer to take these prep patches through the block tree. Only one I don't really like is this one: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d I prefer keeping the alloc path as lean as possible, normal allocs always initialize ->bio since they need to associate a bio with it. Do you have the oops trace from this one? Just curious if we can get rid of it, depending on how deep in the caller this is.
On Wed, Jan 28 2015 at 11:42am -0500, Jens Axboe <axboe@kernel.dk> wrote: > On 01/27/2015 11:42 AM, Mike Snitzer wrote: > >Hey Jens, > > > >I _think_ we've resolved the issues Bart raised for request-based DM's > >support for blk-mq devices (anything remaining seems specific to iSER's > >blk-mq support which is in development). Though Keith did have that one > >additional patch for that block scatter gather attribute that we still > >need to review closer. > > > >Anyway, I think what we have is a solid start and see no reason to hold > >these changes back further. So I've rebased the 'dm-for-3.20' branch of > >linux-dm.git ontop of 3.19-rc6 and reordered the required block changes > >to be at the front of the series, see: > >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20 > > > >(these changes have been in Linux next for a month, via linux-dm.git > >'for-next') > > > >With your OK, I'd be happy to carry the required block changes and > >ultimately request Linus pull them for 3.20 (I can backfill your Acks if > >you approve). BUT I also have no problem with you picking up the block > >changes to submit via your block tree (I'd just have to rebase ontop of > >your 3.20 branch once you pull them in). > > I'd prefer to take these prep patches through the block tree. Great, should I send the patches or can you cherry-pick? > Only one I don't really like is this one: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d > > I prefer keeping the alloc path as lean as possible, normal allocs > always initialize ->bio since they need to associate a bio with it. Would be very surprised if this initialization were measurable but.. I could push this initialization into the DM-mpath driver (just after blk_get_request, like Keith opted for) but that seemed really gross. > Do you have the oops trace from this one? Just curious if we can get > rid of it, depending on how deep in the caller this is. I did't but it was easy enough to recreate: [ 3.112949] BUG: unable to handle kernel NULL pointer dereference at (null) | [ 3.113416] IP: [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160 | [ 3.113416] PGD 0 | [ 3.113416] Oops: 0002 [#1] SMP | [ 3.113416] Modules linked in: dm_service_time crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper crypt| d serio_raw pcspkr virtio_balloon 8139too i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath dm_mod ext4 mbcache jbd2 sd_mod ata_generic cirrus pata_ac| pi syscopyarea sysfillrect sysimgblt drm_kms_helper ttm drm virtio_scsi virtio_blk 8139cp virtio_pci mii i2c_core virtio_ring ata_piix virtio libata floppy | [ 3.113416] CPU: 0 PID: 483 Comm: kdmwork-252:3 Tainted: G W 3.18.0+ #29 | [ 3.113416] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 | [ 3.113416] task: ffff880035c1ad20 ti: ffff8800d6900000 task.ti: ffff8800d6900000 | [ 3.113416] RIP: 0010:[<ffffffff812f6734>] [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160 | [ 3.113416] RSP: 0000:ffff8800d6903d48 EFLAGS: 00010286 | [ 3.113416] RAX: 0000000000000000 RBX: ffffffffa0208500 RCX: 0000000000000001 | [ 3.113416] RDX: ffff8800d7a3b0a0 RSI: ffff880035d0ab00 RDI: ffff880119f8f510 | [ 3.113416] RBP: ffff8800d6903d98 R08: 00000000000185a0 R09: 00000000000000d0 | [ 3.113416] R10: ffff8800d7547680 R11: ffff880035c1b8c8 R12: ffff8800d83d7900 [ 3.113416] R13: ffff880035d0ab00 R14: ffff880119f8f510 R15: ffff8800d7547680 [ 3.113416] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 3.113416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.113416] CR2: 0000000000000000 CR3: 00000000daeec000 CR4: 00000000000407f0 [ 3.113416] Stack: [ 3.113416] ffff8800d6903db8 ffff8800d71502e0 ffff8800d7a3b0a0 000000d0d71502e0 [ 3.113416] ffff8800d6e89800 ffff8800d71502e0 ffff8800d6e89800 0000000000000001 [ 3.113416] ffff8800d7a3b0a0 ffffc90000998040 ffff8800d6903df8 ffffffffa0209c69 [ 3.113416] Call Trace: [ 3.113416] [<ffffffffa0209c69>] map_tio_request+0x219/0x2b0 [dm_mod] [ 3.113416] [<ffffffff8109a4ee>] kthread_worker_fn+0x7e/0x1b0 [ 3.113416] [<ffffffff8109a470>] ? __init_kthread_worker+0x60/0x60 [ 3.113416] [<ffffffff8109a3f7>] kthread+0x107/0x120 [ 3.113416] [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240 [ 3.113416] [<ffffffff816952bc>] ret_from_fork+0x7c/0xb0 [ 3.113416] [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240 [ 3.113416] Code: 89 c3 48 83 ec 28 4c 8b 6e 68 48 85 d2 4c 0f 44 25 22 b7 92 01 48 89 75 b8 89 4d cc 4c 89 4d c0 4d 85 ed 75 16 eb 60 49 8b 47 70 <4c> 89 30 4d 89 77 70 4d 8b 6d 00 4d 85 ed 74 4c 8b 75 cc 4c 89 [ 3.113416] RIP [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160 [ 3.113416] RSP <ffff8800d6903d48> [ 3.113416] CR2: 0000000000000000 [ 3.113416] ---[ end trace 9b3bb6dd6cc4435d ]--- crash> dis -l blk_rq_prep_clone+0x44 /home/snitm/git/linux/block/blk-core.c: 2945 0xffffffff812f6734 <blk_rq_prep_clone+0x44>: mov %r14,(%rax) crash> l /home/snitm/git/linux/block/blk-core.c: 2945 2940 2941 if (bio_ctr && bio_ctr(bio, bio_src, data)) 2942 goto free_and_out; 2943 2944 if (rq->bio) { 2945 rq->biotail->bi_next = bio; 2946 rq->biotail = bio; 2947 } else 2948 rq->bio = rq->biotail = bio; 2949 } Given it would seem the NULL pointer occurs when attempting to dereference rq->biotail a revised check of "if (rq->bio && rq->biotail)" should suffice but I unfortunately then get: [ 2.801634] general protection fault: 0000 [#1] SMP [ 2.802504] Modules linked in: dm_service_time crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper crypt d pcspkr serio_raw 8139too virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath dm_mod ext4 mbcache jbd2 ata_generic sd_mod pata_acpi cirr us syscopyarea sysfillrect sysimgblt drm_kms_helper ttm virtio_scsi virtio_blk drm virtio_pci virtio_ring ata_piix 8139cp libata mii i2c_core virtio floppy [ 2.802504] CPU: 0 PID: 474 Comm: kdmwork-252:1 Tainted: G W 3.18.0+ #30 [ 2.802504] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 2.802504] task: ffff8801194b1690 ti: ffff880119abc000 task.ti: ffff880119abc000 [ 2.802504] RIP: 0010:[<ffffffff812f6739>] [<ffffffff812f6739>] blk_rq_prep_clone+0x49/0x160 [ 2.802504] RSP: 0018:ffff880119abfd48 EFLAGS: 00010206 [ 2.802504] RAX: 6de900000000e800 RBX: ffffffffa0218500 RCX: 0000000000000001 [ 2.802504] RDX: ffff8800daca30a0 RSI: ffff880119dcaf00 RDI: ffff880119dca310 [ 2.802504] RBP: ffff880119abfd98 R08: 00000000000185a0 R09: 00000000000000d0 [ 2.802504] R10: ffff880035937680 R11: ffff8801194b2238 R12: ffff880035876900 [ 2.802504] R13: ffff880119dcaf00 R14: ffff880119dca310 R15: ffff880035937680 [ 2.802504] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 2.802504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.802504] CR2: 00007f45598c2350 CR3: 000000003614a000 CR4: 00000000000407f0 [ 2.802504] Stack: [ 2.802504] ffff880119abfdb8 ffff8800dae602e0 ffff8800daca30a0 000000d0dae602e0 [ 2.802504] ffff8800dad6a000 ffff8800dae602e0 ffff8800dad6a000 0000000000000001 [ 2.802504] ffff8800daca30a0 ffffc9000097d040 ffff880119abfdf8 ffffffffa0219c69 [ 2.802504] Call Trace: [ 2.802504] [<ffffffffa0219c69>] map_tio_request+0x219/0x2b0 [dm_mod] [ 2.802504] [<ffffffff8109a4ee>] kthread_worker_fn+0x7e/0x1b0 [ 2.802504] [<ffffffff8109a470>] ? __init_kthread_worker+0x60/0x60 [ 2.802504] [<ffffffff8109a3f7>] kthread+0x107/0x120 [ 2.802504] [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240 [ 2.802504] [<ffffffff816952bc>] ret_from_fork+0x7c/0xb0 [ 2.802504] [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240 [ 2.802504] Code: 28 4c 8b 6e 68 48 85 d2 4c 0f 44 25 22 b7 92 01 48 89 75 b8 89 4d cc 4c 89 4d c0 4d 85 ed 75 1b eb 64 49 8b 47 70 48 85 c0 74 4a <4c> 89 30 4d 89 77 70 4d 8b 6d 00 4d 85 ed 74 4b 8b 75 cc 4c 89 [ 2.802504] RIP [<ffffffff812f6739>] blk_rq_prep_clone+0x49/0x160 [ 2.802504] RSP <ffff880119abfd48> [ 2.802386] general protection fault: 0000 [#2] [ 2.893050] ---[ end trace 20d230269dc05eca ]--- Not sure what to make of this (other than rq->biotail is pointing at crap too, which is actually likely if rq->bio is): crash> dis -l blk_rq_prep_clone+0x49 /home/snitm/git/linux/block/blk-core.c: 2945 0xffffffff812f6739 <blk_rq_prep_clone+0x49>: mov %r14,(%rax) crash> l /home/snitm/git/linux/block/blk-core.c: 2945 2940 2941 if (bio_ctr && bio_ctr(bio, bio_src, data)) 2942 goto free_and_out; 2943 2944 if (rq->bio && rq->biotail) { 2945 rq->biotail->bi_next = bio; 2946 rq->biotail = bio; 2947 } else 2948 rq->bio = rq->biotail = bio; 2949 } -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 01/28/2015 10:44 AM, Mike Snitzer wrote: > On Wed, Jan 28 2015 at 11:42am -0500, > Jens Axboe <axboe@kernel.dk> wrote: > >> On 01/27/2015 11:42 AM, Mike Snitzer wrote: >>> Hey Jens, >>> >>> I _think_ we've resolved the issues Bart raised for request-based DM's >>> support for blk-mq devices (anything remaining seems specific to iSER's >>> blk-mq support which is in development). Though Keith did have that one >>> additional patch for that block scatter gather attribute that we still >>> need to review closer. >>> >>> Anyway, I think what we have is a solid start and see no reason to hold >>> these changes back further. So I've rebased the 'dm-for-3.20' branch of >>> linux-dm.git ontop of 3.19-rc6 and reordered the required block changes >>> to be at the front of the series, see: >>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20 >>> >>> (these changes have been in Linux next for a month, via linux-dm.git >>> 'for-next') >>> >>> With your OK, I'd be happy to carry the required block changes and >>> ultimately request Linus pull them for 3.20 (I can backfill your Acks if >>> you approve). BUT I also have no problem with you picking up the block >>> changes to submit via your block tree (I'd just have to rebase ontop of >>> your 3.20 branch once you pull them in). >> >> I'd prefer to take these prep patches through the block tree. > > Great, should I send the patches or can you cherry-pick? I already cherry picked them, they are in the for-3.20/core branch. >> Only one I don't really like is this one: >> >> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d >> >> I prefer keeping the alloc path as lean as possible, normal allocs >> always initialize ->bio since they need to associate a bio with it. > > Would be very surprised if this initialization were measurable but.. That's what people always say, and then keep piling more crap in... > I could push this initialization into the DM-mpath driver (just after > blk_get_request, like Keith opted for) but that seemed really gross. It's already doing blk_rq_init() now, so not a huge change and not that nasty.
On Wed, 28 Jan 2015, Jens Axboe wrote: > On 01/28/2015 10:44 AM, Mike Snitzer wrote: >> On Wed, Jan 28 2015 at 11:42am -0500, >> Jens Axboe <axboe@kernel.dk> wrote: >>> I'd prefer to take these prep patches through the block tree. >> >> Great, should I send the patches or can you cherry-pick? > > I already cherry picked them, they are in the for-3.20/core branch. I might be getting ahead of myself for trying this right now (sorry if that's the case) but 'for-3.20/core' is missing necessary parts of the series and hits a BUG_ON at blk-core.c:2333. The original request is initialized when setting up the clone, and in this branch, that happens in prep_fn before the request was dequeued so it's not in the queuelist when it started. One of my commits relocated the initialization, but I didn't realize it had a hard dependency on the follow-on commit. Should we reorder that part of the series? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, Jan 29 2015 at 5:43pm -0500, Keith Busch <keith.busch@intel.com> wrote: > On Wed, 28 Jan 2015, Jens Axboe wrote: > >On 01/28/2015 10:44 AM, Mike Snitzer wrote: > >>On Wed, Jan 28 2015 at 11:42am -0500, > >>Jens Axboe <axboe@kernel.dk> wrote: > >>>I'd prefer to take these prep patches through the block tree. > >> > >>Great, should I send the patches or can you cherry-pick? > > > >I already cherry picked them, they are in the for-3.20/core branch. > > I might be getting ahead of myself for trying this right now (sorry if > that's the case) but 'for-3.20/core' is missing necessary parts of the > series and hits a BUG_ON at blk-core.c:2333. The original request is > initialized when setting up the clone, and in this branch, that happens > in prep_fn before the request was dequeued so it's not in the queuelist > when it started. > > One of my commits relocated the initialization, but I didn't realize it > had a hard dependency on the follow-on commit. Should we reorder that > part of the series? Which follow on commit are you referring to? Please be specific about which commits you think are out of order. Also, what are you testing... are you using the linux-dm.git tree's 'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'? If not please test that and see if you still have problems (without the associated DM changes all the block preparation changes that Jens cherry-picked shouldn't cause any problems at all). -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, 29 Jan 2015, Mike Snitzer wrote: > On Thu, Jan 29 2015 at 5:43pm -0500, >> One of my commits relocated the initialization, but I didn't realize it >> had a hard dependency on the follow-on commit. Should we reorder that >> part of the series? > > Which follow on commit are you referring to? Please be specific about > which commits you think are out of order. > > Also, what are you testing... are you using the linux-dm.git tree's > 'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'? > > If not please test that and see if you still have problems (without the > associated DM changes all the block preparation changes that Jens > cherry-picked shouldn't cause any problems at all). I'm using Jens' linux-block for-3.20/core branch. The last dm commit is this: commit febf71588c2a750e04dc2a8b0824ce120c48bd9e Author: Keith Busch <keith.busch@intel.com> Date: Fri Oct 17 17:46:35 2014 -0600 block: require blk_rq_prep_clone() be given an initialized clone request Looking at this again, the above was incorrect in the first place: it initialized the original, but the intent was to initialize the clone. This slipped by me since the next part of the series fixed it. In your linux-dm dm-for-3.20, it's this commit: commit 102e38b1030e883efc022dfdc7b7e7a3de70d1c5 Author: Mike Snitzer <snitzer@redhat.com> Date: Fri Dec 5 17:11:05 2014 -0500 dm: split request structure out from dm_rq_target_io structure So I was confused earlier, there's no need to reorder anything. I just need to fix the broken part. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, Jan 29 2015 at 6:44pm -0500, Keith Busch <keith.busch@intel.com> wrote: > On Thu, 29 Jan 2015, Mike Snitzer wrote: > >On Thu, Jan 29 2015 at 5:43pm -0500, > >>One of my commits relocated the initialization, but I didn't realize it > >>had a hard dependency on the follow-on commit. Should we reorder that > >>part of the series? > > > >Which follow on commit are you referring to? Please be specific about > >which commits you think are out of order. > > > >Also, what are you testing... are you using the linux-dm.git tree's > >'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'? > > > >If not please test that and see if you still have problems (without the > >associated DM changes all the block preparation changes that Jens > >cherry-picked shouldn't cause any problems at all). > > I'm using Jens' linux-block for-3.20/core branch. The last dm commit > is this: > > commit febf71588c2a750e04dc2a8b0824ce120c48bd9e > Author: Keith Busch <keith.busch@intel.com> > Date: Fri Oct 17 17:46:35 2014 -0600 > > block: require blk_rq_prep_clone() be given an initialized clone request > > Looking at this again, the above was incorrect in the first place: it > initialized the original, but the intent was to initialize the clone. This > slipped by me since the next part of the series fixed it. In your linux-dm > dm-for-3.20, it's this commit: > > commit 102e38b1030e883efc022dfdc7b7e7a3de70d1c5 > Author: Mike Snitzer <snitzer@redhat.com> > Date: Fri Dec 5 17:11:05 2014 -0500 > > dm: split request structure out from dm_rq_target_io structure > > > So I was confused earlier, there's no need to reorder anything. I just > need to fix the broken part. Oof, yeah that first commit should be using: blk_rq_init(NULL, clone); Jens, any chance you could rebase commit febf71588c2a with s/rq/clone/ in dm.c:clone_rq's call to blk_rq_init? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/block/blk-core.c b/block/blk-core.c index 7e78931..6201090 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); static void __blk_rq_prep_clone(struct request *dst, struct request *src) { dst->cpu = src->cpu; - dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; + if (dst->q->mq_ops) + dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; + else + dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; dst->cmd_type = src->cmd_type; dst->__sector = blk_rq_pos(src); dst->__data_len = blk_rq_bytes(src);