diff mbox

[7/7] dm-mpath: Fix a race condition in __multipath_map()

Message ID 81bc399e-df90-099d-1b25-bdb0fda3f27c@sandisk.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Bart Van Assche Nov. 15, 2016, 11:35 p.m. UTC
If a single-queue dm device is stacked on top of multi-queue block
devices and map_tio_request() is called while there are no paths then
the request will be prepared for a single-queue path. If a path is
added after a request was prepared and before __multipath_map() is
called return DM_MAPIO_REQUEUE such that it gets unprepared and
reprepared as a blk-mq request.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/md/dm-mpath.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

Mike Snitzer Nov. 16, 2016, 12:37 a.m. UTC | #1
On Tue, Nov 15 2016 at  6:35pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> If a single-queue dm device is stacked on top of multi-queue block
> devices and map_tio_request() is called while there are no paths then
> the request will be prepared for a single-queue path. If a path is
> added after a request was prepared and before __multipath_map() is
> called return DM_MAPIO_REQUEUE such that it gets unprepared and
> reprepared as a blk-mq request.

This patch makes little sense to me.  There isn't a scenario that I'm
aware of that would allow the request_queue to transition between old
.request_fn and new blk-mq.

The dm-table code should prevent this.

Mike

> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 7559537..6b20349 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -541,6 +541,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  	size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
>  	struct pgpath *pgpath;
>  	struct block_device *bdev;
> +	struct request_queue *q;
>  	struct dm_mpath_io *mpio;
>  
>  	/* Do we need to select a new pgpath? */
> @@ -558,6 +559,18 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  		return r;
>  	}
>  
> +	bdev = pgpath->path.dev->bdev;
> +	q = bdev_get_queue(bdev);
> +
> +	/*
> +	 * When a request is prepared if there are no paths it may happen that
> +	 * the request was prepared for a single-queue path and that a
> +	 * multiqueue path is added before __multipath_map() is called. If
> +	 * that happens requeue to trigger unprepare and reprepare.
> +	 */
> +	if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
> +		return r;
> +
>  	mpio = set_mpio(m, map_context);
>  	if (!mpio)
>  		/* ENOMEM, requeue */
> @@ -566,22 +579,20 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
>  	mpio->pgpath = pgpath;
>  	mpio->nr_bytes = nr_bytes;
>  
> -	bdev = pgpath->path.dev->bdev;
> -
>  	if (clone) {
>  		/*
>  		 * Old request-based interface: allocated clone is passed in.
>  		 * Used by: .request_fn stacked on .request_fn path(s).
>  		 */
> -		clone->q = bdev_get_queue(bdev);
> +		clone->q = q;
>  	} else {
>  		/*
>  		 * blk-mq request-based interface; used by both:
>  		 * .request_fn stacked on blk-mq path(s) and
>  		 * blk-mq stacked on blk-mq path(s).
>  		 */
> -		clone = blk_mq_alloc_request(bdev_get_queue(bdev),
> -					rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
> +		clone = blk_mq_alloc_request(q, rq_data_dir(rq),
> +					     BLK_MQ_REQ_NOWAIT);
>  		if (IS_ERR(clone)) {
>  			/* EBUSY, ENODEV or EWOULDBLOCK; requeue */
>  			clear_request_fn_mpio(m, map_context);
> -- 
> 2.10.1
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Nov. 16, 2016, 12:40 a.m. UTC | #2
On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:35pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> If a single-queue dm device is stacked on top of multi-queue block
>> devices and map_tio_request() is called while there are no paths then
>> the request will be prepared for a single-queue path. If a path is
>> added after a request was prepared and before __multipath_map() is
>> called return DM_MAPIO_REQUEUE such that it gets unprepared and
>> reprepared as a blk-mq request.
>
> This patch makes little sense to me.  There isn't a scenario that I'm
> aware of that would allow the request_queue to transition between old
> .request_fn and new blk-mq.
>
> The dm-table code should prevent this.

Hello Mike,

Are you aware that dm_table_determine_type() sets "all_blk_mq" to false 
if there are no paths, even if the dm device is in blk-mq mode?

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Nov. 16, 2016, 1:01 a.m. UTC | #3
On Tue, Nov 15 2016 at  7:40pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  6:35pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>If a single-queue dm device is stacked on top of multi-queue block
> >>devices and map_tio_request() is called while there are no paths then
> >>the request will be prepared for a single-queue path. If a path is
> >>added after a request was prepared and before __multipath_map() is
> >>called return DM_MAPIO_REQUEUE such that it gets unprepared and
> >>reprepared as a blk-mq request.
> >
> >This patch makes little sense to me.  There isn't a scenario that I'm
> >aware of that would allow the request_queue to transition between old
> >.request_fn and new blk-mq.
> >
> >The dm-table code should prevent this.
> 
> Hello Mike,
> 
> Are you aware that dm_table_determine_type() sets "all_blk_mq" to
> false if there are no paths, even if the dm device is in blk-mq
> mode?

That shouldn't matter.  Once the type is established, it is used to
initialize the DM device's request_queue, the type cannot change across
different table loads.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Nov. 16, 2016, 1:08 a.m. UTC | #4
On 11/15/2016 05:01 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  7:40pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> Are you aware that dm_table_determine_type() sets "all_blk_mq" to
>> false if there are no paths, even if the dm device is in blk-mq
>> mode?
>
> That shouldn't matter.  Once the type is established, it is used to
> initialize the DM device's request_queue, the type cannot change across
> different table loads.

For a single queue dm device, what prevents a user from removing all 
single queue paths and to add one or more blk-mq paths? I think this 
will cause dm_table_determine_type() to change the table type.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Nov. 16, 2016, 1:50 a.m. UTC | #5
On Tue, Nov 15 2016 at  8:08pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 05:01 PM, Mike Snitzer wrote:
> >On Tue, Nov 15 2016 at  7:40pm -0500,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>Are you aware that dm_table_determine_type() sets "all_blk_mq" to
> >>false if there are no paths, even if the dm device is in blk-mq
> >>mode?
> >
> >That shouldn't matter.  Once the type is established, it is used to
> >initialize the DM device's request_queue, the type cannot change across
> >different table loads.
> 
> For a single queue dm device, what prevents a user from removing all
> single queue paths and to add one or more blk-mq paths? I think this
> will cause dm_table_determine_type() to change the table type.

A new table is created for every table load (any time a multipath device
is loaded into the kernel).  The DM core disallows a table with a
different type to load.  It will be rejected within an error.

See dm-ioctl.c:table_load()

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Nov. 21, 2016, 9:44 p.m. UTC | #6
On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> On Tue, Nov 15 2016 at  6:35pm -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
>> If a single-queue dm device is stacked on top of multi-queue block
>> devices and map_tio_request() is called while there are no paths then
>> the request will be prepared for a single-queue path. If a path is
>> added after a request was prepared and before __multipath_map() is
>> called return DM_MAPIO_REQUEUE such that it gets unprepared and
>> reprepared as a blk-mq request.
> 
> This patch makes little sense to me.  There isn't a scenario that I'm
> aware of that would allow the request_queue to transition between old
> .request_fn and new blk-mq.
> 
> The dm-table code should prevent this.

Hello Mike,

After having added the following code in __multipath_map() just before
the set_mpio() call:

	bdev = pgpath->path.dev->bdev;
	q = bdev_get_queue(bdev);

	if (WARN_ON_ONCE(clone && q->mq_ops) ||
	    WARN_ON_ONCE(!clone && !q->mq_ops)) {
		pr_debug("q->queue_flags = %#lx\n", q->queue_flags);
		return r;
	}

I see the following warning appear (line 544 contains
WARN_ON_ONCE(clone && q->mq_ops)):

------------[ cut here ]------------
WARNING: CPU: 2 PID: 25384 at drivers/md/dm-mpath.c:544 __multipath_map.isra.17+0x325/0x360 [dm_multipath]
Modules linked in: ib_srp scsi_transport_srp ib_srpt(O) scst_vdisk(O) scst(O) dlm libcrc32c brd dm_service_time netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm msr ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel hid_generic ipmi_ssif usbhid ipmi_devintf kvm irqbypass mlx4_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel tg3 aes_x86_64 iTCO_wdt lrw gf128mul ptp dcdbas iTCO_vendor_support glue_helper pps_core ablk_helper libphy cryptd ipmi_si pcspkr mei_me fjes ipmi_msghandler mei shpchp tpm_tis tpm_tis_core lpc_ich tpm mfd_core wmi button mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys!
 _fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last unloaded: brd]
CPU: 2 PID: 25384 Comm: kdmwork-254:0 Tainted: G           O    4.9.0-rc6-dbg+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffffc90002cd7d00 ffffffff81329bb5 0000000000000000 0000000000000000
 ffffc90002cd7d40 ffffffff810650e6 0000022000001000 ffff8804433a0008
 ffff88039134fc28 ffff88037e804008 ffff88039bacce98 0000000000001000
Call Trace:
 [<ffffffff81329bb5>] dump_stack+0x68/0x93
 [<ffffffff810650e6>] __warn+0xc6/0xe0
 [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
 [<ffffffffa0046125>] __multipath_map.isra.17+0x325/0x360 [dm_multipath]
 [<ffffffffa0046192>] multipath_map+0x12/0x20 [dm_multipath]
 [<ffffffffa002a356>] map_request+0x46/0x300 [dm_mod]
 [<ffffffffa002a621>] map_tio_request+0x11/0x30 [dm_mod]
 [<ffffffff8108a065>] kthread_worker_fn+0x105/0x1e0
 [<ffffffff81089f60>] ? __kthread_init_worker+0x70/0x70
 [<ffffffff81089ecb>] kthread+0xeb/0x110
 [<ffffffff81089de0>] ? kthread_park+0x60/0x60
 [<ffffffff8163fcc7>] ret_from_fork+0x27/0x40
---[ end trace b181de88e3eff2a0 ]---
dm_multipath:__multipath_map: q->queue_flags = 0x1d06a00

As one can see neither QUEUE_FLAG_DYING nor QUEUE_FLAG_DEAD was set:

$ grep -E 'define QUEUE_FLAG[^[:blank:]]*[[:blank:]](9|11|13|14|20|22|23|24)[[:blank:]]' include/linux/blkdev.h 
#define QUEUE_FLAG_SAME_COMP    9       /* complete on same CPU-group */
#define QUEUE_FLAG_STACKABLE   11       /* supports request stacking */
#define QUEUE_FLAG_IO_STAT     13       /* do IO stats */
#define QUEUE_FLAG_DISCARD     14       /* supports DISCARD */
#define QUEUE_FLAG_INIT_DONE   20       /* queue is initialized */
#define QUEUE_FLAG_POLL        22       /* IO polling enabled if set */
#define QUEUE_FLAG_WC          23       /* Write back caching */
#define QUEUE_FLAG_FUA         24       /* device supports FUA writes */

Do you want to comment on this?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Nov. 21, 2016, 11:43 p.m. UTC | #7
On Mon, Nov 21 2016 at  4:44pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/15/2016 04:37 PM, Mike Snitzer wrote:
> > On Tue, Nov 15 2016 at  6:35pm -0500,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> > 
> >> If a single-queue dm device is stacked on top of multi-queue block
> >> devices and map_tio_request() is called while there are no paths then
> >> the request will be prepared for a single-queue path. If a path is
> >> added after a request was prepared and before __multipath_map() is
> >> called return DM_MAPIO_REQUEUE such that it gets unprepared and
> >> reprepared as a blk-mq request.
> > 
> > This patch makes little sense to me.  There isn't a scenario that I'm
> > aware of that would allow the request_queue to transition between old
> > .request_fn and new blk-mq.
> > 
> > The dm-table code should prevent this.
> 
> Hello Mike,
> 
> After having added the following code in __multipath_map() just before
> the set_mpio() call:
> 
> 	bdev = pgpath->path.dev->bdev;
> 	q = bdev_get_queue(bdev);
> 
> 	if (WARN_ON_ONCE(clone && q->mq_ops) ||
> 	    WARN_ON_ONCE(!clone && !q->mq_ops)) {
> 		pr_debug("q->queue_flags = %#lx\n", q->queue_flags);
> 		return r;
> 	}
> 
> I see the following warning appear (line 544 contains
> WARN_ON_ONCE(clone && q->mq_ops)):
> 
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 25384 at drivers/md/dm-mpath.c:544 __multipath_map.isra.17+0x325/0x360 [dm_multipath]
> Modules linked in: ib_srp scsi_transport_srp ib_srpt(O) scst_vdisk(O) scst(O) dlm libcrc32c brd dm_service_time netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm msr ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel hid_generic ipmi_ssif usbhid ipmi_devintf kvm irqbypass mlx4_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel tg3 aes_x86_64 iTCO_wdt lrw gf128mul ptp dcdbas iTCO_vendor_support glue_helper pps_core ablk_helper libphy cryptd ipmi_si pcspkr mei_me fjes ipmi_msghandler mei shpchp tpm_tis tpm_tis_core lpc_ich tpm mfd_core wmi button mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s!
 ys_fops ttm drm sr_mod cdrom crc32c_intel ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua [last unloaded: brd]
> CPU: 2 PID: 25384 Comm: kdmwork-254:0 Tainted: G           O    4.9.0-rc6-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
>  ffffc90002cd7d00 ffffffff81329bb5 0000000000000000 0000000000000000
>  ffffc90002cd7d40 ffffffff810650e6 0000022000001000 ffff8804433a0008
>  ffff88039134fc28 ffff88037e804008 ffff88039bacce98 0000000000001000
> Call Trace:
>  [<ffffffff81329bb5>] dump_stack+0x68/0x93
>  [<ffffffff810650e6>] __warn+0xc6/0xe0
>  [<ffffffff810651b8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0046125>] __multipath_map.isra.17+0x325/0x360 [dm_multipath]
>  [<ffffffffa0046192>] multipath_map+0x12/0x20 [dm_multipath]
>  [<ffffffffa002a356>] map_request+0x46/0x300 [dm_mod]
>  [<ffffffffa002a621>] map_tio_request+0x11/0x30 [dm_mod]
>  [<ffffffff8108a065>] kthread_worker_fn+0x105/0x1e0
>  [<ffffffff81089f60>] ? __kthread_init_worker+0x70/0x70
>  [<ffffffff81089ecb>] kthread+0xeb/0x110
>  [<ffffffff81089de0>] ? kthread_park+0x60/0x60
>  [<ffffffff8163fcc7>] ret_from_fork+0x27/0x40
> ---[ end trace b181de88e3eff2a0 ]---
> dm_multipath:__multipath_map: q->queue_flags = 0x1d06a00
> 
> As one can see neither QUEUE_FLAG_DYING nor QUEUE_FLAG_DEAD was set:
> 
> $ grep -E 'define QUEUE_FLAG[^[:blank:]]*[[:blank:]](9|11|13|14|20|22|23|24)[[:blank:]]' include/linux/blkdev.h 
> #define QUEUE_FLAG_SAME_COMP    9       /* complete on same CPU-group */
> #define QUEUE_FLAG_STACKABLE   11       /* supports request stacking */
> #define QUEUE_FLAG_IO_STAT     13       /* do IO stats */
> #define QUEUE_FLAG_DISCARD     14       /* supports DISCARD */
> #define QUEUE_FLAG_INIT_DONE   20       /* queue is initialized */
> #define QUEUE_FLAG_POLL        22       /* IO polling enabled if set */
> #define QUEUE_FLAG_WC          23       /* Write back caching */
> #define QUEUE_FLAG_FUA         24       /* device supports FUA writes */
> 
> Do you want to comment on this?

Shouldn't be possible.  The previous stacktrace you shared clearly
showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
was in the stack).

Whereas the stacktrace above is clearly the old request_fn interface.

I'm unaware of how the existing code can allow this.  As I said in my
earlier mails on this: the request-queue shouldn't be able to change
from blk-mq back to .request_fn or vice-versa.

So if you think you're only testing blk-mq DM mpath on blk-mq paths,
then you need to determine how dm_old_init_request_queue() is getting
called to even setup .request_fn (dm_old_request_fn) to be used.

If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
then determine how dm_mq_init_request_queue is getting called.

Basically dm_setup_md_queue() should only ever be called the first time
the "multipath" target is loaded.  If that isn't the case, then you've
exposed some seriously weird bug/regression.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Nov. 21, 2016, 11:57 p.m. UTC | #8
On 11/21/2016 03:43 PM, Mike Snitzer wrote:
> Shouldn't be possible.  The previous stacktrace you shared clearly
> showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
> was in the stack).
>
> Whereas the stacktrace above is clearly the old request_fn interface.
>
> I'm unaware of how the existing code can allow this.  As I said in my
> earlier mails on this: the request-queue shouldn't be able to change
> from blk-mq back to .request_fn or vice-versa.
>
> So if you think you're only testing blk-mq DM mpath on blk-mq paths,
> then you need to determine how dm_old_init_request_queue() is getting
> called to even setup .request_fn (dm_old_request_fn) to be used.
>
> If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
> then determine how dm_mq_init_request_queue is getting called.
>
> Basically dm_setup_md_queue() should only ever be called the first time
> the "multipath" target is loaded.  If that isn't the case, then you've
> exposed some seriously weird bug/regression.

Hello Mike,

Sorry that I had not yet mentioned this, but the test I'm running is as 
follows:

# while true; do for t in mq sq sq-on-mq; do echo ==== $t; 
srp-test/run_tests -s -t 02-$t; done

In other words, I'm alternating mq-on-mq, sq-on-sq and sq-on-mq tests. 
The above command does not log the time at which each test started so 
I'm not sure whether it is possible to determine which test triggered 
the call stack I posted in my previous e-mail.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Nov. 22, 2016, 12:34 a.m. UTC | #9
On Mon, Nov 21 2016 at  6:57pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/21/2016 03:43 PM, Mike Snitzer wrote:
> >Shouldn't be possible.  The previous stacktrace you shared clearly
> >showed that the DM mpath request_queue was using blk-mq (dm_mq_queue_rq
> >was in the stack).
> >
> >Whereas the stacktrace above is clearly the old request_fn interface.
> >
> >I'm unaware of how the existing code can allow this.  As I said in my
> >earlier mails on this: the request-queue shouldn't be able to change
> >from blk-mq back to .request_fn or vice-versa.
> >
> >So if you think you're only testing blk-mq DM mpath on blk-mq paths,
> >then you need to determine how dm_old_init_request_queue() is getting
> >called to even setup .request_fn (dm_old_request_fn) to be used.
> >
> >If the opposite is true (old request_fn DM mpath stack on blk-mq paths)
> >then determine how dm_mq_init_request_queue is getting called.
> >
> >Basically dm_setup_md_queue() should only ever be called the first time
> >the "multipath" target is loaded.  If that isn't the case, then you've
> >exposed some seriously weird bug/regression.
> 
> Hello Mike,
> 
> Sorry that I had not yet mentioned this, but the test I'm running is
> as follows:
> 
> # while true; do for t in mq sq sq-on-mq; do echo ==== $t;
> srp-test/run_tests -s -t 02-$t; done

But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
So this would seem to be a false positive.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Nov. 22, 2016, 11:47 p.m. UTC | #10
On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.

Hello Mike,

This behavior is indeed triggered by the sq-on-mq test. After having 
added the following code in __bind():

	if (old_map &&
	    dm_table_all_blk_mq_devices(old_map) !=
	    dm_table_all_blk_mq_devices(t))
		pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
			 dm_device_name(md),
			 dm_table_all_blk_mq_devices(old_map),
			 dm_table_all_blk_mq_devices(t));

I see the following output appear frequently in the kernel log:

dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0

Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone && 
q->mq_ops) is triggered in __multipath_map()? Does this mean that the 
comment in patch http://marc.info/?l=dm-devel&m=147925314306752 is correct?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Nov. 23, 2016, 12:48 a.m. UTC | #11
On Tue, Nov 22 2016 at  6:47pm -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> >But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
> 
> Hello Mike,
> 
> This behavior is indeed triggered by the sq-on-mq test. After having
> added the following code in __bind():
> 
> 	if (old_map &&
> 	    dm_table_all_blk_mq_devices(old_map) !=
> 	    dm_table_all_blk_mq_devices(t))
> 		pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
> 			 dm_device_name(md),
> 			 dm_table_all_blk_mq_devices(old_map),
> 			 dm_table_all_blk_mq_devices(t));
> 
> I see the following output appear frequently in the kernel log:
> 
> dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0
> 
> Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone
> && q->mq_ops) is triggered in __multipath_map()? Does this mean that
> the comment in patch http://marc.info/?l=dm-devel&m=147925314306752
> is correct?

Yes, looks like you're on to something.  dm_old_prep_tio() has:

        /*
         * Must clone a request if this .request_fn DM device
         * is stacked on .request_fn device(s).
         */
        if (!dm_table_all_blk_mq_devices(table)) { ...

So if your table->all_blk_mq is false (likely due to a temporary no
paths in multipath table scenario) a clone will be passed to
__multipath_map().  But what isn't clear is how __multipath_map() would
then go on to have any underlying paths available to issue IO to (given
the table would have been empty -- or so I would think).

Would be great if you could verify that the table is in fact empty.

It should be noted that dm_table_determine_type() has:

        if (list_empty(devices) && __table_type_request_based(live_md_type)) {
                /* inherit live MD type */
                t->type = live_md_type;
                return 0;
        }

So this explains how/why an empty table will inherit the
DM_TYPE_MQ_REQUEST_BASED, and it also explains why the new (empty) table
would not have ->all_blk_mq set to true.  But again, no IO is able to be
issued when there are no underlying paths.

And looking closer, __multipath_map() _should_ return early with either
DM_MAPIO_DELAY_REQUEUE or DM_MAPIO_REQUEUE when no paths are available.

Not seeing how you could have this scenario where you prepared a clone
(as if the clone request were to be issued to a .request_fn, aka "sq",
device) and then by the time you get into __multipath_map() there is an
underlying path available with q->mq_ops.

But regardless, what certainly needs fixing is the inconsistency of
inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
(all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).

I'm now questioning why we even need the 'all_blk_mq' state within the
table.  I'll revisit the "why?" on all that in my previous commits.
I'll likely get back with you on this point tomorrow.  And will
hopefully have a fix for you.

FYI: given all this I do think something like your 7/7 patch (which you
referenced with the above url) would be a possible added safety net to
guard against future inconsistencies/bug/regressions.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7559537..6b20349 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -541,6 +541,7 @@  static int __multipath_map(struct dm_target *ti, struct request *clone,
 	size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
 	struct pgpath *pgpath;
 	struct block_device *bdev;
+	struct request_queue *q;
 	struct dm_mpath_io *mpio;
 
 	/* Do we need to select a new pgpath? */
@@ -558,6 +559,18 @@  static int __multipath_map(struct dm_target *ti, struct request *clone,
 		return r;
 	}
 
+	bdev = pgpath->path.dev->bdev;
+	q = bdev_get_queue(bdev);
+
+	/*
+	 * When a request is prepared if there are no paths it may happen that
+	 * the request was prepared for a single-queue path and that a
+	 * multiqueue path is added before __multipath_map() is called. If
+	 * that happens requeue to trigger unprepare and reprepare.
+	 */
+	if ((clone && q->mq_ops) || (!clone && !q->mq_ops))
+		return r;
+
 	mpio = set_mpio(m, map_context);
 	if (!mpio)
 		/* ENOMEM, requeue */
@@ -566,22 +579,20 @@  static int __multipath_map(struct dm_target *ti, struct request *clone,
 	mpio->pgpath = pgpath;
 	mpio->nr_bytes = nr_bytes;
 
-	bdev = pgpath->path.dev->bdev;
-
 	if (clone) {
 		/*
 		 * Old request-based interface: allocated clone is passed in.
 		 * Used by: .request_fn stacked on .request_fn path(s).
 		 */
-		clone->q = bdev_get_queue(bdev);
+		clone->q = q;
 	} else {
 		/*
 		 * blk-mq request-based interface; used by both:
 		 * .request_fn stacked on blk-mq path(s) and
 		 * blk-mq stacked on blk-mq path(s).
 		 */
-		clone = blk_mq_alloc_request(bdev_get_queue(bdev),
-					rq_data_dir(rq), BLK_MQ_REQ_NOWAIT);
+		clone = blk_mq_alloc_request(q, rq_data_dir(rq),
+					     BLK_MQ_REQ_NOWAIT);
 		if (IS_ERR(clone)) {
 			/* EBUSY, ENODEV or EWOULDBLOCK; requeue */
 			clear_request_fn_mpio(m, map_context);