diff mbox

blk-mq request allocation stalls

Message ID alpine.LNX.2.00.1501121830010.4026@localhost.lm.intel.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Keith Busch Jan. 12, 2015, 6:35 p.m. UTC
On Mon, 12 Jan 2015, Keith Busch wrote:
> Oh, let's look at "__blk_rq_prep_clone". dm calls that after
> blk_get_request() for the blk-mq based multipath types and overrides the
> destinations cmd_flags with the source's even though the source was not
> allocated from a blk-mq based queue, much less a shared tag.

Untested patch. This will also preserve the failfast cmd_flag dm-mpath
set after allocating.

---
--

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Mike Snitzer Jan. 12, 2015, 7:11 p.m. UTC | #1
On Mon, Jan 12 2015 at  1:35pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Mon, 12 Jan 2015, Keith Busch wrote:
> >Oh, let's look at "__blk_rq_prep_clone". dm calls that after
> >blk_get_request() for the blk-mq based multipath types and overrides the
> >destinations cmd_flags with the source's even though the source was not
> >allocated from a blk-mq based queue, much less a shared tag.
> 
> Untested patch. This will also preserve the failfast cmd_flag dm-mpath
> set after allocating.

Ah, good point.  The failfast flag would get cleared with the patch I
just proposed (unless REQ_FAILFAST_TRANSPORT was added to
REQ_PRESERVE_CLONE_MASK).

Anyway, I'm happy to see this implemented however you guys think is
best.  I think I like Keith's patch better than mine.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 12, 2015, 8:21 p.m. UTC | #2
On Mon, Jan 12 2015 at  2:11pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Jan 12 2015 at  1:35pm -0500,
> Keith Busch <keith.busch@intel.com> wrote:
> 
> > On Mon, 12 Jan 2015, Keith Busch wrote:
> > >Oh, let's look at "__blk_rq_prep_clone". dm calls that after
> > >blk_get_request() for the blk-mq based multipath types and overrides the
> > >destinations cmd_flags with the source's even though the source was not
> > >allocated from a blk-mq based queue, much less a shared tag.
> > 
> > Untested patch. This will also preserve the failfast cmd_flag dm-mpath
> > set after allocating.
> 
> Ah, good point.  The failfast flag would get cleared with the patch I
> just proposed (unless REQ_FAILFAST_TRANSPORT was added to
> REQ_PRESERVE_CLONE_MASK).
> 
> Anyway, I'm happy to see this implemented however you guys think is
> best.  I think I like Keith's patch better than mine.

FYI, I staged Keith's patch here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e

Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch.
Please test, it should hopefully take care of the stall you've been
seeing.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Jan. 13, 2015, 12:29 p.m. UTC | #3
On 01/12/15 21:22, Mike Snitzer wrote:
> FYI, I staged Keith's patch here:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e
> 
> Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch.
> Please test, it should hopefully take care of the stall you've been
> seeing.

Hello Mike,

In the quick test I ran the I/O stalls were indeed gone. Thanks :-)

However, I hit another issue while running I/O on top of a multipath
device (on a kernel with lockdep and SLUB memory poisoning enabled):

NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116]
CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G        W      3.19.0-rc4-debug+ #1
Call Trace:
 [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0
 [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20
 [<ffffffff81342c8e>] alloc_iova+0x2e/0x250
 [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0
 [<ffffffff81348a15>] intel_map_sg+0xc5/0x260
 [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp]
 [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod]
 [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod]
 [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370
 [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150
 [<ffffffff8124bade>] blk_get_request+0x2e/0xe0
 [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath]
 [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath]
 [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod]
 [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0
 [<ffffffff81075c0f>] kthread+0xef/0x110
 [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 13, 2015, 2:17 p.m. UTC | #4
On Tue, Jan 13 2015 at  7:29am -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 01/12/15 21:22, Mike Snitzer wrote:
> > FYI, I staged Keith's patch here:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e
> > 
> > Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch.
> > Please test, it should hopefully take care of the stall you've been
> > seeing.
> 
> Hello Mike,
> 
> In the quick test I ran the I/O stalls were indeed gone. Thanks :-)

Good news, followed by a new mole rearing its head ;)
 
> However, I hit another issue while running I/O on top of a multipath
> device (on a kernel with lockdep and SLUB memory poisoning enabled):
>
> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116]
> CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G        W      3.19.0-rc4-debug+ #1
> Call Trace:
>  [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0
>  [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20
>  [<ffffffff81342c8e>] alloc_iova+0x2e/0x250
>  [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0
>  [<ffffffff81348a15>] intel_map_sg+0xc5/0x260
>  [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp]
>  [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod]
>  [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod]
>  [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370
>  [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150
>  [<ffffffff8124bade>] blk_get_request+0x2e/0xe0
>  [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath]
>  [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath]
>  [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod]
>  [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0
>  [<ffffffff81075c0f>] kthread+0xef/0x110
>  [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0

Unfortunate.  Is this still with a 16MB backing device or is it real
hardware?  Can you share the workload so that myself and/or Keith could
try to reproduce?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Jan. 13, 2015, 2:28 p.m. UTC | #5
On 01/13/15 15:18, Mike Snitzer wrote:
> On Tue, Jan 13 2015 at  7:29am -0500,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> However, I hit another issue while running I/O on top of a multipath
>> device (on a kernel with lockdep and SLUB memory poisoning enabled):
>>
>> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116]
>> CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G        W      3.19.0-rc4-debug+ #1
>> Call Trace:
>>  [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0
>>  [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20
>>  [<ffffffff81342c8e>] alloc_iova+0x2e/0x250
>>  [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0
>>  [<ffffffff81348a15>] intel_map_sg+0xc5/0x260
>>  [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp]
>>  [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod]
>>  [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod]
>>  [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370
>>  [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150
>>  [<ffffffff8124bade>] blk_get_request+0x2e/0xe0
>>  [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath]
>>  [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath]
>>  [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod]
>>  [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0
>>  [<ffffffff81075c0f>] kthread+0xef/0x110
>>  [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0
> 
> Unfortunate.  Is this still with a 16MB backing device or is it real
> hardware?  Can you share the workload so that myself and/or Keith could
> try to reproduce?
 
Hello Mike,

This is still with a 16MB RAM disk as backing device. The fio job I
used to trigger this was as follows:

dev=/dev/sdc
fio --bs=4K --ioengine=libaio --rw=randread --buffered=0 --numjobs=12   \
    --iodepth=128 --iodepth_batch=64 --iodepth_batch_complete=64        \
    --thread --norandommap --loops=$((2**31)) --runtime=60              \
    --group_reporting --gtod_reduce=1 --name=$dev --filename=$dev       \
    --invalidate=1

Bart.
 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jens Axboe Jan. 13, 2015, 2:59 p.m. UTC | #6
On 01/12/2015 11:35 AM, Keith Busch wrote:
> On Mon, 12 Jan 2015, Keith Busch wrote:
>> Oh, let's look at "__blk_rq_prep_clone". dm calls that after
>> blk_get_request() for the blk-mq based multipath types and overrides the
>> destinations cmd_flags with the source's even though the source was not
>> allocated from a blk-mq based queue, much less a shared tag.
>
> Untested patch. This will also preserve the failfast cmd_flag dm-mpath
> set after allocating.
>
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 7e78931..6201090 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
>   static void __blk_rq_prep_clone(struct request *dst, struct request *src)
>   {
>       dst->cpu = src->cpu;
> -    dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> +    if (dst->q->mq_ops)
> +        dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> +    else
> +        dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;

Making the two cases different is a bit... nonsensical. We should do 
this for both cases, if safe, or move the MQ_INFLIGHT flag and expand 
the CLONE_MASK.
Keith Busch Jan. 13, 2015, 3:11 p.m. UTC | #7
On Tue, 13 Jan 2015, Jens Axboe wrote:
> On 01/12/2015 11:35 AM, Keith Busch wrote:
>> On Mon, 12 Jan 2015, Keith Busch wrote:
>>> Oh, let's look at "__blk_rq_prep_clone". dm calls that after
>>> blk_get_request() for the blk-mq based multipath types and overrides the
>>> destinations cmd_flags with the source's even though the source was not
>>> allocated from a blk-mq based queue, much less a shared tag.
>> 
>> Untested patch. This will also preserve the failfast cmd_flag dm-mpath
>> set after allocating.
>> 
>> ---
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 7e78931..6201090 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
>>   static void __blk_rq_prep_clone(struct request *dst, struct request *src)
>>   {
>>       dst->cpu = src->cpu;
>> -    dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
>> +    if (dst->q->mq_ops)
>> +        dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
>> +    else
>> +        dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
>
> Making the two cases different is a bit... nonsensical. We should do this for 
> both cases, if safe, or move the MQ_INFLIGHT flag and expand the CLONE_MASK.

Expanding the clone mask won't do any good since the src doesn't come
from blk-mq and wouldn't ever have MQ_INFLIGHT set. Blk-mq initializes
the cmd_flags when you get one so I assumed OR'ing was safe. For the
non-blk-mq case, we have no guarantees how the req was initialized and
could have nonsense cmd_flags.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 13, 2015, 3:14 p.m. UTC | #8
On Tue, Jan 13 2015 at  9:59am -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 01/12/2015 11:35 AM, Keith Busch wrote:
> >On Mon, 12 Jan 2015, Keith Busch wrote:
> >>Oh, let's look at "__blk_rq_prep_clone". dm calls that after
> >>blk_get_request() for the blk-mq based multipath types and overrides the
> >>destinations cmd_flags with the source's even though the source was not
> >>allocated from a blk-mq based queue, much less a shared tag.
> >
> >Untested patch. This will also preserve the failfast cmd_flag dm-mpath
> >set after allocating.
> >
> >---
> >diff --git a/block/blk-core.c b/block/blk-core.c
> >index 7e78931..6201090 100644
> >--- a/block/blk-core.c
> >+++ b/block/blk-core.c
> >@@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
> >  static void __blk_rq_prep_clone(struct request *dst, struct request *src)
> >  {
> >      dst->cpu = src->cpu;
> >-    dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> >+    if (dst->q->mq_ops)
> >+        dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> >+    else
> >+        dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> 
> Making the two cases different is a bit... nonsensical. We should do
> this for both cases, if safe, or move the MQ_INFLIGHT flag and
> expand the CLONE_MASK.

ok, i'll work through it.
k

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 13, 2015, 3:41 p.m. UTC | #9
On Tue, Jan 13 2015 at 10:11am -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Tue, 13 Jan 2015, Jens Axboe wrote:
> >On 01/12/2015 11:35 AM, Keith Busch wrote:
> >>On Mon, 12 Jan 2015, Keith Busch wrote:
> >>>Oh, let's look at "__blk_rq_prep_clone". dm calls that after
> >>>blk_get_request() for the blk-mq based multipath types and overrides the
> >>>destinations cmd_flags with the source's even though the source was not
> >>>allocated from a blk-mq based queue, much less a shared tag.
> >>
> >>Untested patch. This will also preserve the failfast cmd_flag dm-mpath
> >>set after allocating.
> >>
> >>---
> >>diff --git a/block/blk-core.c b/block/blk-core.c
> >>index 7e78931..6201090 100644
> >>--- a/block/blk-core.c
> >>+++ b/block/blk-core.c
> >>@@ -2895,7 +2895,10 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
> >>  static void __blk_rq_prep_clone(struct request *dst, struct request *src)
> >>  {
> >>      dst->cpu = src->cpu;
> >>-    dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> >>+    if (dst->q->mq_ops)
> >>+        dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> >>+    else
> >>+        dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
> >
> >Making the two cases different is a bit... nonsensical. We should
> >do this for both cases, if safe, or move the MQ_INFLIGHT flag and
> >expand the CLONE_MASK.
> 
> Expanding the clone mask won't do any good since the src doesn't come
> from blk-mq and wouldn't ever have MQ_INFLIGHT set. Blk-mq initializes
> the cmd_flags when you get one so I assumed OR'ing was safe. For the
> non-blk-mq case, we have no guarantees how the req was initialized and
> could have nonsense cmd_flags.

It _could_ but the only consumer of blk_rq_prep_clone() is request-based
DM, and the non-blk-mq case uses blk_rq_init() prior to calling
blk_rq_prep_clone() so the cmd_flags will be 0.

I've revised the patch to just do it for both cases since it will be
safe, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=6a55c9861326dc2a731c7978d93567dd4e62d2f7

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 13, 2015, 4:20 p.m. UTC | #10
On Tue, Jan 13 2015 at  9:28am -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 01/13/15 15:18, Mike Snitzer wrote:
> > On Tue, Jan 13 2015 at  7:29am -0500,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> However, I hit another issue while running I/O on top of a multipath
> >> device (on a kernel with lockdep and SLUB memory poisoning enabled):
> >>
> >> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116]
> >> CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G        W      3.19.0-rc4-debug+ #1
> >> Call Trace:
> >>  [<ffffffff8118e4be>] kmem_cache_alloc+0x28e/0x2c0
> >>  [<ffffffff81346aca>] alloc_iova_mem+0x1a/0x20
> >>  [<ffffffff81342c8e>] alloc_iova+0x2e/0x250
> >>  [<ffffffff81344b65>] intel_alloc_iova+0x95/0xd0
> >>  [<ffffffff81348a15>] intel_map_sg+0xc5/0x260
> >>  [<ffffffffa07e0661>] srp_queuecommand+0xa11/0xc30 [ib_srp]
> >>  [<ffffffffa001698e>] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod]
> >>  [<ffffffffa0017480>] scsi_queue_rq+0x630/0x700 [scsi_mod]
> >>  [<ffffffff8125683d>] __blk_mq_run_hw_queue+0x1dd/0x370
> >>  [<ffffffff81256aae>] blk_mq_alloc_request+0xde/0x150
> >>  [<ffffffff8124bade>] blk_get_request+0x2e/0xe0
> >>  [<ffffffffa07ebd0f>] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath]
> >>  [<ffffffffa07ebd6a>] multipath_clone_and_map+0x1a/0x20 [dm_multipath]
> >>  [<ffffffffa044abb5>] map_tio_request+0x1d5/0x3a0 [dm_mod]
> >>  [<ffffffff81075d16>] kthread_worker_fn+0x86/0x1b0
> >>  [<ffffffff81075c0f>] kthread+0xef/0x110
> >>  [<ffffffff814db42c>] ret_from_fork+0x7c/0xb0
> > 
> > Unfortunate.  Is this still with a 16MB backing device or is it real
> > hardware?  Can you share the workload so that myself and/or Keith could
> > try to reproduce?
>  
> Hello Mike,
> 
> This is still with a 16MB RAM disk as backing device. The fio job I
> used to trigger this was as follows:
> 
> dev=/dev/sdc
> fio --bs=4K --ioengine=libaio --rw=randread --buffered=0 --numjobs=12   \
>     --iodepth=128 --iodepth_batch=64 --iodepth_batch_complete=64        \
>     --thread --norandommap --loops=$((2**31)) --runtime=60              \
>     --group_reporting --gtod_reduce=1 --name=$dev --filename=$dev       \
>     --invalidate=1

OK, I assume you specified the mpath device for the test that failed.

This test works fine on my 100MB scsi_debug device with 4 paths exported
over virtio-blk to a guest that assembles the mpath device.

Could be a hang that is unique to scsi-mq.

Any chance you'd be willing to provide a HOWTO for setting up your
SRP/iscsi configuration?

Are you carrying any related changes that are not upstream?  (I can hunt
down the email in this thread where you describe your kernel tree...)

I'll try to reproduce but this info could be useful to others that are
more scsi-mq inclined who might need to chase this too.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 27, 2015, 6:42 p.m. UTC | #11
Hey Jens,

I _think_ we've resolved the issues Bart raised for request-based DM's
support for blk-mq devices (anything remaining seems specific to iSER's
blk-mq support which is in development).  Though Keith did have that one
additional patch for that block scatter gather attribute that we still
need to review closer.

Anyway, I think what we have is a solid start and see no reason to hold
these changes back further.  So I've rebased the 'dm-for-3.20' branch of
linux-dm.git ontop of 3.19-rc6 and reordered the required block changes
to be at the front of the series, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20

(these changes have been in Linux next for a month, via linux-dm.git
'for-next')

With your OK, I'd be happy to carry the required block changes and
ultimately request Linus pull them for 3.20 (I can backfill your Acks if
you approve).  BUT I also have no problem with you picking up the block
changes to submit via your block tree (I'd just have to rebase ontop of
your 3.20 branch once you pull them in).

Let me know what you think, thanks.
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jens Axboe Jan. 28, 2015, 4:42 p.m. UTC | #12
On 01/27/2015 11:42 AM, Mike Snitzer wrote:
> Hey Jens,
>
> I _think_ we've resolved the issues Bart raised for request-based DM's
> support for blk-mq devices (anything remaining seems specific to iSER's
> blk-mq support which is in development).  Though Keith did have that one
> additional patch for that block scatter gather attribute that we still
> need to review closer.
>
> Anyway, I think what we have is a solid start and see no reason to hold
> these changes back further.  So I've rebased the 'dm-for-3.20' branch of
> linux-dm.git ontop of 3.19-rc6 and reordered the required block changes
> to be at the front of the series, see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20
>
> (these changes have been in Linux next for a month, via linux-dm.git
> 'for-next')
>
> With your OK, I'd be happy to carry the required block changes and
> ultimately request Linus pull them for 3.20 (I can backfill your Acks if
> you approve).  BUT I also have no problem with you picking up the block
> changes to submit via your block tree (I'd just have to rebase ontop of
> your 3.20 branch once you pull them in).

I'd prefer to take these prep patches through the block tree. Only one I 
don't really like is this one:

https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d

I prefer keeping the alloc path as lean as possible, normal allocs 
always initialize ->bio since they need to associate a bio with it. Do
you have the oops trace from this one? Just curious if we can get rid of 
it, depending on how deep in the caller this is.
Mike Snitzer Jan. 28, 2015, 5:44 p.m. UTC | #13
On Wed, Jan 28 2015 at 11:42am -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 01/27/2015 11:42 AM, Mike Snitzer wrote:
> >Hey Jens,
> >
> >I _think_ we've resolved the issues Bart raised for request-based DM's
> >support for blk-mq devices (anything remaining seems specific to iSER's
> >blk-mq support which is in development).  Though Keith did have that one
> >additional patch for that block scatter gather attribute that we still
> >need to review closer.
> >
> >Anyway, I think what we have is a solid start and see no reason to hold
> >these changes back further.  So I've rebased the 'dm-for-3.20' branch of
> >linux-dm.git ontop of 3.19-rc6 and reordered the required block changes
> >to be at the front of the series, see:
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20
> >
> >(these changes have been in Linux next for a month, via linux-dm.git
> >'for-next')
> >
> >With your OK, I'd be happy to carry the required block changes and
> >ultimately request Linus pull them for 3.20 (I can backfill your Acks if
> >you approve).  BUT I also have no problem with you picking up the block
> >changes to submit via your block tree (I'd just have to rebase ontop of
> >your 3.20 branch once you pull them in).
> 
> I'd prefer to take these prep patches through the block tree.

Great, should I send the patches or can you cherry-pick?

> Only one I don't really like is this one:
> 
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d
> 
> I prefer keeping the alloc path as lean as possible, normal allocs
> always initialize ->bio since they need to associate a bio with it.

Would be very surprised if this initialization were measurable but..
I could push this initialization into the DM-mpath driver (just after
blk_get_request, like Keith opted for) but that seemed really gross.

> Do you have the oops trace from this one? Just curious if we can get
> rid of it, depending on how deep in the caller this is.

I did't but it was easy enough to recreate:

[    3.112949] BUG: unable to handle kernel NULL pointer dereference at           (null)                                                                               |
[    3.113416] IP: [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160                                                                                                   |
[    3.113416] PGD 0                                                                                                                                                   |
[    3.113416] Oops: 0002 [#1] SMP                                                                                                                                     |
[    3.113416] Modules linked in: dm_service_time crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper crypt|
d serio_raw pcspkr virtio_balloon 8139too i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath dm_mod ext4 mbcache jbd2 sd_mod ata_generic cirrus pata_ac|
pi syscopyarea sysfillrect sysimgblt drm_kms_helper ttm drm virtio_scsi virtio_blk 8139cp virtio_pci mii i2c_core virtio_ring ata_piix virtio libata floppy            |
[    3.113416] CPU: 0 PID: 483 Comm: kdmwork-252:3 Tainted: G        W      3.18.0+ #29                                                                                |
[    3.113416] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011                                                                                                       |
[    3.113416] task: ffff880035c1ad20 ti: ffff8800d6900000 task.ti: ffff8800d6900000                                                                                   |
[    3.113416] RIP: 0010:[<ffffffff812f6734>]  [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160                                                                       |
[    3.113416] RSP: 0000:ffff8800d6903d48  EFLAGS: 00010286                                                                                                            |
[    3.113416] RAX: 0000000000000000 RBX: ffffffffa0208500 RCX: 0000000000000001                                                                                       |
[    3.113416] RDX: ffff8800d7a3b0a0 RSI: ffff880035d0ab00 RDI: ffff880119f8f510                                                                                       |
[    3.113416] RBP: ffff8800d6903d98 R08: 00000000000185a0 R09: 00000000000000d0                                                                                       |
[    3.113416] R10: ffff8800d7547680 R11: ffff880035c1b8c8 R12: ffff8800d83d7900
[    3.113416] R13: ffff880035d0ab00 R14: ffff880119f8f510 R15: ffff8800d7547680
[    3.113416] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[    3.113416] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.113416] CR2: 0000000000000000 CR3: 00000000daeec000 CR4: 00000000000407f0
[    3.113416] Stack:
[    3.113416]  ffff8800d6903db8 ffff8800d71502e0 ffff8800d7a3b0a0 000000d0d71502e0
[    3.113416]  ffff8800d6e89800 ffff8800d71502e0 ffff8800d6e89800 0000000000000001
[    3.113416]  ffff8800d7a3b0a0 ffffc90000998040 ffff8800d6903df8 ffffffffa0209c69
[    3.113416] Call Trace:
[    3.113416]  [<ffffffffa0209c69>] map_tio_request+0x219/0x2b0 [dm_mod]
[    3.113416]  [<ffffffff8109a4ee>] kthread_worker_fn+0x7e/0x1b0
[    3.113416]  [<ffffffff8109a470>] ? __init_kthread_worker+0x60/0x60
[    3.113416]  [<ffffffff8109a3f7>] kthread+0x107/0x120
[    3.113416]  [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240
[    3.113416]  [<ffffffff816952bc>] ret_from_fork+0x7c/0xb0
[    3.113416]  [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240
[    3.113416] Code: 89 c3 48 83 ec 28 4c 8b 6e 68 48 85 d2 4c 0f 44 25 22 b7 92 01 48 89 75 b8 89 4d cc 4c 89 4d c0 4d 85 ed 75 16 eb 60 49 8b 47 70 <4c> 89 30 4d 89
77 70 4d 8b 6d 00 4d 85 ed 74 4c 8b 75 cc 4c 89
[    3.113416] RIP  [<ffffffff812f6734>] blk_rq_prep_clone+0x44/0x160
[    3.113416]  RSP <ffff8800d6903d48>
[    3.113416] CR2: 0000000000000000
[    3.113416] ---[ end trace 9b3bb6dd6cc4435d ]---

crash> dis -l blk_rq_prep_clone+0x44
/home/snitm/git/linux/block/blk-core.c: 2945
0xffffffff812f6734 <blk_rq_prep_clone+0x44>:    mov    %r14,(%rax)

crash> l /home/snitm/git/linux/block/blk-core.c: 2945
2940    
2941                    if (bio_ctr && bio_ctr(bio, bio_src, data))
2942                            goto free_and_out;
2943    
2944                    if (rq->bio) {
2945                            rq->biotail->bi_next = bio;
2946                            rq->biotail = bio;
2947                    } else
2948                            rq->bio = rq->biotail = bio;
2949            }

Given it would seem the NULL pointer occurs when attempting to
dereference rq->biotail a revised check of "if (rq->bio && rq->biotail)"
should suffice but I unfortunately then get:

[    2.801634] general protection fault: 0000 [#1] SMP
[    2.802504] Modules linked in: dm_service_time crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper crypt
d pcspkr serio_raw 8139too virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath dm_mod ext4 mbcache jbd2 ata_generic sd_mod pata_acpi cirr
us syscopyarea sysfillrect sysimgblt drm_kms_helper ttm virtio_scsi virtio_blk drm virtio_pci virtio_ring ata_piix 8139cp libata mii i2c_core virtio floppy
[    2.802504] CPU: 0 PID: 474 Comm: kdmwork-252:1 Tainted: G        W      3.18.0+ #30
[    2.802504] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    2.802504] task: ffff8801194b1690 ti: ffff880119abc000 task.ti: ffff880119abc000
[    2.802504] RIP: 0010:[<ffffffff812f6739>]  [<ffffffff812f6739>] blk_rq_prep_clone+0x49/0x160
[    2.802504] RSP: 0018:ffff880119abfd48  EFLAGS: 00010206                                                                                                            
[    2.802504] RAX: 6de900000000e800 RBX: ffffffffa0218500 RCX: 0000000000000001                                                                                       
[    2.802504] RDX: ffff8800daca30a0 RSI: ffff880119dcaf00 RDI: ffff880119dca310                                                                                       
[    2.802504] RBP: ffff880119abfd98 R08: 00000000000185a0 R09: 00000000000000d0                                                                                       
[    2.802504] R10: ffff880035937680 R11: ffff8801194b2238 R12: ffff880035876900                                                                                       
[    2.802504] R13: ffff880119dcaf00 R14: ffff880119dca310 R15: ffff880035937680                                                                                       
[    2.802504] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000                                                                            
[    2.802504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                       
[    2.802504] CR2: 00007f45598c2350 CR3: 000000003614a000 CR4: 00000000000407f0                                                                                       
[    2.802504] Stack:                                                                                                                                                  
[    2.802504]  ffff880119abfdb8 ffff8800dae602e0 ffff8800daca30a0 000000d0dae602e0                                                                                    
[    2.802504]  ffff8800dad6a000 ffff8800dae602e0 ffff8800dad6a000 0000000000000001                                                                                    
[    2.802504]  ffff8800daca30a0 ffffc9000097d040 ffff880119abfdf8 ffffffffa0219c69                                                                                    
[    2.802504] Call Trace:                                                                                                                                             
[    2.802504]  [<ffffffffa0219c69>] map_tio_request+0x219/0x2b0 [dm_mod]                                                                                              
[    2.802504]  [<ffffffff8109a4ee>] kthread_worker_fn+0x7e/0x1b0                                                                                                      
[    2.802504]  [<ffffffff8109a470>] ? __init_kthread_worker+0x60/0x60                                                                                                 
[    2.802504]  [<ffffffff8109a3f7>] kthread+0x107/0x120                                                                                                               
[    2.802504]  [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240                                                                                              
[    2.802504]  [<ffffffff816952bc>] ret_from_fork+0x7c/0xb0                                                                                                           
[    2.802504]  [<ffffffff8109a2f0>] ? kthread_create_on_node+0x240/0x240                                                                                              
[    2.802504] Code: 28 4c 8b 6e 68 48 85 d2 4c 0f 44 25 22 b7 92 01 48 89 75 b8 89 4d cc 4c 89 4d c0 4d 85 ed 75 1b eb 64 49 8b 47 70 48 85 c0 74 4a <4c> 89 30 4d 89 
77 70 4d 8b 6d 00 4d 85 ed 74 4b 8b 75 cc 4c 89                                                                                                                        
[    2.802504] RIP  [<ffffffff812f6739>] blk_rq_prep_clone+0x49/0x160                                                                                                  
[    2.802504]  RSP <ffff880119abfd48>                                                                                                                                 
[    2.802386] general protection fault: 0000 [#2] [    2.893050] ---[ end trace 20d230269dc05eca ]---                                  

Not sure what to make of this (other than rq->biotail is pointing at
crap too, which is actually likely if rq->bio is):

crash> dis -l blk_rq_prep_clone+0x49
/home/snitm/git/linux/block/blk-core.c: 2945
0xffffffff812f6739 <blk_rq_prep_clone+0x49>:    mov    %r14,(%rax)

crash> l /home/snitm/git/linux/block/blk-core.c: 2945
2940    
2941                    if (bio_ctr && bio_ctr(bio, bio_src, data))
2942                            goto free_and_out;
2943    
2944                    if (rq->bio && rq->biotail) {
2945                            rq->biotail->bi_next = bio;
2946                            rq->biotail = bio;
2947                    } else
2948                            rq->bio = rq->biotail = bio;
2949            }

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jens Axboe Jan. 28, 2015, 5:49 p.m. UTC | #14
On 01/28/2015 10:44 AM, Mike Snitzer wrote:
> On Wed, Jan 28 2015 at 11:42am -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> On 01/27/2015 11:42 AM, Mike Snitzer wrote:
>>> Hey Jens,
>>>
>>> I _think_ we've resolved the issues Bart raised for request-based DM's
>>> support for blk-mq devices (anything remaining seems specific to iSER's
>>> blk-mq support which is in development).  Though Keith did have that one
>>> additional patch for that block scatter gather attribute that we still
>>> need to review closer.
>>>
>>> Anyway, I think what we have is a solid start and see no reason to hold
>>> these changes back further.  So I've rebased the 'dm-for-3.20' branch of
>>> linux-dm.git ontop of 3.19-rc6 and reordered the required block changes
>>> to be at the front of the series, see:
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.20
>>>
>>> (these changes have been in Linux next for a month, via linux-dm.git
>>> 'for-next')
>>>
>>> With your OK, I'd be happy to carry the required block changes and
>>> ultimately request Linus pull them for 3.20 (I can backfill your Acks if
>>> you approve).  BUT I also have no problem with you picking up the block
>>> changes to submit via your block tree (I'd just have to rebase ontop of
>>> your 3.20 branch once you pull them in).
>>
>> I'd prefer to take these prep patches through the block tree.
>
> Great, should I send the patches or can you cherry-pick?

I already cherry picked them, they are in the for-3.20/core branch.

>> Only one I don't really like is this one:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=23556c2461407495099d1eb20b0de43432dc727d
>>
>> I prefer keeping the alloc path as lean as possible, normal allocs
>> always initialize ->bio since they need to associate a bio with it.
>
> Would be very surprised if this initialization were measurable but..

That's what people always say, and then keep piling more crap in...

> I could push this initialization into the DM-mpath driver (just after
> blk_get_request, like Keith opted for) but that seemed really gross.

It's already doing blk_rq_init() now, so not a huge change and not that 
nasty.
Keith Busch Jan. 29, 2015, 10:43 p.m. UTC | #15
On Wed, 28 Jan 2015, Jens Axboe wrote:
> On 01/28/2015 10:44 AM, Mike Snitzer wrote:
>> On Wed, Jan 28 2015 at 11:42am -0500,
>> Jens Axboe <axboe@kernel.dk> wrote:
>>> I'd prefer to take these prep patches through the block tree.
>> 
>> Great, should I send the patches or can you cherry-pick?
>
> I already cherry picked them, they are in the for-3.20/core branch.

I might be getting ahead of myself for trying this right now (sorry if
that's the case) but 'for-3.20/core' is missing necessary parts of the
series and hits a BUG_ON at blk-core.c:2333. The original request is
initialized when setting up the clone, and in this branch, that happens
in prep_fn before the request was dequeued so it's not in the queuelist
when it started.

One of my commits relocated the initialization, but I didn't realize it
had a hard dependency on the follow-on commit.  Should we reorder that
part of the series?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 29, 2015, 11:09 p.m. UTC | #16
On Thu, Jan 29 2015 at  5:43pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Wed, 28 Jan 2015, Jens Axboe wrote:
> >On 01/28/2015 10:44 AM, Mike Snitzer wrote:
> >>On Wed, Jan 28 2015 at 11:42am -0500,
> >>Jens Axboe <axboe@kernel.dk> wrote:
> >>>I'd prefer to take these prep patches through the block tree.
> >>
> >>Great, should I send the patches or can you cherry-pick?
> >
> >I already cherry picked them, they are in the for-3.20/core branch.
> 
> I might be getting ahead of myself for trying this right now (sorry if
> that's the case) but 'for-3.20/core' is missing necessary parts of the
> series and hits a BUG_ON at blk-core.c:2333. The original request is
> initialized when setting up the clone, and in this branch, that happens
> in prep_fn before the request was dequeued so it's not in the queuelist
> when it started.
> 
> One of my commits relocated the initialization, but I didn't realize it
> had a hard dependency on the follow-on commit.  Should we reorder that
> part of the series?

Which follow on commit are you referring to?  Please be specific about
which commits you think are out of order.

Also, what are you testing... are you using the linux-dm.git tree's
'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'?

If not please test that and see if you still have problems (without the
associated DM changes all the block preparation changes that Jens
cherry-picked shouldn't cause any problems at all).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Jan. 29, 2015, 11:44 p.m. UTC | #17
On Thu, 29 Jan 2015, Mike Snitzer wrote:
> On Thu, Jan 29 2015 at  5:43pm -0500,
>> One of my commits relocated the initialization, but I didn't realize it
>> had a hard dependency on the follow-on commit.  Should we reorder that
>> part of the series?
>
> Which follow on commit are you referring to?  Please be specific about
> which commits you think are out of order.
>
> Also, what are you testing... are you using the linux-dm.git tree's
> 'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'?
>
> If not please test that and see if you still have problems (without the
> associated DM changes all the block preparation changes that Jens
> cherry-picked shouldn't cause any problems at all).

I'm using Jens' linux-block for-3.20/core branch. The last dm commit
is this:

  commit febf71588c2a750e04dc2a8b0824ce120c48bd9e
  Author: Keith Busch <keith.busch@intel.com>
  Date:   Fri Oct 17 17:46:35 2014 -0600

     block: require blk_rq_prep_clone() be given an initialized clone request

Looking at this again, the above was incorrect in the first place: it
initialized the original, but the intent was to initialize the clone. This
slipped by me since the next part of the series fixed it. In your linux-dm
dm-for-3.20, it's this commit:

  commit 102e38b1030e883efc022dfdc7b7e7a3de70d1c5
  Author: Mike Snitzer <snitzer@redhat.com>
  Date:   Fri Dec 5 17:11:05 2014 -0500

      dm: split request structure out from dm_rq_target_io structure


So I was confused earlier, there's no need to reorder anything. I just
need to fix the broken part.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 30, 2015, 12:32 a.m. UTC | #18
On Thu, Jan 29 2015 at  6:44pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Thu, 29 Jan 2015, Mike Snitzer wrote:
> >On Thu, Jan 29 2015 at  5:43pm -0500,
> >>One of my commits relocated the initialization, but I didn't realize it
> >>had a hard dependency on the follow-on commit.  Should we reorder that
> >>part of the series?
> >
> >Which follow on commit are you referring to?  Please be specific about
> >which commits you think are out of order.
> >
> >Also, what are you testing... are you using the linux-dm.git tree's
> >'dm-for-3.20' branch which builds on Jens' 'for-3.20/core'?
> >
> >If not please test that and see if you still have problems (without the
> >associated DM changes all the block preparation changes that Jens
> >cherry-picked shouldn't cause any problems at all).
> 
> I'm using Jens' linux-block for-3.20/core branch. The last dm commit
> is this:
> 
>  commit febf71588c2a750e04dc2a8b0824ce120c48bd9e
>  Author: Keith Busch <keith.busch@intel.com>
>  Date:   Fri Oct 17 17:46:35 2014 -0600
> 
>     block: require blk_rq_prep_clone() be given an initialized clone request
> 
> Looking at this again, the above was incorrect in the first place: it
> initialized the original, but the intent was to initialize the clone. This
> slipped by me since the next part of the series fixed it. In your linux-dm
> dm-for-3.20, it's this commit:
> 
>  commit 102e38b1030e883efc022dfdc7b7e7a3de70d1c5
>  Author: Mike Snitzer <snitzer@redhat.com>
>  Date:   Fri Dec 5 17:11:05 2014 -0500
> 
>      dm: split request structure out from dm_rq_target_io structure
> 
> 
> So I was confused earlier, there's no need to reorder anything. I just
> need to fix the broken part.

Oof, yeah that first commit should be using: blk_rq_init(NULL, clone);

Jens, any chance you could rebase commit febf71588c2a with s/rq/clone/
in dm.c:clone_rq's call to blk_rq_init?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 7e78931..6201090 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2895,7 +2895,10 @@  EXPORT_SYMBOL_GPL(blk_rq_unprep_clone);
  static void __blk_rq_prep_clone(struct request *dst, struct request *src)
  {
  	dst->cpu = src->cpu;
-	dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
+	if (dst->q->mq_ops)
+		dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
+	else
+		dst->cmd_flags = (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE;
  	dst->cmd_type = src->cmd_type;
  	dst->__sector = blk_rq_pos(src);
  	dst->__data_len = blk_rq_bytes(src);