diff mbox series

[bug,report] BUG for REQ_OP_WRITE_ZEROES to dm-zoned

Message ID 20220414083436.pweunapygdtuzwaf@shindev (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series [bug,report] BUG for REQ_OP_WRITE_ZEROES to dm-zoned | expand

Commit Message

Shinichiro Kawasaki April 14, 2022, 8:34 a.m. UTC
Hello Mike,

Let me share a BUG I observed with v5.18-rcX and ask comments for the fix.

BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO)) in dm_accept_partial_bio()
was triggered for dm-zoned. It happens when a bio with REQ_OP_WRITE_ZEROES and
sector range which goes across zone boundaries of the zoned devices that
dm-zoned maps. For such bios, dm-zoned calls dm_accept_partial_bio() to trim the
bio to fit in a zone. And dm core sets the flag DM_TIO_IS_DUPLICATE_BIO to the
tio of the bio.

    The BUG_ON symptom can be recreated with command as follows:

    # xfs_io -C "fzero 4096 $((512 * $(</sys/block/sdf/queue/chunk_sectors)))" /dev/dm-0

    In this command, /dev/dm-0 is the dm-zoned device. /dev/sdf is the zoned
    block device. Its zone size is obtained from sysfs chunk_sectors attribute.

The trigger commit is e6fc9f62ce6e ("dm: flag clones created by
__send_duplicate_bios") which introduced the new flag (it was named
is_duplicated_bio, and following commit renamed it to DM_TIO_IS_DUPLICATE_BIO).
I understand that the flag is set to the bios cloned in __send_duplicate_bios()
to guard tio->len_ptr shared among the cloned bios from updates in
dm_accept_partial_bio().

One point I can not understand is that the flag is set even when
__send_duplicate_bios() clones only single bio. I think bio is not duplicated in
this case, and there is no need to guard tio->len_ptr. Dm-zoned sets 1 to
ti->num_write_zeroes_bios (and ti->num_discard_bios), then I think
__send_duplicate_bios() always clones single bio for dm-zoned. I tried
following patch below, which removes the flag set for the single bio clone case.


With this patch, the BUG is no longer triggered. Is this a right fix approach?
It looks for me the DM_TIO_IS_DUPLICATE_BIO check is too tight and I think we
can relax it for the single clone case.

If I miss anything and the len_ptr guard by DM_TIO_IS_DUPLICATE_BIO is required
even for the single bio clone case, I will think about dm-zoned change to avoid
dm_accept_partial_bio() call, which will need bio split within dm-zoned.

Comments

Mike Snitzer April 14, 2022, 4:29 p.m. UTC | #1
On Thu, Apr 14 2022 at  4:34P -0400,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote:

> Hello Mike,
> 
> Let me share a BUG I observed with v5.18-rcX and ask comments for the fix.
> 
> BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO)) in dm_accept_partial_bio()
> was triggered for dm-zoned. It happens when a bio with REQ_OP_WRITE_ZEROES and
> sector range which goes across zone boundaries of the zoned devices that
> dm-zoned maps. For such bios, dm-zoned calls dm_accept_partial_bio() to trim the
> bio to fit in a zone. And dm core sets the flag DM_TIO_IS_DUPLICATE_BIO to the
> tio of the bio.
> 
>     The BUG_ON symptom can be recreated with command as follows:
> 
>     # xfs_io -C "fzero 4096 $((512 * $(</sys/block/sdf/queue/chunk_sectors)))" /dev/dm-0
> 
>     In this command, /dev/dm-0 is the dm-zoned device. /dev/sdf is the zoned
>     block device. Its zone size is obtained from sysfs chunk_sectors attribute.
> 
> The trigger commit is e6fc9f62ce6e ("dm: flag clones created by
> __send_duplicate_bios") which introduced the new flag (it was named
> is_duplicated_bio, and following commit renamed it to DM_TIO_IS_DUPLICATE_BIO).
> I understand that the flag is set to the bios cloned in __send_duplicate_bios()
> to guard tio->len_ptr shared among the cloned bios from updates in
> dm_accept_partial_bio().
> 
> One point I can not understand is that the flag is set even when
> __send_duplicate_bios() clones only single bio. I think bio is not duplicated in
> this case, and there is no need to guard tio->len_ptr. Dm-zoned sets 1 to
> ti->num_write_zeroes_bios (and ti->num_discard_bios), then I think
> __send_duplicate_bios() always clones single bio for dm-zoned. I tried
> following patch below, which removes the flag set for the single bio clone case.
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index f2397546b93f..d886c57e49ed 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1363,7 +1363,6 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
>                 break;
>         case 1:
>                 clone = alloc_tio(ci, ti, 0, len, GFP_NOIO);
> -               dm_tio_set_flag(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO);
>                 __map_bio(clone);
>                 break;
>         default:
> 
> With this patch, the BUG is no longer triggered. Is this a right fix approach?
> It looks for me the DM_TIO_IS_DUPLICATE_BIO check is too tight and I think we
> can relax it for the single clone case.
> 
> If I miss anything and the len_ptr guard by DM_TIO_IS_DUPLICATE_BIO is required
> even for the single bio clone case, I will think about dm-zoned change to avoid
> dm_accept_partial_bio() call, which will need bio split within dm-zoned.

Thanks for the report, I've staged a fix here (btw, your change above
needs to be paired with the 2nd hunk of my fix otherwise you won't get
the bio split you desire):

https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=3dfb6f5e95f26215ca08d348ca2ddb5ea6ea2349

I'll be sending this to Linus later today or tomorrow.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer April 14, 2022, 4:47 p.m. UTC | #2
On Thu, Apr 14 2022 at 12:29P -0400,
Mike Snitzer <snitzer@kernel.org> wrote:
 
> Thanks for the report, I've staged a fix here (btw, your change above
> needs to be paired with the 2nd hunk of my fix otherwise you won't get
> the bio split you desire):
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=3dfb6f5e95f26215ca08d348ca2ddb5ea6ea2349
> 
> I'll be sending this to Linus later today or tomorrow.

FYI, I revised that commit with further cleanup to not pass
'unsigned *len' to alloc_multiple_bios(), this commit is what will be
sent upstream soon:

https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=c2228f993c7592783b0a2bf7d169b17dfa4cbe2a

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Damien Le Moal April 14, 2022, 11:57 p.m. UTC | #3
On 4/15/22 01:47, Mike Snitzer wrote:
> On Thu, Apr 14 2022 at 12:29P -0400,
> Mike Snitzer <snitzer@kernel.org> wrote:
>  
>> Thanks for the report, I've staged a fix here (btw, your change above
>> needs to be paired with the 2nd hunk of my fix otherwise you won't get
>> the bio split you desire):
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=3dfb6f5e95f26215ca08d348ca2ddb5ea6ea2349
>>
>> I'll be sending this to Linus later today or tomorrow.
> 
> FYI, I revised that commit with further cleanup to not pass
> 'unsigned *len' to alloc_multiple_bios(), this commit is what will be
> sent upstream soon:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=c2228f993c7592783b0a2bf7d169b17dfa4cbe2a

Looks good to me.

Nit: there is a typo in the commit message:

dm_accept_paertial_bio() -> dm_accept_partial_bio()

Feel free to add:

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
>
Shinichiro Kawasaki April 15, 2022, 3:43 a.m. UTC | #4
On Apr 15, 2022 / 08:57, Damien Le Moal wrote:
> On 4/15/22 01:47, Mike Snitzer wrote:
> > On Thu, Apr 14 2022 at 12:29P -0400,
> > Mike Snitzer <snitzer@kernel.org> wrote:
> >  
> >> Thanks for the report, I've staged a fix here (btw, your change above
> >> needs to be paired with the 2nd hunk of my fix otherwise you won't get
> >> the bio split you desire):
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=3dfb6f5e95f26215ca08d348ca2ddb5ea6ea2349

Thank you Mike. Yes, I confirmed that the 2nd hunk is required for split.

> >>
> >> I'll be sending this to Linus later today or tomorrow.
> > 
> > FYI, I revised that commit with further cleanup to not pass
> > 'unsigned *len' to alloc_multiple_bios(), this commit is what will be
> > sent upstream soon:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.18&id=c2228f993c7592783b0a2bf7d169b17dfa4cbe2a
> 
> Looks good to me.
> 
> Nit: there is a typo in the commit message:
> 
> dm_accept_paertial_bio() -> dm_accept_partial_bio()
> 
> Feel free to add:
> 
> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

Another nit in the commit message is "abormal". I think you meant "abnormal".

I tested this revised patch and confrimed it fixes the issue. Good. Thanks.

Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
diff mbox series

Patch

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f2397546b93f..d886c57e49ed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1363,7 +1363,6 @@  static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
                break;
        case 1:
                clone = alloc_tio(ci, ti, 0, len, GFP_NOIO);
-               dm_tio_set_flag(clone_to_tio(clone), DM_TIO_IS_DUPLICATE_BIO);
                __map_bio(clone);
                break;
        default: