diff mbox series

[3/3] block: set REQ_PREFLUSH to the final bio from __blkdev_issue_zero_pages()

Message ID 20201206055332.3144-3-tom.ty89@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series [1/3] block: try one write zeroes request before going further | expand

Commit Message

Tom Yan Dec. 6, 2020, 5:53 a.m. UTC
Mimicking blkdev_issue_flush(). Seems like a right thing to do, as
they are a bunch of REQ_OP_WRITE.

Signed-off-by: Tom Yan <tom.ty89@gmail.com>
---
 block/blk-lib.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Hannes Reinecke Dec. 6, 2020, 11:31 a.m. UTC | #1
On 12/6/20 6:53 AM, Tom Yan wrote:
> Mimicking blkdev_issue_flush(). Seems like a right thing to do, as
> they are a bunch of REQ_OP_WRITE.
> 
> Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> ---
>   block/blk-lib.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 354dcab760c7..5579fdea893d 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -422,6 +422,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>   	} else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
>   		ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
>   						gfp_mask, &bio);
> +		if (bio)
> +			bio->bi_opf |= REQ_PREFLUSH;
>   	} else {
>   		/* No zeroing offload support */
>   		ret = -EOPNOTSUPP;
> 
PREFLUSH is for the 'flush' machinery (cf blk-flush.c). Which is okay 
for blkdev_issue_flush(), but certainly not for zeroout.

Cheers,

Hannes
Tom Yan Dec. 6, 2020, 1:32 p.m. UTC | #2
Why? Did you miss that it is in the condition where
__blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but
WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache
(that is on the device; not sure about dirty pages) flush, wouldn't it
be a right thing to do after we performed a series of WRITE (which is
more or less purposed to get a drive wiped clean).

On Sun, 6 Dec 2020 at 19:31, Hannes Reinecke <hare@suse.de> wrote:
>
> On 12/6/20 6:53 AM, Tom Yan wrote:
> > Mimicking blkdev_issue_flush(). Seems like a right thing to do, as
> > they are a bunch of REQ_OP_WRITE.
> >
> > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > ---
> >   block/blk-lib.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 354dcab760c7..5579fdea893d 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -422,6 +422,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> >       } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
> >               ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
> >                                               gfp_mask, &bio);
> > +             if (bio)
> > +                     bio->bi_opf |= REQ_PREFLUSH;
> >       } else {
> >               /* No zeroing offload support */
> >               ret = -EOPNOTSUPP;
> >
> PREFLUSH is for the 'flush' machinery (cf blk-flush.c). Which is okay
> for blkdev_issue_flush(), but certainly not for zeroout.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke Dec. 6, 2020, 2:05 p.m. UTC | #3
On 12/6/20 2:32 PM, Tom Yan wrote:
> Why? Did you miss that it is in the condition where
> __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but
> WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache
> (that is on the device; not sure about dirty pages) flush, wouldn't it
> be a right thing to do after we performed a series of WRITE (which is
> more or less purposed to get a drive wiped clean).
> 

But what makes 'zero_pages' special as compared to, say, WRITE_SAME?
One could use WRITE SAME with '0' content, arriving at pretty much the 
same content than usine zeroout without unmapping. And neither of them 
worries about cache flushing.
Nor should they, IMO.

These are 'native' block layer calls, providing abstract accesses to 
hardware functionality. If an application wants to use them, it would be 
the task of the application to insert a 'flush' if it deems neccessary.
(There _is_ blkdev_issue_flush(), after all).

Cheers,

Hannes
Tom Yan Dec. 6, 2020, 2:14 p.m. UTC | #4
On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote:
>
> On 12/6/20 2:32 PM, Tom Yan wrote:
> > Why? Did you miss that it is in the condition where
> > __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but
> > WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache
> > (that is on the device; not sure about dirty pages) flush, wouldn't it
> > be a right thing to do after we performed a series of WRITE (which is
> > more or less purposed to get a drive wiped clean).
> >
>
> But what makes 'zero_pages' special as compared to, say, WRITE_SAME?
> One could use WRITE SAME with '0' content, arriving at pretty much the
> same content than usine zeroout without unmapping. And neither of them
> worries about cache flushing.
> Nor should they, IMO.

Because we are writing actual pages (just that they are zero and
"shared memory" in the system) to the device, instead of triggering a
special command (with a specific parameter)?

>
> These are 'native' block layer calls, providing abstract accesses to
> hardware functionality. If an application wants to use them, it would be
> the task of the application to insert a 'flush' if it deems neccessary.
> (There _is_ blkdev_issue_flush(), after all).

Well my argument would be the call has the purpose of "wiping" so it
should try to "atomically" guarantee that the wiping is synced. It's
like a complement to REQ_SYNC in the final submit_bio_wait().

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke Dec. 6, 2020, 4:05 p.m. UTC | #5
On 12/6/20 3:14 PM, Tom Yan wrote:
> On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 12/6/20 2:32 PM, Tom Yan wrote:
>>> Why? Did you miss that it is in the condition where
>>> __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but
>>> WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache
>>> (that is on the device; not sure about dirty pages) flush, wouldn't it
>>> be a right thing to do after we performed a series of WRITE (which is
>>> more or less purposed to get a drive wiped clean).
>>>
>>
>> But what makes 'zero_pages' special as compared to, say, WRITE_SAME?
>> One could use WRITE SAME with '0' content, arriving at pretty much the
>> same content than usine zeroout without unmapping. And neither of them
>> worries about cache flushing.
>> Nor should they, IMO.
> 
> Because we are writing actual pages (just that they are zero and
> "shared memory" in the system) to the device, instead of triggering a
> special command (with a specific parameter)?
> 

But these pages are ephemeral, and never visible to the user.

>>
>> These are 'native' block layer calls, providing abstract accesses to
>> hardware functionality. If an application wants to use them, it would be
>> the task of the application to insert a 'flush' if it deems neccessary.
>> (There _is_ blkdev_issue_flush(), after all).
> 
> Well my argument would be the call has the purpose of "wiping" so it
> should try to "atomically" guarantee that the wiping is synced. It's
> like a complement to REQ_SYNC in the final submit_bio_wait().
> 
That's an assumption.

It would be valid if blkdev_issue_zeroout() would only allow to wipe the 
entire disk. As it stands, it doesn't, and so we shouldn't presume what 
users might want to do with it.

Cheers,

Hannes
Christoph Hellwig Dec. 7, 2020, 1:35 p.m. UTC | #6
On Sun, Dec 06, 2020 at 01:53:32PM +0800, Tom Yan wrote:
> Mimicking blkdev_issue_flush(). Seems like a right thing to do, as
> they are a bunch of REQ_OP_WRITE.

Huh?  Zeroing out is in no way a data integrity operation that needs a
reflush.
Tom Yan Dec. 8, 2020, 12:51 p.m. UTC | #7
On Mon, 7 Dec 2020 at 00:05, Hannes Reinecke <hare@suse.de> wrote:
>
> On 12/6/20 3:14 PM, Tom Yan wrote:
> > On Sun, 6 Dec 2020 at 22:05, Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> On 12/6/20 2:32 PM, Tom Yan wrote:
> >>> Why? Did you miss that it is in the condition where
> >>> __blkdev_issue_zero_pages() is called (i.e. it's not WRITE SAME but
> >>> WRITE). From what I gathered REQ_PREFLUSH triggers a write back cache
> >>> (that is on the device; not sure about dirty pages) flush, wouldn't it
> >>> be a right thing to do after we performed a series of WRITE (which is
> >>> more or less purposed to get a drive wiped clean).
> >>>
> >>
> >> But what makes 'zero_pages' special as compared to, say, WRITE_SAME?
> >> One could use WRITE SAME with '0' content, arriving at pretty much the
> >> same content than usine zeroout without unmapping. And neither of them
> >> worries about cache flushing.
> >> Nor should they, IMO.
> >
> > Because we are writing actual pages (just that they are zero and
> > "shared memory" in the system) to the device, instead of triggering a
> > special command (with a specific parameter)?
> >
>
> But these pages are ephemeral, and never visible to the user.

What do you mean by the "user"? What I meant was, since it's no
different than "normal" write operation, the zero pages will go to the
volatile write cache of the device.

>
> >>
> >> These are 'native' block layer calls, providing abstract accesses to
> >> hardware functionality. If an application wants to use them, it would be
> >> the task of the application to insert a 'flush' if it deems neccessary.
> >> (There _is_ blkdev_issue_flush(), after all).
> >
> > Well my argument would be the call has the purpose of "wiping" so it
> > should try to "atomically" guarantee that the wiping is synced. It's
> > like a complement to REQ_SYNC in the final submit_bio_wait().
> >
> That's an assumption.
>
> It would be valid if blkdev_issue_zeroout() would only allow to wipe the
> entire disk. As it stands, it doesn't, and so we shouldn't presume what
> users might want to do with it.

Whether it's an entire disk doesn't matter. It still stands when it's
only a certain range of blocks.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
diff mbox series

Patch

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 354dcab760c7..5579fdea893d 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -422,6 +422,8 @@  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	} else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
 		ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
 						gfp_mask, &bio);
+		if (bio)
+			bio->bi_opf |= REQ_PREFLUSH;
 	} else {
 		/* No zeroing offload support */
 		ret = -EOPNOTSUPP;