[23/27] drbd: make intelligent use of blkdev_issue_zeroout

Message ID	20170405172125.22600-24-hch@lst.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Christoph Hellwig <hch@lst.de> To: axboe@kernel.dk, martin.petersen@oracle.com, agk@redhat.com, snitzer@redhat.com, shli@kernel.org, philipp.reisner@linbit.com, lars.ellenberg@linbit.com Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, drbd-dev@lists.linbit.com, dm-devel@redhat.com, linux-raid@vger.kernel.org Subject: [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout Date: Wed, 5 Apr 2017 19:21:21 +0200 Message-Id: <20170405172125.22600-24-hch@lst.de> In-Reply-To: <20170405172125.22600-1-hch@lst.de> References: <20170405172125.22600-1-hch@lst.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Message ID

20170405172125.22600-24-hch@lst.de (mailing list archive)

State

New, archived

Headers

From: Christoph Hellwig <hch@lst.de>
To: axboe@kernel.dk, martin.petersen@oracle.com, agk@redhat.com,
	snitzer@redhat.com, shli@kernel.org, philipp.reisner@linbit.com,
	lars.ellenberg@linbit.com
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	drbd-dev@lists.linbit.com, dm-devel@redhat.com,
	linux-raid@vger.kernel.org
Subject: [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout
Date: Wed,  5 Apr 2017 19:21:21 +0200
Message-Id: <20170405172125.22600-24-hch@lst.de>
In-Reply-To: <20170405172125.22600-1-hch@lst.de>
References: <20170405172125.22600-1-hch@lst.de>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Commit Message

Christoph Hellwig April 5, 2017, 5:21 p.m. UTC

drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/block/drbd/drbd_debugfs.c  |   3 --
 drivers/block/drbd/drbd_int.h      |   6 ---
 drivers/block/drbd/drbd_receiver.c | 102 ++-----------------------------------
 drivers/block/drbd/drbd_req.c      |   6 +--
 4 files changed, 7 insertions(+), 110 deletions(-)

Comments

Eric Wheeler Jan. 13, 2018, 12:46 a.m. UTC | #1

Hello All,

We just noticed that discards to DRBD devices backed by dm-thin devices 
are fully allocating the thin blocks.

This behavior does not exist before 
ee472d83 block: add a flags argument to (__)blkdev_issue_zeroout

The problem exists somewhere between
[working] c20cfc27 block: stop using blkdev_issue_write_same for zeroing
  and
[broken]  45c21793 drbd: implement REQ_OP_WRITE_ZEROES

Note that c20cfc27 works as expected, but 45c21793 discards blocks 
being zeroed on the dm-thin backing device. All commits between those two 
produce the following error:

blkdiscard: /dev/drbd/by-res/test: BLKDISCARD ioctl failed: Input/output error

Also note that issuing a blkdiscard to the backing device directly 
discards as you would expect. This is just a problem when sending discards 
through DRBD.

Is there an easy way to solve this in the short term, even if the ultimate 
fix is more involved?

Thank you for your help!

-Eric

--
Eric Wheeler

On Wed, 5 Apr 2017, Christoph Hellwig wrote:

> drbd always wants its discard wire operations to zero the blocks, so
> use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
> reinventing it poorly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/block/drbd/drbd_debugfs.c  |   3 --
>  drivers/block/drbd/drbd_int.h      |   6 ---
>  drivers/block/drbd/drbd_receiver.c | 102 ++-----------------------------------
>  drivers/block/drbd/drbd_req.c      |   6 +--
>  4 files changed, 7 insertions(+), 110 deletions(-)
> 
> diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
> index de5c3ee8a790..494837e59f23 100644
> --- a/drivers/block/drbd/drbd_debugfs.c
> +++ b/drivers/block/drbd/drbd_debugfs.c
> @@ -236,9 +236,6 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re
>  	seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL");
>  	seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C");
>  	seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync");
> -
> -	if (f & EE_IS_TRIM)
> -		__seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim");
>  	seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same");
>  	seq_putc(m, '\n');
>  }
> diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
> index 724d1c50fc52..d5da45bb03a6 100644
> --- a/drivers/block/drbd/drbd_int.h
> +++ b/drivers/block/drbd/drbd_int.h
> @@ -437,9 +437,6 @@ enum {
>  
>  	/* is this a TRIM aka REQ_DISCARD? */
>  	__EE_IS_TRIM,
> -	/* our lower level cannot handle trim,
> -	 * and we want to fall back to zeroout instead */
> -	__EE_IS_TRIM_USE_ZEROOUT,
>  
>  	/* In case a barrier failed,
>  	 * we need to resubmit without the barrier flag. */
> @@ -482,7 +479,6 @@ enum {
>  #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO)
>  #define EE_MAY_SET_IN_SYNC     (1<<__EE_MAY_SET_IN_SYNC)
>  #define EE_IS_TRIM             (1<<__EE_IS_TRIM)
> -#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT)
>  #define EE_RESUBMITTED         (1<<__EE_RESUBMITTED)
>  #define EE_WAS_ERROR           (1<<__EE_WAS_ERROR)
>  #define EE_HAS_DIGEST          (1<<__EE_HAS_DIGEST)
> @@ -1561,8 +1557,6 @@ extern void start_resync_timer_fn(unsigned long data);
>  extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
>  
>  /* drbd_receiver.c */
> -extern int drbd_issue_discard_or_zero_out(struct drbd_device *device,
> -		sector_t start, unsigned int nr_sectors, bool discard);
>  extern int drbd_receiver(struct drbd_thread *thi);
>  extern int drbd_ack_receiver(struct drbd_thread *thi);
>  extern void drbd_send_ping_wf(struct work_struct *ws);
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index dc9a6dcd431c..bc1d296581f9 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1448,108 +1448,14 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
>  		drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
>  }
>  
> -/*
> - * We *may* ignore the discard-zeroes-data setting, if so configured.
> - *
> - * Assumption is that it "discard_zeroes_data=0" is only because the backend
> - * may ignore partial unaligned discards.
> - *
> - * LVM/DM thin as of at least
> - *   LVM version:     2.02.115(2)-RHEL7 (2015-01-28)
> - *   Library version: 1.02.93-RHEL7 (2015-01-28)
> - *   Driver version:  4.29.0
> - * still behaves this way.
> - *
> - * For unaligned (wrt. alignment and granularity) or too small discards,
> - * we zero-out the initial (and/or) trailing unaligned partial chunks,
> - * but discard all the aligned full chunks.
> - *
> - * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
> - */
> -int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)
> -{
> -	struct block_device *bdev = device->ldev->backing_bdev;
> -	struct request_queue *q = bdev_get_queue(bdev);
> -	sector_t tmp, nr;
> -	unsigned int max_discard_sectors, granularity;
> -	int alignment;
> -	int err = 0;
> -
> -	if (!discard)
> -		goto zero_out;
> -
> -	/* Zero-sector (unknown) and one-sector granularities are the same.  */
> -	granularity = max(q->limits.discard_granularity >> 9, 1U);
> -	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
> -
> -	max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22));
> -	max_discard_sectors -= max_discard_sectors % granularity;
> -	if (unlikely(!max_discard_sectors))
> -		goto zero_out;
> -
> -	if (nr_sectors < granularity)
> -		goto zero_out;
> -
> -	tmp = start;
> -	if (sector_div(tmp, granularity) != alignment) {
> -		if (nr_sectors < 2*granularity)
> -			goto zero_out;
> -		/* start + gran - (start + gran - align) % gran */
> -		tmp = start + granularity - alignment;
> -		tmp = start + granularity - sector_div(tmp, granularity);
> -
> -		nr = tmp - start;
> -		err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -		nr_sectors -= nr;
> -		start = tmp;
> -	}
> -	while (nr_sectors >= granularity) {
> -		nr = min_t(sector_t, nr_sectors, max_discard_sectors);
> -		err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -		nr_sectors -= nr;
> -		start += nr;
> -	}
> - zero_out:
> -	if (nr_sectors) {
> -		err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -	}
> -	return err != 0;
> -}
> -
> -static bool can_do_reliable_discards(struct drbd_device *device)
> -{
> -	struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev);
> -	struct disk_conf *dc;
> -	bool can_do;
> -
> -	if (!blk_queue_discard(q))
> -		return false;
> -
> -	if (q->limits.discard_zeroes_data)
> -		return true;
> -
> -	rcu_read_lock();
> -	dc = rcu_dereference(device->ldev->disk_conf);
> -	can_do = dc->discard_zeroes_if_aligned;
> -	rcu_read_unlock();
> -	return can_do;
> -}
> -
>  static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
>  {
> -	/* If the backend cannot discard, or does not guarantee
> -	 * read-back zeroes in discarded ranges, we fall back to
> -	 * zero-out.  Unless configuration specifically requested
> -	 * otherwise. */
> -	if (!can_do_reliable_discards(device))
> -		peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT;
> +	struct block_device *bdev = device->ldev->backing_bdev;
>  
> -	if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector,
> -	    peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT)))
> +	if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
> +			GFP_NOIO, 0))
>  		peer_req->flags |= EE_WAS_ERROR;
> +
>  	drbd_endio_write_sec_final(peer_req);
>  }
>  
> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index 652114ae1a8a..6da9ea8c48b6 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1148,10 +1148,10 @@ static int drbd_process_write_request(struct drbd_request *req)
>  
>  static void drbd_process_discard_req(struct drbd_request *req)
>  {
> -	int err = drbd_issue_discard_or_zero_out(req->device,
> -				req->i.sector, req->i.size >> 9, true);
> +	struct block_device *bdev = req->device->ldev->backing_bdev;
>  
> -	if (err)
> +	if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
> +			GFP_NOIO, 0))
>  		req->private_bio->bi_error = -EIO;
>  	bio_endio(req->private_bio);
>  }
> -- 
> 2.11.0
> 
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
>

Lars Ellenberg Jan. 15, 2018, 12:46 p.m. UTC | #2

On Sat, Jan 13, 2018 at 12:46:40AM +0000, Eric Wheeler wrote:
> Hello All,
> 
> We just noticed that discards to DRBD devices backed by dm-thin devices 
> are fully allocating the thin blocks.
> 
> This behavior does not exist before 
> ee472d83 block: add a flags argument to (__)blkdev_issue_zeroout
> 
> The problem exists somewhere between
> [working] c20cfc27 block: stop using blkdev_issue_write_same for zeroing
>   and
> [broken]  45c21793 drbd: implement REQ_OP_WRITE_ZEROES
> 
> Note that c20cfc27 works as expected, but 45c21793 discards blocks 
> being zeroed on the dm-thin backing device. All commits between those two 
> produce the following error:
> 
> blkdiscard: /dev/drbd/by-res/test: BLKDISCARD ioctl failed: Input/output error
> 
> Also note that issuing a blkdiscard to the backing device directly 
> discards as you would expect. This is just a problem when sending discards 
> through DRBD.
> 
> Is there an easy way to solve this in the short term, even if the ultimate 
> fix is more involved?

> On Wed, 5 Apr 2017, Christoph Hellwig wrote:
> 

commit 0dbed96a3cc9786bc4814dab98a7218753bde934
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Apr 5 19:21:21 2017 +0200

    drbd: make intelligent use of blkdev_issue_zeroout

> > drbd always wants its discard wire operations to zero the blocks, so
> > use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
> > reinventing it poorly.

> > -/*
> > - * We *may* ignore the discard-zeroes-data setting, if so configured.
> > - *
> > - * Assumption is that it "discard_zeroes_data=0" is only because the backend
> > - * may ignore partial unaligned discards.
> > - *
> > - * LVM/DM thin as of at least
> > - *   LVM version:     2.02.115(2)-RHEL7 (2015-01-28)
> > - *   Library version: 1.02.93-RHEL7 (2015-01-28)
> > - *   Driver version:  4.29.0
> > - * still behaves this way.
> > - *
> > - * For unaligned (wrt. alignment and granularity) or too small discards,
> > - * we zero-out the initial (and/or) trailing unaligned partial chunks,
> > - * but discard all the aligned full chunks.
> > - *
> > - * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
> > - */
> > -int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)


As I understood it,
blkdev_issue_zeroout() was supposed to "always try to unmap",
deprovision, the relevant region, and zero-out any unaligned
head or tail, just like my work around above was doing.

And that device mapper thin was "about to" learn this, "soon",
or maybe block core would do the equivalent of my workaround
described above.

But it then did not.

See also:
https://www.redhat.com/archives/dm-devel/2017-March/msg00213.html
https://www.redhat.com/archives/dm-devel/2017-March/msg00226.html

I then did not follow this closely enough anymore,
and I missed that with recent enough kernel,
discard on DRBD on dm-thin would fully allocate.

In our out-of-tree module, we had to keep the older code for
compat reasons, anyways. I will just re-enable our zeroout
workaround there again.

In tree, either dm-thin learns to do REQ_OP_WRITE_ZEROES "properly",
so the result in this scenario is what we expect:

  _: unprovisioned, not allocated, returns zero on read anyways
  *: provisioned, some arbitrary data
  0: explicitly zeroed:

  |gran|ular|ity |    |    |    |
  |****|****|____|****|
     to|-be-|zero|ed
  |**00|____|____|00**|

(leave unallocated blocks alone,
 de-allocate full blocks just like with discard,
 explicitly zero unaligned head and tail)

Or DRBD will have to resurrect that reinvented zeroout again,
with exactly those semantics. I did reinvent it for a reason ;)

Mike Snitzer Jan. 15, 2018, 3:07 p.m. UTC | #3

On Mon, Jan 15 2018 at  7:46am -0500,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> As I understood it,
> blkdev_issue_zeroout() was supposed to "always try to unmap",
> deprovision, the relevant region, and zero-out any unaligned
> head or tail, just like my work around above was doing.
> 
> And that device mapper thin was "about to" learn this, "soon",
> or maybe block core would do the equivalent of my workaround
> described above.
> 
> But it then did not.
> 
> See also:
> https://www.redhat.com/archives/dm-devel/2017-March/msg00213.html
> https://www.redhat.com/archives/dm-devel/2017-March/msg00226.html

Right, now that you mention it it is starting to ring a bell (especially
after I read your 2nd dm-devel archive url above).

> I then did not follow this closely enough anymore,
> and I missed that with recent enough kernel,
> discard on DRBD on dm-thin would fully allocate.
> 
> In our out-of-tree module, we had to keep the older code for
> compat reasons, anyways. I will just re-enable our zeroout
> workaround there again.
> 
> In tree, either dm-thin learns to do REQ_OP_WRITE_ZEROES "properly",
> so the result in this scenario is what we expect:
> 
>   _: unprovisioned, not allocated, returns zero on read anyways
>   *: provisioned, some arbitrary data
>   0: explicitly zeroed:
> 
>   |gran|ular|ity |    |    |    |
>   |****|****|____|****|
>      to|-be-|zero|ed
>   |**00|____|____|00**|
> 
> (leave unallocated blocks alone,
>  de-allocate full blocks just like with discard,
>  explicitly zero unaligned head and tail)

"de-allocate full blocks just like with discard" is an interesting take
what it means for dm-thin to handle REQ_OP_WRITE_ZEROES "properly".

> Or DRBD will have to resurrect that reinvented zeroout again,
> with exactly those semantics. I did reinvent it for a reason ;)

Yeah, I now recall dropping that line of development because it
became "hard" (or at least harder than originally thought).

Don't people use REQ_OP_WRITE_ZEROES to initialize a portion of the
disk?  E.g. zeroing superblocks, metadata areas, or whatever?

If we just discarded the logical extent and then a user did a partial
write to the block, areas that a user might expect to be zeroed wouldn't
be (at least in the case of dm-thinp if "skip_block_zeroing" is
enabled).  And yes if discard passdown is enabled and the device's
discard implementation does "discard_zeroes_data" then it'd be
fine.. but there are a lot of things that need to line up for drbd's
REQ_OP_WRITE_ZEROES to "just work" (as it expects).

(now I'm just echoing the kinds of concerns I had in that 2nd dm-devel
post above).

This post from mkp is interesting:
https://www.redhat.com/archives/dm-devel/2017-March/msg00228.html

Specifically:
"You don't have a way to mark those blocks as being full of zeroes
without actually writing them?

Note that the fallback to a zeroout command is to do a regular write. So
if DM doesn't zero the blocks, the block layer is going to it."

No, dm-thinp doesn't have an easy way to mark an allocated block as
containing zeroes (without actually zeroing).  I toyed with adding that
but then realized that even if we had it it'd still require block
zeroing be enabled.  But block zeroing is done at allocation time.  So
we'd need to interpret the "this block is zeroes" flag to mean "on first
write or read to this block it needs to first zero it".  Fugly to say
the least...

I've been quite busy with other things but I can revisit all this with
Joe Thornber and see what we come up with after a 2nd discussion.

But sadly, in general, this is a low priority for me, so you might do
well to reintroduce your drbd workaround.. sorry about that :(

Mike

Lars Ellenberg Jan. 16, 2018, 8:55 a.m. UTC | #4

On Mon, Jan 15, 2018 at 10:07:38AM -0500, Mike Snitzer wrote:
> > See also:
> > https://www.redhat.com/archives/dm-devel/2017-March/msg00213.html
> > https://www.redhat.com/archives/dm-devel/2017-March/msg00226.html
> 
> Right, now that you mention it it is starting to ring a bell (especially
> after I read your 2nd dm-devel archive url above).

> > In tree, either dm-thin learns to do REQ_OP_WRITE_ZEROES "properly",
> > so the result in this scenario is what we expect:
> > 
> >   _: unprovisioned, not allocated, returns zero on read anyways
> >   *: provisioned, some arbitrary data
> >   0: explicitly zeroed:
> > 
> >   |gran|ular|ity |    |    |    |
> >   |****|****|____|****|
> >      to|-be-|zero|ed
> >   |**00|____|____|00**|
> > 
> > (leave unallocated blocks alone,
> >  de-allocate full blocks just like with discard,
> >  explicitly zero unaligned head and tail)
> 
> "de-allocate full blocks just like with discard" is an interesting take
> what it means for dm-thin to handle REQ_OP_WRITE_ZEROES "properly".
> 
> > Or DRBD will have to resurrect that reinvented zeroout again,
> > with exactly those semantics. I did reinvent it for a reason ;)
> 
> Yeah, I now recall dropping that line of development because it
> became "hard" (or at least harder than originally thought).
> 
> Don't people use REQ_OP_WRITE_ZEROES to initialize a portion of the
> disk?  E.g. zeroing superblocks, metadata areas, or whatever?
> 
> If we just discarded the logical extent and then a user did a partial
> write to the block, areas that a user might expect to be zeroed wouldn't
> be (at least in the case of dm-thinp if "skip_block_zeroing" is
> enabled).


Oh-kay.
So "immediately after" such an operation
("zero-out head and tail and de-alloc full blocks")
a read to that area would return all zeros, as expected.

But once you do a partial write of something to one of those
de-allocated blocks (and skip_block_zeroing is enabled,
which it likely is due to "performance"),
"magically" arbitrary old garbage data springs into existence
on the LBAs that just before read as zeros.

lvmthin lvm.conf
Would that not break a lot of other things
(any read-modify-write of "upper layers")?
Would that not even be a serious "information leak"
(old garbage of other completely unrelated LVs leaking into this one)?

But thank you for that, I start to see the problem ;-)

> No, dm-thinp doesn't have an easy way to mark an allocated block as
> containing zeroes (without actually zeroing).  I toyed with adding that
> but then realized that even if we had it it'd still require block
> zeroing be enabled.  But block zeroing is done at allocation time.  So
> we'd need to interpret the "this block is zeroes" flag to mean "on first
> write or read to this block it needs to first zero it".  Fugly to say
> the least...


Maybe have a "known zeroed block" pool, allocate only from there,
and "lazy zero" unallocated blocks, add to the known-zero pool?
Fallback to zero-on-alloc if that known-zero-pool is depleted.

Easier said than done, I know.

> But sadly, in general, this is a low priority for me, so you might do
> well to reintroduce your drbd workaround.. sorry about that :(

No problem.
I'll put that back in, and document that we strongly recommend to
NOT skip_block_zeroing in those setups.

Thanks,

    Lars

diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index de5c3ee8a790..494837e59f23 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -236,9 +236,6 @@  static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re
 	seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL");
 	seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C");
 	seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync");
-
-	if (f & EE_IS_TRIM)
-		__seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim");
 	seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same");
 	seq_putc(m, '\n');
 }
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 724d1c50fc52..d5da45bb03a6 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -437,9 +437,6 @@  enum {
 
 	/* is this a TRIM aka REQ_DISCARD? */
 	__EE_IS_TRIM,
-	/* our lower level cannot handle trim,
-	 * and we want to fall back to zeroout instead */
-	__EE_IS_TRIM_USE_ZEROOUT,
 
 	/* In case a barrier failed,
 	 * we need to resubmit without the barrier flag. */
@@ -482,7 +479,6 @@  enum {
 #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO)
 #define EE_MAY_SET_IN_SYNC     (1<<__EE_MAY_SET_IN_SYNC)
 #define EE_IS_TRIM             (1<<__EE_IS_TRIM)
-#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT)
 #define EE_RESUBMITTED         (1<<__EE_RESUBMITTED)
 #define EE_WAS_ERROR           (1<<__EE_WAS_ERROR)
 #define EE_HAS_DIGEST          (1<<__EE_HAS_DIGEST)
@@ -1561,8 +1557,6 @@  extern void start_resync_timer_fn(unsigned long data);
 extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
 
 /* drbd_receiver.c */
-extern int drbd_issue_discard_or_zero_out(struct drbd_device *device,
-		sector_t start, unsigned int nr_sectors, bool discard);
 extern int drbd_receiver(struct drbd_thread *thi);
 extern int drbd_ack_receiver(struct drbd_thread *thi);
 extern void drbd_send_ping_wf(struct work_struct *ws);
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index dc9a6dcd431c..bc1d296581f9 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1448,108 +1448,14 @@  void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
 		drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
 }
 
-/*
- * We *may* ignore the discard-zeroes-data setting, if so configured.
- *
- * Assumption is that it "discard_zeroes_data=0" is only because the backend
- * may ignore partial unaligned discards.
- *
- * LVM/DM thin as of at least
- *   LVM version:     2.02.115(2)-RHEL7 (2015-01-28)
- *   Library version: 1.02.93-RHEL7 (2015-01-28)
- *   Driver version:  4.29.0
- * still behaves this way.
- *
- * For unaligned (wrt. alignment and granularity) or too small discards,
- * we zero-out the initial (and/or) trailing unaligned partial chunks,
- * but discard all the aligned full chunks.
- *
- * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
- */
-int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)
-{
-	struct block_device *bdev = device->ldev->backing_bdev;
-	struct request_queue *q = bdev_get_queue(bdev);
-	sector_t tmp, nr;
-	unsigned int max_discard_sectors, granularity;
-	int alignment;
-	int err = 0;
-
-	if (!discard)
-		goto zero_out;
-
-	/* Zero-sector (unknown) and one-sector granularities are the same.  */
-	granularity = max(q->limits.discard_granularity >> 9, 1U);
-	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-	max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22));
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors))
-		goto zero_out;
-
-	if (nr_sectors < granularity)
-		goto zero_out;
-
-	tmp = start;
-	if (sector_div(tmp, granularity) != alignment) {
-		if (nr_sectors < 2*granularity)
-			goto zero_out;
-		/* start + gran - (start + gran - align) % gran */
-		tmp = start + granularity - alignment;
-		tmp = start + granularity - sector_div(tmp, granularity);
-
-		nr = tmp - start;
-		err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO,
-				BLKDEV_ZERO_NOUNMAP);
-		nr_sectors -= nr;
-		start = tmp;
-	}
-	while (nr_sectors >= granularity) {
-		nr = min_t(sector_t, nr_sectors, max_discard_sectors);
-		err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO,
-				BLKDEV_ZERO_NOUNMAP);
-		nr_sectors -= nr;
-		start += nr;
-	}
- zero_out:
-	if (nr_sectors) {
-		err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO,
-				BLKDEV_ZERO_NOUNMAP);
-	}
-	return err != 0;
-}
-
-static bool can_do_reliable_discards(struct drbd_device *device)
-{
-	struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev);
-	struct disk_conf *dc;
-	bool can_do;
-
-	if (!blk_queue_discard(q))
-		return false;
-
-	if (q->limits.discard_zeroes_data)
-		return true;
-
-	rcu_read_lock();
-	dc = rcu_dereference(device->ldev->disk_conf);
-	can_do = dc->discard_zeroes_if_aligned;
-	rcu_read_unlock();
-	return can_do;
-}
-
 static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
 {
-	/* If the backend cannot discard, or does not guarantee
-	 * read-back zeroes in discarded ranges, we fall back to
-	 * zero-out.  Unless configuration specifically requested
-	 * otherwise. */
-	if (!can_do_reliable_discards(device))
-		peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT;
+	struct block_device *bdev = device->ldev->backing_bdev;
 
-	if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector,
-	    peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT)))
+	if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
+			GFP_NOIO, 0))
 		peer_req->flags |= EE_WAS_ERROR;
+
 	drbd_endio_write_sec_final(peer_req);
 }
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 652114ae1a8a..6da9ea8c48b6 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1148,10 +1148,10 @@  static int drbd_process_write_request(struct drbd_request *req)
 
 static void drbd_process_discard_req(struct drbd_request *req)
 {
-	int err = drbd_issue_discard_or_zero_out(req->device,
-				req->i.sector, req->i.size >> 9, true);
+	struct block_device *bdev = req->device->ldev->backing_bdev;
 
-	if (err)
+	if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
+			GFP_NOIO, 0))
 		req->private_bio->bi_error = -EIO;
 	bio_endio(req->private_bio);
 }

[23/27] drbd: make intelligent use of blkdev_issue_zeroout

Commit Message

Comments

Patch