diff mbox series

[v8,5/5] block: Pass unshare intent via REQ_OP_PROVISION

Message ID 20231007012817.3052558-6-sarthakkukreti@chromium.org (mailing list archive)
State New, archived
Headers show
Series Introduce provisioning primitives | expand

Commit Message

Sarthak Kukreti Oct. 7, 2023, 1:28 a.m. UTC
Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
annotate unshare requests to underlying layers. Layers that support
FALLOC_FL_UNSHARE will be able to use this as an indicator of which
fallocate() mode to use.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 block/blk-lib.c           |  6 +++++-
 block/fops.c              |  6 ++++--
 drivers/block/loop.c      | 35 +++++++++++++++++++++++++++++------
 include/linux/blk_types.h |  3 +++
 include/linux/blkdev.h    |  3 ++-
 5 files changed, 43 insertions(+), 10 deletions(-)

Comments

Dave Chinner Oct. 8, 2023, 11:27 p.m. UTC | #1
On Fri, Oct 06, 2023 at 06:28:17PM -0700, Sarthak Kukreti wrote:
> Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
> annotate unshare requests to underlying layers. Layers that support
> FALLOC_FL_UNSHARE will be able to use this as an indicator of which
> fallocate() mode to use.
> 
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  block/blk-lib.c           |  6 +++++-
>  block/fops.c              |  6 ++++--
>  drivers/block/loop.c      | 35 +++++++++++++++++++++++++++++------
>  include/linux/blk_types.h |  3 +++
>  include/linux/blkdev.h    |  3 ++-
>  5 files changed, 43 insertions(+), 10 deletions(-)

I have no idea how filesystems (or even userspace applications, for
that matter) are supposed to use this - they have no idea if the
underlying block device has shared blocks for LBA ranges it already
has allocated and provisioned. IOWs, I don't know waht the semantics
of this function is, it is not documented anywhere, and there is no
use case present that tells me how it might get used.

Yes, unshare at the file level means the filesystem tries to break
internal data extent sharing, but if the block layers or backing
devices are doing deduplication and sharing unknown to the
application or filesystem, how do they ever know that this operation
might need to be performed? In what cases do we need to be able to
unshare block device ranges, and how is that different to the
guarantees that REQ_PROVISION is already supposed to give for
provisioned ranges that are then subsequently shared by the block
device (e.g. by snapshots)?

Also, from an API perspective, this is an "unshare" data operation,
not a "provision" operation. Hence I'd suggest that the API should
be blkdev_issue_unshare() rather than optional behaviour to
_provision() which - before this patch - had clear and well defined
meaning....

Cheers,

Dave.
Sarthak Kukreti Oct. 10, 2023, 10:42 p.m. UTC | #2
On Sun, Oct 8, 2023 at 4:27 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Oct 06, 2023 at 06:28:17PM -0700, Sarthak Kukreti wrote:
> > Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
> > annotate unshare requests to underlying layers. Layers that support
> > FALLOC_FL_UNSHARE will be able to use this as an indicator of which
> > fallocate() mode to use.
> >
> > Suggested-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  block/blk-lib.c           |  6 +++++-
> >  block/fops.c              |  6 ++++--
> >  drivers/block/loop.c      | 35 +++++++++++++++++++++++++++++------
> >  include/linux/blk_types.h |  3 +++
> >  include/linux/blkdev.h    |  3 ++-
> >  5 files changed, 43 insertions(+), 10 deletions(-)
>
> I have no idea how filesystems (or even userspace applications, for
> that matter) are supposed to use this - they have no idea if the
> underlying block device has shared blocks for LBA ranges it already
> has allocated and provisioned. IOWs, I don't know waht the semantics
> of this function is, it is not documented anywhere, and there is no
> use case present that tells me how it might get used.
>
> Yes, unshare at the file level means the filesystem tries to break
> internal data extent sharing, but if the block layers or backing
> devices are doing deduplication and sharing unknown to the
> application or filesystem, how do they ever know that this operation
> might need to be performed? In what cases do we need to be able to
> unshare block device ranges, and how is that different to the
> guarantees that REQ_PROVISION is already supposed to give for
> provisioned ranges that are then subsequently shared by the block
> device (e.g. by snapshots)?
>
> Also, from an API perspective, this is an "unshare" data operation,
> not a "provision" operation. Hence I'd suggest that the API should
> be blkdev_issue_unshare() rather than optional behaviour to
> _provision() which - before this patch - had clear and well defined
> meaning....
>
Fair points, the intent from the conversation with Darrick was the
addition of support for FALLOC_FL_UNSHARE_RANGE in patch 2 of v4
(originally suggested by Brian Forster in [1]): if we allow
fallocate(UNSHARE_RANGE) on a loop device (ex. for creating a
snapshot, similar in nature to the FICLONE example you mentioned on
the loop patch), we'd (ideally) want to pass it through to the
underlying layers and let them figure out what to do with it. But it
is only for situations where we are explicitly know what the
underlying layers are and what's the mecha

I agree though that it clouds the API a bit and I don't think it
necessarily needs to be a part of the initial patch series: for now, I
propose keeping just mode zero (and FALLOC_FL_KEEP_SIZE) handling in
the block series patch and drop this patch for now. WDYT?

Best
Sarthak

[1] https://patchwork.ozlabs.org/project/linux-ext4/patch/20230414000219.92640-2-sarthakkukreti@chromium.org/#3097746




> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Dave Chinner Oct. 11, 2023, 12:06 a.m. UTC | #3
On Tue, Oct 10, 2023 at 03:42:39PM -0700, Sarthak Kukreti wrote:
> On Sun, Oct 8, 2023 at 4:27 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Oct 06, 2023 at 06:28:17PM -0700, Sarthak Kukreti wrote:
> > > Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
> > > annotate unshare requests to underlying layers. Layers that support
> > > FALLOC_FL_UNSHARE will be able to use this as an indicator of which
> > > fallocate() mode to use.
> > >
> > > Suggested-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > ---
> > >  block/blk-lib.c           |  6 +++++-
> > >  block/fops.c              |  6 ++++--
> > >  drivers/block/loop.c      | 35 +++++++++++++++++++++++++++++------
> > >  include/linux/blk_types.h |  3 +++
> > >  include/linux/blkdev.h    |  3 ++-
> > >  5 files changed, 43 insertions(+), 10 deletions(-)
> >
> > I have no idea how filesystems (or even userspace applications, for
> > that matter) are supposed to use this - they have no idea if the
> > underlying block device has shared blocks for LBA ranges it already
> > has allocated and provisioned. IOWs, I don't know waht the semantics
> > of this function is, it is not documented anywhere, and there is no
> > use case present that tells me how it might get used.
> >
> > Yes, unshare at the file level means the filesystem tries to break
> > internal data extent sharing, but if the block layers or backing
> > devices are doing deduplication and sharing unknown to the
> > application or filesystem, how do they ever know that this operation
> > might need to be performed? In what cases do we need to be able to
> > unshare block device ranges, and how is that different to the
> > guarantees that REQ_PROVISION is already supposed to give for
> > provisioned ranges that are then subsequently shared by the block
> > device (e.g. by snapshots)?
> >
> > Also, from an API perspective, this is an "unshare" data operation,
> > not a "provision" operation. Hence I'd suggest that the API should
> > be blkdev_issue_unshare() rather than optional behaviour to
> > _provision() which - before this patch - had clear and well defined
> > meaning....
> >
> Fair points, the intent from the conversation with Darrick was the
> addition of support for FALLOC_FL_UNSHARE_RANGE in patch 2 of v4
> (originally suggested by Brian Forster in [1]): if we allow
> fallocate(UNSHARE_RANGE) on a loop device (ex. for creating a
> snapshot, similar in nature to the FICLONE example you mentioned on
> the loop patch), we'd (ideally) want to pass it through to the
> underlying layers and let them figure out what to do with it. But it
> is only for situations where we are explicitly know what the
> underlying layers are and what's the mecha
> 
> I agree though that it clouds the API a bit and I don't think it
> necessarily needs to be a part of the initial patch series: for now, I
> propose keeping just mode zero (and FALLOC_FL_KEEP_SIZE) handling in
> the block series patch and drop this patch for now. WDYT?

Until we have an actual use case for unsharing (which explicitly
breaks extent sharing) as opposed to provisioning (which ensures
overwrites always succeed regardless of extent state) then let's
leave it out of this -provisioning- series.

-Dave.
diff mbox series

Patch

diff --git a/block/blk-lib.c b/block/blk-lib.c
index b1f720e198cd..d6cf572605f5 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -350,6 +350,7 @@  EXPORT_SYMBOL(blkdev_issue_secure_erase);
  * @sector:	start sector
  * @nr_sects:	number of sectors to provision
  * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @flags:	controls detailed behavior
  *
  * Description:
  *  Issues a provision request to the block device for the range of sectors.
@@ -357,7 +358,7 @@  EXPORT_SYMBOL(blkdev_issue_secure_erase);
  *  underlying storage pool to allocate space for this block range.
  */
 int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp)
+		sector_t nr_sects, gfp_t gfp, unsigned flags)
 {
 	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
 	unsigned int max_sectors = bdev_max_provision_sectors(bdev);
@@ -380,6 +381,9 @@  int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
 		bio->bi_iter.bi_sector = sector;
 		bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
 
+		if (flags & BLKDEV_PROVISION_UNSHARE_RANGE)
+			bio->bi_opf |= REQ_UNSHARE;
+
 		sector += req_sects;
 		nr_sects -= req_sects;
 		if (!nr_sects) {
diff --git a/block/fops.c b/block/fops.c
index 99b24bd9d461..dd442b6f0486 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -782,8 +782,10 @@  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	case FALLOC_FL_UNSHARE_RANGE:
 	case FALLOC_FL_KEEP_SIZE:
 	case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
-		error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
-					       len >> SECTOR_SHIFT, GFP_KERNEL);
+		error = blkdev_issue_provision(
+				bdev, start >> SECTOR_SHIFT, len >> SECTOR_SHIFT, GFP_KERNEL,
+				(mode & FALLOC_FL_UNSHARE_RANGE) ?
+					BLKDEV_PROVISION_UNSHARE_RANGE : 0);
 		break;
 	case FALLOC_FL_ZERO_RANGE:
 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index abb4dddbd4fd..f30479deb615 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -306,6 +306,30 @@  static int lo_read_simple(struct loop_device *lo, struct request *rq,
 	return 0;
 }
 
+static bool validate_fallocate_mode(struct loop_device *lo, int mode)
+{
+	bool ret = true;
+
+	switch (mode) {
+	case FALLOC_FL_PUNCH_HOLE:
+	case FALLOC_FL_ZERO_RANGE:
+		if (!bdev_max_discard_sectors(lo->lo_device))
+			ret = false;
+		break;
+	case 0:
+	case FALLOC_FL_UNSHARE_RANGE:
+		if (!bdev_max_provision_sectors(lo->lo_device))
+			ret = false;
+		break;
+
+	default:
+		ret = false;
+	}
+
+	return ret;
+}
+
+
 static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 			int mode)
 {
@@ -316,11 +340,7 @@  static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 	struct file *file = lo->lo_backing_file;
 	int ret;
 
-	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE) &&
-	    !bdev_max_discard_sectors(lo->lo_device))
-		return -EOPNOTSUPP;
-
-	if (mode == 0 && !bdev_max_provision_sectors(lo->lo_device))
+	if (!validate_fallocate_mode(lo, mode))
 		return -EOPNOTSUPP;
 
 	mode |= FALLOC_FL_KEEP_SIZE;
@@ -493,7 +513,10 @@  static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_PROVISION:
-		return lo_fallocate(lo, rq, pos, 0);
+		return lo_fallocate(lo, rq, pos,
+				    (rq->cmd_flags & REQ_UNSHARE) ?
+					    FALLOC_FL_UNSHARE_RANGE :
+					    0);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e55828ddfafe..f16187ae4c4a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -430,6 +430,8 @@  enum req_flag_bits {
 	 */
 	/* for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
+	/* for REQ_OP_PROVISION: */
+	__REQ_UNSHARE,		/* unshare blocks */
 
 	__REQ_NR_BITS,		/* stops here */
 };
@@ -458,6 +460,7 @@  enum req_flag_bits {
 #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
 
 #define REQ_NOUNMAP	(__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
+#define REQ_UNSHARE	(__force blk_opf_t)(1ULL << __REQ_UNSHARE)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dcae5538f99a..0f88ccbde12f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1042,10 +1042,11 @@  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp);
 
 extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask);
+		sector_t nr_sects, gfp_t gfp_mask, unsigned int flags);
 
 #define BLKDEV_ZERO_NOUNMAP	(1 << 0)  /* do not free blocks */
 #define BLKDEV_ZERO_NOFALLBACK	(1 << 1)  /* don't write explicit zeroes */
+#define BLKDEV_PROVISION_UNSHARE_RANGE	(1 << 2)  /* unshare range on provision */
 
 extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,