Message ID | 20160315194244.30093.6483.stgit@birch.djwong.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote: > #include <linux/cleancache.h> > #include <linux/dax.h> > #include <asm/uaccess.h> > +#include <linux/falloc.h> Maybe keep this before asm/uaccess.h > +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) should be marked static. > + /* We haven't a primitive for "ensure space exists" right now. */ > + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) > + return -EOPNOTSUPP; I don't really understand the comment. But I think you'd be much better off with having blkdev_fallocate as just a tiny wrapper that has a switch for the supported modes, e.g. switch (mode) { case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: return blkdev_punch_hole(); case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:: return blkdev_zero_range(); default: return -EOPNOTSUPP; } > +EXPORT_SYMBOL_GPL(blkdev_fallocate); and no need to export it either.. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21, 2016 at 08:38:27AM -0700, Christoph Hellwig wrote: > On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote: > > #include <linux/cleancache.h> > > #include <linux/dax.h> > > #include <asm/uaccess.h> > > +#include <linux/falloc.h> > > Maybe keep this before asm/uaccess.h > > > +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) > > should be marked static. Ok (to both). > > > + /* We haven't a primitive for "ensure space exists" right now. */ > > + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) > > + return -EOPNOTSUPP; > > I don't really understand the comment. But I think you'd be much I don't know of a block device primitive that corresponds to the "default" mode of fallocate, as documented in the manpage (i.e. mode == 0). I agree that the whole thing could be simplified in the manner you point out below. > better off with having blkdev_fallocate as just a tiny wrapper that has > a switch for the supported modes, e.g. > > switch (mode) { > case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: > return blkdev_punch_hole(); > case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:: > return blkdev_zero_range(); > default: > return -EOPNOTSUPP; > } > > > +EXPORT_SYMBOL_GPL(blkdev_fallocate); > > and no need to export it either.. Ok. --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote: > > I don't really understand the comment. But I think you'd be much > > I don't know of a block device primitive that corresponds to the "default" > mode of fallocate, as documented in the manpage (i.e. mode == 0). I agree > that the whole thing could be simplified in the manner you point out below. SCSI allows 'anchoring' blocks, which is pretty similar to a normal fallocate, but we don't support anchoring blocks in Linux yet. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: Christoph> On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote: >> > I don't really understand the comment. But I think you'd be much >> >> I don't know of a block device primitive that corresponds to the >> "default" mode of fallocate, as documented in the manpage (i.e. mode >> == 0). I agree that the whole thing could be simplified in the >> manner you point out below. Christoph> SCSI allows 'anchoring' blocks, which is pretty similar to a Christoph> normal fallocate, but we don't support anchoring blocks in Christoph> Linux yet. Chicken and egg problem. I actually did tinker with this for a different project a while back and can easily revive it.
On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > After much discussion, it seems that the fallocate feature flag > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > whitelisted for zeroing SCSI UNMAP. Punch still requires that > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > device will be clamped to the device size if KEEP_SIZE is set; or will > return -EINVAL if not. Both start and length must be aligned to the > device's logical block size. > > Since the semantics of fallocate are fairly well established already, > wire up the two pieces. The other fallocate variants (collapse range, > insert range, and allocate blocks) are not supported. I'd like to see fallocate (block allocation) extend down to DM thinp. This more traditional use of fallocate would be useful for ensuring ENOSPC won't occur -- especially important if the FS has committed space in response to fallocate. As of now fallocate doesn't inform DM thinp at all. Curious why you decided not to wire it up? But I'm not sure what "it" (the "allocate blocks" variant) even is given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"... It would require a new block interface to pass the fallocate extent down. But it seems bizarre to implement "some of" fallocate but not the most widely used case for fallocate. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote: > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong > <darrick.wong@oracle.com> wrote: > > After much discussion, it seems that the fallocate feature flag > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > > whitelisted for zeroing SCSI UNMAP. Punch still requires that > > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > > device will be clamped to the device size if KEEP_SIZE is set; or will > > return -EINVAL if not. Both start and length must be aligned to the > > device's logical block size. > > > > Since the semantics of fallocate are fairly well established already, > > wire up the two pieces. The other fallocate variants (collapse range, > > insert range, and allocate blocks) are not supported. > > I'd like to see fallocate (block allocation) extend down to DM thinp. > This more traditional use of fallocate would be useful for ensuring > ENOSPC won't occur -- especially important if the FS has committed > space in response to fallocate. As of now fallocate doesn't inform DM > thinp at all. Curious why you decided not to wire it up? I don't know what to wire it up to. :) I didn't find any blkdev_* function that looked encouraging, though I haven't dug too deeply into bfoster's "prototype a block reservation allocation model" patchset yet. At a high level I'd guess that would be a reasonable piece to connect to? It looks like the piece I want is blk_provision_space(). > But I'm not sure what "it" (the "allocate blocks" variant) even is > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"... The default behavior of fallocate is to allocate blocks, which means that one invokes it by not passing any mode flags (except possibly KEEP_SIZE). > It would require a new block interface to pass the fallocate extent > down. But it seems bizarre to implement "some of" fallocate but not > the most widely used case for fallocate. Agreed. I'd like to get the existing functionality wired up sooner than later, and plumbing "allocate blocks" down to thinp can be done as a followup. (Or stall long enough that it becomes one patchset.) --D > > Mike -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21 2016 at 3:11pm -0400, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote: > > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong > > <darrick.wong@oracle.com> wrote: > > > After much discussion, it seems that the fallocate feature flag > > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > > > whitelisted for zeroing SCSI UNMAP. Punch still requires that > > > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > > > device will be clamped to the device size if KEEP_SIZE is set; or will > > > return -EINVAL if not. Both start and length must be aligned to the > > > device's logical block size. > > > > > > Since the semantics of fallocate are fairly well established already, > > > wire up the two pieces. The other fallocate variants (collapse range, > > > insert range, and allocate blocks) are not supported. > > > > I'd like to see fallocate (block allocation) extend down to DM thinp. > > This more traditional use of fallocate would be useful for ensuring > > ENOSPC won't occur -- especially important if the FS has committed > > space in response to fallocate. As of now fallocate doesn't inform DM > > thinp at all. Curious why you decided not to wire it up? > > I don't know what to wire it up to. :) Fair enough. Yes something needs to be invented. > I didn't find any blkdev_* function that looked encouraging, though I > haven't dug too deeply into bfoster's "prototype a block reservation > allocation model" patchset yet. At a high level I'd guess that would > be a reasonable piece to connect to? It looks like the piece I want > is blk_provision_space(). Yes, something like that. > > But I'm not sure what "it" (the "allocate blocks" variant) even is > > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"... > > The default behavior of fallocate is to allocate blocks, which means > that one invokes it by not passing any mode flags (except possibly > KEEP_SIZE). OK. > > It would require a new block interface to pass the fallocate extent > > down. But it seems bizarre to implement "some of" fallocate but not > > the most widely used case for fallocate. > > Agreed. I'd like to get the existing functionality wired up sooner than > later, and plumbing "allocate blocks" down to thinp can be done as a > followup. > > (Or stall long enough that it becomes one patchset.) Sure, sounds good. Glad we're in agreement. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21, 2016 at 03:22:29PM -0400, Mike Snitzer wrote: > On Mon, Mar 21 2016 at 3:11pm -0400, > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote: > > > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong > > > <darrick.wong@oracle.com> wrote: > > > > After much discussion, it seems that the fallocate feature flag > > > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > > > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > > > > whitelisted for zeroing SCSI UNMAP. Punch still requires that > > > > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > > > > device will be clamped to the device size if KEEP_SIZE is set; or will > > > > return -EINVAL if not. Both start and length must be aligned to the > > > > device's logical block size. > > > > > > > > Since the semantics of fallocate are fairly well established already, > > > > wire up the two pieces. The other fallocate variants (collapse range, > > > > insert range, and allocate blocks) are not supported. > > > > > > I'd like to see fallocate (block allocation) extend down to DM thinp. > > > This more traditional use of fallocate would be useful for ensuring > > > ENOSPC won't occur -- especially important if the FS has committed > > > space in response to fallocate. As of now fallocate doesn't inform DM > > > thinp at all. Curious why you decided not to wire it up? > > > > I don't know what to wire it up to. :) > > Fair enough. Yes something needs to be invented. > > > I didn't find any blkdev_* function that looked encouraging, though I > > haven't dug too deeply into bfoster's "prototype a block reservation > > allocation model" patchset yet. At a high level I'd guess that would > > be a reasonable piece to connect to? It looks like the piece I want > > is blk_provision_space(). > > Yes, something like that. > Just a note that the caveat/hack with the provision call in there is that it returns an allocated block count. That was necessary to help maintain the local reservation accounting. I'd love to find a way to handle that more cleanly or take advantage of generic fallocate, but I don't have a clear idea on how to do that at the moment. (I do wonder whether an internal-only set of falloc "reserve" flags would fly...). Anyways, that's a separate topic. Feel free to steal any of that dm-thin provision code if it is useful for generic fallocate(). :) Brian > > > But I'm not sure what "it" (the "allocate blocks" variant) even is > > > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"... > > > > The default behavior of fallocate is to allocate blocks, which means > > that one invokes it by not passing any mode flags (except possibly > > KEEP_SIZE). > > OK. > > > > It would require a new block interface to pass the fallocate extent > > > down. But it seems bizarre to implement "some of" fallocate but not > > > the most widely used case for fallocate. > > > > Agreed. I'd like to get the existing functionality wired up sooner than > > later, and plumbing "allocate blocks" down to thinp can be done as a > > followup. > > > > (Or stall long enough that it becomes one patchset.) > > Sure, sounds good. Glad we're in agreement. > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/block_dev.c b/fs/block_dev.c index 826b164..6137c6e 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include <linux/cleancache.h> #include <linux/dax.h> #include <asm/uaccess.h> +#include <linux/falloc.h> #include "internal.h" struct bdev_inode { @@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma) #define blkdev_mmap generic_file_mmap #endif +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE) + +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t bs_mask, isize; + int error; + + /* We only support zero range and punch hole. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* We haven't a primitive for "ensure space exists" right now. */ + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) + return -EOPNOTSUPP; + + /* Only punch if the device can do zeroing discard. */ + if ((mode & FALLOC_FL_PUNCH_HOLE) && + (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)) + return -EOPNOTSUPP; + + /* Don't go off the end of the device */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* Don't allow IO that isn't aligned to logical block size */ + bs_mask = bdev_logical_block_size(bdev) - 1; + if ((start | len) & bs_mask) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + error = -EINVAL; + if (mode & FALLOC_FL_ZERO_RANGE) + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + else if (mode & FALLOC_FL_PUNCH_HOLE) + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_CACHE_SHIFT, + end >> PAGE_CACHE_SHIFT); +} +EXPORT_SYMBOL_GPL(blkdev_fallocate); + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 55bdc75..4f99adc 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */
After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/block_dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 ++ 2 files changed, 71 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html