diff mbox

[1/3] block: invalidate the page cache when issuing BLKZEROOUT.

Message ID 146612625412.12764.6647932282740152837.stgit@birch.djwong.org (mailing list archive)
State New, archived
Headers show

Commit Message

Darrick J. Wong June 17, 2016, 1:17 a.m. UTC
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/ioctl.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bart Van Assche June 20, 2016, 12:35 p.m. UTC | #1
On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> Split the page invalidation and the new ioctl into separate patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/ioctl.c |   29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
>
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index ed2397f..d001f52 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>  		unsigned long arg)
>  {
>  	uint64_t range[2];
> -	uint64_t start, len;
> +	struct address_space *mapping;
> +	uint64_t start, end, len;
> +	int ret;
>
>  	if (!(mode & FMODE_WRITE))
>  		return -EBADF;
> @@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>
>  	start = range[0];
>  	len = range[1];
> +	end = start + len - 1;
>
>  	if (start & 511)
>  		return -EINVAL;
>  	if (len & 511)
>  		return -EINVAL;
> -	start >>= 9;
> -	len >>= 9;
> -
> -	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> +	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> +		return -EINVAL;
> +	if (end < start)
>  		return -EINVAL;
>
> -	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> +	/* Invalidate the page cache, including dirty pages */
> +	mapping = bdev->bd_inode->i_mapping;
> +	truncate_inode_pages_range(mapping, start, end);
> +
> +	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> +				    false);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Invalidate again; if someone wandered in and dirtied a page,
> +	 * the caller will be given -EBUSY.
> +	 */
> +	return invalidate_inode_pages2_range(mapping,
> +					     start >> PAGE_SHIFT,
> +					     end >> PAGE_SHIFT);
>  }

Hello Darrick,

Maybe this has already been discussed, but anyway: in the POSIX spec 
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I 
found the following: "This volume of POSIX.1-2008 does not specify 
behavior of concurrent writes to a file from multiple processes. 
Applications should use some form of concurrency control."

Do we really need the invalidate_inode_pages2_range() call?

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong June 28, 2016, 7:13 p.m. UTC | #2
On Mon, Jun 20, 2016 at 02:35:29PM +0200, Bart Van Assche wrote:
> On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> >Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> >returning stale cache contents at a later time.
> >
> >v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> >Split the page invalidation and the new ioctl into separate patches.
> >
> >Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >Reviewed-by: Christoph Hellwig <hch@lst.de>
> >---
> > block/ioctl.c |   29 +++++++++++++++++++++++------
> > 1 file changed, 23 insertions(+), 6 deletions(-)
> >
> >
> >diff --git a/block/ioctl.c b/block/ioctl.c
> >index ed2397f..d001f52 100644
> >--- a/block/ioctl.c
> >+++ b/block/ioctl.c
> >@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> > 		unsigned long arg)
> > {
> > 	uint64_t range[2];
> >-	uint64_t start, len;
> >+	struct address_space *mapping;
> >+	uint64_t start, end, len;
> >+	int ret;
> >
> > 	if (!(mode & FMODE_WRITE))
> > 		return -EBADF;
> >@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> >
> > 	start = range[0];
> > 	len = range[1];
> >+	end = start + len - 1;
> >
> > 	if (start & 511)
> > 		return -EINVAL;
> > 	if (len & 511)
> > 		return -EINVAL;
> >-	start >>= 9;
> >-	len >>= 9;
> >-
> >-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> >+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> >+		return -EINVAL;
> >+	if (end < start)
> > 		return -EINVAL;
> >
> >-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> >+	/* Invalidate the page cache, including dirty pages */
> >+	mapping = bdev->bd_inode->i_mapping;
> >+	truncate_inode_pages_range(mapping, start, end);
> >+
> >+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> >+				    false);
> >+	if (ret)
> >+		return ret;
> >+
> >+	/*
> >+	 * Invalidate again; if someone wandered in and dirtied a page,
> >+	 * the caller will be given -EBUSY.
> >+	 */
> >+	return invalidate_inode_pages2_range(mapping,
> >+					     start >> PAGE_SHIFT,
> >+					     end >> PAGE_SHIFT);
> > }
> 
> Hello Darrick,
> 
> Maybe this has already been discussed, but anyway: in the POSIX spec
> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I
> found the following: "This volume of POSIX.1-2008 does not specify behavior
> of concurrent writes to a file from multiple processes. Applications should
> use some form of concurrency control."
> 
> Do we really need the invalidate_inode_pages2_range() call?

It's not strictly necessary.  I like the idea of having the kernel bonking
userspace when they don't coordinate and collide, but we could just jump
out after the blkdev_*() calls and let userspace fend for themselves. :)

--D

> 
> Thanks,
> 
> Bart.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen June 29, 2016, 4:57 a.m. UTC | #3
>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Invalidate the page cache (as a regular O_DIRECT write would
Darrick> do) to avoid returning stale cache contents at a later time.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
diff mbox

Patch

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d001f52 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,9 @@  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
+	int ret;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -235,18 +237,33 @@  static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
+	if (ret)
+		return ret;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)