From patchwork Fri Nov 13 22:01:43 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 7615331 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id DC7B3BF90C for ; Fri, 13 Nov 2015 22:02:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E898820842 for ; Fri, 13 Nov 2015 22:02:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F1FB82083A for ; Fri, 13 Nov 2015 22:02:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753295AbbKMWCT (ORCPT ); Fri, 13 Nov 2015 17:02:19 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:42104 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753287AbbKMWCQ (ORCPT ); Fri, 13 Nov 2015 17:02:16 -0500 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id tADM1kCS028436 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 13 Nov 2015 22:01:47 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id tADM1jlW027473 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 13 Nov 2015 22:01:46 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id tADM1ilj003969; Fri, 13 Nov 2015 22:01:44 GMT Received: from localhost (/71.198.20.188) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 13 Nov 2015 14:01:44 -0800 Date: Fri, 13 Nov 2015 14:01:43 -0800 From: "Darrick J. Wong" To: Jens Axboe , Christoph Hellwig Cc: "Seymour, Shane M" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-api@vger.kernel.org" , Jeff Layton , "J. Bruce Fields" , "martin.petersen@oracle.com" Subject: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks Message-ID: <20151113220143.GE2217@birch.djwong.org> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Create a new ioctl to expose the block layer's newfound ability to issue either a zeroing discard, a WRITE SAME with a zero page, or a regular write with the zero page. This BLKZEROOUT2 ioctl takes {start, length, flags} as parameters. So far, the only flag available is to enable the zeroing discard part -- without it, the call invokes the old BLKZEROOUT behavior. start and length have the same meaning as in BLKZEROOUT. Furthermore, because BLKZEROOUT2 issues commands directly to the storage device, we must invalidate the page cache (as a regular O_DIRECT write would do) to avoid returning stale cache contents at a later time. v3: Add extra padding for future expansion, and check the padding is zero. v4: Check the start/len arguments for overflows prior to feeding the page cache bogus numbers (that it'll ignore anyway). Signed-off-by: Darrick J. Wong Reviewed-by: Shane Seymour --- block/ioctl.c | 50 ++++++++++++++++++++++++++++++++++++++++------- include/uapi/linux/fs.h | 9 ++++++++ 2 files changed, 52 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/block/ioctl.c b/block/ioctl.c index 8061eba..5c092e8 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -213,19 +213,41 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start, } static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start, - uint64_t len) + uint64_t len, uint32_t flags) { + int ret; + struct address_space *mapping; + uint64_t end = start + len - 1; + + if (flags & ~BLKZEROOUT2_DISCARD_OK) + return -EINVAL; if (start & 511) return -EINVAL; if (len & 511) return -EINVAL; - start >>= 9; - len >>= 9; - - if (start + len > (i_size_read(bdev->bd_inode) >> 9)) + if (end >= (uint64_t)i_size_read(bdev->bd_inode)) + return -EINVAL; + if (end < start) return -EINVAL; - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false); + /* Invalidate the page cache, including dirty pages */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, + flags & BLKZEROOUT2_DISCARD_OK); + if (ret) + goto out; + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY. + */ + ret = invalidate_inode_pages2_range(mapping, + start >> PAGE_CACHE_SHIFT, + end >> PAGE_CACHE_SHIFT); +out: + return ret; } static int put_ushort(unsigned long arg, unsigned short val) @@ -353,7 +375,21 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, if (copy_from_user(range, (void __user *)arg, sizeof(range))) return -EFAULT; - return blk_ioctl_zeroout(bdev, range[0], range[1]); + return blk_ioctl_zeroout(bdev, range[0], range[1], 0); + } + case BLKZEROOUT2: { + struct blkzeroout2 p; + + if (!(mode & FMODE_WRITE)) + return -EBADF; + + if (copy_from_user(&p, (void __user *)arg, sizeof(p))) + return -EFAULT; + + if (p.padding || p.padding2) + return -EINVAL; + + return blk_ioctl_zeroout(bdev, p.start, p.length, p.flags); } case HDIO_GETGEO: { diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 9b964a5..b811fa4 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -152,6 +152,15 @@ struct inodes_stat_t { #define BLKSECDISCARD _IO(0x12,125) #define BLKROTATIONAL _IO(0x12,126) #define BLKZEROOUT _IO(0x12,127) +struct blkzeroout2 { + __u64 start; + __u64 length; + __u32 flags; + __u32 padding; + __u64 padding2; +}; +#define BLKZEROOUT2_DISCARD_OK 1 +#define BLKZEROOUT2 _IOR(0x12, 127, struct blkzeroout2) #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP _IO(0x00,1) /* bmap access */