From patchwork Fri Jun 17 01:17:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 9182107 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 196B760760 for ; Fri, 17 Jun 2016 01:22:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A3482839F for ; Fri, 17 Jun 2016 01:22:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2F88283A1; Fri, 17 Jun 2016 01:22:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D3712839F for ; Fri, 17 Jun 2016 01:22:23 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u5H1Igqi031527; Thu, 16 Jun 2016 21:18:42 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u5H1IM9Y021434 for ; Thu, 16 Jun 2016 21:18:22 -0400 Received: from mx1.redhat.com (ext-mx10.extmail.prod.ext.phx2.redhat.com [10.5.110.39]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u5H1IMww010802 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 16 Jun 2016 21:18:22 -0400 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7EF6062655; Fri, 17 Jun 2016 01:18:21 +0000 (UTC) Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u5H1HrML023235 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 17 Jun 2016 01:17:53 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u5H1Hq1C007532 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 17 Jun 2016 01:17:53 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u5H1HoWS005675; Fri, 17 Jun 2016 01:17:51 GMT Received: from localhost (/10.145.178.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 16 Jun 2016 18:17:49 -0700 From: "Darrick J. Wong" To: axboe@kernel.dk, darrick.wong@oracle.com Date: Thu, 16 Jun 2016 18:17:47 -0700 Message-ID: <146612626773.12764.9800994140955661542.stgit@birch.djwong.org> In-Reply-To: <146612624734.12764.4316680863289411106.stgit@birch.djwong.org> References: <146612624734.12764.4316680863289411106.stgit@birch.djwong.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Greylist: Sender IP whitelisted by DNSRBL, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 17 Jun 2016 01:18:21 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 17 Jun 2016 01:18:21 +0000 (UTC) for IP:'141.146.126.69' DOMAIN:'aserp1040.oracle.com' HELO:'aserp1040.oracle.com' FROM:'darrick.wong@oracle.com' RCPT:'' X-RedHat-Spam-Score: -103.046 (BAYES_50, DCC_REPUT_13_19, RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, RP_MATCHES_RCVD, SPF_PASS, UNPARSEABLE_RELAY, USER_IN_WHITELIST) 141.146.126.69 aserp1040.oracle.com 141.146.126.69 aserp1040.oracle.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.39 X-loop: dm-devel@redhat.com Cc: hch@infradead.org, tytso@mit.edu, martin.petersen@oracle.com, snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com, xfs@oss.sgi.com, linux-block@vger.kernel.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org Subject: [dm-devel] [PATCH 3/3] block: implement (some of) fallocate for block devices X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. v2: Incorporate feedback from Christoph & Linus. Tentatively add a requirement that the fallocate arguments be aligned to logical block size, and put in a few XXX comments ahead of LSF discussion. v3: Forward port to 4.7. Signed-off-by: Darrick J. Wong --- fs/block_dev.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 +- 2 files changed, 86 insertions(+), 1 deletion(-) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/fs/block_dev.c b/fs/block_dev.c index 71ccab1..a3975c4 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include "internal.h" @@ -1802,6 +1803,88 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE) + +static long blkdev_fallocate(struct file *file, int mode, loff_t start, + loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t isize; + int error; + + /* Fail if we don't recognize the flags. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* Don't go off the end of the device. */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* + * Don't allow IO that isn't aligned to logical block size. + */ + if ((start | len) & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + switch (mode) { + case FALLOC_FL_ZERO_RANGE: + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: + /* Only punch if the device can do zeroing discard. */ + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: + /* + * XXX: a well known search engine vendor interprets this + * flag (in other circumstances) to mean "I don't care if + * we can read stale contents later". Is it appropriate + * to wire this up to the non-zeroing discard? + */ + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + default: + return -EOPNOTSUPP; + } + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); +} + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1816,6 +1899,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 93ae3cd..b2dbda4 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */