From patchwork Tue Mar 15 19:42:44 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 8591941 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 7AFEBC0553 for ; Tue, 15 Mar 2016 19:44:03 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 796FA202F2 for ; Tue, 15 Mar 2016 19:44:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 62E572037C for ; Tue, 15 Mar 2016 19:44:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965833AbcCOTn0 (ORCPT ); Tue, 15 Mar 2016 15:43:26 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:48095 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965822AbcCOTnX (ORCPT ); Tue, 15 Mar 2016 15:43:23 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u2FJgnrc023945 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 15 Mar 2016 19:42:50 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u2FJgnSX002231 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Tue, 15 Mar 2016 19:42:49 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u2FJgkiQ030520; Tue, 15 Mar 2016 19:42:46 GMT Received: from localhost (/24.21.154.84) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 15 Mar 2016 12:42:46 -0700 Subject: [PATCH 3/3] block: implement (some of) fallocate for block devices From: "Darrick J. Wong" To: axboe@kernel.dk, torvalds@linux-foundation.org, darrick.wong@oracle.com Cc: bfields@fieldses.org, tytso@mit.edu, martin.petersen@oracle.com, linux-api@vger.kernel.org, david@fromorbit.com, linux-kernel@vger.kernel.org, shane.seymour@hpe.com, hch@infradead.org, linux-fsdevel@vger.kernel.org, jlayton@poochiereds.net, akpm@linux-foundation.org Date: Tue, 15 Mar 2016 12:42:44 -0700 Message-ID: <20160315194244.30093.6483.stgit@birch.djwong.org> In-Reply-To: <20160315194221.30093.70506.stgit@birch.djwong.org> References: <20160315194221.30093.70506.stgit@birch.djwong.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong --- fs/block_dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 ++ 2 files changed, 71 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/block_dev.c b/fs/block_dev.c index 826b164..6137c6e 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "internal.h" struct bdev_inode { @@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma) #define blkdev_mmap generic_file_mmap #endif +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE) + +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t bs_mask, isize; + int error; + + /* We only support zero range and punch hole. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* We haven't a primitive for "ensure space exists" right now. */ + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) + return -EOPNOTSUPP; + + /* Only punch if the device can do zeroing discard. */ + if ((mode & FALLOC_FL_PUNCH_HOLE) && + (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)) + return -EOPNOTSUPP; + + /* Don't go off the end of the device */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* Don't allow IO that isn't aligned to logical block size */ + bs_mask = bdev_logical_block_size(bdev) - 1; + if ((start | len) & bs_mask) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + error = -EINVAL; + if (mode & FALLOC_FL_ZERO_RANGE) + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + else if (mode & FALLOC_FL_PUNCH_HOLE) + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_CACHE_SHIFT, + end >> PAGE_CACHE_SHIFT); +} +EXPORT_SYMBOL_GPL(blkdev_fallocate); + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 55bdc75..4f99adc 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */