From patchwork Wed May 30 09:59:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 10438203 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0C559602CC for ; Wed, 30 May 2018 10:00:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0D63288B7 for ; Wed, 30 May 2018 10:00:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E5CF9288ED; Wed, 30 May 2018 10:00:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42AAB288B7 for ; Wed, 30 May 2018 10:00:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 983576B027A; Wed, 30 May 2018 06:00:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 909B86B027B; Wed, 30 May 2018 06:00:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AB676B027C; Wed, 30 May 2018 06:00:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 34C6E6B027A for ; Wed, 30 May 2018 06:00:29 -0400 (EDT) Received: by mail-pl0-f70.google.com with SMTP id a5-v6so11009688plp.8 for ; Wed, 30 May 2018 03:00:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=bykig1B/o0KTc2lAOPx5oaNImQA7N11Yd1a18ztXDF0=; b=gwl3Fik0PWiUbPoNJzxqcGnfvR9Uci0jheNwB5vF4b8P4B+uLcpk1qS81v8sCu5ZMS Crs2IWNMbLnpRbE6R8JK/E7W5kRhzNlDChkhthUux/MrxNHh/M5Qx82WPsGejguaDjUu enDs1XarLovAa6LEqnpiSGxyx+itPN8xobHMJshPpYQTQQi+GYnA1helKUzTzMRAe1iE UZ17W4zOAkXkGnOjcV53hhHFRgPRT0oI14RCRxoGc3kJxf8BjO3Rr8jPT8pzvS4E37Jm yhd4yiy2Y/5C/YucqJcwWChNOq7HyZ7HwpQ48n6t5Ajv63MNRchy5Qq0+arlY5rlCdhe r6rg== X-Gm-Message-State: ALKqPwf51zAkOws+YNJIDpmz+QRoOZBsNDklocQJuC8g3MZm5DdlpqtM ffLzLHXFHThxtDtHe+82fieEn8SrjlDrbCxedkf2X9J8pjkmBOItHFAkixKOAt6VJYnjgXOJSLn ZKv7jXWafOKOsASQfexoK+Maxcq+BLoC4d5w+G6H9HYLdmwfWX71TtNzJ3Dl4Nck= X-Received: by 2002:a63:2bc4:: with SMTP id r187-v6mr1689880pgr.231.1527674428861; Wed, 30 May 2018 03:00:28 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ+sBmrA5N201f3iYeqlV6N0rlJybWTITovqLKWqHyzex5r5cZ7o+pRDnJZUgMtFw9mNvQP X-Received: by 2002:a63:2bc4:: with SMTP id r187-v6mr1689810pgr.231.1527674427763; Wed, 30 May 2018 03:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527674427; cv=none; d=google.com; s=arc-20160816; b=gsBfeawKRAjPIdzxUj58FO9ofYLHsj6pgn303OJj3THlRYg1e7aV6ge6WeUAEbUWK+ mB9B5EMCUyH7Wht1JNlY4VBB8SiXcJm9BnsN3WwsZ7vAAlfGq1PKJTh4J5AFm2oOKCzI Xcxb2LippdxxaQjdkPMyzv829mR2OXAEy6bLCyv9QmcBRayftuaVBQRuWgbxPK+dYazn BBcQMzKs9WkKx6ghpTY8uj//k/Sz7hcJVfrcrKyXLpQU+oXsfJJbVvM2Nh9UwXo/q6SY lxlRPG87F+jGn4fQXic9SGuuMO7VOTXYkrmKT8vRbCOSmyvbgrkYOE4WvtDDw59fHTmM KQBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=bykig1B/o0KTc2lAOPx5oaNImQA7N11Yd1a18ztXDF0=; b=apvQxjwHdC+Rklvw4fofX8mj1hrTSSYnnV4zAAmZEOHxN30Iwkufa43N/An4RMSY3o xe0rDBUXW/FXdqUM5w28BdMN9/6RKrDh18miUBFCSWvWRTM+TKTW4fvznbUtjQKncbkS zQ9/G9xi/bu2bmzyVEXUKO5uqy1JCEUpwfp0vPiuaVrxztDWPy3eYQQBDHlKHC2SSR1i SkhY9b3dHrcZ5PbThmIajXKUmvsCFYSz03qyOYYbWUCIIdTqcE6cZ5ZYqi3sTjLRYts5 cQ+8RxEK+XFjPyc2wyA9/DBZh/HnypN/2Zg75pS7OQmDcX5oJt+RbBMisZFfbg/BXsy8 xP3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=sXwBf3sp; spf=pass (google.com: best guess record for domain of batv+1f4557cc97fec8e307c5+5393+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+1f4557cc97fec8e307c5+5393+infradead.org+hch@bombadil.srs.infradead.org Received: from bombadil.infradead.org (bombadil.infradead.org. [2607:7c80:54:e::133]) by mx.google.com with ESMTPS id n18-v6si3616905pgd.541.2018.05.30.03.00.26 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 03:00:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of batv+1f4557cc97fec8e307c5+5393+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) client-ip=2607:7c80:54:e::133; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=sXwBf3sp; spf=pass (google.com: best guess record for domain of batv+1f4557cc97fec8e307c5+5393+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+1f4557cc97fec8e307c5+5393+infradead.org+hch@bombadil.srs.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=References:In-Reply-To:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=bykig1B/o0KTc2lAOPx5oaNImQA7N11Yd1a18ztXDF0=; b=sXwBf3sp40H/2eW8G6OYxcp4j ULLtC50Ko3YpKXlrfk3UvsRFYzcvGfS9/cz2Jr4J3KMzmZFaO98lZwO3vIeOY9iANCzxDJg6pAqK4 b6S5P75hlaVkz9bMj0E8CmWnUR+o+LBwnrzslXESRE9V4mYvaTH3OT7rCqRf+iiWmajkDNBUELygi eWKn+i8ma9JetM5Xid6uPusT7BPuCAs7JhF/kl4Dtm4K34+A5oQQH9A3I6/9X8/dMY4JC+jddq2bj tKPn259Nuz3C8bz3c4JY2yGd5omaHFIcN1L8Wcb6/N9QeUl4SRrDZag7ohPsblqfGp1nILNjRMnez +Qu8VYaeg==; Received: from 213-225-38-123.nat.highway.a1.net ([213.225.38.123] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fNxu5-0007iJ-Db; Wed, 30 May 2018 10:00:25 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/18] iomap: add initial support for writes without buffer heads Date: Wed, 30 May 2018 11:59:57 +0200 Message-Id: <20180530100013.31358-3-hch@lst.de> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180530100013.31358-1-hch@lst.de> References: <20180530100013.31358-1-hch@lst.de> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For now just limited to blocksize == PAGE_SIZE, where we can simply read in the full page in write begin, and just set the whole page dirty after copying data into it. This code is enabled by default and XFS will now be feed pages without buffer heads in ->writepage and ->writepages. If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old path will still be used, this both helps the transition in XFS and prepares for the gfs2 migration to the iomap infrastructure. Signed-off-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong --- fs/iomap.c | 128 ++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_iomap.c | 6 +- include/linux/iomap.h | 2 + 3 files changed, 123 insertions(+), 13 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 5e5a266e3325..0c9d9be59184 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -316,6 +316,48 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len) truncate_pagecache_range(inode, max(pos, i_size), pos + len); } +static int +iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page, + unsigned poff, unsigned plen, unsigned from, unsigned to, + struct iomap *iomap) +{ + struct bio_vec bvec; + struct bio bio; + + if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) { + zero_user_segments(page, poff, from, to, poff + plen); + return 0; + } + + bio_init(&bio, &bvec, 1); + bio.bi_opf = REQ_OP_READ; + bio.bi_iter.bi_sector = iomap_sector(iomap, block_start); + bio_set_dev(&bio, iomap->bdev); + __bio_add_page(&bio, page, plen, poff); + return submit_bio_wait(&bio); +} + +static int +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, + struct page *page, struct iomap *iomap) +{ + loff_t block_size = i_blocksize(inode); + loff_t block_start = pos & ~(block_size - 1); + loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1); + unsigned poff = block_start & (PAGE_SIZE - 1); + unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start); + unsigned from = pos & (PAGE_SIZE - 1), to = from + len; + + WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE); + + if (PageUptodate(page)) + return 0; + if (from <= poff && to >= poff + plen) + return 0; + return iomap_read_page_sync(inode, block_start, page, + poff, plen, from, to, iomap); +} + static int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, struct page **pagep, struct iomap *iomap) @@ -333,7 +375,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, if (!page) return -ENOMEM; - status = __block_write_begin_int(page, pos, len, NULL, iomap); + if (iomap->flags & IOMAP_F_BUFFER_HEAD) + status = __block_write_begin_int(page, pos, len, NULL, iomap); + else + status = __iomap_write_begin(inode, pos, len, page, iomap); if (unlikely(status)) { unlock_page(page); put_page(page); @@ -346,14 +391,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, return status; } +int +iomap_set_page_dirty(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + int newly_dirty; + + if (unlikely(!mapping)) + return !TestSetPageDirty(page); + + /* + * Lock out page->mem_cgroup migration to keep PageDirty + * synchronized with per-memcg dirty page counters. + */ + lock_page_memcg(page); + newly_dirty = !TestSetPageDirty(page); + if (newly_dirty) + __set_page_dirty(page, mapping, 0); + unlock_page_memcg(page); + + if (newly_dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + return newly_dirty; +} +EXPORT_SYMBOL_GPL(iomap_set_page_dirty); + +static int +__iomap_write_end(struct inode *inode, loff_t pos, unsigned len, + unsigned copied, struct page *page, struct iomap *iomap) +{ + flush_dcache_page(page); + + /* + * The blocks that were entirely written will now be uptodate, so we + * don't have to worry about a readpage reading them and overwriting a + * partial write. However if we have encountered a short write and only + * partially written into a block, it will not be marked uptodate, so a + * readpage might come in and destroy our partial write. + * + * Do the simplest thing, and just treat any short write to a non + * uptodate page as a zero-length write, and force the caller to redo + * the whole thing. + */ + if (unlikely(copied < len && !PageUptodate(page))) { + copied = 0; + } else { + SetPageUptodate(page); + iomap_set_page_dirty(page); + } + return __generic_write_end(inode, pos, copied, page); +} + static int iomap_write_end(struct inode *inode, loff_t pos, unsigned len, - unsigned copied, struct page *page) + unsigned copied, struct page *page, struct iomap *iomap) { int ret; - ret = generic_write_end(NULL, inode->i_mapping, pos, len, - copied, page, NULL); + if (iomap->flags & IOMAP_F_BUFFER_HEAD) + ret = generic_write_end(NULL, inode->i_mapping, pos, len, + copied, page, NULL); + else + ret = __iomap_write_end(inode, pos, len, copied, page, iomap); + if (ret < len) iomap_write_failed(inode, pos, len); return ret; @@ -408,7 +508,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, flush_dcache_page(page); - status = iomap_write_end(inode, pos, bytes, copied, page); + status = iomap_write_end(inode, pos, bytes, copied, page, + iomap); if (unlikely(status < 0)) break; copied = status; @@ -502,7 +603,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, WARN_ON_ONCE(!PageUptodate(page)); - status = iomap_write_end(inode, pos, bytes, bytes, page); + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap); if (unlikely(status <= 0)) { if (WARN_ON_ONCE(status == 0)) return -EIO; @@ -554,7 +655,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, zero_user(page, offset, bytes); mark_page_accessed(page); - return iomap_write_end(inode, pos, bytes, bytes, page); + return iomap_write_end(inode, pos, bytes, bytes, page, iomap); } static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, @@ -640,11 +741,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, struct page *page = data; int ret; - ret = __block_write_begin_int(page, pos, length, NULL, iomap); - if (ret) - return ret; + if (iomap->flags & IOMAP_F_BUFFER_HEAD) { + ret = __block_write_begin_int(page, pos, length, NULL, iomap); + if (ret) + return ret; + block_commit_write(page, 0, length); + } else { + WARN_ON_ONCE(!PageUptodate(page)); + WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE); + } - block_commit_write(page, 0, length); return length; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index c6ce6f9335b6..da6d1995e460 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay( * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch * them out if the write happens to fail. */ - iomap->flags = IOMAP_F_NEW; + iomap->flags |= IOMAP_F_NEW; trace_xfs_iomap_alloc(ip, offset, count, 0, &got); done: if (isnullstartblock(got.br_startblock)) @@ -1031,6 +1031,8 @@ xfs_file_iomap_begin( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; + iomap->flags |= IOMAP_F_BUFFER_HEAD; + if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ @@ -1131,7 +1133,7 @@ xfs_file_iomap_begin( if (error) return error; - iomap->flags = IOMAP_F_NEW; + iomap->flags |= IOMAP_F_NEW; trace_xfs_iomap_alloc(ip, offset, length, 0, &imap); out_finish: diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 7300d30ca495..4d3d9d0cd69f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -30,6 +30,7 @@ struct vm_fault; */ #define IOMAP_F_NEW 0x01 /* blocks have been newly allocated */ #define IOMAP_F_DIRTY 0x02 /* uncommitted metadata */ +#define IOMAP_F_BUFFER_HEAD 0x04 /* file system requires buffer heads */ /* * Flags that only need to be reported for IOMAP_REPORT requests: @@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, int iomap_readpage(struct page *page, const struct iomap_ops *ops); int iomap_readpages(struct address_space *mapping, struct list_head *pages, unsigned nr_pages, const struct iomap_ops *ops); +int iomap_set_page_dirty(struct page *page); int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,