From patchwork Wed Nov 19 07:22:35 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 5334771 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 1209FC11AC for ; Wed, 19 Nov 2014 07:22:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 455A020115 for ; Wed, 19 Nov 2014 07:22:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 638D0200FE for ; Wed, 19 Nov 2014 07:22:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755155AbaKSHWk (ORCPT ); Wed, 19 Nov 2014 02:22:40 -0500 Received: from mail-pd0-f176.google.com ([209.85.192.176]:37665 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755117AbaKSHWi (ORCPT ); Wed, 19 Nov 2014 02:22:38 -0500 Received: by mail-pd0-f176.google.com with SMTP id y10so879pdj.35 for ; Tue, 18 Nov 2014 23:22:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=OAbldNs8TKNOM8CdA8aMwtIazCnB8QBFxYxxF/yCb0g=; b=Eg5IfKfRdRWOjD6xhpV8JftUnNy/Tf1R6pBqdKl0vQJ1NdZcu0c1CzlX496QsN4UIn FNM+H2rNJLCjbtoJPcCp6ZWRKltnZUx0xsQdlH3e2VAvqrIzF3fqmMK5SDSOvYsSKdiV 2SFFNMDAkgau3G6gEbWugep0kc/SXR5+jRwp8bT2If4EJrRs7yrTtrjvmsRHUA43RHUP luFKJ4epR1yfOyl1eG0hop2wo2fZu2rsTSqa0LvnOJsGsIM6cqeMtfb1rScaKrpNS43O ThvfOC7Gc0LW8p2yd7+qglvqG2durFFGmDNRW2mp1hwLR3oeg98WyzC7wTscauiQxc9+ lkeg== X-Gm-Message-State: ALoCoQnAIi4vdQ3AZUfWpUzLbAm8FOai1EibZBab4utJWpuYeDMUOmHXoe5QDq4dY8OOw6JJFMYQ X-Received: by 10.68.196.163 with SMTP id in3mr43239592pbc.100.1416381757783; Tue, 18 Nov 2014 23:22:37 -0800 (PST) Received: from mew (c-24-19-133-29.hsd1.wa.comcast.net. [24.19.133.29]) by mx.google.com with ESMTPSA id bj7sm977994pad.20.2014.11.18.23.22.36 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 23:22:37 -0800 (PST) Date: Tue, 18 Nov 2014 23:22:35 -0800 From: Omar Sandoval To: Christoph Hellwig Cc: linux-btrfs@vger.kernel.org, Mel Gorman , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [RFC PATCH 0/6] btrfs: implement swap file support Message-ID: <20141119072235.GA6541@mew> References: <20141117154817.GA25670@infradead.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20141117154817.GA25670@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Nov 17, 2014 at 07:48:17AM -0800, Christoph Hellwig wrote: > With the new iov_iter infrastructure that supprots direct I/O to kernel > pages please get rid of the ->readpage hack first. I'm still utterly > disapoined that this crap ever got merged. > That seems reasonable. Using direct I/O circumvents the need for patches 3, 4, and 5, which were workarounds for readpage being fed a swapcache page, and patch 1, which is a big, error-prone mess. Here's a nice little bit of insanity I put together in that direction -- consider it a discussion point more than a patch. It does two things: - Uses an ITER_BVEC iov_iter to do direct_IO for swap_readpage. This makes swap_readpage a synchronous operation, but I think that's the best we can do with the existing interface. - Unless I'm missing something, there don't appear to be any instances of ITER_BVEC | READ in the kernel, so the dio path doesn't know not to dirty pages it gets that way. Dave Kleikamp and Ming Lei each previously submitted patches doing this as part of adding an aio_kernel interface. (The NFS direct I/O implementation doesn't know how to deal with these either, so this patch actually breaks the only existing user of this code path, but in the interest of keeping the patch short, I didn't try to fix it :) Obviously, there's more to be done if that's how you'd prefer I do this. I'm far from being an expert in any of this, so please let me know if I'm spewing nonsense :) From e58c52e69a9aef07c0089f9ce552fca96d42bce9 Mon Sep 17 00:00:00 2001 Message-Id: From: Omar Sandoval Date: Tue, 18 Nov 2014 22:42:10 -0800 Subject: [PATCH] swap: use direct_IO for swap_readpage Signed-off-by: Omar Sandoval --- fs/direct-io.c | 8 +++++--- mm/page_io.c | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 10 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index e181b6b..e542ce4 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -120,6 +120,7 @@ struct dio { spinlock_t bio_lock; /* protects BIO fields below */ int page_errors; /* errno from get_user_pages() */ int is_async; /* is IO async ? */ + int should_dirty; /* should we mark read pages dirty? */ bool defer_completion; /* defer AIO completion to workqueue? */ int io_error; /* IO error in completion path */ unsigned long refcount; /* direct_io_worker() and bios */ @@ -392,7 +393,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) dio->refcount++; spin_unlock_irqrestore(&dio->bio_lock, flags); - if (dio->is_async && dio->rw == READ) + if (dio->is_async && dio->rw == READ && dio->should_dirty) bio_set_pages_dirty(bio); if (sdio->submit_io) @@ -463,13 +464,13 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) if (!uptodate) dio->io_error = -EIO; - if (dio->is_async && dio->rw == READ) { + if (dio->is_async && dio->rw == READ && dio->should_dirty) { bio_check_pages_dirty(bio); /* transfers ownership */ } else { bio_for_each_segment_all(bvec, bio, i) { struct page *page = bvec->bv_page; - if (dio->rw == READ && !PageCompound(page)) + if (dio->rw == READ && !PageCompound(page) && dio->should_dirty) set_page_dirty_lock(page); page_cache_release(page); } @@ -1177,6 +1178,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, dio->inode = inode; dio->rw = rw; + dio->should_dirty = !(iter->type & ITER_BVEC); /* * For AIO O_(D)SYNC writes we need to defer completions to a workqueue diff --git a/mm/page_io.c b/mm/page_io.c index 955db8b..b9b84b2 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -266,8 +266,8 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, struct address_space *mapping = swap_file->f_mapping; struct bio_vec bv = { .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 + .bv_len = PAGE_SIZE, + .bv_offset = 0, }; struct iov_iter from = { .type = ITER_BVEC | WRITE, @@ -283,8 +283,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, set_page_writeback(page); unlock_page(page); - ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE, - &kiocb, &from, + ret = mapping->a_ops->direct_IO(WRITE, &kiocb, &from, kiocb.ki_pos); if (ret == PAGE_SIZE) { count_vm_event(PSWPOUT); @@ -303,7 +302,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, set_page_dirty(page); ClearPageReclaim(page); pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", - page_file_offset(page)); + page_file_offset(page)); } end_page_writeback(page); return ret; @@ -348,12 +347,36 @@ int swap_readpage(struct page *page) } if (sis->flags & SWP_FILE) { + struct kiocb kiocb; struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping; + struct bio_vec bv = { + .bv_page = page, + .bv_len = PAGE_SIZE, + .bv_offset = 0, + }; + struct iov_iter to = { + .type = ITER_BVEC | READ, + .count = PAGE_SIZE, + .iov_offset = 0, + .nr_segs = 1, + }; + to.bvec = &bv; /* older gcc versions are broken */ + + init_sync_kiocb(&kiocb, swap_file); + kiocb.ki_pos = page_file_offset(page); + kiocb.ki_nbytes = PAGE_SIZE; - ret = mapping->a_ops->readpage(swap_file, page); - if (!ret) + ret = mapping->a_ops->direct_IO(READ, &kiocb, &to, + kiocb.ki_pos); + if (ret == PAGE_SIZE) { + SetPageUptodate(page); count_vm_event(PSWPIN); + ret = 0; + } else { + PageError(page); /* XXX: maybe? */ + } + unlock_page(page); return ret; } -- 2.1.3