From patchwork Wed Oct 31 14:01:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10662749 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E90D14E2 for ; Wed, 31 Oct 2018 14:01:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F6BA2AF74 for ; Wed, 31 Oct 2018 14:01:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8D1052AF69; Wed, 31 Oct 2018 14:01:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24AC82AF31 for ; Wed, 31 Oct 2018 14:01:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729430AbeJaXAH (ORCPT ); Wed, 31 Oct 2018 19:00:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44358 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729345AbeJaXAG (ORCPT ); Wed, 31 Oct 2018 19:00:06 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 446D1307DAC2 for ; Wed, 31 Oct 2018 14:01:57 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id D77A7A47AC for ; Wed, 31 Oct 2018 14:01:56 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH 2/2] xfs: don't preempt writeback sequence on single page wb error Date: Wed, 31 Oct 2018 10:01:55 -0400 Message-Id: <20181031140155.17996-3-bfoster@redhat.com> In-Reply-To: <20181031140155.17996-1-bfoster@redhat.com> References: <20181031140155.17996-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 31 Oct 2018 14:01:57 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP xfs_do_writepage() currently returns errors directly regardless of whether it is called via ->writepages() or ->writepage(). In the case of ->writepages(), an xfs_do_writepage() error return breaks the current writeback sequence in write_cache_pages(). This means that an integrity writeback (i.e., sync), for example, returns before all associated pages have been processed. This can be problematic in cases like unmount. If the writeback doesn't process all delalloc pages before unmounting, we end up reclaiming inodes with non-zero delalloc block counts. In turn, this breaks block accounting and leaves the fs inconsistent. XFS explicitly discards delalloc blocks on such writepage failures to avoid this problem. This isn't terribly useful if we allow an integrity writeback to complete (and thus a filesystem to unmount) without addressing the entire set of dirty pages on an inode. Therefore, change ->writepage[s]() to track high level error state in the xfs_writepage_ctx structure and return it from the higher level operation callout rather than xfs_do_writepage(). This ensures that write_cache_pages() does not exit prematurely when called via ->writepages(), but both ->writepage() and ->writepages() still ultimately return an error for the higher level operation. This patch introduces a subtle change in the behavior of background writeback in the event of persistent errors. The current behavior of returning an error preempts the background writeback. Writeback eventually comes around again and repeats the process for a few more pages (in practice) before it once again fails. This repeats over and over until the entire set of dirty pages is cleaned. This behavior results in a somewhat slower stream of "page discard" errors in the system log and dictates that many repeated fsync calls may be required before the entire data set is processed and mapping error consumed. With this change in place, background writeback executes on as many pages as necessary as if each page writeback were successful. The pages are cleaned immediately and significantly more page discard errors can be observed at once. Signed-off-by: Brian Foster --- fs/xfs/xfs_aops.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 3feae3691467..438cfc66a40e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -32,6 +32,7 @@ struct xfs_writepage_ctx { unsigned int io_type; unsigned int cow_seq; struct xfs_ioend *ioend; + int error; }; struct block_device * @@ -798,7 +799,9 @@ xfs_writepage_map( end_page_writeback(page); done: mapping_set_error(page->mapping, error); - return error; + if (!wpc->error) + wpc->error = error; + return 0; } /* @@ -929,8 +932,8 @@ xfs_vm_writepage( ret = xfs_do_writepage(page, wbc, &wpc); if (wpc.ioend) - ret = xfs_submit_ioend(wbc, wpc.ioend, ret); - return ret; + ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error); + return ret ? ret : wpc.error; } STATIC int @@ -946,8 +949,8 @@ xfs_vm_writepages( xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc); if (wpc.ioend) - ret = xfs_submit_ioend(wbc, wpc.ioend, ret); - return ret; + ret = xfs_submit_ioend(wbc, wpc.ioend, wpc.error); + return ret ? ret : wpc.error; } STATIC int