From patchwork Wed Mar 15 05:07:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 9624833 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 939CB6048C for ; Wed, 15 Mar 2017 05:07:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87326285E8 for ; Wed, 15 Mar 2017 05:07:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 789CA285EB; Wed, 15 Mar 2017 05:07:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 043AE285E8 for ; Wed, 15 Mar 2017 05:07:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750982AbdCOFH5 (ORCPT ); Wed, 15 Mar 2017 01:07:57 -0400 Received: from imap.thunk.org ([74.207.234.97]:60820 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750780AbdCOFH4 (ORCPT ); Wed, 15 Mar 2017 01:07:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=thunk.org; s=ef5046eb; h=References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From; bh=UBZhtlDFihd8l3DxeCmT2LlzroQ4E2AU95ductlu9Jw=; b=aHGJ35aLWsOic0hJcHfNqi4E9MqeRFmTcx5w60Buas452qb5dO867wnAIVVj3bGSkzKrTWUzll5AnGcVtdN7BYCij2e3I8PUDrEyS3gPI+iMj/F23psU8ckf7hu9EatZeaLlP2MlOVWUOWAITwU6EPTr7KVfCTM/+nN5NCfbqyI=; Received: from root (helo=callcc.thunk.org) by imap.thunk.org with local-esmtp (Exim 4.84_2) (envelope-from ) id 1co1A7-0001ex-QX; Wed, 15 Mar 2017 05:07:51 +0000 Received: by callcc.thunk.org (Postfix, from userid 15806) id C91AFC0032B; Wed, 15 Mar 2017 01:07:50 -0400 (EDT) From: Theodore Ts'o To: linux-fsdevel@vger.kernel.org Cc: Theodore Ts'o , Jan Kara , Michal Hocko , linux-mm@kvack.org Subject: [RFC PATCH] mm: retry writepages() on ENOMEM when doing an data integrity writeback Date: Wed, 15 Mar 2017 01:07:43 -0400 Message-Id: <20170315050743.5539-1-tytso@mit.edu> X-Mailer: git-send-email 2.11.0.rc0.7.gbe5a750 In-Reply-To: <20170309090449.GD15874@quack2.suse.cz> References: <20170309090449.GD15874@quack2.suse.cz> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, file system's writepages() function must not fail with an ENOMEM, since if they do, it's possible for buffered data to be lost. This is because on a data integrity writeback writepages() gets called but once, and if it returns ENOMEM and you're lucky the error will get reflected back to the userspace process calling fsync() --- at which point the application may or may not be properly checking error codes. If you aren't lucky, the user is unmounting the file system, and the dirty pages will simply be lost. For this reason, file system code generally will use GFP_NOFS, and in some cases, will retry the allocation in a loop, on the theory that "kernel livelocks are temporary; data loss is forever". Unfortunately, this can indeed cause livelocks, since inside the writepages() call, the file system is holding various mutexes, and these mutexes may prevent the OOM killer from killing its targetted victim if it is also holding on to those mutexes. A better solution would be to allow writepages() to call the memory allocator with flags that give greater latitude to the allocator to fail, and then release its locks and return ENOMEM, and in the case of background writeback, the writes can be retried at a later time. In the case of data-integrity writeback retry after waiting a brief amount of time. Signed-off-by: Theodore Ts'o --- As we had discussed in an e-mail thread last week, I'm interested in allowing ext4_writepages() to return ENOMEM without causing dirty pages from buffered writes getting list. It looks like doing so should be fairly straightforward. What do folks think? mm/page-writeback.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 290e8b7d3181..8666d3f3c57a 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2352,10 +2352,16 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc) if (wbc->nr_to_write <= 0) return 0; - if (mapping->a_ops->writepages) - ret = mapping->a_ops->writepages(mapping, wbc); - else - ret = generic_writepages(mapping, wbc); + while (1) { + if (mapping->a_ops->writepages) + ret = mapping->a_ops->writepages(mapping, wbc); + else + ret = generic_writepages(mapping, wbc); + if ((ret != ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL)) + break; + cond_resched(); + congestion_wait(BLK_RW_ASYNC, HZ/50); + } return ret; }