From patchwork Thu Jan 26 11:58:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 9539081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id ACFF1604A9 for ; Thu, 26 Jan 2017 12:05:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4B6B26E91 for ; Thu, 26 Jan 2017 12:05:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 994BD2819A; Thu, 26 Jan 2017 12:05:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 370D22818E for ; Thu, 26 Jan 2017 12:05:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753327AbdAZL7I (ORCPT ); Thu, 26 Jan 2017 06:59:08 -0500 Received: from mga01.intel.com ([192.55.52.88]:31179 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753143AbdAZL7F (ORCPT ); Thu, 26 Jan 2017 06:59:05 -0500 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP; 26 Jan 2017 03:58:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,289,1477983600"; d="scan'208";a="217875232" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga004.fm.intel.com with ESMTP; 26 Jan 2017 03:58:50 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id C583378A; Thu, 26 Jan 2017 13:58:26 +0200 (EET) From: "Kirill A. Shutemov" To: "Theodore Ts'o" , Andreas Dilger , Jan Kara , Andrew Morton Cc: Alexander Viro , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Vlastimil Babka , Matthew Wilcox , Ross Zwisler , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv6 35/37] ext4: reserve larger jounral transaction for huge pages Date: Thu, 26 Jan 2017 14:58:17 +0300 Message-Id: <20170126115819.58875-36-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170126115819.58875-1-kirill.shutemov@linux.intel.com> References: <20170126115819.58875-1-kirill.shutemov@linux.intel.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If huge pages enabled, in worst case with 2048 blocks underlying a page, each possibly in a different block group we have much more metadata to commit. Let's update estimates accordingly. I was not able to trigger bad situation without the patch as it's hard to construct very fragmented filesystem, but hopefully this change would be enough to address the concern. Signed-off-by: Kirill A. Shutemov --- fs/ext4/ext4_jbd2.h | 16 +++++++++++++--- fs/ext4/inode.c | 34 +++++++++++++++++++++++++++------- 2 files changed, 40 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index f97611171023..6e4e534d6e98 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -353,11 +353,21 @@ static inline int ext4_journal_restart(handle_t *handle, int nblocks) return 0; } +static inline int __ext4_journal_blocks_per_page(struct inode *inode, bool thp) +{ + int bpp = 0; + if (EXT4_JOURNAL(inode) != NULL) { + bpp = jbd2_journal_blocks_per_page(inode); + if (thp) + bpp <<= HPAGE_PMD_ORDER; + } + return bpp; +} + static inline int ext4_journal_blocks_per_page(struct inode *inode) { - if (EXT4_JOURNAL(inode) != NULL) - return jbd2_journal_blocks_per_page(inode); - return 0; + return __ext4_journal_blocks_per_page(inode, + (inode->i_flags & S_HUGE_MODE) != S_HUGE_NEVER); } static inline int ext4_journal_force_commit(journal_t *journal) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5bf68bbe65ec..c30562b6e685 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -141,6 +141,7 @@ static int __ext4_journalled_writepage(struct page *page, unsigned int len); static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh); static int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents); +static int __ext4_writepage_trans_blocks(struct inode *inode, int bpp); /* * Test whether an inode is a fast symlink. @@ -4496,6 +4497,21 @@ void ext4_set_inode_flags(struct inode *inode) !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) && !ext4_encrypted_inode(inode)) new_fl |= S_DAX; + + if ((new_fl & S_HUGE_MODE) != S_HUGE_NEVER && + EXT4_JOURNAL(inode) != NULL) { + int bpp = __ext4_journal_blocks_per_page(inode, true); + int credits = __ext4_writepage_trans_blocks(inode, bpp); + + if (EXT4_JOURNAL(inode)->j_max_transaction_buffers < credits) { + pr_warn_once("EXT4-fs (%s): " + "journal is too small for huge pages. " + "Disable huge pages support.\n", + inode->i_sb->s_id); + new_fl &= ~S_HUGE_MODE; + } + } + inode_set_flags(inode, new_fl, S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX); } @@ -5471,6 +5487,16 @@ static int ext4_meta_trans_blocks(struct inode *inode, int lblocks, return ret; } +static int __ext4_writepage_trans_blocks(struct inode *inode, int bpp) +{ + int ret = ext4_meta_trans_blocks(inode, bpp, bpp); + + /* Account for data blocks for journalled mode */ + if (ext4_should_journal_data(inode)) + ret += bpp; + return ret; +} + /* * Calculate the total number of credits to reserve to fit * the modification of a single pages into a single transaction, @@ -5484,14 +5510,8 @@ static int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int ext4_writepage_trans_blocks(struct inode *inode) { int bpp = ext4_journal_blocks_per_page(inode); - int ret; - - ret = ext4_meta_trans_blocks(inode, bpp, bpp); - /* Account for data blocks for journalled mode */ - if (ext4_should_journal_data(inode)) - ret += bpp; - return ret; + return __ext4_writepage_trans_blocks(inode, bpp); } /*