From patchwork Tue Nov 29 11:22:39 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 9451587 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C2A4E60710 for ; Tue, 29 Nov 2016 11:28:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A612B281B7 for ; Tue, 29 Nov 2016 11:28:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9AA8D281C3; Tue, 29 Nov 2016 11:28:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31C132823E for ; Tue, 29 Nov 2016 11:28:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757415AbcK2LZa (ORCPT ); Tue, 29 Nov 2016 06:25:30 -0500 Received: from mga02.intel.com ([134.134.136.20]:30097 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757339AbcK2LYO (ORCPT ); Tue, 29 Nov 2016 06:24:14 -0500 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 29 Nov 2016 03:23:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,568,1473145200"; d="scan'208";a="792071986" Received: from black.fi.intel.com ([10.237.72.28]) by FMSMGA003.fm.intel.com with ESMTP; 29 Nov 2016 03:23:53 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 7304F7A6; Tue, 29 Nov 2016 13:23:12 +0200 (EET) From: "Kirill A. Shutemov" To: "Theodore Ts'o" , Andreas Dilger , Jan Kara , Andrew Morton Cc: Alexander Viro , Hugh Dickins , Andrea Arcangeli , Dave Hansen , Vlastimil Babka , Matthew Wilcox , Ross Zwisler , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 11/36] HACK: readahead: alloc huge pages, if allowed Date: Tue, 29 Nov 2016 14:22:39 +0300 Message-Id: <20161129112304.90056-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.10.2 In-Reply-To: <20161129112304.90056-1-kirill.shutemov@linux.intel.com> References: <20161129112304.90056-1-kirill.shutemov@linux.intel.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Most page cache allocation happens via readahead (sync or async), so if we want to have significant number of huge pages in page cache we need to find a ways to allocate them from readahead. Unfortunately, huge pages doesn't fit into current readahead design: 128 max readahead window, assumption on page size, PageReadahead() to track hit/miss. I haven't found a ways to get it right yet. This patch just allocates huge page if allowed, but doesn't really provide any readahead if huge page is allocated. We read out 2M a time and I would expect spikes in latancy without readahead. Therefore HACK. Having that said, I don't think it should prevent huge page support to be applied. Future will show if lacking readahead is a big deal with huge pages in page cache. Any suggestions are welcome. Signed-off-by: Kirill A. Shutemov --- mm/readahead.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/readahead.c b/mm/readahead.c index fb4c99f85618..87e38b522645 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -174,6 +174,21 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp, if (page_offset > end_index) break; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) && + (!page_idx || !(page_offset % HPAGE_PMD_NR)) && + page_cache_allow_huge(mapping, page_offset)) { + page = __page_cache_alloc_order(gfp_mask | __GFP_COMP, + HPAGE_PMD_ORDER); + if (page) { + prep_transhuge_page(page); + page->index = round_down(page_offset, + HPAGE_PMD_NR); + list_add(&page->lru, &page_pool); + ret++; + goto start_io; + } + } + rcu_read_lock(); page = radix_tree_lookup(&mapping->page_tree, page_offset); rcu_read_unlock(); @@ -189,7 +204,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp, SetPageReadahead(page); ret++; } - +start_io: /* * Now start the IO. We ignore I/O errors - if the page is not * uptodate then the caller will launch readpage again, and