From patchwork Fri May 29 02:58:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Wilcox X-Patchwork-Id: 11577521 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA4F060D for ; Fri, 29 May 2020 02:59:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6C86B2145D for ; Fri, 29 May 2020 02:59:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="q3hwUpTG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C86B2145D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 76FC6800C6; Thu, 28 May 2020 22:58:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C74D4800C3; Thu, 28 May 2020 22:58:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CFF4800CF; Thu, 28 May 2020 22:58:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id C85BA800C4 for ; Thu, 28 May 2020 22:58:30 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 97A69824556B for ; Fri, 29 May 2020 02:58:30 +0000 (UTC) X-FDA: 76868248380.21.bait42_100e6cf1c9301 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 7FA25180442C0 for ; Fri, 29 May 2020 02:58:30 +0000 (UTC) X-Spam-Summary: 2,0,0,cd125ac4936724fd,d41d8cd98f00b204,willy@infradead.org,,RULES_HIT:41:355:379:421:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:2731:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4117:4250:4321:5007:6119:6261:6653:7576:7875:7903:8603:8957:9036:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12555:12683:12895:12986:13894:14096:14110:14181:14394:14721:21080:21433:21451:21627:21990:30012:30054:30056:30090,0,RBL:198.137.202.133:@infradead.org:.lbl8.mailshell.net-64.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: bait42_100e6cf1c9301 X-Filterd-Recvd-Size: 6421 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Fri, 29 May 2020 02:58:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=LmM269lExxIQJ7CVxanji6dFU71GXV9lB3HgO/5Orxw=; b=q3hwUpTGIn+9oISEr3ab5BwGtA Hfxg9nIqnPdOv8pwDVCvKP/ddnFCI3cDitYFQSH0V52b4KF7fQww4M01jMsT6E2rrcS80occh7nkt HDMIGXbVouhhXhG46jJWat07hK07QMQr4GW3Az1eGEc3azOSp1vXshbu1/8ORUsEeKKCIypIjMpzK NLLJG4C4VOI2qNMs8NpLR2TlVhSCjuPAIVeFNw1PTWrDa0j47v5xMrroetqu/R5V+/NWvgnhLa4c3 61XpoLSTaUDnHFcORM7WeQjlKu0tCNkAEvRvVqzxuUoeWDOMKGXVJqQdABos07uKkoNqpHCbuSBxM mDvehopg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jeVE4-0008U6-1n; Fri, 29 May 2020 02:58:28 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 38/39] mm: Add large page readahead Date: Thu, 28 May 2020 19:58:23 -0700 Message-Id: <20200529025824.32296-39-willy@infradead.org> X-Mailer: git-send-email 2.21.1 In-Reply-To: <20200529025824.32296-1-willy@infradead.org> References: <20200529025824.32296-1-willy@infradead.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7FA25180442C0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Matthew Wilcox (Oracle)" If the filesystem supports large pages, allocate larger pages in the readahead code when it seems worth doing. The heuristic for choosing larger page sizes will surely need some tuning, but this aggressive ramp-up seems good for testing. Signed-off-by: Matthew Wilcox (Oracle) --- mm/readahead.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 87 insertions(+), 6 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 74c7e1eff540..ac16e96a8828 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -149,7 +149,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages, blk_finish_plug(&plug); - BUG_ON(!list_empty(pages)); + BUG_ON(pages && !list_empty(pages)); BUG_ON(readahead_count(rac)); out: @@ -428,13 +428,92 @@ static int try_context_readahead(struct address_space *mapping, return 1; } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline int ra_alloc_page(struct readahead_control *rac, pgoff_t index, + pgoff_t mark, unsigned int order, gfp_t gfp) +{ + int err; + struct page *page = __page_cache_alloc_order(gfp, order); + + if (!page) + return -ENOMEM; + if (mark - index < (1UL << order)) + SetPageReadahead(page); + err = add_to_page_cache_lru(page, rac->mapping, index, gfp); + if (err) + put_page(page); + else + rac->_nr_pages += 1UL << order; + return err; +} + +static bool page_cache_readahead_order(struct readahead_control *rac, + struct file_ra_state *ra, unsigned int order) +{ + struct address_space *mapping = rac->mapping; + unsigned int old_order = order; + pgoff_t index = readahead_index(rac); + pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; + pgoff_t mark = index + ra->size - ra->async_size; + int err = 0; + gfp_t gfp = readahead_gfp_mask(mapping); + + if (!mapping_large_pages(mapping)) + return false; + + limit = min(limit, index + ra->size - 1); + + /* Grow page size up to PMD size */ + if (order < HPAGE_PMD_ORDER) { + order += 2; + if (order > HPAGE_PMD_ORDER) + order = HPAGE_PMD_ORDER; + while ((1 << order) > ra->size) + order--; + } + + /* If size is somehow misaligned, fill with order-0 pages */ + while (!err && index & ((1UL << old_order) - 1)) + err = ra_alloc_page(rac, index++, mark, 0, gfp); + + while (!err && index & ((1UL << order) - 1)) { + err = ra_alloc_page(rac, index, mark, old_order, gfp); + index += 1UL << old_order; + } + + while (!err && index <= limit) { + err = ra_alloc_page(rac, index, mark, order, gfp); + index += 1UL << order; + } + + if (index > limit) { + ra->size += index - limit - 1; + ra->async_size += index - limit - 1; + } + + read_pages(rac, NULL, false); + + /* + * If there were already pages in the page cache, then we may have + * left some gaps. Let the regular readahead code take care of this + * situation. + */ + return !err; +} +#else +static bool page_cache_readahead_order(struct readahead_control *rac, + struct file_ra_state *ra, unsigned int order) +{ + return false; +} +#endif + /* * A minimal readahead algorithm for trivial sequential/random reads. */ static void ondemand_readahead(struct address_space *mapping, struct file_ra_state *ra, struct file *file, - bool hit_readahead_marker, pgoff_t index, - unsigned long req_size) + struct page *page, pgoff_t index, unsigned long req_size) { DEFINE_READAHEAD(rac, file, mapping, index); struct backing_dev_info *bdi = inode_to_bdi(mapping->host); @@ -473,7 +552,7 @@ static void ondemand_readahead(struct address_space *mapping, * Query the pagecache for async_size, which normally equals to * readahead size. Ramp it up and use it as the new readahead size. */ - if (hit_readahead_marker) { + if (page) { pgoff_t start; rcu_read_lock(); @@ -544,6 +623,8 @@ static void ondemand_readahead(struct address_space *mapping, } rac._index = ra->start; + if (page && page_cache_readahead_order(&rac, ra, compound_order(page))) + return; __do_page_cache_readahead(&rac, ra->size, ra->async_size); } @@ -578,7 +659,7 @@ void page_cache_sync_readahead(struct address_space *mapping, } /* do read-ahead */ - ondemand_readahead(mapping, ra, filp, false, index, req_count); + ondemand_readahead(mapping, ra, filp, NULL, index, req_count); } EXPORT_SYMBOL_GPL(page_cache_sync_readahead); @@ -624,7 +705,7 @@ page_cache_async_readahead(struct address_space *mapping, return; /* do read-ahead */ - ondemand_readahead(mapping, ra, filp, true, index, req_count); + ondemand_readahead(mapping, ra, filp, page, index, req_count); } EXPORT_SYMBOL_GPL(page_cache_async_readahead);