From patchwork Tue Jan 23 15:32:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13527605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B9EBC47258 for ; Tue, 23 Jan 2024 15:33:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AA026B007E; Tue, 23 Jan 2024 10:33:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1321F6B0080; Tue, 23 Jan 2024 10:33:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F14A16B0082; Tue, 23 Jan 2024 10:33:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DB6056B007E for ; Tue, 23 Jan 2024 10:33:02 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B1CA7160B5B for ; Tue, 23 Jan 2024 15:33:02 +0000 (UTC) X-FDA: 81710969004.08.862A8B6 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf19.hostedemail.com (Postfix) with ESMTP id 06C961A000B for ; Tue, 23 Jan 2024 15:32:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YjEJGnb4; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=bluJ3u3w; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YjEJGnb4; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=bluJ3u3w; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706023979; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=n9ZQ+lfieOp07NChHPLn0Nl1T3KepbyRYuFNcNECB/o=; b=bLe/sl3XTRvr+PSe9QppEXLLZM6abhpPjpWw4Px7jdeWYb1GepjV1oec7h1cpyWrkwidZ2 cWNP7rQPMqNrSbP6Mme1xWm3VcGrrwDYtwuUGR5zaTJha5QgBPRwcUET2Jx3joxwXqI3m4 Rb5hjJ5r4t2GRQ/bDEqrOFbWUWKKKXs= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YjEJGnb4; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=bluJ3u3w; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YjEJGnb4; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=bluJ3u3w; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706023979; a=rsa-sha256; cv=none; b=Ws1DiLShnC4+h9XKkwAOm80e6ZCiaUXLiklA5unPMHfTJHdw7jaKOuKz9L2Ei+Pf+6zFLD 6qu3W7llTU6nEMxaWBbE+B70MfyG31f1RvIfJYJSQvBj+ZsOa68NnJKVFOziorL13sh+Tm f2NZwpyyLUxrYR2yv+ijzBPSuvQzUd0= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CE9B62234A; Tue, 23 Jan 2024 15:32:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1706023976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=n9ZQ+lfieOp07NChHPLn0Nl1T3KepbyRYuFNcNECB/o=; b=YjEJGnb4ZQq2d+TXUKmdcZvy/QipXPVe8Dy30xqcA6+hDhybdMmQHmYuFXeYBzAX8mn0aw SxYdf24g/qGgyDq6fhUaUeIhSh5sQMOAv1xuIFfbFQ6DKhT+Zlaw9gS8CrPTq3b+pUaZvD STPtPwLUFlenkn3RiH3AzeousEZJIp8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1706023976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=n9ZQ+lfieOp07NChHPLn0Nl1T3KepbyRYuFNcNECB/o=; b=bluJ3u3w5hCySNUiLvqGhOGybBX1rg7kj2ta7ReOfbCJWU07aQBBRRFZtwDS51dO8vPfXd BZo/u4HhweCGt/Dw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1706023976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=n9ZQ+lfieOp07NChHPLn0Nl1T3KepbyRYuFNcNECB/o=; b=YjEJGnb4ZQq2d+TXUKmdcZvy/QipXPVe8Dy30xqcA6+hDhybdMmQHmYuFXeYBzAX8mn0aw SxYdf24g/qGgyDq6fhUaUeIhSh5sQMOAv1xuIFfbFQ6DKhT+Zlaw9gS8CrPTq3b+pUaZvD STPtPwLUFlenkn3RiH3AzeousEZJIp8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1706023976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=n9ZQ+lfieOp07NChHPLn0Nl1T3KepbyRYuFNcNECB/o=; b=bluJ3u3w5hCySNUiLvqGhOGybBX1rg7kj2ta7ReOfbCJWU07aQBBRRFZtwDS51dO8vPfXd BZo/u4HhweCGt/Dw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id BBE74136A4; Tue, 23 Jan 2024 15:32:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id RDzbLSjcr2UueAAAD6G6ig (envelope-from ); Tue, 23 Jan 2024 15:32:56 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 36DF1A0803; Tue, 23 Jan 2024 16:32:56 +0100 (CET) From: Jan Kara To: Andrew Morton Cc: , , Matthew Wilcox , Guo Xuenan , Jan Kara Subject: [PATCH v2] readahead: Avoid multiple marked readahead pages Date: Tue, 23 Jan 2024 16:32:54 +0100 Message-Id: <20240123153254.5206-1-jack@suse.cz> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3877; i=jack@suse.cz; h=from:subject; bh=rcud75o9Odu9Y2B8NmGQCe7DOyOTs1VuoiONycZRC8A=; b=owGbwMvMwME4Z+4qdvsUh5uMp9WSGFLX3xE9/i6N8dqs1n79As7gQOsvAd2/+R5OsPbgfBe3unz3 uXU3OxmNWRgYORhkxRRZVkde1L42z6hra6iGDMwgViaQKQxcnAIwEWU2DoZJc/XifzU8mOsizJsq8v huj1Z9f6xxcKX64QQunQ3JHg8P7PBS6yir6lrBd9B3Y/aamqYzC3kUDxSbszx94FjpL3KteGNGbI7A Hpk6mYdBj4rFekOX79hhuNqtm2lpxFRNq2WaU7VapwT/rz8zR+O9+nVZZm9tzb8HT8T+1j9zTqOh1r TlXOgWxjaTbdLT2u6f86xeXHAwOuDtj2Qe3dMaguLTP9g9dJ7wJEdywbpK1YbuLOfFbVbKOa6zfgsF fnQIjV06+axD+faEt9sY4h8+CLaPeu1nriisw6PG68lwL33heUfuuUxXFV9sXFU+M+jvSvEfHg4LBb tcPke1BdiYe4ncUjZw/9hxY+8PAA== X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C X-Spamd-Bar: + X-Rspam-User: X-Stat-Signature: b4jbp9uac3h8ekybnmutmqanxz196o65 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 06C961A000B X-HE-Tag: 1706023978-848370 X-HE-Meta: U2FsdGVkX1/jmI6au8UzDeCda9sm2kM2w6LGTjESqL+W007svtXjfW0RYOHEvhZmXs49d0+euY8PrZOWMrIge4LkIwhlEhqLZyFSKTMHGWD3jLyPnuxnvM+Z795FWCw+FckMqMk0Tl9w8cPdL5hs+Q4UvAZqxJ1Dzc7zcGG/LBw2r7GbAeWXGuo+x2JTFHuZKQJ/GNhgMYKjM71ZXsWfwuyhQADV+3g41QjsAXPEcZ1vLhQf3ugSEY3I71wv3CAosFl9E76opD1cYMPV3gchYC4l5fSHOJ8coP12Slib0eIwiqf36ycSn34KceVDFI1ymUHt84jv2qgEfHVuQ4JEErkWvQe8NGGXiRbcCWKtb2MDZwbwL5HZpbqcgUZmFw9upzgM8h805zgjIf0N23c/LQgFCrcNQk4+FvmAKC2mVat27ISpTbZkJlTd//R5uKH+2DI3+j5nAgSx6Jf9BIfhWIxKuhDFeqQSt0oOwUg4w12IOiN6NU8XtQdPeXttj2o3m8P2xUYrV1pbaO09fsJmG0ZSjDD4a5ubPXsITOXmTuaquiimuCBd72+P0gsLttM65fOuTGkAgyM15fCbaCW8VausAha5e+hr5Re+JZGV9iU3JvIQBpp+x12lhYSKsiYnqJBIjpfN1WG432wtw8Th7MDXRLfyY7/ZZUKWjLHfPQwdeM3XLsBp8eAP2bmx2FFx3rp1zjs5O61JlJ4AM3kqgSc5ylF4mxxusL8k3zsxeJSsMQg+Pj8JDm/g/TzpWeUMi0+2cu918zLMrE/Xyi6s4M/Ph6pKKpH5CrrCZMKtJzW58SB7A3QHAYPJsRpcwkcJEUkUndB5EbSkJI+b6P7Fb+sU/dfj55cZeAmQRIOHfOELU9HVohYjL74RMtk2wUCAK5w0AKN2a3k3F1UNes8xLkLU4Oad+1I5JFAgcSVjVJe4sEKl2MaDEucGUD534rHrVfTzNvZgLIW25Zrts5/ 2/V+TTIR eZNyYXAiV4ls37d7ZEv8To2i2fHhZlVf8pY1dUygX30ODkAsh9+7AJ5vKydwEC/Ok5prtbHuVi24v5detnDwHcvJYDg1SdBLn/ToRe/FK8oV6iKMUMnHsTSiA0y4q7NS5sG2GDykVATgf2ISqfkP+fv8y3R8dtUUAb092B3WXW6eXWDuE0cnmWLDLLIqfpJjXMHHSxdmglU/2aSE1hxL6VQWRMoCBIdHTAuHYrxWmJiYq+fPLz9GE98uXIGTrPyhurr9L0uwl9Fs0mSBlq7eXz1ARbkZ7XJoFvuIx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ra_alloc_folio() marks a page that should trigger next round of async readahead. However it rounds up computed index to the order of page being allocated. This can however lead to multiple consecutive pages being marked with readahead flag. Consider situation with index == 1, mark == 1, order == 0. We insert order 0 page at index 1 and mark it. Then we bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page at index 2 is marked as well. Then we bump order to 2, index is incremented to 4, mark gets rounded to 4 so page at index 4 is marked as well. The fact that multiple pages get marked within a single readahead window confuses the readahead logic and results in readahead window being trimmed back to 1. This situation is triggered in particular when maximum readahead window size is not a power of two (in the observed case it was 768 KB) and as a result sequential read throughput suffers. Fix the problem by rounding 'mark' down instead of up. Because the index is naturally aligned to 'order', we are guaranteed 'rounded mark' == index iff 'mark' is within the page we are allocating at 'index' and thus exactly one page is marked with readahead flag as required by the readahead code and sequential read performance is restored. This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios"). The commit changed the rounding with the rationale: "... we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read." Although this is true, the fact is this was always the case with read sizes not aligned to folio boundaries and large folios in the page cache just make the situation more obvious (and frequent). Also for sequential read workloads it is better to trigger the readahead earlier rather than later. It is true that the difference in the rounding and thus earlier triggering of the readahead can result in reading more for semi-random workloads. However workloads really suffering from this seem to be rare. In particular I have verified that the workload described in commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading random 100k blocks from a file like: [reader] bs=100k rw=randread numjobs=1 size=64g runtime=60s is not impacted by the rounding change and achieves ~70MB/s in both cases. Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") CC: Guo Xuenan Signed-off-by: Jan Kara --- mm/readahead.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Changes since v1: * Fixed one more place where mark rounding was done as well v1: https://lore.kernel.org/all/20240104085839.21029-1-jack@suse.cz diff --git a/mm/readahead.c b/mm/readahead.c index 6925e6959fd3..1d1a84deb5bc 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index, if (!folio) return -ENOMEM; - mark = round_up(mark, 1UL << order); + mark = round_down(mark, 1UL << order); if (index == mark) folio_set_readahead(folio); err = filemap_add_folio(ractl->mapping, folio, index, gfp); @@ -577,7 +577,7 @@ static void ondemand_readahead(struct readahead_control *ractl, * It's the expected callback index, assume sequential access. * Ramp up sizes, and push forward the readahead window. */ - expected = round_up(ra->start + ra->size - ra->async_size, + expected = round_down(ra->start + ra->size - ra->async_size, 1UL << order); if (index == expected || index == (ra->start + ra->size)) { ra->start += ra->size;