From patchwork Thu Jan 4 08:58:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 13510897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 965C4C47074 for ; Thu, 4 Jan 2024 08:58:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30E8D6B035A; Thu, 4 Jan 2024 03:58:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BF7E6B035B; Thu, 4 Jan 2024 03:58:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15F876B035C; Thu, 4 Jan 2024 03:58:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 052B56B035A for ; Thu, 4 Jan 2024 03:58:47 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D317A1C0E7E for ; Thu, 4 Jan 2024 08:58:46 +0000 (UTC) X-FDA: 81641028252.03.9B4A864 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf01.hostedemail.com (Postfix) with ESMTP id 8451D4000C for ; Thu, 4 Jan 2024 08:58:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lkMsM0GC; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6iW/ShH9"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lkMsM0GC; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6iW/ShH9"; spf=pass (imf01.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704358724; a=rsa-sha256; cv=none; b=kdliI6Fb6AdgfiDCql2w3591XU4xOlJ3R7CGFhQjPokzWs8PQdGbegw66Gz5dkjIOkX5Ff PhJabKb7/h0hTeGeofTiAwMSBtv+vTq3piOmnlbtYxQ0x38MUxh5DRoxVtfdmFY7lJyeIt gRbrRJ8tgcynrSla7Uo5duoY/gokvLA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lkMsM0GC; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6iW/ShH9"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lkMsM0GC; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="6iW/ShH9"; spf=pass (imf01.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704358724; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=rR6ap2Hh5Afb3mxfpkpDyaMovj9vOv//LhJyDYEU988=; b=IpH9scMyJdph5CZ+WxbqniiXL09EptYez3Xe/BYR0+PerfmOqgIA7d33W3A9D64jzA1yBk 2Ci2ruc1Y+I7soi8xOnbLNJEgaJXDrE/SM0aMYXyiaCwcsxx6ArwT0juYCz0SwAJIBw6nF 1uyKHqBdccBZJTrIFhWWwPoLplkucA4= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 75E071F7F6; Thu, 4 Jan 2024 08:58:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1704358722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rR6ap2Hh5Afb3mxfpkpDyaMovj9vOv//LhJyDYEU988=; b=lkMsM0GC566Xb7xycDksgRQALZo1t1rRu8SdedVraTDagTAga8B9uIE5xA6kdsxThl9jax 8Obh91HbRxf65wgS9BbZMINAMPKDLBkCv+Fy64v1D/m8q7M6Ao+ldsEnOztSLjYSF0XMg7 fk1oRCeE1JqKWp2RV+p0DwKl6HeMxro= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1704358722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rR6ap2Hh5Afb3mxfpkpDyaMovj9vOv//LhJyDYEU988=; b=6iW/ShH9ZZX4prWWBpJAmL2/jzEXloPwkIX+KVoaIF9/kjRJ0TV9jrgPmfiv7z+D2cMcJr 0DsYqFidkKIur3Ag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1704358722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rR6ap2Hh5Afb3mxfpkpDyaMovj9vOv//LhJyDYEU988=; b=lkMsM0GC566Xb7xycDksgRQALZo1t1rRu8SdedVraTDagTAga8B9uIE5xA6kdsxThl9jax 8Obh91HbRxf65wgS9BbZMINAMPKDLBkCv+Fy64v1D/m8q7M6Ao+ldsEnOztSLjYSF0XMg7 fk1oRCeE1JqKWp2RV+p0DwKl6HeMxro= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1704358722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rR6ap2Hh5Afb3mxfpkpDyaMovj9vOv//LhJyDYEU988=; b=6iW/ShH9ZZX4prWWBpJAmL2/jzEXloPwkIX+KVoaIF9/kjRJ0TV9jrgPmfiv7z+D2cMcJr 0DsYqFidkKIur3Ag== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 66D0913722; Thu, 4 Jan 2024 08:58:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id foAVGUJzlmWQVwAAD6G6ig (envelope-from ); Thu, 04 Jan 2024 08:58:42 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 03A58A07EF; Thu, 4 Jan 2024 09:58:41 +0100 (CET) From: Jan Kara To: Matthew Wilcox Cc: , , Jan Kara , Guo Xuenan Subject: [PATCH] readahead: Avoid multiple marked readahead pages Date: Thu, 4 Jan 2024 09:58:39 +0100 Message-Id: <20240104085839.21029-1-jack@suse.cz> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3264; i=jack@suse.cz; h=from:subject; bh=+UgtsDahSVLdNC5e29UeWUH6z6ypziO0BPHhaCZTopY=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBllnM1WAVFvYILan4IAuUJJoJtF8CQZ2zGZPmQSlyM Qe44qAKJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCZZZzNQAKCRCcnaoHP2RA2RifB/ 4llYzia2TKkEMBhnoCE6IltG0Vd45gb1DVtDgpwZXx3NvwQeKgB6CZxxu+AmLrJhQQPgZayzWOhopb VT3XU1Hz0pubS2KTrFXAMulQbTpaIcp2ydgXw96eu/6ovvoRQTOCJ3PNqLGqD+oHt5OtRuVF6hdYR7 +g0Duvz0pF9wZVYRbmeSMdhdIixv+MY+Ik0uDV5ICm5X2mI6bOCP+LuoqyWvgD0mgFVrqrKArTAJ68 nW69MusTutbenLQiGoKQ0DS4jX2tA3jRnGVJCK8vYxwC51ddhFJFlWMPJce9oZsUjGfYdLlCgSlXur VmQaGXgAfJN6FQ9BpXY4swLqWU2AmI X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C X-Spamd-Bar: ++++ X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8451D4000C X-Stat-Signature: e11ozjemba7suxhsm1oqocs7z77coy4g X-Rspam-User: X-HE-Tag: 1704358724-162238 X-HE-Meta: U2FsdGVkX1/El/ddY8mQg7G0FfYBVb9B3KAnFmn0jqmJe1R3m7qfTi8bwDxVCI3/M9EHVlyDWjoq8ZZDfQ3+BvKlubVzKiGI0cRwVcKV2U6MgVPRfJLapB7vPEm9ztHnHZQaNhyB9FfVaxrf/jMIs0hPlgmVlMzF5evhEYv6WlzH8p71yrsy3Xs1MIyE6xieVI6MFx34a2I1As9fwi6jYeRAFdwApuj5zHnXaD2+5In3YrJhjObGJXgx4U/xocq4lm/wfKscLWrY5HKw2+/xPrejku2j4XEqRVOqzlYuqEn2ypQw39ZDtHQJDyHhPktT04D+kW96xveaioeFRtYgEg2mtJsAB5UBhB/vZ+Og/vjt7E+lsT4wbKgDNSEhhrUmXJpfdO8z354QluHc1OLv8xkcd95f89B0T7rkXpx+msHioW/4Oj9pyPvS20kYAXimrKFkC72Nd4YPd43He4KlGJWfMaETXXwzmohYQgSJF+JvnXxciRsaO+ua8uvYVfaKFuRcKmdmnq6t4SXg/6eBvoZV0DmzyZRgwYOrD4TzHEl7D5GWvvejki/Y35vP0C61rLml38IpISFuNu9CgESPVGaJVMkclo+Ls3S1cULn6y4YyVqJsqXb0jKZqEHZTqUGRl+JzEYhz9qvnPKtzjMk+ZaY9HIClJbFwh8rPcJBoj/34OkrVYgQ+hEybqnm0UhTsjmOEp4uE81/klksL2R0c5AE0yWpJC0e55cNz/6TPnzEixDKTrUFHqJsllIFdi9xOuIRMbZ/3S9UwYnBiu1C11ajQEkTeSCPvnAyW/E0ZYbQ375Y1Dbn6+q9Ze4aDkZ3OSS9TE9MHbt75kwvtyeghbWIqxJUAOAkQBZfONYdUdnaYvhWGCH08f7yVufXtsvGE2AujO1zrNGNLTGGsy1P+V2SG6CPWBOBPjtuLam1QjYkMWX1w6zKHPJsASOH/bPPZxrqdfZFS7KQD/vxmfp Z1rxSDxF LcgSuPcjRuD2Aw1jqW4Rr3Cfuux2FmUw3k0TK53vSkYu+sUEiOuOAGkpvG2ZqEH/Zo4OuSj3wZIZTUzUjYkZsgpoahbhC2GRNFI/X323omchfczifuyAcJ4lrTpUux92xSR49FkaI9XiooxJpntXt1QNH9JpiFJxy+rRNcxtek6hDlROMfxD0GgNHg+xIqCR/GXwMf4uG8GS1veIJwUXRRpvfCO+KHliGhrR5HtjrGMWLuR4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ra_alloc_folio() marks a page that should trigger next round of async readahead. However it rounds up computed index to the order of page being allocated. This can however lead to multiple consecutive pages being marked with readahead flag. Consider situation with index == 1, mark == 1, order == 0. We insert order 0 page at index 1 and mark it. Then we bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page at index 2 is marked as well. Then we bump order to 2, index is incremented to 4, mark gets rounded to 4 so page at index 4 is marked as well. The fact that multiple pages get marked within a single readahead window confuses the readahead logic and results in readahead window being trimmed back to 1. This situation is triggered in particular when maximum readahead window size is not a power of two (in the observed case it was 768 KB) and as a result sequential read throughput suffers. Fix the problem by rounding 'mark' down instead of up. Because the index is naturally aligned to 'order', we are guaranteed 'rounded mark' == index iff 'mark' is within the page we are allocating at 'index' and thus exactly one page is marked with readahead flag as required by the readahead code and sequential read performance is restored. This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios"). The commit changed the rounding with the rationale: "... we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read." Although this is true, the fact is this was always the case with read sizes not aligned to folio boundaries and large folios in the page cache just make the situation more obvious (and frequent). Also for sequential read workloads it is better to trigger the readahead earlier rather than later. It is true that the difference in the rounding and thus earlier triggering of the readahead can result in reading more for semi-random workloads. However workloads really suffering from this seem to be rare. In particular I have verified that the workload described in commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading random 100k blocks from a file like: [reader] bs=100k rw=randread numjobs=1 size=64g runtime=60s is not impacted by the rounding change and achieves ~70MB/s in both cases. Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") CC: Guo Xuenan Signed-off-by: Jan Kara --- mm/readahead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/readahead.c b/mm/readahead.c index 6925e6959fd3..3032fbdce276 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index, if (!folio) return -ENOMEM; - mark = round_up(mark, 1UL << order); + mark = round_down(mark, 1UL << order); if (index == mark) folio_set_readahead(folio); err = filemap_add_folio(ractl->mapping, folio, index, gfp);