From patchwork Wed Aug 19 12:53:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11724027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 462E8618 for ; Wed, 19 Aug 2020 12:54:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 12B4A2083B for ; Wed, 19 Aug 2020 12:54:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LKqpsTCN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 12B4A2083B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 209B48D0011; Wed, 19 Aug 2020 08:54:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1BA468D0001; Wed, 19 Aug 2020 08:54:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0818E8D0011; Wed, 19 Aug 2020 08:54:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id E55388D0001 for ; Wed, 19 Aug 2020 08:54:11 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A24138248068 for ; Wed, 19 Aug 2020 12:54:11 +0000 (UTC) X-FDA: 77167311102.10.net67_000582b27028 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 780C916A0AE for ; Wed, 19 Aug 2020 12:54:11 +0000 (UTC) X-Spam-Summary: 1,0,0,ecce958be9be6447,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1437:1515:1535:1543:1711:1730:1747:1777:1792:1801:2393:2559:2562:2740:2894:3138:3139:3140:3141:3142:3354:3743:3865:3866:3867:3868:3870:3871:3872:3874:4117:4250:4605:5007:6261:6653:7514:8603:9413:10004:11026:11233:11473:11658:11914:12043:12291:12297:12438:12517:12519:12555:12895:13146:13161:13229:13230:13869:13894:14181:14394:14687:14721:21080:21324:21444:21451:21627:21666:21740:21990:30005:30054,0,RBL:209.85.210.193:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100;04yr49xp9yca1p7oxy6h5a8satnecopmdh8synhabgpywuqczot9h84e7aphm3q.437jgmo87dzic38xqgntn6g5adfyhtcxu1561ryte6xa5hebkzywm3o89o5fzt9.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: net67_000582b27028 X-Filterd-Recvd-Size: 6766 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 19 Aug 2020 12:54:11 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id f193so11627260pfa.12 for ; Wed, 19 Aug 2020 05:54:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=PnRFKRAedDIcgPdVOrDHp5lHsOtzTanDbhe4IRPxP8c=; b=LKqpsTCNNWud5otkV6T5Gl5AekemUr7F96//2IxrvX3NNPAtJz+npMbpX0qHnVBj12 BLP4+aWV3uG5rd2DL2h3EqsGuU2twiLFLuvfMrSVouQWdeMyCNgtTAjpTfwjFrjOEcYK XeACCj6r08YWcckRGHcVbisWLNWaxwLPCXCSvr0v2j3iYN5odNu6b9JcF7hJX48YWVqW FuBe/SPHmuc5CekDaeMdgKedJg9Y7bWMDRu/L54pQJYw2bGhroUJZyrpu2bhpANLA+Kt ycgJ8n9UwqBh88t0JZXiBKKBrpHc85DXXL69fTsBGrFTMkE1eaRFssxBtdx+LWWfvprz pMrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=PnRFKRAedDIcgPdVOrDHp5lHsOtzTanDbhe4IRPxP8c=; b=mMA/GML87TgNPzGP13lM7LGBEDYafj7B4eD4Ld2yHESHCV31jjxPNPAA4T97I5pNDT +BfDuHGmXVzPMCheiPWIHB3qfZxVvbd1iaW18cRfVFanKA/ukZfsyWPskbFbop7fYrcx W8KUlZydHuaYi1rwI8pFtfC586hkrpSrvQbP2DsCuf8d+aJF3O71cqgnHcFnuC9JI2ez TUeWW4eMu432cyQ6k+O7RbZwtYQLICHZ/SWzMVqRCVf1mfZqlsYzPKWtR1KLpiCLFFQD 0yYD2aJO4/QnarqKUrllr8dHdibqYwwryvM9ETt5h8qoLH/xdWUcf8eUPZoOLChXvHtV GREQ== X-Gm-Message-State: AOAM530AMdSR/uh/1YZIk7IWGzKVcbElgJTNWlTLEofAs/yEXVxufiuF SJ3hdpDnopzfTgeYYV6vYMM= X-Google-Smtp-Source: ABdhPJynmTgB9JqM7CUZrJw0aqDD1TaenZHzXqIb7z6sRPvOrLwgGp4XYR87l38q2708D4T+VhlkCg== X-Received: by 2002:a05:6a00:22c9:: with SMTP id f9mr19142584pfj.212.1597841650118; Wed, 19 Aug 2020 05:54:10 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (193-116-193-175.tpgi.com.au. [193.116.193.175]) by smtp.gmail.com with ESMTPSA id o16sm30903203pfu.188.2020.08.19.05.54.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Aug 2020 05:54:09 -0700 (PDT) From: Nicholas Piggin To: Linus Torvalds , Michal Hocko Cc: Nicholas Piggin , Oleg Nesterov , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Tim Chen , Anton Blanchard Subject: [PATCH RFC] mm: increase page waitqueue hash size Date: Wed, 19 Aug 2020 22:53:49 +1000 Message-Id: <20200819125349.558249-1-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 780C916A0AE X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page waitqueue hash is a bit small (256 entries) on very big systems. A 16 socket 1536 thread POWER9 system was found to encounter hash collisions and excessive time in waitqueue locking at times. This was intermittent and hard to reproduce easily with the setup we had (very little real IO capacity). The thought is some important pages happened to collide in the hash, slowing down page locking, causing the problem to snowball. An small test case was made where threads would write and fsync different pages, generating just a small amount of contention across many pages. Increasing page waitqueue hash size to 262144 entries increased throughput by 182% while also reducing standard deviation 3x. perf before the increase: 36.23% [k] _raw_spin_lock_irqsave - - | |--34.60%--wake_up_page_bit | 0 | iomap_write_end.isra.38 | iomap_write_actor | iomap_apply | iomap_file_buffered_write | xfs_file_buffered_aio_write | new_sync_write 17.93% [k] native_queued_spin_lock_slowpath - - | |--16.74%--_raw_spin_lock_irqsave | | | --16.44%--wake_up_page_bit | iomap_write_end.isra.38 | iomap_write_actor | iomap_apply | iomap_file_buffered_write | xfs_file_buffered_aio_write This patch uses alloc_large_system_hash to allocate a bigger system hash that scales somewhat with memory size. This hash could be made per-node, which should help reduce remote accesses on well localised workloads, but that adds some complexity with hotplug, so until we get a less artificial workload to test with, let's keep it simple. Signed-off-by: Nicholas Piggin --- mm/filemap.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 1aaea26556cc..d3cd158f0c3f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -969,19 +970,36 @@ EXPORT_SYMBOL(__page_cache_alloc); * at a cost of "thundering herd" phenomena during rare hash * collisions. */ -#define PAGE_WAIT_TABLE_BITS 8 -#define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) +#define PAGE_WAIT_TABLE_SIZE (1 << page_wait_table_bits) +#if CONFIG_BASE_SMALL +static const unsigned int page_wait_table_bits = 4; static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned; +#else +static unsigned int page_wait_table_bits __ro_after_init; +static wait_queue_head_t *page_wait_table __ro_after_init; +#endif static wait_queue_head_t *page_waitqueue(struct page *page) { - return &page_wait_table[hash_ptr(page, PAGE_WAIT_TABLE_BITS)]; + return &page_wait_table[hash_ptr(page, page_wait_table_bits)]; } void __init pagecache_init(void) { int i; + if (!CONFIG_BASE_SMALL) { + page_wait_table = alloc_large_system_hash("Page waitqueue hash", + sizeof(wait_queue_head_t), + 0, + 21, + 0, + &page_wait_table_bits, + NULL, + 0, + 0); + } + for (i = 0; i < PAGE_WAIT_TABLE_SIZE; i++) init_waitqueue_head(&page_wait_table[i]);