From patchwork Wed Aug 23 05:06:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13361597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52208EE49B0 for ; Wed, 23 Aug 2023 05:06:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4196D940007; Wed, 23 Aug 2023 01:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32C7C28003C; Wed, 23 Aug 2023 01:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CCE9940036; Wed, 23 Aug 2023 01:06:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0DB24940007 for ; Wed, 23 Aug 2023 01:06:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DA0F340128 for ; Wed, 23 Aug 2023 05:06:18 +0000 (UTC) X-FDA: 81154183236.12.1DD7C4B Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf21.hostedemail.com (Postfix) with ESMTP id 1D80D1C0011 for ; Wed, 23 Aug 2023 05:06:15 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=g4+9g8Ux; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692767176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=sqKYKpOEi3Kr9hF/miWkxFJLPX34bcvccfAOYoYKAjI=; b=HnoggAsDKpnEHwGB3J1TVwYggwXqxz+oCsTEV+DV86zYLVxB6AQ5SdUhrgXYk7ub3p031c tiWVmjUZijnQ8PFttNwcaNBFDQKs5KeALlEo1lnuyF1ldkS4+LpoBeOMhV8TNOAolm1ghG CWbyDKyaqJB1XCbrEOfxbzu1jBR9zbw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=g4+9g8Ux; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692767176; a=rsa-sha256; cv=none; b=nMU5CCTOTOS4kBd7M6d2Q8wExdWqnW8Aa0DmAHa+WkwnkRDn48oqxJNu+Y3gV8eOMXzhgr dVb7Oehm3Ao2SYraaplgzXyNiSJG5gTf4iCFMCoHvpuazEwqTAePgZIY9rX9cEkD06lFzw tYgpGXRb+zrBV6XBDduxnDso8zR3qNk= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-9a18a4136a9so436866366b.2 for ; Tue, 22 Aug 2023 22:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692767174; x=1693371974; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=sqKYKpOEi3Kr9hF/miWkxFJLPX34bcvccfAOYoYKAjI=; b=g4+9g8UxCOKkr8I5LirgjBpbjHT1YjvTGeuZbCsizGhtiHgqGXdLhrnlUSB/a8t1Vt 8HFt1WMDsL31KSg9V2XLPJlNV7UzY0oOEJI2q3f2SeGRBq2Jc3dhPG5UhBxePEGfQ/C7 vrItswwQFtDe0wYnN2i2dFpvch2hJkZKyXDxRcxBskbzFFGFEZgdpatn5q1rIVWMS+Lg ES/uQy/WvB9ojP8suJplWSUeTYHIh4/bGZ85GwhZ4eLCaZaf8raGPALw9dX8GLa0wTd4 M1QJemsoj+rfBlhPw4kNYP8UIc32yFU4tRoNOwF1OE12Fd/6vNmT3lpyPatZnEXMlE45 GlGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692767174; x=1693371974; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sqKYKpOEi3Kr9hF/miWkxFJLPX34bcvccfAOYoYKAjI=; b=clcISLR+/k6FB1J1fZokArp2paAIflfdvI+jMfQ4aoyL+dxj7CTaIbIfhLhUc/7C0k igdSFVX4c+CRhGfo9sWJKFyYKPNwQlEqo3NMTYsp+NOoGtJQFijnf+eVWbwf1o06Dnfe objOM8nKRnVkPpdOmqOqRTpMXLMRaH0WYxMk2t6mn4zta25MDRJ1TCW8JZAGBFcbUafy pb+H82BSoBrb0h8T8ShLhDBzqT982HQX8120mPczuIthsPiYj8s1yxQ8eGBzLtUrea6D enwsw9/dG5u8ci71Sex1aW/8CICHQMgSjlQ352v/YLNB2ykJH/5HK7BxoSwMDcWpAYPP 3wmg== X-Gm-Message-State: AOJu0Yy98gD9LPprC6At7ok2YJWibsAPB4uuLyj8snpb61MUdgaY6l4A MFujjxY7vcNJwnqZG2ZTHOE= X-Google-Smtp-Source: AGHT+IF8JRl9gwD6rVjqyqMT4L6TPhIzI+CuO/MXDsiX8e0PGSfkLPonABmt6rEwYqFMXWLE1mOpaw== X-Received: by 2002:a17:907:b15:b0:99d:e858:4160 with SMTP id h21-20020a1709070b1500b0099de8584160mr8958916ejl.49.1692767174224; Tue, 22 Aug 2023 22:06:14 -0700 (PDT) Received: from f.. (cst-prg-85-121.cust.vodafone.cz. [46.135.85.121]) by smtp.gmail.com with ESMTPSA id q16-20020a170906b29000b0099ddc81903asm9267401ejz.221.2023.08.22.22.06.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 22:06:13 -0700 (PDT) From: Mateusz Guzik To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, vegard.nossum@oracle.com, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH v3 0/2] execve scalability issues, part 1 Date: Wed, 23 Aug 2023 07:06:07 +0200 Message-Id: <20230823050609.2228718-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1D80D1C0011 X-Stat-Signature: 8mhtp3uxax4zbyambnjp4ijjbuou33nr X-Rspam-User: X-HE-Tag: 1692767175-134113 X-HE-Meta: U2FsdGVkX19UeGbT7RPSloNHX4cGYlTkqqW4SPSetIDIf3o2xEqNklJcnAXTq72agG8VNLGQhT3vgB29fEv+BRrQYJAG50XHFPYY3EaeGYuqr1M9cl7bxyWLWpDek2zaCRGIay9olbWCb3k7BVL225HrmOH6widskovHTR2w9CSPI47qcky7bxyTpjhE7EkC0T8MOH0cEH3DoD1QYtYS3qldiQCtjIrgbuTx4a/YNiGyl/BH7uW+HuaiS8TAUXogLhVes4Y/LcEBkP0nUVovsjAvIT2G5jksK0QCCvqauJ+uz01IT7m8A/AfbOMutEwB5ruWXRVdGmWa7t73SdIm4nLCUcrbQfZAHXZOwz3tDvt7pM1gW+aIjwnAG+V4dNbRvWBQXuUgMbBzFlHClm1mzx/H5pTe0hYHdyddfR6tIwRy9/EMjS6CtUu3H41dV+4eV058FGjUuTpRVTUyOumcQSHCPNWXuYh1L4YFhPvrjZ5wWCCDBIQU1VlCXYo3bWgaC+2KbSGHrIGAwip9wLWw7XWuyqZFp3KvepA1X4caniaepyVJ0/0L9jlZQGDp1BjxPYQFWoZ+CAhpzXbXnLt8KUSNGUe0BnTCfJKSftrT7RW0Cm/WUXhMMQmiFctTZMqqoAMJtjhpwioiq54ao53TBeH+97UqAmgLDfioy4+7aUxYL6JbNo95Mnptb2Me4kdPsSfRFgCh2NULqQ/lFrycW+sKqMHBvTF51ADxKcyNiZ8A/JXtY3pU1Z5Ini3M9lFC7k3CYI83LzflXF9CeiT8r0ImVUyDhRNqxRn+gppuz7xrRQChpwd6MebmCCIVfZvFS9Gw7P9hfuMAlzqJSd6Qv9qpgY129GZB0QglsuTlJ7z8YEyxxYYc9AtahfO4d0TWgmTBVoZt66C2cpjr/feNuS4h3hAqKC3vlsZCab6H16M7qIamJEr5A7MFVflOWMQbYaIIBSU/xTvvincOFyM lgVNz4hA 8GGsVli1ElSz0S411p5qQ79RD7lk3MhGcETf49//EnAAVfFm5UeTdWL6oxtzBUEXhTahNOLwY/GFgDawoZ7EeZGozW5KI0YiqhtETgTjVfODCWjHiTF9uFQfODfEojPfnAghNxNqsQ3Bt3LHZQ1dIHEy+Ja5B1SwbbZOPC61XxPVbdally5vgBQz1H2Ec3vxPWkHToZLkRzGQcNgZKIJAKaqF51WRdodWBvcOVwI/nYJt3sL64dVhlvyLp/aeONR0PdhH6R+Is/bfo7kFu21hYx/9BsnhH9MeXAlSBdmuqjAOIUInVTlhJB08WLfsR8yOaeIHUYVO+JK23nw18/zKQcS2IwVYU/NsaoK8cfZ4IwePni99rWgf9o19ZhJHleijis/lG84QSqnUQjXMn9udneKJRy/9sGx83mWjH+4xy9ChVXZnm3VOTLuTWrCs3Z/GSRSXWC94NEqOz5CeTJ4MvZkHp5ZHD1tbYIoD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000039, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To start I figured I'm going to bench about as friendly case as it gets -- statically linked *separate* binaries all doing execve in a loop. I borrowed the bench from here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) It prints a result every second. My test box is temporarily only 26 cores and even at this scale I run into massive lock contention stemming from back-to-back calls to percpu_counter_init (and _destroy later). While not a panacea, one simple thing to do here is to batch these ops. Since the term "batching" is already used in the file, I decided to refer to it as "grouping" instead. Even if this code could be patched to dodge these counters, I would argue a high-traffic alloc/free consumer is only a matter of time so it makes sense to facilitate it. With the fix I get an ok win, to quote from the commit: > Even at a very modest scale of 26 cores (ops/s): > before: 133543.63 > after: 186061.81 (+39%) While with the patch these allocations remain a significant problem, the primary bottleneck shifts to: __pv_queued_spin_lock_slowpath+1 _raw_spin_lock_irqsave+57 folio_lruvec_lock_irqsave+91 release_pages+590 tlb_batch_pages_flush+61 tlb_finish_mmu+101 exit_mmap+327 __mmput+61 begin_new_exec+1245 load_elf_binary+712 bprm_execve+644 do_execveat_common.isra.0+429 __x64_sys_execve+50 do_syscall_64+46 entry_SYSCALL_64_after_hwframe+110 I intend to do more work on the area to mostly sort it out, but I would not mind if someone else took the hammer to folio. :) With this out of the way I'll be looking at some form of caching to eliminate these allocs as a problem. v3: - fix !CONFIG_SMP build - drop the backtrace from fork commit message v2: - force bigger alignment on alloc - rename "counters" to "nr_counters" and pass prior to lock key - drop {}'s for single-statement loops Mateusz Guzik (2): pcpcntr: add group allocation/free fork: group allocation of per-cpu counters for mm struct include/linux/percpu_counter.h | 39 ++++++++++++++++++---- kernel/fork.c | 14 ++------ lib/percpu_counter.c | 61 +++++++++++++++++++++++----------- 3 files changed, 77 insertions(+), 37 deletions(-)