From patchwork Tue May 21 23:43:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13669812 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D182BC25B7A for ; Tue, 21 May 2024 23:43:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64FC86B0082; Tue, 21 May 2024 19:43:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6002E6B0083; Tue, 21 May 2024 19:43:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C7706B0085; Tue, 21 May 2024 19:43:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2F2D76B0082 for ; Tue, 21 May 2024 19:43:45 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B8818A0CE9 for ; Tue, 21 May 2024 23:43:44 +0000 (UTC) X-FDA: 82144032768.13.BD1674F Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf11.hostedemail.com (Postfix) with ESMTP id 06C2040005 for ; Tue, 21 May 2024 23:43:42 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NTc2IOpr; spf=pass (imf11.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716335023; a=rsa-sha256; cv=none; b=Jvn/6hdQCXtL+IrCindJG5idGaJ0jtC/QuUb5jdxHrODlm1kPWwbgRSJljyAx8T1Cy5d4/ K4RHswETah5UuchbPJHPybSWj67UCOa6gdk7RQuI8eVJkL3O3Einxz0ddJucBHGTmht9Kl CXQwcHVBU9IBmusYvR+Ihr1S2HwEKVU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NTc2IOpr; spf=pass (imf11.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716335023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=fgL+VLTGJeIvjxtlE+hv8XMUHqxwxG+fmhYvBcoUtEM=; b=XBD9SA74dDp7Uz+Yp85uIllsVN7NyE4iMdV6MsFPyq56TzdfC2aOYRnTnx7b7hFI4E1kVI ONXWGp51HnNAwa3esY5qxO1Ouv9U9NvPzBkgabwPL4nAdu3DsDJ8v+NZYEkjK8J/NUFVjN GEhRxOZ/ICA/4nSY/lD/pcvVzIO/tFo= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a59b49162aeso896032666b.3 for ; Tue, 21 May 2024 16:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716335021; x=1716939821; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=fgL+VLTGJeIvjxtlE+hv8XMUHqxwxG+fmhYvBcoUtEM=; b=NTc2IOprMyMLG1+el0oAwkRoWn49AiWc7jwo1HF4QVmkNh9X0vqXm0TWQ3Aua61tQ1 tI9XFngRRTuVeDxhvV90SCCZkxNsXpfopB+vMrSeOhWl3QPRdG4jeaDaVIspyDal1ABb 10KtgThsOGSb9ZNI6d5X3cUTSQnSYJSQg8ninSe2BG0RqIZS0FbgdfdmylPZ+PLfBZFA szVFNnSYYqY4xJp5l3guXJf2fIB+hVGjdFGSojjAfz3UtMEYZnhlBCTKmN39GjPWnRdh CeI/2v2Z9X/zopuKI9dZwqwkK3Fbat8OVW6Wb7sEyPuRsCVYfiY/5e/yL7i/D9Zc+YlW 8ycg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716335021; x=1716939821; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fgL+VLTGJeIvjxtlE+hv8XMUHqxwxG+fmhYvBcoUtEM=; b=dn81tJsmSyol0hajg1s4qQY+tkIC6C9R7IpRBa0BgJqOmNHEDfdSy6cXWYhXOrnyF3 ziC9nJ9lLKeHN+b4c133MlM3aUaDQOOXFyP3QrxAoBya59MPyEUVW4qAyPwpJHZ/vVgi 4ZyLy55ufCTaLRxaTIbp/rhe1nUTfIhRW+ZwWG73e9AwWeftkrxxPbPx6AC+LFIBwbb9 aUJzHjxGpYoR45I1stkD6nEdcuch5GIyLtI9zD2Qjr+++CMWM+Sdj8CIHSlY8mbfCe90 1Mo4e6jS9ZHPHzFk15NM4Sl23110vcO8CKNKGgtE8mcJrf4xXvun9RRAtFlUccrSeAOU o6lA== X-Forwarded-Encrypted: i=1; AJvYcCVGj+6pHUtxcYitVqGgytkjSTz6LbwCZJkS435l/8du9gxfWLAkF5JGAXMwo8uBxLBe9rxO6q3HxKqW1gL3XAxML5U= X-Gm-Message-State: AOJu0YwTtMdZam0mhX53Oi5H9IIFXALWGQ1DNYKSGUpdJo9AERZElGSp DSDclK/MzAHuJy3QfxiK8nxudFD1NuoP0lMOALv1zLovORKI3OF8 X-Google-Smtp-Source: AGHT+IEVJnoLtHuQS7gLKz4u2w8GO3mhbvPVqtIMHrs0GYh9VlCUKTba6ViUl4mlJ8G4NOISGC4spA== X-Received: by 2002:a17:906:710c:b0:a59:a3ef:21eb with SMTP id a640c23a62f3a-a62281906f6mr16024166b.73.1716335021352; Tue, 21 May 2024 16:43:41 -0700 (PDT) Received: from f.. (cst-prg-19-178.cust.vodafone.cz. [46.135.19.178]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a5a1781cd7asm1702554466b.10.2024.05.21.16.43.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 16:43:40 -0700 (PDT) From: Mateusz Guzik To: akpm@linux-foundation.org Cc: Liam.Howlett@oracle.com, vbabka@suse.cz, lstoakes@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH v2] mm: batch unlink_file_vma calls in free_pgd_range Date: Wed, 22 May 2024 01:43:21 +0200 Message-ID: <20240521234321.359501-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 06C2040005 X-Stat-Signature: 7uhixa684hwteictecpymmjbmymyg8qi X-HE-Tag: 1716335022-119591 X-HE-Meta: U2FsdGVkX194btEmxszfSVU4FplOig9dk6P4vMF6CpAWSLyUQOASfpLs8biSK6kZypnPsQRfZYH5amNgKM0+pgDTZCpp8fxzQwCVYYNadiwBqsqwH1Oiga0HeXftET3XiUXk/lNIadenZHmPJ3xvlGVb6lYJ6Aznxg2jYzhDVq56Xw8+KJWizIw7Mgc6TJOpSvjHO2GKexNU70ZFc2RiQgeqHdtyzCUifLJ1AdymyH3WXLwKX/WaE//HyAE5BPOG2788PpdRCLW+wBifbemWCEkHpKpbVYDkbBR0GaMIRdmkSJCtzpwNl1Rk2OyS6v4aehirBiJR+DFi0fScMy85I5fw19yw5Xj3Kjx6IJojh1yKCtakcfLOAt+oNJ50CW+u3iVKju5nHX425uTgmd0pANp3JDHZByIsYtdZtH0igGnD317VqtBJ6M8fdDJPX9iLXPikv/17A+2+4Z58aO2ijlV70w+SCYo+mwxukOEMbkcSD+RpLJkX5Ioj2PsfBI8tnls88IF4fehRUPhM5BejsbBPhoCAfxx5rPC6jTNIHAPe5fHaZFBfSc/5ZOp2KgxMkWfFe9oJ2uHG0UfUeD0BJZMonKiYmrIj8RU8vsiGZnHQxMPLqt2qnMtgbv/YiuqnyG2iWo1JxPdXz4+AAhaU9fgy6iphvBAsDJ/W/W8zOaQVH1DdShU/lTGmPcws5F4ugPUxshwADJ7TlkNzN9KpHOMP090epD/3bp3BDTzCnPUOXvnbsvVMT9pGFAu1enIFqH7OF+mqxAlA+OXqhsk9BiwevvtpmEtjAx3yfsMTrRHXLLNHMla5o6Zf2shzhD/y27XW1e26Ww4yFMO4YxC0PyD6JUaeUCs5Pkr84lyV9S5cg7OZStqoUnFQyseHnpundYLhQbpzRPnL2O5LxNDY2hb5Xf317pdDyn2LMzEugfMr1Au2kszheZ7U1oP/Eaxk2Z3d4E+LB3/gZtnnzlC dNGndjRZ 2jtPCqye1/fEDHQB70v0Wm9MGrfQxZmDDqLa6jraO1jqtmzC1A90KIun7BuWHRCWfDarn6dMXOvUxIgyOuFVQjrD+1sfVWU8XOneauoPOtNnR1FuNAURsfDpqMM6umwSU/dXCwqEQXiizCRoILY5u9bk4RKw2sMurwc7J/I5LRFOs3T1NRpuTdHvt+6CXSkY1aV3BkZb2spPURyA4GFHxknsAYQ20UW1jyqymm5IYEQBn1cT42q4eNy/5qktMrqE1M5EWpvtkXGyMjMtMLcdKlufD3Z/xCg+quYaC3TZYvNdzI1AY9vc2CG+5LlEMlii+s5nK8MR5yDVbHpZd3hFjdaAwSk/HKtYkD65i8v4EykOc/kkkI+WlhE9kzKbBlyDszF62CnCy6JeZ6QHODh7q6/jCguMB5oCZXeNhPq6cZm9HWVySvuG5lDiAYw9OioaSGu1HPVqneKGM1AnQziT4kW4sZ9fQNsgKyLEM3nBbWo5Yg0DC0za6jVJqsDSF+x8xHQj5n/eWARMzWKAIXUFPf467of+afyXTF2+FXiYo3lkJJSY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.035071, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Execs of dynamically linked binaries at 20-ish cores are bottlenecked on the i_mmap_rwsem semaphore, while the biggest singular contributor is free_pgd_range inducing the lock acquire back-to-back for all consecutive mappings of a given file. Tracing the count of said acquires while building the kernel shows: [1, 2) 799579 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 3) 0 | | [3, 4) 3009 | | [4, 5) 3009 | | [5, 6) 326442 |@@@@@@@@@@@@@@@@@@@@@ | So in particular there were 326442 opportunities to coalesce 5 acquires into 1. Doing so increases execs per second by 4% (~50k to ~52k) when running the benchmark linked below. The lock remains the main bottleneck, I have not looked at other spots yet. Bench can be found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -O2 -o shared-doexec doexec.c $ ./shared-doexec $(nproc) Note this particular test makes sure binaries are separate, but the loader is shared. Stats collected on the patched kernel (+ "noinline") with: bpftrace -e 'kprobe:unlink_file_vma_batch_process { @ = lhist(((struct unlink_vma_file_batch *)arg0)->count, 0, 8, 1); }' Signed-off-by: Mateusz Guzik Reviewed-by: Liam R. Howlett --- v2: - move new stuff to mm/internal.h mm/internal.h | 9 +++++++++ mm/memory.c | 10 ++++++++-- mm/mmap.c | 41 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 2adabe369403..2e7be1c773f2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1484,4 +1484,13 @@ static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry, void workingset_update_node(struct xa_node *node); extern struct list_lru shadow_nodes; +struct unlink_vma_file_batch { + int count; + struct vm_area_struct *vmas[8]; +}; + +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *); +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *, struct vm_area_struct *); +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/memory.c b/mm/memory.c index b5453b86ec4b..1b96dce19796 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -365,6 +365,8 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling, bool mm_wr_locked) { + struct unlink_vma_file_batch vb; + do { unsigned long addr = vma->vm_start; struct vm_area_struct *next; @@ -384,12 +386,15 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, if (mm_wr_locked) vma_start_write(vma); unlink_anon_vmas(vma); - unlink_file_vma(vma); if (is_vm_hugetlb_page(vma)) { + unlink_file_vma(vma); hugetlb_free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); } else { + unlink_file_vma_batch_init(&vb); + unlink_file_vma_batch_add(&vb, vma); + /* * Optimization: gather nearby vmas into one call down */ @@ -402,8 +407,9 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, if (mm_wr_locked) vma_start_write(vma); unlink_anon_vmas(vma); - unlink_file_vma(vma); + unlink_file_vma_batch_add(&vb, vma); } + unlink_file_vma_batch_final(&vb); free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); } diff --git a/mm/mmap.c b/mm/mmap.c index d6d8ab119b72..1f9a43ecd053 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -131,6 +131,47 @@ void unlink_file_vma(struct vm_area_struct *vma) } } +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb) +{ + vb->count = 0; +} + +static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb) +{ + struct address_space *mapping; + int i; + + mapping = vb->vmas[0]->vm_file->f_mapping; + i_mmap_lock_write(mapping); + for (i = 0; i < vb->count; i++) { + VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping); + __remove_shared_vm_struct(vb->vmas[i], mapping); + } + i_mmap_unlock_write(mapping); + + unlink_file_vma_batch_init(vb); +} + +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, + struct vm_area_struct *vma) +{ + if (vma->vm_file == NULL) + return; + + if ((vb->count > 0 && vb->vmas[0]->vm_file != vma->vm_file) || + vb->count == ARRAY_SIZE(vb->vmas)) + unlink_file_vma_batch_process(vb); + + vb->vmas[vb->count] = vma; + vb->count++; +} + +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb) +{ + if (vb->count > 0) + unlink_file_vma_batch_process(vb); +} + /* * Close a vm structure and free it. */