From patchwork Sat May 18 06:20:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13667495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7353FC25B74 for ; Sat, 18 May 2024 06:20:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95EE46B007B; Sat, 18 May 2024 02:20:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90D686B0083; Sat, 18 May 2024 02:20:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D6266B0085; Sat, 18 May 2024 02:20:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5D0A16B007B for ; Sat, 18 May 2024 02:20:21 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A66BB805BD for ; Sat, 18 May 2024 06:20:20 +0000 (UTC) X-FDA: 82130517000.04.AE40127 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf03.hostedemail.com (Postfix) with ESMTP id DFC9320003 for ; Sat, 18 May 2024 06:20:18 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Iodk47y3; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716013219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=CwaRs7iveAf1QbDUF+8bWlnvNsmKULYKCOVL/wSWrtA=; b=W1MiVaA5uCLeg927N13fkZtN81QoTVRVL58/jqV6DSELWbDnMmQexVA0JviPMqCLnD1RLX p4sEhu9zhcRKM72Snw1r7oGO5NMzK/JBofQ+H1p6SVunnNqqB8yljJnCnW3/y71pmh27b5 /wX7ZKxiRR0FV1QBfSeo1AaRiifw2VU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716013219; a=rsa-sha256; cv=none; b=bVkIiCdS7O21wPBJI0hPAFUhBEI6Z+hfyUWrhg+te7gj803acqUNX7BPaTxQjzx9a6H0sD XrnH1xqAKZReDPZYv3MRpFul+RRJVePQimxSZlEPgkn7fEpygI73EQKRxBVQQONActGCZO jJLJ3pUCw0r6I0X6QjqY1sMuo09Dvao= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Iodk47y3; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4200ee47de7so7241585e9.2 for ; Fri, 17 May 2024 23:20:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716013217; x=1716618017; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CwaRs7iveAf1QbDUF+8bWlnvNsmKULYKCOVL/wSWrtA=; b=Iodk47y3CmHF0MVyWMiZgrKnrXAtRAhadR3/+zAxDBUxzHQEHK2QzNxAE6iJMuSEhj hapqSGUCoNeyIVAOyhLcEyiE0Hq0gjHGpWyMikONcZcHeDsPFBV3y7Glk7fuqUeRYpOH JfqysZcx62GaNXPXdK198tAIkUtlD3AdLKI3ZvIUTUMf4srPQIfZrs5/O2CxS/bf2h15 Luc1wlcraaqaud6nUGOPdIUmyJB7B8S76hIqsZuqKszprmR+MXh5eam+IfKT58Wk5gf7 URU1bWovBwLd6zCnCpBzGM+/CGBF757n+HgpTuo+smLMUJWNWn1ahv2fUhPP6WZDezvg Tvkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716013217; x=1716618017; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CwaRs7iveAf1QbDUF+8bWlnvNsmKULYKCOVL/wSWrtA=; b=kQN0hcME5wSxq8Li5KkcQn11rRIsNhelRlpted7T10hehfpQlvlvn8qQ+UD43xkOas chz0IR/10mEQvq6+3esyj6EdXw135TdyWslok/N8Om/8l+5ihoLBolxCkkaHhDoowLbL 1Rg4t40DV98IRAGcar9cvETz/oYvLRmz4bNGl486K+6b2IwIdVtk5XnMAbTaPxnmX9qH snQ/vkT0Lvgy1mNRME+ilYTBdXKapynHQ0UMkitn79uLSmb83Rm/6qSmgDr27xPEfKOK ypGrXIuI3Q1KskDxtRNEnsLD1loCAcTGUOX3DMHlEAOYULI/3S41ozdRhWLBfXedTkda 8Fbg== X-Forwarded-Encrypted: i=1; AJvYcCXR0Fj2eF2KTgmkswzFXSVvcBlxg0wq8n6SXtXsZfrGT84XkFRnoajDyonrbLXWp39nhJvF7NdJlIErP46WMFAKR/Q= X-Gm-Message-State: AOJu0Yw98FB2RCk86dMu2yylaR1mWQcU6hywH7TcHyaBEz77X2G+4EFd hGjizH5/+7Z4H8R49GQJ4KfzLk0pFNSlWozqDt+bz3x+Jm64qQI7 X-Google-Smtp-Source: AGHT+IGmSHzs7IwvzbAbCrroZTGxRdcS97V48M9qbrTe2xuZ6s70b310WbPNimYYXrTnuD7Ta95MEg== X-Received: by 2002:a05:600c:46d0:b0:420:2df0:1a9b with SMTP id 5b1f17b1804b1-4202df01ae0mr45887775e9.18.1716013217037; Fri, 17 May 2024 23:20:17 -0700 (PDT) Received: from f.. (cst-prg-73-12.cust.vodafone.cz. [46.135.73.12]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3502b8a7748sm23446687f8f.49.2024.05.17.23.20.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 23:20:16 -0700 (PDT) From: Mateusz Guzik To: akpm@linux-foundation.org Cc: Liam.Howlett@oracle.com, vbabka@suse.cz, lstoakes@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mateusz Guzik Subject: [PATCH] mm: batch unlink_file_vma calls in free_pgd_range Date: Sat, 18 May 2024 08:20:05 +0200 Message-ID: <20240518062005.76129-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DFC9320003 X-Rspam-User: X-Stat-Signature: qs47zzy5acyq6n6fa8shuscwd9yof9tc X-HE-Tag: 1716013218-436719 X-HE-Meta: U2FsdGVkX1+8Q2pTdwGq76QnIbh9kfq98ivPROeocMcA0fUv9QvSmDOa+0U7YzSl7/N4BKxao6MGGo9+hNXk+Umaiy1LjtJOev7FdaZ9/SixAdOvyLmh9Lk5AfPqYJ4n2X2anMJcmbvKdX5vHlHtSxooGjMrzPR+V+cCLf/BtbSf8TVOqzjNZ8uFznbwJ383zSEsnKaWiuS3bJTx0ypLwpteiw+WNnXq4lBFeWibtE++sTKqKdDabaQ6BfWDZ831UIOdvGhWOOTnG17hBPwcsaqsOtSIjy2+NA1aqrx1M6TdCxWGTDTd4E+g3XnQTzYXbqDC/CTWezKNWnYCd4xvF7cXKY2gV0DHK4loMlOu5jyCMnFHe7hQo5WUIHKBbEjU2M664vtiZBge+nAjKJ/NvDzZ+YVP3hhwwVNjGnynOWHQW+2y2zqDz0PwzsBxHI1+g8R7uF9vY9xhYYbFqLbV1ajzmQmGzDfRZp2b+qLYKEw+Ig3JZjcojjWsKiLD8I+BJI8+TH+s7fORu8QMLI3CWqE6gk3DVwJdEUGvGIEXOfUb1uxmTqy9YWA9X2TD4ldUfZ3tHW45qtAuBLn4QIIfGVbuSQiKS251QiLGjMj+YfFuYGhSdMCEW5K9rKSNhc6dYg00YsobbI1Ko9GrCjW0fYXfttf7KVkcMQv6cQs+0dkSMVNV6yTVJIET/s/canAgEKQ/P2FzxJDblfZs/a/d3Kew8EkEEjQqRQweB34JJL3F4ZyN2mlfXs5g6X0PMNhOnjavtGoYGdCCkHxhGD1bkpz9ZistjyiffXiq41bZLxQdbSjt/I1KM0XB0/6aGdWfWJqvy+0qINBMmkea/WbgLMeN+b/3kqVXC0tGWzgZQqyOzotfpAI+YOlGwZ3KbwY66meufiBUEMLv7bEVWd+zv8Fj+2PQjVRh1wdfRdQsVrCCL2wkxb2Drpm/rnbJo+WLVZnbIeFlVw4S7/zKVkJ cp1l3uLV 1hcCung3eTDYS0Xq7f9iTFxcgqABaWSvgJrWnvMoJgbrewQtyaWaiUG+YA3eIGKol6Fky2CGcv4BJlbzO/Coe/4F2frRLH9iPhYJqjxGH/UTxuEC/mYLkadQNREpwEq9Tsm+NC6aH+1G/sLwbHVqQ1ApMFL0avonS8vG0+GojYuI/AaK6fFZxzBS5305VjLKgUatXKsfiUNYFvnpbyD/1ajsJel/4oNmqbBWDUzHioJqOCiCDFjyeAu+dgrjTDQLFdERKVh2q4+/lopw1Eqga9JE2eH9mjAsC5HWrLf0zs2/LFE2WvD2EP6lpIE2/b9jYBQudDFAw95klKdZ6JxTpPkcRIWI7RbX8WtvUwyw8pMMQ5LLvCM4PfgGLJ5/qodEqRPLg+9q0x1e5EGCtziH1rwlr2oGScAaNE7VJVCpLzR21bmtq1ZLgO+hmoaq5yFaFP/xlVYbv3xIHMhRyT63JBsHLfhK/qE4c7J4TuH7vW4ZuetEWeVIzZBY+/dUnfvg4RYpsdrXEFzUjsCCXZt/z31S56/tJP3RsGP4ONTJqxBEV8KY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000015, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Execs of dynamically linked binaries at 20-ish cores are bottlenecked on the i_mmap_rwsem semaphore, while the biggest singular contributor is free_pgd_range inducing the lock acquire back-to-back for all consecutive mappings of a given file. Tracing the count of said acquires while building the kernel shows: [1, 2) 799579 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 3) 0 | | [3, 4) 3009 | | [4, 5) 3009 | | [5, 6) 326442 |@@@@@@@@@@@@@@@@@@@@@ | So in particular there were 326442 opportunities to coalesce 5 acquires into 1. Doing so increases execs per second by 4% (~50k to ~52k) when running the benchmark linked below. The lock remains the main bottleneck, I have not looked at other spots yet. Bench can be found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -O2 -o shared-doexec doexec.c $ ./shared-doexec $(nproc) Note this particular test makes sure binaries are separate, but the loader is shared. Stats collected on the patched kernel (+ "noinline") with: bpftrace -e 'kprobe:unlink_file_vma_batch_process { @ = lhist(((struct unlink_vma_file_batch *)arg0)->count, 0, 8, 1); }' Signed-off-by: Mateusz Guzik --- include/linux/mm.h | 8 ++++++++ mm/memory.c | 10 ++++++++-- mm/mmap.c | 41 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b6bdaa18b9e9..443d0c55df80 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3272,6 +3272,11 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node); avc; avc = anon_vma_interval_tree_iter_next(avc, start, last)) /* mmap.c */ +struct unlink_vma_file_batch { + int count; + struct vm_area_struct *vmas[8]; +}; + extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, pgoff_t pgoff, @@ -3281,6 +3286,9 @@ extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); extern void unlink_file_vma(struct vm_area_struct *); +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *); +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *, struct vm_area_struct *); +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *); extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, bool *need_rmap_locks); diff --git a/mm/memory.c b/mm/memory.c index 0201f50d8307..048fde0e5a8a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -363,6 +363,8 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling, bool mm_wr_locked) { + struct unlink_vma_file_batch vb; + do { unsigned long addr = vma->vm_start; struct vm_area_struct *next; @@ -382,12 +384,15 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, if (mm_wr_locked) vma_start_write(vma); unlink_anon_vmas(vma); - unlink_file_vma(vma); if (is_vm_hugetlb_page(vma)) { + unlink_file_vma(vma); hugetlb_free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); } else { + unlink_file_vma_batch_init(&vb); + unlink_file_vma_batch_add(&vb, vma); + /* * Optimization: gather nearby vmas into one call down */ @@ -400,8 +405,9 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, if (mm_wr_locked) vma_start_write(vma); unlink_anon_vmas(vma); - unlink_file_vma(vma); + unlink_file_vma_batch_add(&vb, vma); } + unlink_file_vma_batch_final(&vb); free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); } diff --git a/mm/mmap.c b/mm/mmap.c index 3490af70f259..e928401df913 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -131,6 +131,47 @@ void unlink_file_vma(struct vm_area_struct *vma) } } +void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb) +{ + vb->count = 0; +} + +static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb) +{ + struct address_space *mapping; + int i; + + mapping = vb->vmas[0]->vm_file->f_mapping; + i_mmap_lock_write(mapping); + for (i = 0; i < vb->count; i++) { + VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping != mapping); + __remove_shared_vm_struct(vb->vmas[i], mapping); + } + i_mmap_unlock_write(mapping); + + unlink_file_vma_batch_init(vb); +} + +void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, + struct vm_area_struct *vma) +{ + if (vma->vm_file == NULL) + return; + + if ((vb->count > 0 && vb->vmas[0]->vm_file != vma->vm_file) || + vb->count == ARRAY_SIZE(vb->vmas)) + unlink_file_vma_batch_process(vb); + + vb->vmas[vb->count] = vma; + vb->count++; +} + +void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb) +{ + if (vb->count > 0) + unlink_file_vma_batch_process(vb); +} + /* * Close a vm structure and free it. */