From patchwork Thu Jan 16 12:35:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13941646 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E88BC02183 for ; Thu, 16 Jan 2025 12:36:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 61DA66B007B; Thu, 16 Jan 2025 07:36:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CD796B0082; Thu, 16 Jan 2025 07:36:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46DF66B0085; Thu, 16 Jan 2025 07:36:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 28DCC6B007B for ; Thu, 16 Jan 2025 07:36:05 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C8C46C0A83 for ; Thu, 16 Jan 2025 12:36:04 +0000 (UTC) X-FDA: 83013262248.12.A82D298 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by imf07.hostedemail.com (Postfix) with ESMTP id D650C4000A for ; Thu, 16 Jan 2025 12:36:02 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KCWfnvbO; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737030963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h9q/Ez/2vi9dG2C4OCPna1JnJ8nRu2IG4tLzEf9zm8Y=; b=A7e1xTegTrnS+X5C/WR4MCzBAN0o7QBCkMCRqMS/+aKbgz18nSAAJb/KnqnfkevqSi7xoB mUQ/y/6M8tCDXd+9650eTjrYRmjYUgZoM698wRAyMQk++S5UwvCKYauK956HKwxzQi9OMf Bo45LOP/8IZT/3IWgwugznOKmtmjfHU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KCWfnvbO; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737030963; a=rsa-sha256; cv=none; b=c+B0BFb4VClDz2phiApi++t/V23zxWyPoSPW37QFkzC7+QVhsAXQkzyXTqWw1dd3IuKSs5 8I8ikFfSBFO+xQY/10rs7qNeUILXDdmZFCbQSUAExl+SXhmymvl2yODuDTXZ2pErQBbiHI stwxTHJ3Hq3KXnP9cLUDItLg8r/sSTI= Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-385df53e559so697139f8f.3 for ; Thu, 16 Jan 2025 04:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737030961; x=1737635761; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h9q/Ez/2vi9dG2C4OCPna1JnJ8nRu2IG4tLzEf9zm8Y=; b=KCWfnvbOkgG0l5k9U0BdC9YUbD+Ub0svJy6cNy5iml7gwtIHif9EVF1FfHi9CNMrce C9I6LPiDM/6puoqc+L73ixMtN+eZm7SOk58cHIiIzo7SOKIxGl5EnHPE1y78JFBBLWA4 03ogeF7RXncuGwhA95ICZ3pbCemd1eEKq3LfxRDwFBrJpaRV0+Efm0UZ6XY/5Y8XHD3v HGkPrTol8sk8yg7f4ncw62Bnn/CUwUd2VbcV23BgFQvXW0o4d+MfDqJkgel9LUcPocnu hZocx1TbesTnOodnD6dMS/d4t65zLAa4tBxSOyyzbOU5ATc6I2WESEA2q7LlXDcG6Bq8 MZDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737030961; x=1737635761; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h9q/Ez/2vi9dG2C4OCPna1JnJ8nRu2IG4tLzEf9zm8Y=; b=aVb6HuudelkEXSGVkq4HD01Ua8Nkc2jd13b3qrLQRBlutayXWy5sDlcjDnL41VFL5u CWFxA/IqgJOt8T2V2mz4t83ckIEXthaB2mbgQdtmvJeaiUYAn5gTAtNZTgPffcdrDyuJ Bd/VPeNZu0b8WU6acpd7tzgiupYi8Y8ulKERwaewNn1rU6rGQe7QkY7YjUYAfz6nhF5S Ul/7scjcrJmoba4j+Jbjth3RJd3xpI3bFn4f58bZaTe9UA1aLbjt8XVX/phQVrsH/l8D X2wVBlxLmNWcNJfhXclaIA2fitqVJTC7Mj03d/63izUEyyUkJty8BuamToBwgB8md5fP 3feA== X-Forwarded-Encrypted: i=1; AJvYcCXGmfW0W35QD7xcDoVdHCAYpkfNdgw9uPej6MTXc1HSoMQmtPYru+CGtRCxJcr093YfaSWCElLx7g==@kvack.org X-Gm-Message-State: AOJu0YzCeuw7l374w3VwJEK1Wa3xngU6s/wnOuWxm/PnJcbsXAwHrhE1 vHSvDXThC5Qhbg3fAAeozKedX/WX6vAdr8Ngc2qVTjMuLA+MIQf2 X-Gm-Gg: ASbGnctMqu1Z3zy5Lz5x5EABZ0cysT0TmDPOdifxl1K6GfccDgKjJMZwKeES2BCOx3E IAyIt8djIuHC+UjVgBZr72u+Oyh02beEXZQYlgdhE6Nl/VUqsb9+HLah9hZb7ygRLxLdJhUi+9k MX7mLaCURix3k5ib/JXzbfTzOWBB8DpQPlfXr5kFnefSHGw/2ANtcIx3pEjTY1tWslOTLW1UEFF MS2Tlee0zO6oiNMo0rr3r4k7uRhM0iqOn6NsMRS2VmItFr/gRrlR389dyTqECgf3TxgTu9c7U4= X-Google-Smtp-Source: AGHT+IGI4m/fLHfejrIpVVlpMhiIczCwPT/bKqv1pPV0d0Z/SxHpSiADNw6J4C9ujYxwoik/6BdJfw== X-Received: by 2002:a05:6000:490e:b0:385:ef39:6cf7 with SMTP id ffacd0b85a97d-38a872ec3c7mr27460021f8f.32.1737030961121; Thu, 16 Jan 2025 04:36:01 -0800 (PST) Received: from f.. (cst-prg-69-191.cust.vodafone.cz. [46.135.69.191]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38a8e383dedsm20688875f8f.35.2025.01.16.04.35.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jan 2025 04:36:00 -0800 (PST) From: Mateusz Guzik To: mjguzik@gmail.com Cc: brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tavianator@tavianator.com, viro@zeniv.linux.org.uk, linux-mm@kvack.org, akpm@linux-foundation.org Subject: [PATCH] fs: avoid mmap sem relocks when coredumping with many missing pages Date: Thu, 16 Jan 2025 13:35:52 +0100 Message-ID: <20250116123552.1846273-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D650C4000A X-Stat-Signature: jub9s5n6k7e6kkfq6i5qp6nrxrk8zk55 X-Rspam-User: X-HE-Tag: 1737030962-773061 X-HE-Meta: U2FsdGVkX1/479xAiDZUn2XLii3KLihTK1yNg5Km2nGt/jZH03LLzul3Y+ozQXnCmbZu7OOsvURIrMGeiNvtTKmlG9AEhuEXIfKryZzAembpqZsKTHSiW8Ln6uQegGj9vQUYo6x5JFaGZICAarZ0snG6Qw5GzEObPHGU1K2iwZPPUQBrY4Mcu4kmzoHIyra7Z9I9TbNnOkHJeX6NN8c9a+JjvKr+d06Yso82s3H1++qLZfOTrkzXLTYlnXddyrH0Y2dSOoOUoOI5RzYYoW0ECttc064dwy4blg6TJYp3qAPBC59gxcqd2GkRWMZTCIsec5oWC8NnSJHUyL82oNKLzyh7mb9tSGrcQeq03eXfbHg+0BNPCthrTJzW4ws1Tgx3JpkLI7EdrgoW6J5Y9HaMR8AgDvsEimE2T8amZOs1m1DLYMM6lDOGYIhlgC16/YPB0D3t17/5QUtqkKIoQ/BcQImUXIFS8E0rETFrOfM32HZ7efeZlm56KjI4fO7irI9gmTpgbenJYbXzsfJh2RpTlTmoxw88r0AAM5hDfIGQcg9nO0pusbuNzWJt0gOcejBqecr9MWHx0PVy1B2h+qcdxgM+rrCEC3/D2zgymU5RVwFbkXk8BVLu6CphHI0QbqD/pbjT1g25IUnPIeJ8sVm6JtIrPTlQOpg9Vho2q32ze3+4gY9z9PKEqZ231czRwePuBx+as6EfK8FLURs+vOhNjqgKWzNsto+ii6sCQsQNfiiHvbfUpn8Km5qrmqkQxCSko8MkLPpDdxhQ3WyUDN76oF9NX5vwrQkwriEKqUkhwjKi2FLYoaNIar3jBvFRDZ5uhG3M1yQfDnNMp5zoHws6iEIF3GdAM/JJKb6FMwZmpiXRZ6oWbX0LELmNw5qAZFEJWoFe8jXVwkkiw9pSEpyU2z5V3++gFYrUaQ43gq3RLjFjLieqf7d9P9tEbcIwne/HN9XJQR7ol1cTiCl034s Wruc4sQh ppEbGvBiYafC6+6Qh367r3ylTh7Z9Uv56tpxdwIZLaVFVUnhpAWrkMKR4Jwkk8cOppXBSjxiTSiMbTSyTNRPviWKoXBxoR2eynqehF0aM7Czrqh3W250T5yK3HJYhzc3SlG0kqcUhZCJIY55zzYqw4M2D9yClj8QAZ8aKJnCYm6xFxCsbLgBddyX+GK/N/4Pg4UqD8E8EiCShXDqWZJiF2D5leIQ0S+k/2zX6zDW8iS5Sqtxo0g8rWtB4wCG1JrATcKpYdDtAfHusmXZMVje9K8Mx16YpM7RXPgMJX/sqETpsdQE6TyyCPBuJBYMyVt0BXQAO33TPzimnzM+m5dHMjwXpVzIxoevgBuYX1ntBwihocvCxGHS1EhYQeQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Dumping processes with large allocated and mostly not-faulted areas is very slow. Borrowing a test case from Tavian Barnes: int main(void) { char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0); printf("%p %m\n", mem); if (mem != MAP_FAILED) { mem[0] = 1; } abort(); } That's 1TB of almost completely not-populated area. On my test box it takes 13-14 seconds to dump. The profile shows: - 99.89% 0.00% a.out entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_exit_to_user_mode arch_do_signal_or_restart - get_signal - 99.89% do_coredump - 99.88% elf_core_dump - dump_user_range - 98.12% get_dump_page - 64.19% __get_user_pages - 40.92% gup_vma_lookup - find_vma - mt_find 4.21% __rcu_read_lock 1.33% __rcu_read_unlock - 3.14% check_vma_flags 0.68% vma_is_secretmem 0.61% __cond_resched 0.60% vma_pgtable_walk_end 0.59% vma_pgtable_walk_begin 0.58% no_page_table - 15.13% down_read_killable 0.69% __cond_resched 13.84% up_read 0.58% __cond_resched Almost 29% of the time is spent relocking the mmap semaphore between calls to get_dump_page() which find nothing. Whacking that results in times of 10 seconds (down from 13-14). While here make the thing killable. The real problem is the page-sized iteration and the real fix would patch it up instead. It is left as an exercise for the mm-familiar reader. Signed-off-by: Mateusz Guzik --- Minimally tested, very plausible I missed something. arch/arm64/kernel/elfcore.c | 3 ++- fs/coredump.c | 38 +++++++++++++++++++++++++++++++------ include/linux/mm.h | 2 +- mm/gup.c | 5 ++--- 4 files changed, 37 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/elfcore.c b/arch/arm64/kernel/elfcore.c index 2e94d20c4ac7..b735f4c2fe5e 100644 --- a/arch/arm64/kernel/elfcore.c +++ b/arch/arm64/kernel/elfcore.c @@ -27,9 +27,10 @@ static int mte_dump_tag_range(struct coredump_params *cprm, int ret = 1; unsigned long addr; void *tags = NULL; + int locked = 0; for (addr = start; addr < start + len; addr += PAGE_SIZE) { - struct page *page = get_dump_page(addr); + struct page *page = get_dump_page(addr, &locked); /* * get_dump_page() returns NULL when encountering an empty diff --git a/fs/coredump.c b/fs/coredump.c index d48edb37bc35..84cf76f0d5b6 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -925,14 +925,23 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, { unsigned long addr; struct page *dump_page; + int locked, ret; dump_page = dump_page_alloc(); if (!dump_page) return 0; + ret = 0; + locked = 0; for (addr = start; addr < start + len; addr += PAGE_SIZE) { struct page *page; + if (!locked) { + if (mmap_read_lock_killable(current->mm)) + goto out; + locked = 1; + } + /* * To avoid having to allocate page tables for virtual address * ranges that have never been used yet, and also to make it @@ -940,21 +949,38 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, * NULL when encountering an empty page table entry that would * otherwise have been filled with the zero page. */ - page = get_dump_page(addr); + page = get_dump_page(addr, &locked); if (page) { + if (locked) { + mmap_read_unlock(current->mm); + locked = 0; + } int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page)); put_page(page); - if (stop) { - dump_page_free(dump_page); - return 0; - } + if (stop) + goto out; } else { dump_skip(cprm, PAGE_SIZE); } + + if (dump_interrupted()) + goto out; + + if (!need_resched()) + continue; + if (locked) { + mmap_read_unlock(current->mm); + locked = 0; + } cond_resched(); } + ret = 1; +out: + if (locked) + mmap_read_unlock(current->mm); + dump_page_free(dump_page); - return 1; + return ret; } #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 75c9b4f46897..7df0d9200d8c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2633,7 +2633,7 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); struct kvec; -struct page *get_dump_page(unsigned long addr); +struct page *get_dump_page(unsigned long addr, int *locked); bool folio_mark_dirty(struct folio *folio); bool folio_mark_dirty_lock(struct folio *folio); diff --git a/mm/gup.c b/mm/gup.c index 2304175636df..f3be2aa43543 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2266,13 +2266,12 @@ EXPORT_SYMBOL(fault_in_readable); * Called without mmap_lock (takes and releases the mmap_lock by itself). */ #ifdef CONFIG_ELF_CORE -struct page *get_dump_page(unsigned long addr) +struct page *get_dump_page(unsigned long addr, int *locked) { struct page *page; - int locked = 0; int ret; - ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked, + ret = __get_user_pages_locked(current->mm, addr, 1, &page, locked, FOLL_FORCE | FOLL_DUMP | FOLL_GET); return (ret == 1) ? page : NULL; }