From patchwork Thu Apr 21 23:35:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38081C433F5 for ; Thu, 21 Apr 2022 23:35:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA3226B0073; Thu, 21 Apr 2022 19:35:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A50EE6B0074; Thu, 21 Apr 2022 19:35:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 919B16B0075; Thu, 21 Apr 2022 19:35:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 827E26B0073 for ; Thu, 21 Apr 2022 19:35:36 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4972620070 for ; Thu, 21 Apr 2022 23:35:36 +0000 (UTC) X-FDA: 79382495472.08.A6328A0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id 05C42160014 for ; Thu, 21 Apr 2022 23:35:32 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2FD7D61EB9; Thu, 21 Apr 2022 23:35:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83927C385A5; Thu, 21 Apr 2022 23:35:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584134; bh=lhRBJD/80fHH3oIZnKfupQJnQk/e8LTnMOu6tI8M3WI=; h=Date:To:From:In-Reply-To:Subject:From; b=TPAb8GDUS/2bal1hjKxcfgefgJM4U7YfrbHBhuwoJBRGdubKl2Py2g3kaCnPA9T37 V0EV37sJ4FuTDeopW5WfJoZhz7ZU16lCaeoTlIUJbOAH2hXFsg0tJU/bVs/Q/VSKuX Cj8ejA0sKnAHZsDKdlCuJu0G8dZZFdN3Qz6qPHSo= Date: Thu, 21 Apr 2022 16:35:33 -0700 To: stable@vger.kernel.org,shy828301@gmail.com,mike.kravetz@oracle.com,linmiaohe@huawei.com,dan.carpenter@oracle.com,naoya.horiguchi@nec.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 01/13] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() Message-Id: <20220421233534.83927C385A5@smtp.kernel.org> X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 05C42160014 X-Stat-Signature: r5keytdhisp7yoz15yycwdedyrban1i6 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=TPAb8GDU; dmarc=none; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1650584132-568270 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Subject: mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page. The one simple result is that wrong processes can be killed, but another (more serious) one is that the actual error is left unhandled, so no one prevents later access to it, and that might lead to more serious results like consuming corrupted data. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... The current code first does prechecks roughly and then reconfirms after taking refcount, but it's found that it makes code overly complicated, so move the prechecks in a single hugetlb_lock range. A newly introduced function, try_memory_failure_hugetlb(), always takes hugetlb_lock (even for non-hugetlb pages). That can be improved, but memory_failure() is rare in principle, so should not be a big problem. Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi Reported-by: Mike Kravetz Reviewed-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Yang Shi Cc: Dan Carpenter Cc: Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 + include/linux/mm.h | 8 ++ mm/hugetlb.c | 10 ++ mm/memory-failure.c | 145 ++++++++++++++++++++++++++------------ 4 files changed, 127 insertions(+), 42 deletions(-) --- a/include/linux/hugetlb.h~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb +++ a/include/linux/hugetlb.h @@ -169,6 +169,7 @@ long hugetlb_unreserve_pages(struct inod long freed); bool isolate_huge_page(struct page *page, struct list_head *list); int get_hwpoison_huge_page(struct page *page, bool *hugetlb); +int get_huge_page_for_hwpoison(unsigned long pfn, int flags); void putback_active_hugepage(struct page *page); void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); @@ -377,6 +378,11 @@ static inline int get_hwpoison_huge_page { return 0; } + +static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + return 0; +} static inline void putback_active_hugepage(struct page *page) { --- a/include/linux/mm.h~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb +++ a/include/linux/mm.h @@ -3197,6 +3197,14 @@ extern int sysctl_memory_failure_recover extern void shake_page(struct page *p); extern atomic_long_t num_poisoned_pages __read_mostly; extern int soft_offline_page(unsigned long pfn, int flags); +#ifdef CONFIG_MEMORY_FAILURE +extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags); +#else +static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + return 0; +} +#endif #ifndef arch_memory_failure static inline int arch_memory_failure(unsigned long pfn, int flags) --- a/mm/hugetlb.c~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb +++ a/mm/hugetlb.c @@ -6785,6 +6785,16 @@ int get_hwpoison_huge_page(struct page * return ret; } +int get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + int ret; + + spin_lock_irq(&hugetlb_lock); + ret = __get_huge_page_for_hwpoison(pfn, flags); + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void putback_active_hugepage(struct page *page) { spin_lock_irq(&hugetlb_lock); --- a/mm/memory-failure.c~mm-hwpoison-fix-race-between-hugetlb-free-demotion-and-memory_failure_hugetlb +++ a/mm/memory-failure.c @@ -1498,50 +1498,113 @@ static int try_to_split_thp_page(struct return 0; } -static int memory_failure_hugetlb(unsigned long pfn, int flags) +/* + * Called from hugetlb code with hugetlb_lock held. + * + * Return values: + * 0 - free hugepage + * 1 - in-use hugepage + * 2 - not a hugepage + * -EBUSY - the hugepage is busy (try to retry) + * -EHWPOISON - the hugepage is already hwpoisoned + */ +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + struct page *page = pfn_to_page(pfn); + struct page *head = compound_head(page); + int ret = 2; /* fallback to normal page handling */ + bool count_increased = false; + + if (!PageHeadHuge(head)) + goto out; + + if (flags & MF_COUNT_INCREASED) { + ret = 1; + count_increased = true; + } else if (HPageFreed(head) || HPageMigratable(head)) { + ret = get_page_unless_zero(head); + if (ret) + count_increased = true; + } else { + ret = -EBUSY; + goto out; + } + + if (TestSetPageHWPoison(head)) { + ret = -EHWPOISON; + goto out; + } + + return ret; +out: + if (count_increased) + put_page(head); + return ret; +} + +#ifdef CONFIG_HUGETLB_PAGE +/* + * Taking refcount of hugetlb pages needs extra care about race conditions + * with basic operations like hugepage allocation/free/demotion. + * So some of prechecks for hwpoison (pinning, and testing/setting + * PageHWPoison) should be done in single hugetlb_lock range. + */ +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb) { - struct page *p = pfn_to_page(pfn); - struct page *head = compound_head(p); int res; + struct page *p = pfn_to_page(pfn); + struct page *head; unsigned long page_flags; + bool retry = true; - if (TestSetPageHWPoison(head)) { - pr_err("Memory failure: %#lx: already hardware poisoned\n", - pfn); - res = -EHWPOISON; - if (flags & MF_ACTION_REQUIRED) + *hugetlb = 1; +retry: + res = get_huge_page_for_hwpoison(pfn, flags); + if (res == 2) { /* fallback to normal page handling */ + *hugetlb = 0; + return 0; + } else if (res == -EHWPOISON) { + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + if (flags & MF_ACTION_REQUIRED) { + head = compound_head(p); res = kill_accessing_process(current, page_to_pfn(head), flags); + } return res; + } else if (res == -EBUSY) { + if (retry) { + retry = false; + goto retry; + } + action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); + return res; + } + + head = compound_head(p); + lock_page(head); + + if (hwpoison_filter(p)) { + ClearPageHWPoison(head); + res = -EOPNOTSUPP; + goto out; } num_poisoned_pages_inc(); - if (!(flags & MF_COUNT_INCREASED)) { - res = get_hwpoison_page(p, flags); - if (!res) { - lock_page(head); - if (hwpoison_filter(p)) { - if (TestClearPageHWPoison(head)) - num_poisoned_pages_dec(); - unlock_page(head); - return -EOPNOTSUPP; - } - unlock_page(head); - res = MF_FAILED; - if (__page_handle_poison(p)) { - page_ref_inc(p); - res = MF_RECOVERED; - } - action_result(pfn, MF_MSG_FREE_HUGE, res); - return res == MF_RECOVERED ? 0 : -EBUSY; - } else if (res < 0) { - action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); - return -EBUSY; + /* + * Handling free hugepage. The possible race with hugepage allocation + * or demotion can be prevented by PageHWPoison flag. + */ + if (res == 0) { + unlock_page(head); + res = MF_FAILED; + if (__page_handle_poison(p)) { + page_ref_inc(p); + res = MF_RECOVERED; } + action_result(pfn, MF_MSG_FREE_HUGE, res); + return res == MF_RECOVERED ? 0 : -EBUSY; } - lock_page(head); - /* * The page could have changed compound pages due to race window. * If this happens just bail out. @@ -1554,14 +1617,6 @@ static int memory_failure_hugetlb(unsign page_flags = head->flags; - if (hwpoison_filter(p)) { - if (TestClearPageHWPoison(head)) - num_poisoned_pages_dec(); - put_page(p); - res = -EOPNOTSUPP; - goto out; - } - /* * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so * simply disable it. In order to make it work properly, we need @@ -1588,6 +1643,12 @@ out: unlock_page(head); return res; } +#else +static inline int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb) +{ + return 0; +} +#endif static int memory_failure_dev_pagemap(unsigned long pfn, int flags, struct dev_pagemap *pgmap) @@ -1712,6 +1773,7 @@ int memory_failure(unsigned long pfn, in int res = 0; unsigned long page_flags; bool retry = true; + int hugetlb = 0; if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); @@ -1739,10 +1801,9 @@ int memory_failure(unsigned long pfn, in } try_again: - if (PageHuge(p)) { - res = memory_failure_hugetlb(pfn, flags); + res = try_memory_failure_hugetlb(pfn, flags, &hugetlb); + if (hugetlb) goto unlock_mutex; - } if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", From patchwork Thu Apr 21 23:35:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C671C433FE for ; Thu, 21 Apr 2022 23:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAFC46B0074; Thu, 21 Apr 2022 19:35:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5EA96B0075; Thu, 21 Apr 2022 19:35:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C26E86B0078; Thu, 21 Apr 2022 19:35:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id B38F46B0074 for ; Thu, 21 Apr 2022 19:35:39 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9366B1C3 for ; Thu, 21 Apr 2022 23:35:39 +0000 (UTC) X-FDA: 79382495598.10.96720C0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id E4B3B120024 for ; Thu, 21 Apr 2022 23:35:37 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3BDC661EB5; Thu, 21 Apr 2022 23:35:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9173FC385A7; Thu, 21 Apr 2022 23:35:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584137; bh=kcEzpV0lqS75awhCu4CXLE0pMhrtBAVyXflyyGq9i30=; h=Date:To:From:In-Reply-To:Subject:From; b=VBytBotogYOcouACPPtVsS6Cpog9qsZOUYkExwI+J+rrz1k1FHJVGQ7ld7YLRhjnC 2jaXCVM4FNEiJ0S6Pj/DjMdatZYuApE8TaHzEGrM8awW250b3HFEqwWkifWwEtwOz+ 7eE6znYnMgZf2LcwZ7LzoSnFe0+Bht0b4IlUQ1FU= Date: Thu, 21 Apr 2022 16:35:37 -0700 To: stable@vger.kernel.org,osalvador@suse.de,naoya.horiguchi@nec.com,linmiaohe@huawei.com,anshuman.khandual@arm.com,abaci@linux.alibaba.com,xuyu@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 02/13] mm/memory-failure.c: skip huge_zero_page in memory_failure() Message-Id: <20220421233537.9173FC385A7@smtp.kernel.org> Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=VBytBoto; dmarc=none; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E4B3B120024 X-Stat-Signature: wizq1zjxsfdtune8d78u4is6e8e7o5ap X-HE-Tag: 1650584137-619096 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Xu Yu Subject: mm/memory-failure.c: skip huge_zero_page in memory_failure() Kernel panic when injecting memory_failure for the global huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. [ 5.582720] Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 [ 5.583786] page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 [ 5.584900] head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 [ 5.585796] flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) [ 5.586712] raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 [ 5.587640] raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 [ 5.588565] page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) [ 5.589398] ------------[ cut here ]------------ [ 5.589952] kernel BUG at mm/huge_memory.c:2499! [ 5.590516] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 5.591120] CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 [ 5.591904] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 [ 5.592817] RIP: 0010:split_huge_page_to_list+0x66a/0x880 [ 5.593469] Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b [ 5.595806] RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 [ 5.596434] RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 [ 5.597322] RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff [ 5.598162] RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff [ 5.598999] R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 [ 5.599849] R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 [ 5.600693] FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 [ 5.601640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.602304] CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 [ 5.603139] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5.603977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5.604806] Call Trace: [ 5.605101] [ 5.605357] ? __irq_work_queue_local+0x39/0x70 [ 5.605904] try_to_split_thp_page+0x3a/0x130 [ 5.606430] memory_failure+0x128/0x800 [ 5.606888] madvise_inject_error.cold+0x8b/0xa1 [ 5.607444] __x64_sys_madvise+0x54/0x60 [ 5.607915] do_syscall_64+0x35/0x80 [ 5.608347] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 5.608949] RIP: 0033:0x7fc3754f8bf9 [ 5.609374] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 [ 5.611554] RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c [ 5.612441] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 [ 5.613269] RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 [ 5.614108] RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 [ 5.614946] R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 [ 5.615787] R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 [ 5.616626] This makes huge_zero_page bail out explicitly before split in memory_failure(), thus the panic above won't happen again. Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7") Signed-off-by: Xu Yu Reported-by: Abaci Suggested-by: Naoya Horiguchi Acked-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Cc: Anshuman Khandual Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton --- mm/memory-failure.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) --- a/mm/memory-failure.c~mm-memory-failurec-skip-huge_zero_page-in-memory_failure +++ a/mm/memory-failure.c @@ -1861,6 +1861,19 @@ try_again: if (PageTransHuge(hpage)) { /* + * Bail out before SetPageHasHWPoisoned() if hpage is + * huge_zero_page, although PG_has_hwpoisoned is not + * checked in set_huge_zero_page(). + * + * TODO: Handle memory failure of huge_zero_page thoroughly. + */ + if (is_huge_zero_page(hpage)) { + action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); + res = -EBUSY; + goto unlock_mutex; + } + + /* * The flag must be set after the refcount is bumped * otherwise it may race with THP split. * And the flag can't be set in get_hwpoison_page() since From patchwork Thu Apr 21 23:35:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61EA2C433EF for ; Thu, 21 Apr 2022 23:35:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0E546B0075; Thu, 21 Apr 2022 19:35:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBD456B0078; Thu, 21 Apr 2022 19:35:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D877D6B007B; Thu, 21 Apr 2022 19:35:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id CA9E86B0075 for ; Thu, 21 Apr 2022 19:35:44 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9EFF820263 for ; Thu, 21 Apr 2022 23:35:44 +0000 (UTC) X-FDA: 79382495808.16.728CA63 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf25.hostedemail.com (Postfix) with ESMTP id 8EEACA0002 for ; Thu, 21 Apr 2022 23:35:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2017CB829AC; Thu, 21 Apr 2022 23:35:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA29DC385AB; Thu, 21 Apr 2022 23:35:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584140; bh=nY8N0bNibmYZKD4dAvsEBjoo8JN1MQlNzfXmO/adiPc=; h=Date:To:From:In-Reply-To:Subject:From; b=QdxtCKIMgLb90C8APyRvSx3rYIBuEGW4RqChtuDZcj0EZslFVheNMVc44iKsFsxrM QLuZRCUzsH5Rv47xX7+lqmn0zA8GkIYoJV3fhUE8U8Mcvyi5cf+tYhBcqBK1ryy28Z L6Rtlp7YCX0N6EHaV08IMNR44iqRg49uqbROo0hg= Date: Thu, 21 Apr 2022 16:35:40 -0700 To: stable@vger.kernel.org,roman.gushchin@linux.dev,mkoutny@suse.com,mhocko@suse.com,ivan@cloudflare.com,hannes@cmpxchg.org,fhofmann@cloudflare.com,dqminh@cloudflare.com,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 03/13] memcg: sync flush only if periodic flush is delayed Message-Id: <20220421233540.AA29DC385AB@smtp.kernel.org> Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=QdxtCKIM; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: bn47rdmw67p5waagbiogkn1dm85jutry X-Rspamd-Queue-Id: 8EEACA0002 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1650584140-753316 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: memcg: sync flush only if periodic flush is delayed Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt Reported-by: Daniel Dao Tested-by: Ivan Babrou Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Michal Koutný Cc: Frank Hofmann Cc: Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 5 +++++ mm/memcontrol.c | 12 +++++++++++- mm/workingset.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) --- a/include/linux/memcontrol.h~memcg-sync-flush-only-if-periodic-flush-is-delayed +++ a/include/linux/memcontrol.h @@ -1012,6 +1012,7 @@ static inline unsigned long lruvec_page_ } void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats_delayed(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1455,6 +1456,10 @@ static inline void mem_cgroup_flush_stat { } +static inline void mem_cgroup_flush_stats_delayed(void) +{ +} + static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { --- a/mm/memcontrol.c~memcg-sync-flush-only-if-periodic-flush-is-delayed +++ a/mm/memcontrol.c @@ -587,6 +587,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flu static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +static u64 flush_next_time; + +#define FLUSH_TIME (2UL*HZ) /* * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can @@ -637,6 +640,7 @@ static void __mem_cgroup_flush_stats(voi if (!spin_trylock_irqsave(&stats_flush_lock, flag)) return; + flush_next_time = jiffies_64 + 2*FLUSH_TIME; cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); spin_unlock_irqrestore(&stats_flush_lock, flag); @@ -648,10 +652,16 @@ void mem_cgroup_flush_stats(void) __mem_cgroup_flush_stats(); } +void mem_cgroup_flush_stats_delayed(void) +{ + if (time_after64(jiffies_64, flush_next_time)) + mem_cgroup_flush_stats(); +} + static void flush_memcg_stats_dwork(struct work_struct *w) { __mem_cgroup_flush_stats(); - queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); + queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } /** --- a/mm/workingset.c~memcg-sync-flush-only-if-periodic-flush-is-delayed +++ a/mm/workingset.c @@ -355,7 +355,7 @@ void workingset_refault(struct folio *fo mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats_delayed(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Apr 21 23:35:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54569C43219 for ; Thu, 21 Apr 2022 23:35:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B65D36B0078; Thu, 21 Apr 2022 19:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC6676B007B; Thu, 21 Apr 2022 19:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96B4D6B007D; Thu, 21 Apr 2022 19:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 752056B0078 for ; Thu, 21 Apr 2022 19:35:45 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5599D1BF for ; Thu, 21 Apr 2022 23:35:45 +0000 (UTC) X-FDA: 79382495850.11.A6648E7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 663DF180026 for ; Thu, 21 Apr 2022 23:35:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5C0A061EBF; Thu, 21 Apr 2022 23:35:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ADE25C385A5; Thu, 21 Apr 2022 23:35:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584143; bh=9JrhIuldGcG+bMHN5M/dS6f6EI/wfPeIxpjhrcP2TPk=; h=Date:To:From:In-Reply-To:Subject:From; b=O4bThR3t0/Ni9wo8SZHZgSbSzGynagqkRTNXiXoFsfTNjbgEzn8xTTREOQmhyrGW7 70ic6pdny/bKtdfjxPvQOIXi4Ryj+xnz9cB81Bt/HDUHeiWPFPpmstIE3C1fo0uDyA 9mOBegFHYLsBGbK8ec9sFleGkn0Sj1fXISP94nfw= Date: Thu, 21 Apr 2022 16:35:43 -0700 To: rppt@linux.vnet.ibm.com,peterx@redhat.com,axelrasmussen@google.com,aarcange@redhat.com,namit@vmware.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 04/13] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Message-Id: <20220421233543.ADE25C385A5@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 663DF180026 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=O4bThR3t; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: sb98pyfdg79eyr8ruez98zbmz3cga44h X-HE-Tag: 1650584143-44281 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Subject: userfaultfd: mark uffd_wp regardless of VM_WRITE flag When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is currently only marked as write-protected if the VMA has VM_WRITE flag set. This seems incorrect or at least would be unexpected by the users. Consider the following sequence of operations that are being performed on a certain page: mprotect(PROT_READ) UFFDIO_COPY(UFFDIO_COPY_MODE_WP) mprotect(PROT_READ|PROT_WRITE) At this point the user would expect to still get UFFD notification when the page is accessed for write, but the user would not get one, since the PTE was not marked as UFFD_WP during UFFDIO_COPY. Fix it by always marking PTEs as UFFD_WP regardless on the write-permission in the VMA flags. Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") Signed-off-by: Nadav Amit Acked-by: Peter Xu Cc: Axel Rasmussen Cc: Mike Rapoport Cc: Andrea Arcangeli Signed-off-by: Andrew Morton --- mm/userfaultfd.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) --- a/mm/userfaultfd.c~userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag +++ a/mm/userfaultfd.c @@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_s _dst_pte = pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) writable = false; - if (writable) { - if (wp_copy) - _dst_pte = pte_mkuffd_wp(_dst_pte); - else - _dst_pte = pte_mkwrite(_dst_pte); - } + + /* + * Always mark a PTE as write-protected when needed, regardless of + * VM_WRITE, which the user might change. + */ + if (wp_copy) + _dst_pte = pte_mkuffd_wp(_dst_pte); + else if (writable) + _dst_pte = pte_mkwrite(_dst_pte); dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); From patchwork Thu Apr 21 23:35:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21607C433F5 for ; Thu, 21 Apr 2022 23:35:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF1536B007B; Thu, 21 Apr 2022 19:35:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9EA56B007D; Thu, 21 Apr 2022 19:35:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 918C56B007E; Thu, 21 Apr 2022 19:35:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 82C0D6B007B for ; Thu, 21 Apr 2022 19:35:48 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 636151C5 for ; Thu, 21 Apr 2022 23:35:48 +0000 (UTC) X-FDA: 79382495976.14.43ED84F Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id 334C1100017 for ; Thu, 21 Apr 2022 23:35:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 52B3C61EC0; Thu, 21 Apr 2022 23:35:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 16372C385A5; Thu, 21 Apr 2022 23:35:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584147; bh=o9AmBZ97xph/P353g6OX4XLQm6BehR5eRzobhAKu0NY=; h=Date:To:From:In-Reply-To:Subject:From; b=GzGCXdJr1ALTj67BkvEvaT15IiVvYNyw1wHHWBoMG0hw5bMJ6lVZxewdDHH4vrByY lW5edXABeh9Rt41DqI1Gy0EYfznzsexOhtEuaOORXajZvVXsHe6HVKhmQkiyyMrkOc k99vfHvz3fqMJtPHMKoqIwZpHdTwAPx/GwoYG0fU= Date: Thu, 21 Apr 2022 16:35:46 -0700 To: will.deacon@arm.com,steve.capper@arm.com,stable@vger.kernel.org,catalin.marinas@arm.com,christophe.leroy@csgroup.eu,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 05/13] mm, hugetlb: allow for "high" userspace addresses Message-Id: <20220421233547.16372C385A5@smtp.kernel.org> X-Stat-Signature: js3fyy6spun5d14hocu5z4djbf44798d X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=GzGCXdJr; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 334C1100017 X-HE-Tag: 1650584147-303580 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Christophe Leroy Subject: mm, hugetlb: allow for "high" userspace addresses This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said : We should have fixed hugetlb_get_unmapped_area() as well when we added : support for 52-bit VA. The reason for commit f6795053dac8 was to prevent : normal mmap() from returning addresses above 48-bit by default as some : user-space had hard assumptions about this. : : It's a slight ABI change if you do this for hugetlb_get_unmapped_area() : but I doubt anyone would notice. It's more likely that the current : behaviour would cause issues, so I'd rather have them consistent. : : Basically when arm64 gained support for 52-bit addresses we did not : want user-space calling mmap() to suddenly get such high addresses, : otherwise we could have inadvertently broken some programs (similar : behaviour to x86 here). Hence we added commit f6795053dac8. But we : missed hugetlbfs which could still get such high mmap() addresses. So : in theory that's a potential regression that should have bee addressed : at the same time as commit f6795053dac8 (and before arm64 enabled : 52-bit addresses). Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy Reviewed-by: Catalin Marinas Cc: Steve Capper Cc: Will Deacon Cc: [5.0.x] Signed-off-by: Andrew Morton --- fs/hugetlbfs/inode.c | 9 +++++---- include/linux/sched/mm.h | 8 ++++++++ mm/mmap.c | 8 -------- 3 files changed, 13 insertions(+), 12 deletions(-) --- a/fs/hugetlbfs/inode.c~mm-hugetlbfs-allow-for-high-userspace-addresses +++ a/fs/hugetlbfs/inode.c @@ -206,7 +206,7 @@ hugetlb_get_unmapped_area_bottomup(struc info.flags = 0; info.length = len; info.low_limit = current->mm->mmap_base; - info.high_limit = TASK_SIZE; + info.high_limit = arch_get_mmap_end(addr); info.align_mask = PAGE_MASK & ~huge_page_mask(h); info.align_offset = 0; return vm_unmapped_area(&info); @@ -222,7 +222,7 @@ hugetlb_get_unmapped_area_topdown(struct info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; info.low_limit = max(PAGE_SIZE, mmap_min_addr); - info.high_limit = current->mm->mmap_base; + info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base); info.align_mask = PAGE_MASK & ~huge_page_mask(h); info.align_offset = 0; addr = vm_unmapped_area(&info); @@ -237,7 +237,7 @@ hugetlb_get_unmapped_area_topdown(struct VM_BUG_ON(addr != -ENOMEM); info.flags = 0; info.low_limit = current->mm->mmap_base; - info.high_limit = TASK_SIZE; + info.high_limit = arch_get_mmap_end(addr); addr = vm_unmapped_area(&info); } @@ -251,6 +251,7 @@ hugetlb_get_unmapped_area(struct file *f struct mm_struct *mm = current->mm; struct vm_area_struct *vma; struct hstate *h = hstate_file(file); + const unsigned long mmap_end = arch_get_mmap_end(addr); if (len & ~huge_page_mask(h)) return -EINVAL; @@ -266,7 +267,7 @@ hugetlb_get_unmapped_area(struct file *f if (addr) { addr = ALIGN(addr, huge_page_size(h)); vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && + if (mmap_end - len >= addr && (!vma || addr + len <= vm_start_gap(vma))) return addr; } --- a/include/linux/sched/mm.h~mm-hugetlbfs-allow-for-high-userspace-addresses +++ a/include/linux/sched/mm.h @@ -136,6 +136,14 @@ static inline void mm_update_next_owner( #endif /* CONFIG_MEMCG */ #ifdef CONFIG_MMU +#ifndef arch_get_mmap_end +#define arch_get_mmap_end(addr) (TASK_SIZE) +#endif + +#ifndef arch_get_mmap_base +#define arch_get_mmap_base(addr, base) (base) +#endif + extern void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack); extern unsigned long --- a/mm/mmap.c~mm-hugetlbfs-allow-for-high-userspace-addresses +++ a/mm/mmap.c @@ -2117,14 +2117,6 @@ unsigned long vm_unmapped_area(struct vm return addr; } -#ifndef arch_get_mmap_end -#define arch_get_mmap_end(addr) (TASK_SIZE) -#endif - -#ifndef arch_get_mmap_base -#define arch_get_mmap_base(addr, base) (base) -#endif - /* Get an address range which is currently unmapped. * For shmat() with addr=0. * From patchwork Thu Apr 21 23:35:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5B2DC433F5 for ; Thu, 21 Apr 2022 23:35:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E2DA6B0071; Thu, 21 Apr 2022 19:35:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 790C46B007D; Thu, 21 Apr 2022 19:35:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 682DE6B007E; Thu, 21 Apr 2022 19:35:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 59EEC6B0071 for ; Thu, 21 Apr 2022 19:35:53 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 35261120148 for ; Thu, 21 Apr 2022 23:35:53 +0000 (UTC) X-FDA: 79382496186.27.0D65756 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf04.hostedemail.com (Postfix) with ESMTP id 8F1874001A for ; Thu, 21 Apr 2022 23:35:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 479D6B82936; Thu, 21 Apr 2022 23:35:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2D34C385A7; Thu, 21 Apr 2022 23:35:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584150; bh=jEX6oGkjPnokPyiiCeNv+6w1fLQkDgMbOZS5HSm4XnA=; h=Date:To:From:In-Reply-To:Subject:From; b=ihS8SOPygXy8lxXmcwfZZdl/YraysyI7OGe8Wf1G+4/nXTsWqNIgqSWmFen564aJH TTlD4IJ46Rd1SYmPVdaJ1aEUoRN4sWtkGHg+T7z8Oc+pdlHO6QTW9p4pfJQj2dDIO7 xjgYrAY0iOCD3CEFIsfsFIewFkvtJJrB51G+Gvk8= Date: Thu, 21 Apr 2022 16:35:49 -0700 To: skhan@linuxfoundation.org,sidhartha.kumar@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 06/13] selftest/vm: verify mmap addr in mremap_test Message-Id: <20220421233549.F2D34C385A7@smtp.kernel.org> Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ihS8SOPy; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8F1874001A X-Stat-Signature: pardbuf5x6sqa6paw59b4f9rjtuwjrm9 X-HE-Tag: 1650584150-244157 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sidhartha Kumar Subject: selftest/vm: verify mmap addr in mremap_test Avoid calling mmap with requested addresses that are less than the system's mmap_min_addr. When run as root, mmap returns EACCES when trying to map addresses < mmap_min_addr. This is not one of the error codes for the condition to retry the mmap in the test. Rather than arbitrarily retrying on EACCES, don't attempt an mmap until addr > vm.mmap_min_addr. Add a munmap call after an alignment check as the mappings are retained after the retry and can reach the vm.max_map_count sysctl. Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar Reviewed-by: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/mremap_test.c | 41 ++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) --- a/tools/testing/selftests/vm/mremap_test.c~selftest-vm-verify-mmap-addr-in-mremap_test +++ a/tools/testing/selftests/vm/mremap_test.c @@ -6,6 +6,7 @@ #include #include +#include #include #include #include @@ -63,6 +64,35 @@ enum { .expect_failure = should_fail \ } +/* Returns mmap_min_addr sysctl tunable from procfs */ +static unsigned long long get_mmap_min_addr(void) +{ + FILE *fp; + int n_matched; + static unsigned long long addr; + + if (addr) + return addr; + + fp = fopen("/proc/sys/vm/mmap_min_addr", "r"); + if (fp == NULL) { + ksft_print_msg("Failed to open /proc/sys/vm/mmap_min_addr: %s\n", + strerror(errno)); + exit(KSFT_SKIP); + } + + n_matched = fscanf(fp, "%llu", &addr); + if (n_matched != 1) { + ksft_print_msg("Failed to read /proc/sys/vm/mmap_min_addr: %s\n", + strerror(errno)); + fclose(fp); + exit(KSFT_SKIP); + } + + fclose(fp); + return addr; +} + /* * Returns the start address of the mapping on success, else returns * NULL on failure. @@ -71,8 +101,15 @@ static void *get_source_mapping(struct c { unsigned long long addr = 0ULL; void *src_addr = NULL; + unsigned long long mmap_min_addr; + + mmap_min_addr = get_mmap_min_addr(); + retry: addr += c.src_alignment; + if (addr < mmap_min_addr) + goto retry; + src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE, MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED, -1, 0); @@ -90,8 +127,10 @@ retry: * alignment in the tests. */ if (((unsigned long long) src_addr & (c.src_alignment - 1)) || - !((unsigned long long) src_addr & c.src_alignment)) + !((unsigned long long) src_addr & c.src_alignment)) { + munmap(src_addr, c.region_size); goto retry; + } if (!src_addr) goto error; From patchwork Thu Apr 21 23:35:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F603C433EF for ; Thu, 21 Apr 2022 23:35:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B18616B007D; Thu, 21 Apr 2022 19:35:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC6906B007E; Thu, 21 Apr 2022 19:35:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98DAC6B0080; Thu, 21 Apr 2022 19:35:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 8983A6B007D for ; Thu, 21 Apr 2022 19:35:56 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5D7BC229F0 for ; Thu, 21 Apr 2022 23:35:56 +0000 (UTC) X-FDA: 79382496312.04.209AFE8 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf21.hostedemail.com (Postfix) with ESMTP id 3906F1C0024 for ; Thu, 21 Apr 2022 23:35:54 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 52982B829AA; Thu, 21 Apr 2022 23:35:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E24B5C385A7; Thu, 21 Apr 2022 23:35:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584153; bh=rlWd4GU0TXuFAdVrmoDjHHyTBnnvfudPC1YHq9S16ME=; h=Date:To:From:In-Reply-To:Subject:From; b=TOFwfCFTmU6WrBcbhovLgtQNZWBTFVEbu/WxV+i+u1wBHVhWtsBY85Yke79pr+qgx QAtxDXnV7KwUsbC1rhc8ib4YbQBBDsKSHVU3XcaZSMDJOEKi2UPWiR9yJ3f+YgGtFe Ih7xjUpD8dHvq/NGn43Uk7O/LE4SEa8GeSOQ6Mxk= Date: Thu, 21 Apr 2022 16:35:52 -0700 To: skhan@linuxfoundation.org,sidhartha.kumar@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 07/13] selftest/vm: verify remap destination address in mremap_test Message-Id: <20220421233552.E24B5C385A7@smtp.kernel.org> X-Stat-Signature: j4xahdcf668oq67tgkayjw7bam61i48k X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=TOFwfCFT; dmarc=none; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3906F1C0024 X-HE-Tag: 1650584154-641614 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sidhartha Kumar Subject: selftest/vm: verify remap destination address in mremap_test Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy existing mappings. This causes a segfault when regions such as text are remapped and the permissions are changed. Verify the requested mremap destination address does not overlap any existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep incrementing the destination address until a valid mapping is found or fail the current test once the max address is reached. Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar Reviewed-by: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/mremap_test.c | 42 +++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) --- a/tools/testing/selftests/vm/mremap_test.c~selftest-vm-verify-remap-destination-address-in-mremap_test +++ a/tools/testing/selftests/vm/mremap_test.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "../kselftest.h" @@ -64,6 +65,30 @@ enum { .expect_failure = should_fail \ } +/* + * Returns false if the requested remap region overlaps with an + * existing mapping (e.g text, stack) else returns true. + */ +static bool is_remap_region_valid(void *addr, unsigned long long size) +{ + void *remap_addr = NULL; + bool ret = true; + + /* Use MAP_FIXED_NOREPLACE flag to ensure region is not mapped */ + remap_addr = mmap(addr, size, PROT_READ | PROT_WRITE, + MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED, + -1, 0); + + if (remap_addr == MAP_FAILED) { + if (errno == EEXIST) + ret = false; + } else { + munmap(remap_addr, size); + } + + return ret; +} + /* Returns mmap_min_addr sysctl tunable from procfs */ static unsigned long long get_mmap_min_addr(void) { @@ -111,8 +136,8 @@ retry: goto retry; src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE, - MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED, - -1, 0); + MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED, + -1, 0); if (src_addr == MAP_FAILED) { if (errno == EPERM || errno == EEXIST) goto retry; @@ -179,9 +204,20 @@ static long long remap_region(struct con if (!((unsigned long long) addr & c.dest_alignment)) addr = (void *) ((unsigned long long) addr | c.dest_alignment); + /* Don't destroy existing mappings unless expected to overlap */ + while (!is_remap_region_valid(addr, c.region_size) && !c.overlapping) { + /* Check for unsigned overflow */ + if (addr + c.dest_alignment < addr) { + ksft_print_msg("Couldn't find a valid region to remap to\n"); + ret = -1; + goto out; + } + addr += c.dest_alignment; + } + clock_gettime(CLOCK_MONOTONIC, &t_start); dest_addr = mremap(src_addr, c.region_size, c.region_size, - MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr); + MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr); clock_gettime(CLOCK_MONOTONIC, &t_end); if (dest_addr == MAP_FAILED) { From patchwork Thu Apr 21 23:35:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78060C43219 for ; Thu, 21 Apr 2022 23:35:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B98E6B007E; Thu, 21 Apr 2022 19:35:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 066C36B0080; Thu, 21 Apr 2022 19:35:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFCF76B0081; Thu, 21 Apr 2022 19:35:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id D15966B007E for ; Thu, 21 Apr 2022 19:35:57 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id A12A812014A for ; Thu, 21 Apr 2022 23:35:57 +0000 (UTC) X-FDA: 79382496354.10.E8AA84C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id AF64918001B for ; Thu, 21 Apr 2022 23:35:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7846961EC1; Thu, 21 Apr 2022 23:35:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB9CAC385A7; Thu, 21 Apr 2022 23:35:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584155; bh=0saNveK0MxhvPf4d1SYMocPo57hTU9uX4KMm4m3Ue5Q=; h=Date:To:From:In-Reply-To:Subject:From; b=mUUWIIE/WDFo1ABXAeSqZ/ITgxk8haRigHVnbxYyyEsxrwo6musD4bxr2KFwUIgMZ 3Nuk7fG5qJ+HaZq2spFaWHMBLY6PtMP1pVH5XfK1lzAoXRY8kDpNDZzqQxYNApoLxX E4U2I1gA6vQJIwkJfHHACmXHsykPXWe6HeOOFs6E= Date: Thu, 21 Apr 2022 16:35:55 -0700 To: skhan@linuxfoundation.org,sidhartha.kumar@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 08/13] selftest/vm: support xfail in mremap_test Message-Id: <20220421233555.CB9CAC385A7@smtp.kernel.org> X-Stat-Signature: w94gj3jyiyhjyosbh8rg3uefmd86brnf X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="mUUWIIE/"; dmarc=none; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AF64918001B X-HE-Tag: 1650584155-350086 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sidhartha Kumar Subject: selftest/vm: support xfail in mremap_test Use ksft_test_result_xfail for the tests which are expected to fail. Link: https://lkml.kernel.org/r/20220420215721.4868-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar Reviewed-by: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/mremap_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/tools/testing/selftests/vm/mremap_test.c~selftest-vm-support-xfail-in-mremap_test +++ a/tools/testing/selftests/vm/mremap_test.c @@ -268,7 +268,7 @@ static void run_mremap_test_case(struct if (remap_time < 0) { if (test_case.expect_failure) - ksft_test_result_pass("%s\n\tExpected mremap failure\n", + ksft_test_result_xfail("%s\n\tExpected mremap failure\n", test_case.name); else { ksft_test_result_fail("%s\n", test_case.name); From patchwork Thu Apr 21 23:35:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822492 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B029C433EF for ; Thu, 21 Apr 2022 23:36:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A09E6B0080; Thu, 21 Apr 2022 19:36:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 150936B0081; Thu, 21 Apr 2022 19:36:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 018216B0082; Thu, 21 Apr 2022 19:36:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id E6CCD6B0080 for ; Thu, 21 Apr 2022 19:36:01 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B2F9219F for ; Thu, 21 Apr 2022 23:36:01 +0000 (UTC) X-FDA: 79382496522.06.98965FE Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf06.hostedemail.com (Postfix) with ESMTP id 67828180008 for ; Thu, 21 Apr 2022 23:36:00 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 175EDB829A3; Thu, 21 Apr 2022 23:36:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AFD30C385A7; Thu, 21 Apr 2022 23:35:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584158; bh=+PTbRr32hR/2e7Qt1ov+7DavlYZYTFl1RC97L9AIeqc=; h=Date:To:From:In-Reply-To:Subject:From; b=u8qnaDVtuV/1xHggVMild2b9lYpUk58aKt5nTvjiacQEB7UYlTVS0nFWdWbDsT+la Uqsacaowh+GIjaW3OIdD550W/m38H/OOceUTzQYD1IEV3iaS7d1yyb08Tu2xC7uAUM 4V5jQiOLeoBiCNcoxUQ2LUPpGaycTPoqiRlHo6kA= Date: Thu, 21 Apr 2022 16:35:58 -0700 To: skhan@linuxfoundation.org,sidhartha.kumar@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 09/13] selftest/vm: add skip support to mremap_test Message-Id: <20220421233558.AFD30C385A7@smtp.kernel.org> Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=u8qnaDVt; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: fgp5134nkeejq8ze9ua89f3uwinnpag3 X-Rspamd-Queue-Id: 67828180008 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1650584160-952584 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sidhartha Kumar Subject: selftest/vm: add skip support to mremap_test Allow the mremap test to be skipped due to errors such as failing to parse the mmap_min_addr sysctl. Link: https://lkml.kernel.org/r/20220420215721.4868-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar Reviewed-by: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/run_vmtests.sh | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/tools/testing/selftests/vm/run_vmtests.sh~selftest-vm-add-skip-support-to-mremap_test +++ a/tools/testing/selftests/vm/run_vmtests.sh @@ -291,11 +291,16 @@ echo "-------------------" echo "running mremap_test" echo "-------------------" ./mremap_test -if [ $? -ne 0 ]; then +ret_val=$? + +if [ $ret_val -eq 0 ]; then + echo "[PASS]" +elif [ $ret_val -eq $ksft_skip ]; then + echo "[SKIP]" + exitcode=$ksft_skip +else echo "[FAIL]" exitcode=1 -else - echo "[PASS]" fi echo "-----------------" From patchwork Thu Apr 21 23:36:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C65E6C433EF for ; Thu, 21 Apr 2022 23:36:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 598C46B0081; Thu, 21 Apr 2022 19:36:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 548056B0082; Thu, 21 Apr 2022 19:36:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E8536B0083; Thu, 21 Apr 2022 19:36:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 315156B0081 for ; Thu, 21 Apr 2022 19:36:05 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 035086072B for ; Thu, 21 Apr 2022 23:36:04 +0000 (UTC) X-FDA: 79382496690.30.21C23DD Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf26.hostedemail.com (Postfix) with ESMTP id E49CE140012 for ; Thu, 21 Apr 2022 23:36:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3B023B829A3; Thu, 21 Apr 2022 23:36:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2D31C385A5; Thu, 21 Apr 2022 23:36:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584162; bh=77vg5Iz2wUpAfeZuamuPjE96MamAPIkQNqzjGYJf6Zk=; h=Date:To:From:In-Reply-To:Subject:From; b=eMVt6r3yLd17FYKkKHu55GWiQI66lDfl9cciKmdmaRrwxbZz0n1cV8NFdGRa3rBRN nOMrMOxbSi/u6XktKro8P+kuStPHM5DCnuid7CILsBl5oJKKzdgD8nqK6xobf71NiA sTkpeWnBUekdHyMuqdWhyT56bHPpWQQN4naCPpTY= Date: Thu, 21 Apr 2022 16:36:01 -0700 To: vincent.guittot@linaro.org,tglx@linutronix.de,stable@vger.kernel.org,rostedt@goodmis.org,rientjes@google.com,peterz@infradead.org,mingo@redhat.com,mhocko@suse.com,mgorman@suse.de,longman@redhat.com,juri.lelli@redhat.com,jsavitz@redhat.com,herton@redhat.com,dvhart@infradead.org,dietmar.eggemann@arm.com,dave@stgolabs.net,bsegall@google.com,bristot@redhat.com,aquini@redhat.com,aarcange@redhat.com,npache@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 10/13] oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup Message-Id: <20220421233601.E2D31C385A5@smtp.kernel.org> Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=eMVt6r3y; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: spir5768f8mz1pa4gko63x6fowim9zbu X-Rspamd-Queue-Id: E49CE140012 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1650584162-173260 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nico Pache Subject: oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 ------------------------------------------------------------------------ page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. [1] https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz Signed-off-by: Nico Pache Co-developed-by: Joel Savitz Suggested-by: Thomas Gleixner Acked-by: Thomas Gleixner Acked-by: Michal Hocko Cc: Rafael Aquini Cc: Waiman Long Cc: Herton R. Krzesinski Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: David Rientjes Cc: Andrea Arcangeli Cc: Davidlohr Bueso Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Joel Savitz Cc: Darren Hart Cc: Signed-off-by: Andrew Morton --- include/linux/sched.h | 1 mm/oom_kill.c | 54 +++++++++++++++++++++++++++++----------- 2 files changed, 41 insertions(+), 14 deletions(-) --- a/include/linux/sched.h~oom_killc-futex-delay-the-oom-reaper-to-allow-time-for-proper-futex-cleanup +++ a/include/linux/sched.h @@ -1443,6 +1443,7 @@ struct task_struct { int pagefault_disabled; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; + struct timer_list oom_reaper_timer; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; --- a/mm/oom_kill.c~oom_killc-futex-delay-the-oom-reaper-to-allow-time-for-proper-futex-cleanup +++ a/mm/oom_kill.c @@ -632,7 +632,7 @@ done: */ set_bit(MMF_OOM_SKIP, &mm->flags); - /* Drop a reference taken by wake_oom_reaper */ + /* Drop a reference taken by queue_oom_reaper */ put_task_struct(tsk); } @@ -644,12 +644,12 @@ static int oom_reaper(void *unused) struct task_struct *tsk = NULL; wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL); - spin_lock(&oom_reaper_lock); + spin_lock_irq(&oom_reaper_lock); if (oom_reaper_list != NULL) { tsk = oom_reaper_list; oom_reaper_list = tsk->oom_reaper_list; } - spin_unlock(&oom_reaper_lock); + spin_unlock_irq(&oom_reaper_lock); if (tsk) oom_reap_task(tsk); @@ -658,22 +658,48 @@ static int oom_reaper(void *unused) return 0; } -static void wake_oom_reaper(struct task_struct *tsk) +static void wake_oom_reaper(struct timer_list *timer) { - /* mm is already queued? */ - if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) + struct task_struct *tsk = container_of(timer, struct task_struct, + oom_reaper_timer); + struct mm_struct *mm = tsk->signal->oom_mm; + unsigned long flags; + + /* The victim managed to terminate on its own - see exit_mmap */ + if (test_bit(MMF_OOM_SKIP, &mm->flags)) { + put_task_struct(tsk); return; + } - get_task_struct(tsk); - - spin_lock(&oom_reaper_lock); + spin_lock_irqsave(&oom_reaper_lock, flags); tsk->oom_reaper_list = oom_reaper_list; oom_reaper_list = tsk; - spin_unlock(&oom_reaper_lock); + spin_unlock_irqrestore(&oom_reaper_lock, flags); trace_wake_reaper(tsk->pid); wake_up(&oom_reaper_wait); } +/* + * Give the OOM victim time to exit naturally before invoking the oom_reaping. + * The timers timeout is arbitrary... the longer it is, the longer the worst + * case scenario for the OOM can take. If it is too small, the oom_reaper can + * get in the way and release resources needed by the process exit path. + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped + * before the exit path is able to wake the futex waiters. + */ +#define OOM_REAPER_DELAY (2*HZ) +static void queue_oom_reaper(struct task_struct *tsk) +{ + /* mm is already queued? */ + if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) + return; + + get_task_struct(tsk); + timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); + tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; + add_timer(&tsk->oom_reaper_timer); +} + static int __init oom_init(void) { oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper"); @@ -681,7 +707,7 @@ static int __init oom_init(void) } subsys_initcall(oom_init) #else -static inline void wake_oom_reaper(struct task_struct *tsk) +static inline void queue_oom_reaper(struct task_struct *tsk) { } #endif /* CONFIG_MMU */ @@ -932,7 +958,7 @@ static void __oom_kill_process(struct ta rcu_read_unlock(); if (can_oom_reap) - wake_oom_reaper(victim); + queue_oom_reaper(victim); mmdrop(mm); put_task_struct(victim); @@ -968,7 +994,7 @@ static void oom_kill_process(struct oom_ task_lock(victim); if (task_will_free_mem(victim)) { mark_oom_victim(victim); - wake_oom_reaper(victim); + queue_oom_reaper(victim); task_unlock(victim); put_task_struct(victim); return; @@ -1067,7 +1093,7 @@ bool out_of_memory(struct oom_control *o */ if (task_will_free_mem(current)) { mark_oom_victim(current); - wake_oom_reaper(current); + queue_oom_reaper(current); return true; } From patchwork Thu Apr 21 23:36:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEC38C4332F for ; Thu, 21 Apr 2022 23:36:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43B896B0083; Thu, 21 Apr 2022 19:36:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E96E6B0085; Thu, 21 Apr 2022 19:36:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B1576B0087; Thu, 21 Apr 2022 19:36:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 13EFF6B0083 for ; Thu, 21 Apr 2022 19:36:11 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EEF2122A15 for ; Thu, 21 Apr 2022 23:36:07 +0000 (UTC) X-FDA: 79382496774.03.360DF0D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id B5C444000F for ; Thu, 21 Apr 2022 23:36:03 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5234DB829AA; Thu, 21 Apr 2022 23:36:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB9B0C385A7; Thu, 21 Apr 2022 23:36:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584165; bh=MzcMSe+kMEsRMZ4YRLUrtJdcODmvPwMv63+fwiOkfm4=; h=Date:To:From:In-Reply-To:Subject:From; b=Fm9Xl7FU3ieQ2fFOI6Rwf8+JMlYE3FSYgXzBgmYgizSwEUueeig/V0/W6u5jF+j9t Zd7wxPHbBcqKLvWybXT1KbdqzDqlcm3vaLsNAXEX//ONo7HsrQafXx8MM7rynAsUAJ kpIYvcSsI4eK7rBETIJt25icgCVMOKuUSS+G4TKo= Date: Thu, 21 Apr 2022 16:36:04 -0700 To: ryabinin.a.a@gmail.com,glider@google.com,dvyukov@google.com,andreyknvl@gmail.com,vincenzo.frascino@arm.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 11/13] MAINTAINERS: add Vincenzo Frascino to KASAN reviewers Message-Id: <20220421233604.DB9B0C385A7@smtp.kernel.org> X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B5C444000F X-Stat-Signature: fxmi9ofqikpzrtbfdgwd8ybs16xnuddd Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Fm9Xl7FU; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1650584163-318223 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vincenzo Frascino Subject: MAINTAINERS: add Vincenzo Frascino to KASAN reviewers Add my email address to KASAN reviewers list to make sure that I am Cc'ed in all the KASAN changes that may affect arm64 MTE. Link: https://lkml.kernel.org/r/20220419170640.21404-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino Cc: Andrey Ryabinin Cc: Andrey Konovalov Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) --- a/MAINTAINERS~maintainers-add-vincenzo-frascino-to-kasan-reviewers +++ a/MAINTAINERS @@ -10549,6 +10549,7 @@ M: Andrey Ryabinin R: Andrey Konovalov R: Dmitry Vyukov +R: Vincenzo Frascino L: kasan-dev@googlegroups.com S: Maintained F: Documentation/dev-tools/kasan.rst From patchwork Thu Apr 21 23:36:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822494 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54FF1C433F5 for ; Thu, 21 Apr 2022 23:36:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFD626B0082; Thu, 21 Apr 2022 19:36:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAE706B0083; Thu, 21 Apr 2022 19:36:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C73EF6B0085; Thu, 21 Apr 2022 19:36:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id B8E796B0082 for ; Thu, 21 Apr 2022 19:36:09 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8F1BD60279 for ; Thu, 21 Apr 2022 23:36:09 +0000 (UTC) X-FDA: 79382496858.01.CDBFAFC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf21.hostedemail.com (Postfix) with ESMTP id A617F1C0024 for ; Thu, 21 Apr 2022 23:36:07 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9062161EB9; Thu, 21 Apr 2022 23:36:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7E09C385A7; Thu, 21 Apr 2022 23:36:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584168; bh=IeHWkgVqOFZYBcD3UCEb6fsykXZrS6zVdkxAwURYZ8o=; h=Date:To:From:In-Reply-To:Subject:From; b=hWKhkPcddd81NiKFF2uKb+B8tduVIelemb/cNJk9mwRYCh7JZ81wSHbwyjP6Jjtq6 ER+TqT30mv4XUjKpRX4TjCGQtqxFkeC1cxQjeFp3uC1Afo4sLzBysb42faNHyq6G0f dKj6KhalQPdGZpNs/3c2hmkDTsCh9HbKpZgkbvdA= Date: Thu, 21 Apr 2022 16:36:07 -0700 To: tarasmadan@google.com,glider@google.com,elver@google.com,dvyukov@google.com,bigeasy@linutronix.de,andreyknvl@gmail.com,nogikh@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 12/13] kcov: don't generate a warning on vm_insert_page()'s failure Message-Id: <20220421233607.E7E09C385A7@smtp.kernel.org> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A617F1C0024 X-Stat-Signature: jt114m85asn9mbsec3qzyht4f7setrhm Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=hWKhkPcd; dmarc=none; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1650584167-140531 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aleksandr Nogikh Subject: kcov: don't generate a warning on vm_insert_page()'s failure vm_insert_page()'s failure is not an unexpected condition, so don't do WARN_ONCE() in such a case. Instead, print a kernel message and just return an error code. This flaw has been reported under an OOM condition by sysbot (https://lkml.kernel.org/r/Ylkr2xrVbhQYwNLf@elver.google.com). The message is mainly for the benefit of the test log, in this case the fuzzer's log so that humans inspecting the log can figure out what was going on. KCOV is a testing tool, so I think being a little more chatty when KCOV unexpectedly is about to fail will save someone debugging time. We don't want the WARN, because it's not a kernel bug that syzbot should report, and failure can happen if the fuzzer tries hard enough (as above). Link: https://lkml.kernel.org/r/20220401182512.249282-1-nogikh@google.com Fixes: b3d7fe86fbd0 ("kcov: properly handle subsequent mmap calls"), Signed-off-by: Aleksandr Nogikh Acked-by: Marco Elver Cc: Dmitry Vyukov Cc: Andrey Konovalov Cc: Alexander Potapenko Cc: Taras Madan Cc: Sebastian Andrzej Siewior Signed-off-by: Andrew Morton --- kernel/kcov.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/kernel/kcov.c~kcov-dont-generate-a-warning-on-vm_insert_pages-failure +++ a/kernel/kcov.c @@ -475,8 +475,11 @@ static int kcov_mmap(struct file *filep, vma->vm_flags |= VM_DONTEXPAND; for (off = 0; off < size; off += PAGE_SIZE) { page = vmalloc_to_page(kcov->area + off); - if (vm_insert_page(vma, vma->vm_start + off, page)) - WARN_ONCE(1, "vm_insert_page() failed"); + res = vm_insert_page(vma, vma->vm_start + off, page); + if (res) { + pr_warn_once("kcov: vm_insert_page() failed\n"); + return res; + } } return 0; exit: From patchwork Thu Apr 21 23:36:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12822496 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB684C433F5 for ; Thu, 21 Apr 2022 23:36:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C3D46B0085; Thu, 21 Apr 2022 19:36:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34E516B0087; Thu, 21 Apr 2022 19:36:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 213256B0088; Thu, 21 Apr 2022 19:36:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 0F6FF6B0085 for ; Thu, 21 Apr 2022 19:36:13 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id DC67F60749 for ; Thu, 21 Apr 2022 23:36:12 +0000 (UTC) X-FDA: 79382496984.30.BEF29C9 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id B1A3714002A for ; Thu, 21 Apr 2022 23:36:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 94C1A61EB9; Thu, 21 Apr 2022 23:36:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E9EFDC385AB; Thu, 21 Apr 2022 23:36:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1650584171; bh=Rz872EwhpkJz/H2MaVr7yZn1UCITr1zMVSOzm3WIcy8=; h=Date:To:From:In-Reply-To:Subject:From; b=mvbYeQaV5E6rrgsDY1NUacWjwhOxK4oaLqhjNg/ovRsVV64CKNx/ir2NYp8iJvYVM Gx3SNERDUHQFxuI8lvvqleC24ZIKx2VuLiHOHNbftA2EYlBYVaf+gsV5+5aqpNjRS+ F1s/kyWRat5PpPZ4y+zEeIdfDjqXaXi1mUZ0jLjw= Date: Thu, 21 Apr 2022 16:36:10 -0700 To: stable@vger.kernel.org,rcampbell@nvidia.com,jhubbard@nvidia.com,jgg@nvidia.com,christian.koenig@amd.com,apopple@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220421163508.66028a9ac2d9fb6ea05b1342@linux-foundation.org> Subject: [patch 13/13] mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() Message-Id: <20220421233610.E9EFDC385AB@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B1A3714002A X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=mvbYeQaV; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: s3o3qmxtn7h4tfjep8d5417m4u3x7wyd X-HE-Tag: 1650584169-46325 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alistair Popple Subject: mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() In some cases it is possible for mmu_interval_notifier_remove() to race with mn_tree_inv_end() allowing it to return while the notifier data structure is still in use. Consider the following sequence: CPU0 - mn_tree_inv_end() CPU1 - mmu_interval_notifier_remove() ----------------------------------- ------------------------------------ spin_lock(subscriptions->lock); seq = subscriptions->invalidate_seq; spin_lock(subscriptions->lock); spin_unlock(subscriptions->lock); subscriptions->invalidate_seq++; wait_event(invalidate_seq != seq); return; interval_tree_remove(interval_sub); kfree(interval_sub); spin_unlock(subscriptions->lock); wake_up_all(); As the wait_event() condition is true it will return immediately. This can lead to use-after-free type errors if the caller frees the data structure containing the interval notifier subscription while it is still on a deferred list. Fix this by taking the appropriate lock when reading invalidate_seq to ensure proper synchronisation. I observed this whilst running stress testing during some development. You do have to be pretty unlucky, but it leads to the usual problems of use-after-free (memory corruption, kernel crash, difficult to diagnose WARN_ON, etc). Link: https://lkml.kernel.org/r/20220420043734.476348-1-apopple@nvidia.com Fixes: 99cb252f5e68 ("mm/mmu_notifier: add an interval tree notifier") Signed-off-by: Alistair Popple Signed-off-by: Jason Gunthorpe Cc: Christian König Cc: John Hubbard Cc: Ralph Campbell Cc: Signed-off-by: Andrew Morton --- mm/mmu_notifier.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/mm/mmu_notifier.c~mm-mmu_notifierc-fix-race-in-mmu_interval_notifier_remove +++ a/mm/mmu_notifier.c @@ -1036,6 +1036,18 @@ int mmu_interval_notifier_insert_locked( } EXPORT_SYMBOL_GPL(mmu_interval_notifier_insert_locked); +static bool +mmu_interval_seq_released(struct mmu_notifier_subscriptions *subscriptions, + unsigned long seq) +{ + bool ret; + + spin_lock(&subscriptions->lock); + ret = subscriptions->invalidate_seq != seq; + spin_unlock(&subscriptions->lock); + return ret; +} + /** * mmu_interval_notifier_remove - Remove a interval notifier * @interval_sub: Interval subscription to unregister @@ -1083,7 +1095,7 @@ void mmu_interval_notifier_remove(struct lock_map_release(&__mmu_notifier_invalidate_range_start_map); if (seq) wait_event(subscriptions->wq, - READ_ONCE(subscriptions->invalidate_seq) != seq); + mmu_interval_seq_released(subscriptions, seq)); /* pairs with mmgrab in mmu_interval_notifier_insert() */ mmdrop(mm);