From patchwork Fri Oct 16 02:49:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11840777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8667114B4 for ; Fri, 16 Oct 2020 02:49:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 372E7208E4 for ; Fri, 16 Oct 2020 02:49:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="RO13JgpU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 372E7208E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 35C6194006E; Thu, 15 Oct 2020 22:49:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 30C0194006D; Thu, 15 Oct 2020 22:49:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D50994006E; Thu, 15 Oct 2020 22:49:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id E09DF94006D for ; Thu, 15 Oct 2020 22:49:37 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 86E968249980 for ; Fri, 16 Oct 2020 02:49:37 +0000 (UTC) X-FDA: 77376257994.08.pin12_4d0170727219 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 68A8B1819E766 for ; Fri, 16 Oct 2020 02:49:37 +0000 (UTC) X-Spam-Summary: 50,0,0,33d62eb7516f3128,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:2:41:69:355:379:800:960:966:967:968:973:981:988:989:1260:1263:1345:1359:1381:1431:1437:1535:1605:1730:1747:1777:1792:2196:2199:2393:2525:2560:2565:2682:2685:2693:2859:2898:2899:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4049:4119:4250:4321:4385:4605:5007:6119:6261:6653:7576:7875:7903:7904:7974:8599:8660:8784:9010:9025:9545:9592:10004:10913:11026:11473:11658:11914:12043:12048:12294:12296:12297:12438:12517:12519:12555:12663:12679:12683:12783:12986:13148:13161:13229:13230:13255:13846:14096:21080:21324:21325:21451:21524:21627:21740:21795:21939:21990:30003:30012:30051:30054:30070:30079,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100;04ygydbmxb4pshgrudc8sk54nem6sopax5dt1g6tkuwurfbfuh5wnyhmdxac71n.onxfqd99cskqfgsef7wy7i967jeakouqzp fdmbsf5f X-HE-Tag: pin12_4d0170727219 X-Filterd-Recvd-Size: 8287 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 16 Oct 2020 02:49:36 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 77CE22145D; Fri, 16 Oct 2020 02:49:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602816576; bh=mcabKatnSrGv9eQwPptKXwhqXaLyNcySWns1IXkEmLc=; h=Date:From:To:Subject:In-Reply-To:From; b=RO13JgpUnGP44S5SmSi77XFiyq36hBaCfWXW1wm+TJjRXK8vq88mbiyFmOhWx6eT9 1e2QPMwVL4HAdlzkojWFu6PDgVkM1UDXwYQX+jVEcmzJQwFIkUFzW/uHyeQgFRlyZr KtVcgAHRuqcJI8xu2yJ2R2yTXdvM8ULfo1Np0nfY= Date: Thu, 15 Oct 2020 19:49:34 -0700 From: Andrew Morton To: akpm@linux-foundation.org, ebiederm@xmission.com, hch@lst.de, hughd@google.com, jannh@google.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, oleg@redhat.com, torvalds@linux-foundation.org, viro@zeniv.linux.org.uk Subject: [patch 136/156] binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU Message-ID: <20201016024934.Y5feZQ2LY%akpm@linux-foundation.org> In-Reply-To: <20201015192732.f448da14e9854c7cb7299956@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jann Horn Subject: binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5. At the moment, we have that rather ugly mmget_still_valid() helper to work around : ELF core dumping doesn't take the mmap_sem while traversing the task's VMAs, and if anything (like userfaultfd) then remotely messes with the VMA tree, fireworks ensue. So at the moment we use mmget_still_valid() to bail out in any writers that might be operating on a remote mm's VMAs. With this series, I'm trying to get rid of the need for that as cleanly as possible. ("cleanly" meaning "avoid holding the mmap_lock across unbounded sleeps".) Patches 1, 2, 3 and 4 are relatively unrelated cleanups in the core dumping code. Patches 5 and 6 implement the main change: Instead of repeatedly accessing the VMA list with sleeps in between, we snapshot it at the start with proper locking, and then later we just use our copy of the VMA list. This ensures that the kernel won't crash, that VMA metadata in the coredump is consistent even in the presence of concurrent modifications, and that any virtual addresses that aren't being concurrently modified have their contents show up in the core dump properly. The disadvantage of this approach is that we need a bit more memory during core dumping for storing metadata about all VMAs. At the end of the series, patch 7 removes the old workaround for this issue (mmget_still_valid()). I have tested: - Creating a simple core dump on X86-64 still works. - The created coredump on X86-64 opens in GDB and looks plausible. - X86-64 core dumps contain the first page for executable mappings at offset 0, and don't contain the first page for non-executable file mappings or executable mappings at offset !=0. - NOMMU 32-bit ARM can still generate plausible-looking core dumps through the FDPIC implementation. (I can't test this with GDB because GDB is missing some structure definition for nommu ARM, but I've poked around in the hexdump and it looked decent.) This patch (of 7): dump_emit() is for kernel pointers, and VMAs describe userspace memory. Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, even if it probably doesn't matter much on !MMU systems - especially given that it looks like we can just use the same get_dump_page() as on MMU if we move it out of the CONFIG_MMU block. One small change we have to make in get_dump_page() is to use __get_user_pages_locked() instead of __get_user_pages(), since the latter doesn't exist on nommu. On mmu builds, __get_user_pages_locked() will just call __get_user_pages() for us. Link: http://lkml.kernel.org/r/20200827114932.3572699-1-jannh@google.com Link: http://lkml.kernel.org/r/20200827114932.3572699-2-jannh@google.com Signed-off-by: Jann Horn Acked-by: Linus Torvalds Cc: Christoph Hellwig Cc: Alexander Viro Cc: "Eric W . Biederman" Cc: Oleg Nesterov Cc: Hugh Dickins Signed-off-by: Andrew Morton --- fs/binfmt_elf_fdpic.c | 8 ----- mm/gup.c | 57 +++++++++++++++++++--------------------- 2 files changed, 28 insertions(+), 37 deletions(-) --- a/fs/binfmt_elf_fdpic.c~binfmt_elf_fdpic-stop-using-dump_emit-on-user-pointers-on-mmu +++ a/fs/binfmt_elf_fdpic.c @@ -1529,14 +1529,11 @@ static bool elf_fdpic_dump_segments(stru struct vm_area_struct *vma; for (vma = current->mm->mmap; vma; vma = vma->vm_next) { -#ifdef CONFIG_MMU unsigned long addr; -#endif if (!maydump(vma, cprm->mm_flags)) continue; -#ifdef CONFIG_MMU for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { bool res; @@ -1552,11 +1549,6 @@ static bool elf_fdpic_dump_segments(stru if (!res) return false; } -#else - if (!dump_emit(cprm, (void *) vma->vm_start, - vma->vm_end - vma->vm_start)) - return false; -#endif } return true; } --- a/mm/gup.c~binfmt_elf_fdpic-stop-using-dump_emit-on-user-pointers-on-mmu +++ a/mm/gup.c @@ -1490,35 +1490,6 @@ int __mm_populate(unsigned long start, u mmap_read_unlock(mm); return ret; /* 0 or negative error code */ } - -/** - * get_dump_page() - pin user page in memory while writing it to core dump - * @addr: user address - * - * Returns struct page pointer of user page pinned for dump, - * to be freed afterwards by put_page(). - * - * Returns NULL on any kind of failure - a hole must then be inserted into - * the corefile, to preserve alignment with its headers; and also returns - * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found - - * allowing a hole to be left in the corefile to save diskspace. - * - * Called without mmap_lock, but after all other threads have been killed. - */ -#ifdef CONFIG_ELF_CORE -struct page *get_dump_page(unsigned long addr) -{ - struct vm_area_struct *vma; - struct page *page; - - if (__get_user_pages(current->mm, addr, 1, - FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma, - NULL) < 1) - return NULL; - flush_cache_page(vma, addr, page_to_pfn(page)); - return page; -} -#endif /* CONFIG_ELF_CORE */ #else /* CONFIG_MMU */ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start, unsigned long nr_pages, struct page **pages, @@ -1564,6 +1535,34 @@ finish_or_fault: } #endif /* !CONFIG_MMU */ +/** + * get_dump_page() - pin user page in memory while writing it to core dump + * @addr: user address + * + * Returns struct page pointer of user page pinned for dump, + * to be freed afterwards by put_page(). + * + * Returns NULL on any kind of failure - a hole must then be inserted into + * the corefile, to preserve alignment with its headers; and also returns + * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found - + * allowing a hole to be left in the corefile to save diskspace. + * + * Called without mmap_lock, but after all other threads have been killed. + */ +#ifdef CONFIG_ELF_CORE +struct page *get_dump_page(unsigned long addr) +{ + struct vm_area_struct *vma; + struct page *page; + + if (__get_user_pages_locked(current->mm, addr, 1, &page, &vma, NULL, + FOLL_FORCE | FOLL_DUMP | FOLL_GET) < 1) + return NULL; + flush_cache_page(vma, addr, page_to_pfn(page)); + return page; +} +#endif /* CONFIG_ELF_CORE */ + #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA) static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) {