From patchwork Tue Aug 1 23:27:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Helge Deller X-Patchwork-Id: 13337355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BFAACEB64DD for ; Tue, 1 Aug 2023 23:28:26 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qQymU-0001MM-Sv; Tue, 01 Aug 2023 19:27:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qQymS-0001JO-KZ for qemu-devel@nongnu.org; Tue, 01 Aug 2023 19:27:56 -0400 Received: from mout.gmx.net ([212.227.17.21]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qQymP-000786-7M for qemu-devel@nongnu.org; Tue, 01 Aug 2023 19:27:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1690932468; x=1691537268; i=deller@gmx.de; bh=lAN249IyTXdxU+qyf5qo5ihznd6eja0cEzaQkKzVRDg=; h=X-UI-Sender-Class:From:To:Cc:Subject:Date:In-Reply-To:References; b=N7CJn+/jqJi/7qHQRoViMk25d2ZE8WR/PNI3x2dR79Fwq/G1/baWMhZSYvbuceKbuXIWFKu 5qY+DgGBjN3mxRMYks1azhn99zhtMXUq4LBrSvmvNmbCWygAp8uOWUgi16G+xtfyGq8yiaSXD bAkffJ8wk/TYKQ1nBIs5EbdsDqwmgLe8GseAFzJkhjOBa3nW6AlKujUXeWh6W2HzruGqTpQev 0OaFvAgK+R1AwirksN5mRhbfmqswDEYE2W+FMjLU6CHMbsSo11HngVY+7FGj4eqGECFNxmIiX 5M/P5kNJBEIVcNDSKnYa+4q6QLAtQwp+1CF/x7qz/UzzRUlwTrEA== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from p100.fritz.box ([94.134.150.247]) by mail.gmx.net (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MCKFu-1qaMhK2Gf4-009RlI; Wed, 02 Aug 2023 01:27:48 +0200 From: Helge Deller To: qemu-devel@nongnu.org Cc: Richard Henderson , Laurent Vivier , Paolo Bonzini , Joel Stanley , Akihiko Odaki , Helge Deller Subject: [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables Date: Wed, 2 Aug 2023 01:27:44 +0200 Message-ID: <20230801232745.4125-8-deller@gmx.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230801232745.4125-1-deller@gmx.de> References: <20230801232745.4125-1-deller@gmx.de> MIME-Version: 1.0 X-Provags-ID: V03:K1:wHB8943utChrnrZuINliwuivkUPx+xinjcGdb6oRRoDh4B10PkV H3nAVCjTXtm5nJ1MV0MNOr/GnP1euQFfxvw+bNEF0HzSuRQnJ6obLkC8YBOcojuOYUQinQD +kzpcyR46wvHi+p3FQC6PU6PPdFhK8UFLELWC9Br+BB0M6zyD4eisJSdNV3EShwRuZR0UEJ qDI/bro7//3WyKa/Pskdw== UI-OutboundReport: notjunk:1;M01:P0:kK2wHqQeajU=;x9uVhfPvK5WHWhvDx24BVvQgc2M DxSQlbp46+Bis6N4IfRm+D0EZaDUMMkP6XxPhHJs6ZeS8zqIOhZxHcRzMIGOQIWFp47h5Q6XO hmPmbqHgPLr+DIXXdB56anNptFZ/mqoMsdrMHHvj4sIrtf325vLdWz5G6whoZRcPGvJ3A7429 2FYm52CY1+M+GRlTY44Gtkl2tQr9DbgFnL5v6TyJ0y4I4swh5SjmHpDyiOaTmB5UbxNv0pNVm yrMW+szHBeeirXnhYCscD+bXSY/Orl7PY6SCpGOCrJmYlaIzJIxMADbRctxjbMv3obbmujkWn wWs6LjpgkE/DHQ0KzDSd6AysAkkgsJ3snAVwucoQpiOUgK962rTid57R3lFVTaW3cHFcebeS1 7oqgQpbAgJZ78gPwqi5141cXQssxSpYmn4ML/3ymPZ/JFs7l7rLaCFSgu3/IuNZLZal3gKzfq iKGXmEUZAOmgKWXH1BvOqb3uPEkSyBMSYMrciRsjCXV/ZITiQk8iDkZc1XQTBxGuiuoWEXdQu Xb0u5Asza/PdcwOuXUjMyTqCP4Uja06p6fLyuKolYtmssmoFt7k3rwuse3KnmbwLOmM11D5SU yimOxFTIJyuOKxHx05EeJnJdhZVPh3fdg+KQ/bSe/J/F9MDgD1ptjEtfSOUnswQgqNNYFUQ4+ CTziegnA+qdcrKYsUYpn4Z5QRgJBM7+huf4avv0vvcTJ6ZiZ7DjXmpxrxmN3P2K0yCPaYRjNG /YCZ4m+ohCDHxjywihpxJxQl3P9ckpwnnwwRDENZOKCX6DdxwrgmTirIiP5h5ZW6f21Gj004W 2r3Lw0EVbrCqR065prPVd/dueMHywk3yIysY+Co6MUw+nogZFDerHwYZerCxbuUYru2D1mytM m/7QNy5o1S+YNlbR9tqKgIS//5J1m+rWlLfDNaqoSnTuL3sOXh2EOBe+9cIXK/M1vNMYw2XQg 3Ce6FOcZZWL2aeR6d7tRosvxbNI= Received-SPF: pass client-ip=212.227.17.21; envelope-from=deller@gmx.de; helo=mout.gmx.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Reorganize the guest memory layout to get as much memory as possible for heap for the guest application. This patch optimizes the memory layout by loading pie executables into lower memory and shared libs into higher memory (at TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap space which will be located directly after the executable. Up to now, pie executable and shared libs were loaded directly behind each other in the area at TASK_UNMAPPED_BASE, which leaves very little space for heap. I tested this patchset with chroots of alpha, arm, armel, arm64, hppa, m68k, mips64el, mipsel, powerpc, ppc64, ppc64el, s390x, sh4 and sparc64 on a x86-64 host, and with a static armhf binary (which fails to run without this patch). This patch temporarily breaks the Thread Sanitizer (TSan) application which expects specific boundary definitions for memory mappings on different platforms [1], see commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for aarch64") for aarch64. The follow-up patch fixes it again. [1] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h Signed-off-by: Helge Deller --- linux-user/elfload.c | 55 +++++++++++++------------------------------- linux-user/mmap.c | 8 ++++--- 2 files changed, 21 insertions(+), 42 deletions(-) -- 2.41.0 diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 2aee2298ec..47a118e430 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int image_fd, abi_ulong load_addr, load_bias, loaddr, hiaddr, error; int i, retval, prot_exec; Error *err = NULL; + bool is_main_executable; /* First of all, some simple consistency checks */ if (!elf_check_ident(ehdr)) { @@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd, } } - if (pinterp_name != NULL) { - /* - * This is the main executable. - * - * Reserve extra space for brk. - * We hold on to this space while placing the interpreter - * and the stack, lest they be placed immediately after - * the data segment and block allocation from the brk. - * - * 16MB is chosen as "large enough" without being so large as - * to allow the result to not fit with a 32-bit guest on a - * 32-bit host. However some 64 bit guests (e.g. s390x) - * attempt to place their heap further ahead and currently - * nothing stops them smashing into QEMUs address space. - */ -#if TARGET_LONG_BITS == 64 - info->reserve_brk = 32 * MiB; -#else - info->reserve_brk = 16 * MiB; -#endif - hiaddr += info->reserve_brk; - + is_main_executable = (pinterp_name != NULL); + if (is_main_executable) { if (ehdr->e_type == ET_EXEC) { /* * Make sure that the low address does not conflict with @@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int image_fd, probe_guest_base(image_name, loaddr, hiaddr); } else { /* - * The binary is dynamic, but we still need to + * The binary is dynamic (pie-executabe), but we still need to * select guest_base. In this case we pass a size. */ probe_guest_base(image_name, 0, hiaddr - loaddr); @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd, */ load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE, MAP_PRIVATE | MAP_ANON | MAP_NORESERVE | - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0), + (is_main_executable ? MAP_FIXED : 0), -1, 0); if (load_addr == -1) { goto exit_mmap; @@ -3194,7 +3175,8 @@ static void load_elf_image(const char *image_name, int image_fd, info->end_code = 0; info->start_data = -1; info->end_data = 0; - info->brk = 0; + /* possible start for brk is behind all sections of this ELF file. */ + info->brk = TARGET_PAGE_ALIGN(hiaddr); info->elf_flags = ehdr->e_flags; prot_exec = PROT_EXEC; @@ -3288,9 +3270,6 @@ static void load_elf_image(const char *image_name, int image_fd, info->end_data = vaddr_ef; } } - if (vaddr_em > info->brk) { - info->brk = vaddr_em; - } #ifdef TARGET_MIPS } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) { Mips_elf_abiflags_v0 abiflags; @@ -3618,6 +3597,15 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info) if (elf_interpreter) { load_elf_interp(elf_interpreter, &interp_info, bprm->buf); + /* + * Use brk address of interpreter if it was loaded above the + * executable and leaves less than 16 MB for heap. + * This happens e.g. with static binaries on armhf. + */ + if (interp_info.brk > info->brk && + interp_info.load_bias - info->brk < 16 * MiB) { + info->brk = interp_info.brk; + } /* If the program interpreter is one of these two, then assume an iBCS2 image. Otherwise assume a native linux image. */ @@ -3672,17 +3660,6 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info) bprm->core_dump = &elf_core_dump; #endif - /* - * If we reserved extra space for brk, release it now. - * The implementation of do_brk in syscalls.c expects to be able - * to mmap pages in this space. - */ - if (info->reserve_brk) { - abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk); - abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + info->reserve_brk); - target_munmap(start_brk, end_brk - start_brk); - } - return 0; } diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 2f26cbaf5d..c624feead0 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -299,14 +299,16 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last, #ifdef TARGET_AARCH64 # define TASK_UNMAPPED_BASE 0x5500000000 #else -# define TASK_UNMAPPED_BASE (1ul << 38) +# define TASK_UNMAPPED_BASE 0x4000000000 #endif -#else +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32 #ifdef TARGET_HPPA # define TASK_UNMAPPED_BASE 0xfa000000 #else -# define TASK_UNMAPPED_BASE 0x40000000 +# define TASK_UNMAPPED_BASE 0xe0000000 #endif +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */ +# define TASK_UNMAPPED_BASE 0x40000000 #endif abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;